Three New Bioinformatics Tools Available

The NIAID sponsored Pathogen Functional Genomics Resource Center (PFGRC) at the J. Craig Venter Institute is pleased to announce the release of three new, free open-source software tools: Magnolia, Ginkgo and APEX. Magnolia is a microarray data management and export system for researchers who use PFGRC microarrays. The software greatly simplifies the tasks of organizing experimental data and submitting it to a public data repository. Ginkgo is a Comparative Genomic Hybridization (CGH) and expression microarray data analysis package. Several normalization, data filtering and imputation, and replicate microarray functions are implemented in an intuitive graphical framework. The APEX tool is an implementation of the Absolute Protein Expression quantitation technique. It can compute protein abundance values for LC-MS/MS proteomics datasets, quantifying hundreds or thousands of proteins. Links to additional information on each of these new software tools is available from the PFGRC's bioinformatics page.

Microarray Suggestion Criteria

The National Institute of Allergy and Infectious Diseases (NIAID) supported Pathogen Functional Genomics Resource Center (PFGRC) designs, constructs, and distributes glass slide DNA microarrays for pathogens and biodefense related organisms (Select A-C agents). Currently, the PFGRC supports DNA microarrays for the 38 organisms listed here. In continuing its efforts to provide the infectious disease and biodefense communities with the microarray resources most relevant to their research efforts, the PFGRC is soliciting input for selection of its next set of reference/species microarrays. The criteria for organism selection may be found here.

Home  > Comparative Genomics  > S. pneumoniae comparative analysis

Genotyping and SNP detection of Streptococcus pneumoniae

Genotyping and SNP detection of Streptococcus pneumoniae isolates by resequencing arrays

The NIAID sponsored Pathogen Functional Genomics Resource Center (PFGRC) (NIAID contract N01-AI-15447) has undertaken an evaluation of resequencing oligonucleotide array technology to detect genotypic variations in microorganisms. The primary goal is to establish expertise in the technology and methodologies while simultaneously addressing its value in the scientific community to identify and discover genotypic variations in microorganisms.

The PFGRC, in collaboration with Dr. M. Catherine McEllistrem's laboratory at The University of Pittsburg, has evaluated Affymetrix CustomSeq Resequencing Oligonucleotide Array technology to detect genotypic variations in Streptococcus pneumoniae isolates. This pilot project is further aimed towards development of a novel genotyping platform enabling detection of SNPs from whole genomes of microorganisms.

Non-contiguous regions of Streptococcus pneumoniae isolates were resequenced in this collaborative project. The CustomSeq Resequencing Array consists of 231,688 probes covering 28,961 bases of non-contiguous sequences of the reference Streptococcus pneumoniae TIGR4 strain as shown below. The sequence details can be accessed through the Comprehensive Microbial Resource (CMR) database of JCVI (http://www.jcvi.org/).

Name/Locus

Gene/Sequence

Length(bp)

16S_rRNA

16SrRNA

1413

SP0117

Pneumococcal surface protein A

2232

SP0368

Cell wall surface anchor family protein

5301

SP0377_SP0378

Choline binding proteins C, J & intergenic region

2262

SP0390_SP0391

Choline binding proteins G, F & intergenic region

2078

SP0463_SP0466

Surface anchor, sortase & hypothetical proteins with intergenic region

4788

SP0667

Pneumococcal surface protein - putative

996

SP0834

Hemolysin-related protein

510

SP1204

Hemolysin A - putative

594

SP1466

Hemoylsin

645

SP1833

Cell wall surface anchor family protein

2124

SP1961

DNA-directed RNA polymerase, beta subunit

3609

SP1992

Cell wall surface anchor family protein

663

SP2145

Antigen, cell wall surface anchor family

2082

The data presented in the following pages shows the SNPs that were detected in each of 85 distinct whole genome samples after hybridization with the TIGR4-based resequencing array. All samples were done in duplicate. A set of bioinformatic filters were applied to the results from each experiment (see here for more information), and the results from the two experiments were combined, eliminating those SNPs that were not present in both results after filtration. Among the samples hybridized were the TIGR4 strain itself, and 3 additional strains that are fully sequenced: G54, R6 and 670.

The use of genomiphied whole-genome samples, rather than PCR-amplified fragments, simplifies the experimental protocol, and also avoids the PCR failures that would inevitably occur with some clinical samples of unknown sequence composition. However, the higher complexity of the whole-genome sample also increases the frequency of certain artifacts. The bioinformatic filters that we have developed have proven successful in identifying and eliminating the majority of these artifacts.

The "SNP Report" provides:

  1. the nucleotide and its position in the reference strain and in the target fragment
  2. the annotation, context of the ORF, and amino acid sequence when in a coding region

This information may be sorted and organized by nucleotide position or ORF. A separate report provides, for each selected SNP position, an alignment between the reference sequence and the chosen target sequences.

Users of this comparative sequence information may begin to compile a meaningful set of known SNPs that may be applied to their own research projects.