Three New Bioinformatics Tools Available
The NIAID sponsored Pathogen Functional Genomics Resource Center (PFGRC) at the J. Craig Venter Institute is pleased to announce the release of three new, free open-source software tools: Magnolia, Ginkgo and APEX. Magnolia is a microarray data management and export system for researchers who use PFGRC microarrays. The software greatly simplifies the tasks of organizing experimental data and submitting it to a public data repository. Ginkgo is a Comparative Genomic Hybridization (CGH) and expression microarray data analysis package. Several normalization, data filtering and imputation, and replicate microarray functions are implemented in an intuitive graphical framework. The APEX tool is an implementation of the Absolute Protein Expression quantitation technique. It can compute protein abundance values for LC-MS/MS proteomics datasets, quantifying hundreds or thousands of proteins. Links to additional information on each of these new software tools is available from the PFGRC's bioinformatics page.
Microarray Suggestion Criteria
The National Institute of Allergy and Infectious Diseases (NIAID) supported Pathogen Functional Genomics Resource Center (PFGRC) designs, constructs, and distributes glass slide DNA microarrays for pathogens and biodefense related organisms (Select A-C agents). Currently, the PFGRC supports DNA microarrays for the 38 organisms listed here. In continuing its efforts to provide the infectious disease and biodefense communities with the microarray resources most relevant to their research efforts, the PFGRC is soliciting input for selection of its next set of reference/species microarrays. The criteria for organism selection may be found here.
Genotyping and SNP detection of Streptococcus pneumoniae
Genotyping and SNP detection of Streptococcus pneumoniae isolates by resequencing arrays
The NIAID sponsored Pathogen Functional Genomics Resource Center (PFGRC) (NIAID contract N01-AI-15447) has undertaken an evaluation of resequencing oligonucleotide array technology to detect genotypic variations in microorganisms. The primary goal is to establish expertise in the technology and methodologies while simultaneously addressing its value in the scientific community to identify and discover genotypic variations in microorganisms.
The PFGRC, in collaboration with Dr. M. Catherine McEllistrem's laboratory at The University of Pittsburg, has evaluated Affymetrix CustomSeq Resequencing Oligonucleotide Array technology to detect genotypic variations in Streptococcus pneumoniae isolates. This pilot project is further aimed towards development of a novel genotyping platform enabling detection of SNPs from whole genomes of microorganisms.
Non-contiguous regions of Streptococcus pneumoniae isolates were resequenced in this collaborative project. The CustomSeq Resequencing Array consists of 231,688 probes covering 28,961 bases of non-contiguous sequences of the reference Streptococcus pneumoniae TIGR4 strain as shown below. The sequence details can be accessed through the Comprehensive Microbial Resource (CMR) database of JCVI (http://www.jcvi.org/).
Name/Locus |
Gene/Sequence |
Length(bp) |
16S_rRNA |
16SrRNA |
1413 |
SP0117 |
Pneumococcal surface protein A |
2232 |
SP0368 |
Cell wall surface anchor family protein |
5301 |
SP0377_SP0378 |
Choline binding proteins C, J & intergenic region |
2262 |
SP0390_SP0391 |
Choline binding proteins G, F & intergenic region |
2078 |
SP0463_SP0466 |
Surface anchor, sortase & hypothetical proteins with intergenic region |
4788 |
SP0667 |
Pneumococcal surface protein - putative |
996 |
SP0834 |
Hemolysin-related protein |
510 |
SP1204 |
Hemolysin A - putative |
594 |
SP1466 |
Hemoylsin |
645 |
SP1833 |
Cell wall surface anchor family protein |
2124 |
SP1961 |
DNA-directed RNA polymerase, beta subunit |
3609 |
SP1992 |
Cell wall surface anchor family protein |
663 |
SP2145 |
Antigen, cell wall surface anchor family |
2082 |
The data presented in the following pages shows the SNPs that were detected in each of 85 distinct whole genome samples after hybridization with the TIGR4-based resequencing array. All samples were done in duplicate. A set of bioinformatic filters were applied to the results from each experiment (see here for more information), and the results from the two experiments were combined, eliminating those SNPs that were not present in both results after filtration. Among the samples hybridized were the TIGR4 strain itself, and 3 additional strains that are fully sequenced: G54, R6 and 670.
The use of genomiphied whole-genome samples, rather than PCR-amplified fragments, simplifies the experimental protocol, and also avoids the PCR failures that would inevitably occur with some clinical samples of unknown sequence composition. However, the higher complexity of the whole-genome sample also increases the frequency of certain artifacts. The bioinformatic filters that we have developed have proven successful in identifying and eliminating the majority of these artifacts.
The "SNP Report" provides:
- the nucleotide and its position in the reference strain and in the target fragment
- the annotation, context of the ORF, and amino acid sequence when in a coding region
This information may be sorted and organized by nucleotide position or ORF. A separate report provides, for each selected SNP position, an alignment between the reference sequence and the chosen target sequences.
Users of this comparative sequence information may begin to compile a meaningful set of known SNPs that may be applied to their own research projects.
