Three New Bioinformatics Tools Available
The NIAID sponsored Pathogen Functional Genomics Resource Center (PFGRC) at the J. Craig Venter Institute is pleased to announce the release of three new, free open-source software tools: Magnolia, Ginkgo and APEX. Magnolia is a microarray data management and export system for researchers who use PFGRC microarrays. The software greatly simplifies the tasks of organizing experimental data and submitting it to a public data repository. Ginkgo is a Comparative Genomic Hybridization (CGH) and expression microarray data analysis package. Several normalization, data filtering and imputation, and replicate microarray functions are implemented in an intuitive graphical framework. The APEX tool is an implementation of the Absolute Protein Expression quantitation technique. It can compute protein abundance values for LC-MS/MS proteomics datasets, quantifying hundreds or thousands of proteins. Links to additional information on each of these new software tools is available from the PFGRC's bioinformatics page.
Microarray Suggestion Criteria
The National Institute of Allergy and Infectious Diseases (NIAID) supported Pathogen Functional Genomics Resource Center (PFGRC) designs, constructs, and distributes glass slide DNA microarrays for pathogens and biodefense related organisms (Select A-C agents). Currently, the PFGRC supports DNA microarrays for the 38 organisms listed here. In continuing its efforts to provide the infectious disease and biodefense communities with the microarray resources most relevant to their research efforts, the PFGRC is soliciting input for selection of its next set of reference/species microarrays. The criteria for organism selection may be found here.
Genotyping and SNP detection of Francisella tularensis
Whole-genome genotyping and SNP detection of Francisella tularensis isolates by resequencing arrays
The NIAID sponsored (NIAID contract N01-AI-15447) Pathogen Functional Genomics Resource Center (PFGRC) has evaluated Affymetrix GeneChip® Resequencing Oligonucleotide Array technology to detect genotypic variations in Francisella tularensis isolates. The primary goal is to establish expertise in the technology and methodologies towards development of a novel genotyping platform enabling detection of single nucleotide polymorphisms (SNPs) from whole genomes of microorganisms. The whole-genome scale (re)sequence and SNP information from multiple strains of an infectious agent will be shared with the scientific community, enabling advances in both basic research and translational applications for this select A agent.
The F. tularensis GeneChip® set was designed on the basis of the DNA sequence of strains LVS (GenBank Accession: AM 233362) and SCHU S4 (GenBank Accession: AJ 749949) available at http://cmr.jcvi.org. Sequences of plasmids, pOM1 (GenBank Accession: NC 002109) and pFNL10 (GenBank Accession: NC 004952) were obtained from the NCBI database (http://www.ncbi.nlm.nih.gov/). A merged sequence was constructed based on these genomic and plasmid sequences for the purposes of GeneChip® design. The F. tularensis LVS and SCHU S4 genomes are 1,895,998 and 1,892,819 bp respectively. An in silico analysis was performed to identify unique sequences from SCHU S4 (ranging from 1 bp to 11086 bp) that were appended to the LVS sequence along with plasmid pOM1 sequence and unique regions from pFNL10. A total of 179,193 bp (9.22%) of repetitive sequence were excluded from the design, resulting in 1,764,558 queryable bases (91% of the F. tularensis genome) for resequencing by hybridization. This merged sequence was tiled onto a set of 6 CustomSeq 300K GeneChip® arrays.
The use of genomiphied whole-genome samples, rather than PCR-amplified fragments, simplifies the experimental protocol, and also avoids the PCR failures that would inevitably occur with some clinical samples of unknown sequence composition. However, the higher complexity of the whole-genome sample also increases the frequency of certain artifacts. The bioinformatic filters that we have developed have proven successful in identifying and eliminating he majority of these artifacts.
The details of the custom whole-genome resequencing array set design, bioinformatic filter development and validation for improved base-call accuracy and polymorphism detection are published (Nucleic Acids Research 2007; doi: 10.1093/nar/gkm918).
The data presented in the following pages shows the SNPs that were detected in each of 40 distinct whole genome samples after hybridization with the resequencing arrays. All samples were done in duplicate. A set of bioinformatic filters were applied to the results from each experiment (see here for more information), and the results from two experiments were combined, eliminating those SNPs that were not present in both results after filtration.
The "SNP Report" provides:
- the nucleotide and its position in the reference strain and in the target fragment
- the annotation, context of the ORF, and amino acid sequence when in a coding region
This information may be sorted and organized by nucleotide position or ORF. A separate report provides, for each selected SNP position, an alignment between the reference sequence and the chosen target sequences.
Users of this comparative sequence information may begin to compile a meaningful set of known SNPs that may be applied to their own research projects.
