- Investigator Login
- Deadlines and Application Procedures
- NIAID PFGRC Site
- NIDCR (Dental Pathogens)
- Available Microarrays
- Description of Annotation Files
- Description of the EASE Archive
- Description of Controls
- Instructions for Downloading Annotations
- Forms and Documentation
- Laboratory Protocols
- Frequently Asked Questions
Three New Bioinformatics Tools Available
The NIAID sponsored Pathogen Functional Genomics Resource Center (PFGRC) at the J. Craig Venter Institute is pleased to announce the release of three new, free open-source software tools: Magnolia, Ginkgo and APEX. Magnolia is a microarray data management and export system for researchers who use PFGRC microarrays. The software greatly simplifies the tasks of organizing experimental data and submitting it to a public data repository. Ginkgo is a Comparative Genomic Hybridization (CGH) and expression microarray data analysis package. Several normalization, data filtering and imputation, and replicate microarray functions are implemented in an intuitive graphical framework. The APEX tool is an implementation of the Absolute Protein Expression quantitation technique. It can compute protein abundance values for LC-MS/MS proteomics datasets, quantifying hundreds or thousands of proteins. Links to additional information on each of these new software tools is available from the PFGRC's bioinformatics page.
Microarray Suggestion Criteria
The National Institute of Allergy and Infectious Diseases (NIAID) supported Pathogen Functional Genomics Resource Center (PFGRC) designs, constructs, and distributes glass slide DNA microarrays for pathogens and biodefense related organisms (Select A-C agents). Currently, the PFGRC supports DNA microarrays for the 38 organisms listed here. In continuing its efforts to provide the infectious disease and biodefense communities with the microarray resources most relevant to their research efforts, the PFGRC is soliciting input for selection of its next set of reference/species microarrays. The criteria for organism selection may be found here.
Description of Annotation Files
Column Name |
Description |
Comments |
Spot ID |
A unique identifier for this spot |
This column may be missing from some annotation sheets. |
Row |
The overall row position of the slide of the spot. |
|
Column |
The overall column position of the slide of the spot. |
|
Meta Row |
The row position of the block containing the spot. |
|
Meta Column |
The column position of the block containing the spot. |
|
Sub Row |
The row position within the block of the spot. |
|
Sub Column |
The column position within the block of the spot. |
|
Oligo ID |
A unique identifier for the 70mer oligo at the spot. |
|
Sequence |
The DNA sequence of the 70mer oligo. |
|
GC% |
The percentage of oligo bases that are G or C. |
|
Internal Repeat Score (IRS) |
The oligo's internal repeat score. |
|
Self-Annealing Score (SAS) |
The oligo's self annealing score. |
|
Design Strain |
The strain to which the design target belongs. |
|
Design Target [1][2] |
A name (e.g. a locus name) which identifies the target for which the oligo was designed. |
This oligonucleotide corresponds to a 100% sequence identity and 100% length hit with the locus given as the design target. There may be more than one locus from different strains used in the design that can be called a 'Design Target'. In this case we chose the target for this column based on what is determined to be the 'Reference Strain' for this chip. |
Common Name of Design Target |
The common name of the design target. |
|
Gene Symbol |
A gene symbol, if any, for the design target. |
|
Strain XXX |
Information about the expected hits for this spot against the specified strain (xxx will be replaced by an actual strain name). |
For arrays that represent multiple strains and/or species, there will be multiple "Strain xxx" Columns, one for each strain. The format of the hit information is defined below. |
- Reserved Spots - All spots listed as 'Reserved' are A. Thaliana control 70-mers that will be used as a future normalization and control set.
- Obsolete - This is a potentially a valid oligonucleotide. However it does not correspond to any locus from any strain used in the design of the chip with 100% identity and 100% length.
Format of hit (alignment) information in the "Strain xxxx" column(s):
- TargetName::AlignmentLength::PercentIdentity::E-value::Direction
- Multiple hits are separated by commas.
- TargetName is a name identifying the target sequence within the particular strain (usually a locus name).
- AlignmentLength is the length of the alignment between the 70mer oligo and the target sequence.
- PercentIdentity is the fraction of identical bases in the alignment, as a percentage.
- E-value: For a given BLAST score, the number of hits in a database search that we expect to see by chance with this score or better. The E-value takes into account the size of the database. The lower the E-value, the more significant the score.
- Direction is either "+" or "-". Forward strand hits are shown with direction "+"; these hits are relevant to expression experiments. Reverse strand hits are shown with direction "-". Hits of both directions are relevant to CGH experiments.
- Example: MSMEG4610::70::100.000000::8.0e-12::+,MSMEG4609::70::100.000000::4.8e-11::-
ANN Files
The Annotation (ANN, .ann) file is a tab delimited annotation file format that is designed to be compatible with the TM4 microarray analysis suite. The TM4 suite includes free, open source tools for image analysis, normalization and filtering, data management, and statistical and cluster analysis.
The ANN file format is fully described in the pdf format manual of the TM4 applications which are available at http://www.tigr.org/software/Microarray.
The file design is flexible but always contains a header row with a Unique ID column (UID) and can contain optional descriptive comments. The UID is used to map the annotation to expression data within the TM4 tools.
The ANN files generated for the PFGRC contain the file creation data, the name of the person that generated the file, the slide type that corresponds to the file, and a description.
e.g. (sample from a PFGRC ANN file)
- # date: 06/21/2005
- # created_by: jgilbert
- # slide_type: N_gonorrhoeae
- # description: .ann annotation file
The column header must contain UID, row index, and a column index columns labeled as UID, R, and C. Additional columns of additional annotation can follow as shown from the sample column header row.
ANN files will have one of the following two column format options depending if oligo sequence in included in the file or not:
- UID R C MR MC SR SC FeatN Locus Strain ComN WellID
- UID R C MR MC SR SC FeatN Locus Strain ComN Seq WellID
