Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 21:11:603.
doi: 10.1186/1471-2105-11-603.

Content-based microarray search using differential expression profiles

Affiliations

Content-based microarray search using differential expression profiles

Jesse M Engreitz et al. BMC Bioinformatics. .

Abstract

Background: With the expansion of public repositories such as the Gene Expression Omnibus (GEO), we are rapidly cataloging cellular transcriptional responses to diverse experimental conditions. Methods that query these repositories based on gene expression content, rather than textual annotations, may enable more effective experiment retrieval as well as the discovery of novel associations between drugs, diseases, and other perturbations.

Results: We develop methods to retrieve gene expression experiments that differentially express the same transcriptional programs as a query experiment. Avoiding thresholds, we generate differential expression profiles that include a score for each gene measured in an experiment. We use existing and novel dimension reduction and correlation measures to rank relevant experiments in an entirely data-driven manner, allowing emergent features of the data to drive the results. A combination of matrix decomposition and p-weighted Pearson correlation proves the most suitable for comparing differential expression profiles. We apply this method to index all GEO DataSets, and demonstrate the utility of our approach by identifying pathways and conditions relevant to transcription factors Nanog and FoxO3.

Conclusions: Content-based gene expression search generates relevant hypotheses for biological inquiry. Experiments across platforms, tissue types, and protocols inform the analysis of new datasets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic diagram of our approach. (A) Creation of differential expression (DE) profiles. An experiment comparing condition X with condition Y is condensed to a single DE profile in gene space. Dimension reduction is applied to create a DE profile in a reduced feature space. (B) Searching a library of DE profiles. The query profile is compared to the DE profiles of all other experiments in our disease compendium (or in GEO) using a similarity measure. Results are ranked by their similarity to the query profile. Italics indicate variable steps in our pipeline.
Figure 2
Figure 2
Disease compendium. Our collection of 32 disease-related experiments represents several combinations of species, platforms, and tissues. Differential expression profiles based on log fold-change were generated for each experiment, then mapped to human genes through Homologene. We compared DE profiles using Pearson correlation and applied hierarchical clustering to find that the profiles cluster primarily by disease and tissue. One GEO Series appears more than once: GSE3790 provides three profiles that cluster together, comparing normal to diseased tissue in cerebellum, frontal cortex, and caudate nucleus.
Figure 3
Figure 3
Evaluation of dimension reduction methods and similarity measures. Comparison of four dimension reduction methods and six similarity measures using leave-one-out cross-validation in our disease compendium. Bars and AUC estimates indicate standard errors for curves averaged over all cross-validation trials. The three similarity measures based on Pearson correlation outperform the rank-based approaches, with the p-weighted Pearson correlation proving the best at identifying other experiments of the same disease. The ICA projection method for dimension reduction outperforms the module-based approaches, and performs comparably to gene-level analysis. HsGxModules = Human Gene Expression Modules (see Methods).
Figure 4
Figure 4
Network of GEO differential expression profiles. (A) We calculated p-weighted correlations between 9,415 differential expression profiles from GEO and connected highly similar profiles (q < 0.001). Nodes are colored according to experimental variable (e.g., time). Dense clusters tend to represent multiple profiles from the same experiment. We identified multi-experiment clusters corresponding to processes including muscle injury, mammary gland development, and glioma grade. For a high resolution figure, see Additional files 3 and 4. (B) Close-up of a multi-experiment cluster. DE profile nodes are re-colored to correspond to the GEO DataSet from which they originate, and node shape represents experimental variables. *Compares gestation day 14 to gestation day 16.
Figure 5
Figure 5
Search results for Nanog knockdown.
Figure 6
Figure 6
Gene-level analysis of transcription factor search results. Scatterplots comparing the log fold-change of each gene shared by two differential expression profiles. Expression values are centered and scaled. The area of each circle is proportional to the contribution that the gene makes to the final correlation score, and thus is a function of the magnitude as well as the significance of differential expression. (A) Comparison of GSE18326: FoxO3 null versus wild type and GDS2758: normoxia versus hypoxia. (B) Comparison of GDS1824: Nanog knockdown versus control and GDS1688: non-small cell adenocarcinoma versus small cell cancer.
Figure 7
Figure 7
Search results for FoxO3A knockout.

Similar articles

Cited by

References

    1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–7. doi: 10.1126/science.286.5439.531. - DOI - PubMed
    1. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH. Functional discovery via a compendium of expression profiles. Cell. 2000;102:109–26. doi: 10.1016/S0092-8674(00)00015-5. - DOI - PubMed
    1. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–35. doi: 10.1126/science.1132939. - DOI - PubMed
    1. Hassane DC, Guzman ML, Corbett C, Li X, Abboud R, Young F, Liesveld JL, Carroll M, Jordan CT. Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data. Blood. 2008;111(12):5654–62. doi: 10.1182/blood-2007-11-126003. - DOI - PMC - PubMed
    1. Dudley JT, Tibshirani R, Deshpande T, Butte AJ. Disease signatures are robust across tissues and experiments. Mol Syst Biol. 2009;5:307. doi: 10.1038/msb.2009.66. - DOI - PMC - PubMed

Publication types