Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011;12 Suppl 2(Suppl 2):S4.
doi: 10.1186/1471-2164-12-S2-S4. Epub 2011 Jul 27.

Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences

Affiliations
Comparative Study

Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences

Bo Liu et al. BMC Genomics. 2011.

Abstract

Background: A major goal of metagenomics is to characterize the microbial composition of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from metagenomic shotgun sequencing data by matching individual reads against a database of reference sequences. One major limitation of prior computational methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels.

Results: We propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic classifier MetaPhyler (http://metaphyler.cbcb.umd.edu), which uses phylogenetic marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results by analyzing a real metagenomic dataset.

Conclusions: We have introduced a novel taxonomic classification method for analyzing the microbial diversity from whole-metagenome shotgun sequences. Compared with previous approaches, MetaPhyler is much more accurate in estimating the phylogenetic composition. In addition, we have shown that MetaPhyler can be used to guide the discovery of novel organisms from metagenomic samples.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Estimating taxonomic profiles using 16S rRNA targeted sequencing or metagenome shotgun sequencing. Figure1a shows that the taxonomic profile estimated from 16S rRNA targeted sequencing is biased because of copy number variation. Figure 1b shows that classification of whole-metagenome shotgun sequences may produce biased estimation because of the variations in genome size.
Figure 2
Figure 2
Evaluation of classification performance Comparison of phylogenetic classification performance of MetaPhyler, MEGAN, CARMA and PhymmBL. The sensitivity and precision are calculated across five taxonomic levels using 60bp and 300bp simulated metagenomic reads. During the classification with MetaPhyler, MEGAN, and PhymmBL, reference sequences that are from the same genome as the query reads are excluded. CARMA results are from the classifications based on WebCARMA server. This figure shows that the sensitivity of MetaPhyler significantly outperforms the other three methods, and that the precision is also slightly better at the genus level.
Figure 3
Figure 3
Comparison of bacterial compositions estimated from different approaches. We have created a simulated metagenomic sample (Table 2) with 100bp reads to evaluate the performance of different approaches in estimating the bacterial compositions. ”16S Ideal” and ”Shotgun Ideal” represent results obtained by analyzing 16S rRNA genes and whole genome shotgun sequences assuming the classification accuracy is perfect. Genus ”Other” indicates that sequences have been classified into genera other than that in the simulated sample. Different approaches are ranked by their correlation coefficients (shown in legend) between the estimated and true taxonomic profile. When running MetaPhyler, the genomes from which the reads were simulated are removed from the reference database.
Figure 4
Figure 4
Building MetaPhyler classifier To build MetaPhyler for a particular phylogenetic marker gene G and for length 60bp, we first simulate metagenomic reads from all reference marker genes, and as a negative set, from genomic sequences that do not contain marker genes. We then map these simulated reads against reference gene G using BLASTX. To build a classifier for gene G at a specific taxonomic level, say order, in vector Border we store BLASTX bit scores between gene G and the simulated reads that are from the same order; in vector Belse we store bit scores for aligning all other reads against G. We then find the bit score cutoff bcut that minimizes Equation 1. Finally, we repeat the previous steps to find bit score cutoffs for simulated reads of other lengths and for other genes.
Figure 5
Figure 5
Detecting novel organisms Because MetaPhyler uses different classification thresholds for different phylogenetic levels, it can avoid assigning an organism to a lower-level taxonomic group if the evidence does not support this assignment. The presence of novel organisms leads to a detectable discrepancy between the number of sequences assigned to a lower taxonomic level, and the number of sequences assigned to a higher (less specific) taxonomic level.

Similar articles

Cited by

  • Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains.
    Lan Y, Rosen G, Hershberg R. Lan Y, et al. Microbiome. 2016 May 3;4(1):18. doi: 10.1186/s40168-016-0162-5. Microbiome. 2016. PMID: 27138046 Free PMC article.
  • CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.
    Ounit R, Wanamaker S, Close TJ, Lonardi S. Ounit R, et al. BMC Genomics. 2015 Mar 25;16(1):236. doi: 10.1186/s12864-015-1419-2. BMC Genomics. 2015. PMID: 25879410 Free PMC article.
  • Critical Assessment of Metagenome Interpretation: the second round of challenges.
    Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Meyer F, et al. Nat Methods. 2022 Apr;19(4):429-440. doi: 10.1038/s41592-022-01431-4. Epub 2022 Apr 8. Nat Methods. 2022. PMID: 35396482 Free PMC article.
  • Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet.
    Sapoval N, Liu Y, Curry KD, Kille B, Huang W, Kokroko N, Nute MG, Tyshaieva A, Dilthey A, Molloy EK, Treangen TJ. Sapoval N, et al. bioRxiv [Preprint]. 2024 Aug 25:2024.06.01.596961. doi: 10.1101/2024.06.01.596961. bioRxiv. 2024. PMID: 38895276 Free PMC article. Preprint.
  • Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.
    Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Sczyrba A, et al. Nat Methods. 2017 Nov;14(11):1063-1071. doi: 10.1038/nmeth.4458. Epub 2017 Oct 2. Nat Methods. 2017. PMID: 28967888 Free PMC article.

References

    1. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525–52. doi: 10.1146/annurev.genet.38.072902.091216. - DOI - PubMed
    1. Hooper LV, Gordon JI. Commensal host-bacterial relationships in the gut. Science. 2001;292(5519):1115–8. doi: 10.1126/science.1058709. - DOI - PubMed
    1. Tringe SG, Rubin EM. Metagenomics: DNA sequencing of environmental samples. Nat Rev Genet. 2005;6(11):805–14. doi: 10.1038/nrg1709. - DOI - PubMed
    1. Handelsman J. Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev. 2004;68(4):669–85. doi: 10.1128/MMBR.68.4.669-685.2004. - DOI - PMC - PubMed
    1. Hamady M, Knight R. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res. 2009;19(7):1141–52. doi: 10.1101/gr.085464.108. - DOI - PMC - PubMed

Publication types

LinkOut - more resources