Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 26;7(5):e0074122.
doi: 10.1128/msystems.00741-22. Epub 2022 Sep 7.

MetaPhage: an Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data

Affiliations

MetaPhage: an Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data

Mattia Pandolfo et al. mSystems. .

Abstract

Phages are the most abundant biological entities on the planet, and they play an important role in controlling density, diversity, and network interactions among bacterial communities through predation and gene transfer. To date, a variety of bacteriophage identification tools have been developed that differ in the phage mining strategies used, input files requested, and results produced. However, new users attempting bacteriophage analysis can struggle to select the best methods and interpret the variety of results produced. Here, we present MetaPhage, a comprehensive reads-to-report pipeline that streamlines the use of multiple phage miners and generates an exhaustive report. The report both summarizes and visualizes the key findings and enables further exploration of key results via interactive filterable tables. The pipeline is implemented in Nextflow, a widely adopted workflow manager that enables an optimized parallelization of tasks in different locations, from local server to the cloud; this ensures reproducible results from containerized packages. MetaPhage is designed to enable scalability and reproducibility; also, it can be easily expanded to include new miners and methods as they are developed in this continuously growing field. MetaPhage is freely available under a GPL-3.0 license at https://github.com/MattiaPandolfoVR/MetaPhage. IMPORTANCE Bacteriophages (viruses that infect bacteria) are the most abundant biological entities on earth and are increasingly studied as members of the resident microbiota community in many environments, from oceans to soils and the human gut. Their identification is of great importance to better understand complex bacterial dynamics and microbial ecosystem function. A variety of metagenome bacteriophage identification tools have been developed that differ in the phage mining strategies used, input files requested, and results produced. To facilitate the management and the execution of such a complex workflow, we developed MetaPhage (MP), a comprehensive reads-to-report pipeline that streamlines the use of multiple phage miners and generates an exhaustive report. The pipeline is implemented in Nextflow, a widely adopted workflow manager that enables an optimized parallelization of tasks. MetaPhage is designed to enable scalability and reproducibility and offers an installation-free, dependency-free, and conflict-free workflow execution.

Keywords: NGS; bacteriophages; bioinformatics; metagenomics; phage mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIG 1
FIG 1
MetaPhage workflow DAG chart. Input sequences can either be single-end or paired-end, while metadata are .csv files with a strict format style as described in the MetaPhage manual. By configuration file or command line, the user can decide to skip several steps or phage identification tools.
FIG 2
FIG 2
Interactive subgraph produced by graphanalyzer for a vOTU. Triangles and circles represent reference genomes and vOTU, respectively. This vOTU and the reference genome from which it inherits the taxonomy are depicted in red and green, respectively. Orange nodes are subclustered together with this vOTU, while yellow nodes are only clustered together and belong to a different subcluster. The width and color of edges are proportional to the similarity between two nodes, from thin transparent aquamarine (weaker similarity) to thick opaque red (strongest similarity) shading from green, yellow, and orange in-between. Nodes are positioned approximately respecting the similarity between them, and this tends to make clusters visible at first sight. When opened with a browser, this subgraph is interactive; the user can zoom and drag it and can hover over a node with the mouse to show its properties, like taxonomy classification or cluster type.
FIG 3
FIG 3
MetaPhage produces an html report with multiple sections, which can be divided in three main categories as follows: analysis overview, searchable tables, and hyperlinks to results. (A) Analysis overview includes panes to inspect the overall quality of the reads (fastp) and taxonomic composition of the whole sample (Kraken2), including interactive plots for each sample (Krona), the assembly metrics (metaQuast), and custom plots specific to viral diversity and produced with R by the pipeline (alpha and beta diversity, heatmap, violin plots). (B) The searchable tables include a summary of the taxonomic analysis of the viral OTUs (vOTUs) as performed (vConTACT2), and custom filters can be added to exclude some miners or to restrict the search to specific phage clusters in the network. For each vOTU, a link to the individual subnetwork (as performed by the graphanalyzer script). (C) A dedicated section reports links to the main files produced by the pipeline for downstream analyses, including the raw counts table, the taxonomy table, and phyloseq objects for downstream analyses in R.
FIG 4
FIG 4
(A) Heatmaps of the RPM (reads per million) abundances of predicted viral families across each sample in the two data sets tested. (B) Alpha diversity stratified by sampling time based on Chao1 metrics. The vOTUs unclassified at Phylum level were removed. Low quality vOTU labeled as “not-determined” by CheckV quality assessment were also removed in the shotgun metagenome data set.

Similar articles

Cited by

References

    1. Roux S, Páez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Reddy TBK, Nayfach S, Schulz F, Call L, Neches RY, Woyke T, Ivanova NN, Eloe-Fadrosh EA, Kyrpides NC. 2021. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res 49:D764–D775. doi:10.1093/nar/gkaa946. - DOI - PMC - PubMed
    1. Gregory AC, Zablocki O, Zayed AA, Howell A, Bolduc B, Sullivan MB. 2020. The Gut Virome Database reveals age-dependent patterns of virome diversity in the human gut. Cell Host Microbe 28:724–740. doi:10.1016/j.chom.2020.08.003. - DOI - PMC - PubMed
    1. Camarillo-Guerrero LF, Almeida A, Rangel-Pineros G, Finn RD, Lawley TD. 2021. Massive expansion of human gut bacteriophage diversity. Cell 184:1098–1109. doi:10.1016/j.cell.2021.01.029. - DOI - PMC - PubMed
    1. Roux S, Adriaenssens EM, Dutilh BE, Koonin EV, Kropinski AM, Krupovic M, Kuhn JH, Lavigne R, Brister JR, Varsani A, Amid C, Aziz RK, Bordenstein SR, Bork P, Breitbart M, Cochrane GR, Daly RA, Desnues C, Duhaime MB, Emerson JB, Enault F, Fuhrman JA, Hingamp P, Hugenholtz P, Hurwitz BL, Ivanova NN, Labonté JM, Lee K-B, Malmstrom RR, Martinez-Garcia M, Mizrachi IK, Ogata H, Páez-Espino D, Petit M-A, Putonti C, Rattei T, Reyes A, Rodriguez-Valera F, Rosario K, Schriml L, Schulz F, Steward GF, Sullivan MB, Sunagawa S, Suttle CA, Temperton B, Tringe SG, Thurber RV, Webster NS, Whiteson KL, et al. . 2019. Minimum information about an uncultivated virus genome (MIUViG). Nat Biotechnol 37:29–37. doi:10.1038/nbt.4306. - DOI - PMC - PubMed
    1. Turner D, Kropinski AM, Adriaenssens EM. 2021. A roadmap for genome-based phage taxonomy. Viruses 13:506. doi:10.3390/v13030506. - DOI - PMC - PubMed

Publication types

LinkOut - more resources