Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr;39(4):442-450.
doi: 10.1038/s41587-020-00746-x. Epub 2020 Nov 30.

Readfish enables targeted nanopore sequencing of gigabase-sized genomes

Affiliations

Readfish enables targeted nanopore sequencing of gigabase-sized genomes

Alexander Payne et al. Nat Biotechnol. 2021 Apr.

Abstract

Nanopore sequencers can be used to selectively sequence certain DNA molecules in a pool by reversing the voltage across individual nanopores to reject specific sequences, enabling enrichment and depletion to address biological questions. Previously, we achieved this using dynamic time warping to map the signal to a reference genome, but the method required substantial computational resources and did not scale to gigabase-sized references. Here we overcome this limitation by using graphical processing unit (GPU) base-calling. We show enrichment of specific chromosomes from the human genome and of low-abundance organisms in mixed populations without a priori knowledge of sample composition. Finally, we enrich targeted panels comprising 25,600 exons from 10,000 human genes and 717 genes implicated in cancer, identifying PML-RARA fusions in the NB4 cell line in <15 h sequencing. These methods can be used to efficiently screen any target panel of genes without specialized sample preparation using any computer and a suitable GPU. Our toolkit, readfish, is available at https://www.github.com/looselab/readfish .

PubMed Disclaimer

Conflict of interest statement

Competing interests

ML was a member of the MinION access program and has received free flow cells and sequencing reagents in the past. ML has received reimbursement for travel, accommodation and conference fees to speak at events organized by Oxford Nanopore Technologies.

Figures

Figure 1
Figure 1. Human Genome Scale Selective Sequencing.
A) Median read lengths for reads sequenced from GM12878 and mapped against HG38 excluding alt chromosomes. The four panels each represent a quadrant of the flow cell. In the control all reads are sequenced, in the second reads mapping to chromosomes 1-8, in the third reads mapping to chromosomes 9-14 and the fourth reads mapping to chromosomes 16-20. The combined length of each of these target sets equates to approximately ½, ¼ and ⅛ of the human genome respectively. B) Heatmap of throughput per channel in each quadrant from the flow cell illustrating reduced yield as the proportion of reads rejected is increased. C) Yield ratio for each chromosome in each condition normalised against yield observed for each chromosome in the control quadrant. D) Yield of on target reads calculated in a rolling window over the course of the sequencing run showing the loss of enrichment potential. E) Plot of the number of channels contributing sequence data over the course of the sequencing run. Channels are lost at a greater rate when more reads are rejected.
Figure 2
Figure 2. Adaptive sequencing enriching for the least abundant genome and ensuring uniform 40x coverage.
A) Mean read lengths for reads sequenced from the ZymoBIOMICS mock metagenomic community mapped against the provided references (ZymoBIOMICS, USA). Read lengths are reported for the whole run, the deliberately sequenced reads and those which were actively unblocked. B) Shows cumulative coverage of each ZymoBIOMICS genome during the sequencing run. The total coverage still accumulated as unblocked reads, though short, still map. Sequencing was automatically terminated once each sample reached 40x. C) Stacked area graph illustrating how the proportion of bases mapping to each species changes over time. D) In contrast, the proportion of reads mapping to each species over time doesn’t change significantly. Species and composition are: bs - Bacillus subtilis (14%), ef - Enterococcus faecalis (14%), ec - Escherichia coli (14%), lm - Listeria monocytogenes (14%), pa - Pseudomonas aeruginosa (14%), sc - Saccharomyces cerevisiae (2%), se - Salmonella enterica (14%), sa - Staphylococcus aureus (14%).
Figure 3
Figure 3. Adaptive sequencing enriching for the least abundant genome with centrifuge read classification and ensuring uniform 50x coverage.
A) Mean read lengths for reads sequenced from the ZymoBIOMICS mock metagenomic community mapped against the provided references. Read lengths are reported for the whole run, the deliberately sequenced reads and those which were actively unblocked. B) Shows cumulative coverage of each ZymoBIOMICS genome during the sequencing run. The total coverage still accumulated as unblocked reads, though short, still map. Sequencing was automatically terminated once each sample reached 50x. The small overshoot in sequenced reads coverage is likely caused by the centrifuge step lagging as reads are not instantly written to disk. C) Stacked area graph illustrating how the proportion of bases mapping to each species changes over time. D) In contrast, the proportion of reads mapping to each species over time doesn’t change significantly. Species and composition as in Figure 2.
Figure 4
Figure 4. Half Exome Panel Targeted Sequencing.
A) Mean coverage across each exon target in the genome ordered by chromosome. Exons on odd numbered chromosomes are enriched (green) and depleted on even numbered chromosomes (red). B) Mean coverage across each exon for genes within the COSMIC panels. For A and B, horizontal lines represent approximate mean expected coverage for flow cells yielding 10, 20 or 30 Gb of data in a single run. Mean coverage calculated by mosdepth . C,D,E,F) Coverage plots for highlighted genes including BRCA1 (C), PML (D), WIF1 (E) and HOXC13 and HOXC11 (F). C and D are enriched as they are found on chromosome 17 and 15 whilst E and F are depleted as genes are on chromosome 12. Exon target regions indicated by arrows. In this experiment, different targets were used for the Watson and Crick strands as illustrated by the offsets. Note the absence of target regions for panels E and F.
Figure 5
Figure 5. COSMIC Panel Targeted Sequencing.
A & B) Mean coverage across the selected COSMIC gene regions ordered by chromosome for two independent sequencing runs of NA12878. Horizontal lines represent approximate mean expected coverage for flow cells yielding 10, 20 or 30 Gb of data in a single run. Mean coverage calculated by mosdepth . C,D,E,F) Coverage plots from each run (light green) for highlighted genes including BRCA1 (C), PML (D), WIF1 (E) and HOXC13 and HOXC11 (F). For comparison, coverage in the same regions for a 35X whole genome sequenced nanopore run shown in blue. COSMIC Target regions indicated by blue bars and include intronic sequence.
Figure 6
Figure 6. COSMIC Panel Targeted Sequencing of NB4.
A & B) Mean coverage across each of the COSMIC target regions ordered by chromosome for two independent sequencing runs of the NB4 cell line. Horizontal dashed line indicates expected coverage from a flow cell yielding 10, 20 or 30 Gb of sequence data in a single run. C & D) Coverage plots for each NB4 sequencing run shown in orange for PML (C) and RARA (D). E & F) Reads mapping to chromosomes 15 and 17 derived from the NB4 cell line runs 1 and 2 respectively indicating the fusion between PML and RARA. Mappings of example individual reads are shown. Breakpoints identified using svim, visualisations using Ribbon ,.

Comment in

Similar articles

Cited by

References

    1. Loose M, Malla S, Stout M. Real-time selective sequencing using nanopore technology. Nat Methods. 2016;13:751–754. - PMC - PubMed
    1. Masutani B, Morishita S. A framework and an algorithm to detect low-abundance DNA by a handy sequencer and a palm-sized computer. Bioinformatics. 2019;35:584–592. - PubMed
    1. Kovaka S, Fan Y, Ni B, Timp W, Schatz MC. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. doi: 10.1101/2020.02.03.931923. - DOI - PMC - PubMed
    1. Edwards HS, Krishnakumar R, Sinha A, Bird SW, Patel KD, Bartsch MS. Real-Time Selective Sequencing with RUBRIC: Read Until with Basecall and Reference-Informed Criteria. Sci Rep. 2019;9 11475. - PMC - PubMed
    1. Rang FJ, Kloosterman WP, de Ridder J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 2018;19:90. - PMC - PubMed

Publication types

Substances