Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun;15(6):461-468.
doi: 10.1038/s41592-018-0001-7. Epub 2018 Apr 30.

Accurate detection of complex structural variations using single-molecule sequencing

Affiliations

Accurate detection of complex structural variations using single-molecule sequencing

Fritz J Sedlazeck et al. Nat Methods. 2018 Jun.

Abstract

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.

PubMed Disclaimer

Conflict of interest statement

Competing interests

M.C.S. and F.J.S. have participated in PacBio sponsored meetings over the past few years and have received travel reimbursement and honoraria for presenting at these events. Since the initial submission, P.R. is an employee of Oxford Nanopore. PacBio and Oxford Nanopore had no role in decisions relating to the study/work to be published, data collection or analysis of data.

Figures

Figure 1
Figure 1
Overview of the main steps implemented in NGMLR (left) and Sniffles (right). For details see Supplementary Notes 1 and 2 for NGMLR and Sniffles, respectively.
Figure 2
Figure 2
Alignment improvements using NGMLR shown for a 228 bp deletion (left) and a 150 bp inversion (right) shown in IGV37. Upper track shows BWA-MEM alignments that indicate these events but is not able to localize the precise event and breakpoints. With the improved alignments of NGMLR, Sniffles can precisely pinpoint the location and type of the SV.
Figure 3
Figure 3
Evaluation of NGMLR, Sniffles and related tools using simulated data with 840 SVs. X axis is showing the size of the simulated SVs. For read alignments (top), we simulated PacBio-like (left) and Oxford Nanopore-like reads (right), and distinguish between: precise (green), indicated (yellow), forced (red), unaligned reads (white), or trimmed but not aligned through the SV (grey). The SV analysis (bottom) used the same alignments as before, and distinguishes between.
Figure 4
Figure 4
Systematic error in short-read based SV calling. A) An example of a putative translocation identified in the short-read data (top alignments) that overlaps an insertion detected by both PacBio (middle) and Oxford Nanopore sequencing (bottom). B) An example of a putative inversion identified in the short-read data (top) that overlaps an insertion detected by both PacBio (middle) and Oxford Nanopore reads (bottom)
Figure 5
Figure 5
Nested SVs in SKBR3 cancer cell line. A: Evaluation of Sniffles + NGMLR using simulated data to identify nested SVs. B: A 3kb region including two deletions flanking an inverted sequence clearly visible and detected by Sniffles using NGMLR (above) and not detected by the Illumina methods (below). C: The start of an inverted duplication. The breakpoints were reported by Sniffles as the start of an inverted duplication (above) and not correctly detected by short-read methods (below).
Figure 6
Figure 6
Analysis of SV detection accuracy with different amounts of coverage. A: Theoretical assessment of recall vs coverage for different read lengths requiring a 50bp overlap of each breakpoints for SV events. B: Subsampling experiment of the 55× PacBio NA12878 data; C: Subsampling experiment using 28× Oxford Nanopore NA12878 data; D: Subsampling experiment of the 70× PacBio SKBR3 breast cancer cell line dataset. For plots B–D, Sniffles and NGMLR were run on subsampled data (rate indicated by lines) and using different thresholds for Sniffles (s: 1–10 indicated in symbols and colors). In every data set we could show the success for Sniffles using NGMLR with only 10× to 30× coverage that recovers around 80% of the calls with a precision ~80% or higher.

Similar articles

Cited by

  • Re-examination of two diatom reference genomes using long-read sequencing.
    Filloramo GV, Curtis BA, Blanche E, Archibald JM. Filloramo GV, et al. BMC Genomics. 2021 May 24;22(1):379. doi: 10.1186/s12864-021-07666-3. BMC Genomics. 2021. PMID: 34030633 Free PMC article.
  • Genomic architecture of autism from comprehensive whole-genome sequence annotation.
    Trost B, Thiruvahindrapuram B, Chan AJS, Engchuan W, Higginbotham EJ, Howe JL, Loureiro LO, Reuter MS, Roshandel D, Whitney J, Zarrei M, Bookman M, Somerville C, Shaath R, Abdi M, Aliyev E, Patel RV, Nalpathamkalam T, Pellecchia G, Hamdan O, Kaur G, Wang Z, MacDonald JR, Wei J, Sung WWL, Lamoureux S, Hoang N, Selvanayagam T, Deflaux N, Geng M, Ghaffari S, Bates J, Young EJ, Ding Q, Shum C, D'Abate L, Bradley CA, Rutherford A, Aguda V, Apresto B, Chen N, Desai S, Du X, Fong MLY, Pullenayegum S, Samler K, Wang T, Ho K, Paton T, Pereira SL, Herbrick JA, Wintle RF, Fuerth J, Noppornpitak J, Ward H, Magee P, Al Baz A, Kajendirarajah U, Kapadia S, Vlasblom J, Valluri M, Green J, Seifer V, Quirbach M, Rennie O, Kelley E, Masjedi N, Lord C, Szego MJ, Zawati MH, Lang M, Strug LJ, Marshall CR, Costain G, Calli K, Iaboni A, Yusuf A, Ambrozewicz P, Gallagher L, Amaral DG, Brian J, Elsabbagh M, Georgiades S, Messinger DS, Ozonoff S, Sebat J, Sjaarda C, Smith IM, Szatmari P, Zwaigenbaum L, Kushki A, Frazier TW, Vorstman JAS, Fakhro KA, Fernandez BA, Lewis MES, Weksberg R, Fiume M, Yuen RKC, Anagnostou E, Sondheimer N, Glazer D, Hartley DM, Scherer SW. Trost B, et al. Cell. 2022 Nov 10;185(23):4409-4427.e18. doi: 10.1016/j.cell.2022.10.009. Cell. 2022. PMID: 36368308 Free PMC article.
  • High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation.
    Tong X, Han MJ, Lu K, Tai S, Liang S, Liu Y, Hu H, Shen J, Long A, Zhan C, Ding X, Liu S, Gao Q, Zhang B, Zhou L, Tan D, Yuan Y, Guo N, Li YH, Wu Z, Liu L, Li C, Lu Y, Gai T, Zhang Y, Yang R, Qian H, Liu Y, Luo J, Zheng L, Lou J, Peng Y, Zuo W, Song J, He S, Wu S, Zou Y, Zhou L, Cheng L, Tang Y, Cheng G, Yuan L, He W, Xu J, Fu T, Xiao Y, Lei T, Xu A, Yin Y, Wang J, Monteiro A, Westhof E, Lu C, Tian Z, Wang W, Xiang Z, Dai F. Tong X, et al. Nat Commun. 2022 Sep 24;13(1):5619. doi: 10.1038/s41467-022-33366-x. Nat Commun. 2022. PMID: 36153338 Free PMC article.
  • Clinically relevant mutations in regulatory regions of metabolic genes facilitate early adaptation to ciprofloxacin in Escherichia coli.
    Pal A, Ghosh D, Thakur P, Nagpal P, Irulappan M, Maruthan K, Mukherjee S, Patil NG, Dutta T, Veeraraghavan B, Vivekanandan P. Pal A, et al. Nucleic Acids Res. 2024 Sep 23;52(17):10385-10399. doi: 10.1093/nar/gkae719. Nucleic Acids Res. 2024. PMID: 39180403 Free PMC article.
  • Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato.
    Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, Levy Y, Harel TH, Shalev-Schlosser G, Amsellem Z, Razifard H, Caicedo AL, Tieman DM, Klee H, Kirsche M, Aganezov S, Ranallo-Benavidez TR, Lemmon ZH, Kim J, Robitaille G, Kramer M, Goodwin S, McCombie WR, Hutton S, Van Eck J, Gillis J, Eshed Y, Sedlazeck FJ, van der Knaap E, Schatz MC, Lippman ZB. Alonge M, et al. Cell. 2020 Jul 9;182(1):145-161.e23. doi: 10.1016/j.cell.2020.05.021. Epub 2020 Jun 17. Cell. 2020. PMID: 32553272 Free PMC article.

References

    1. Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013;14:125–138. doi: 10.1038/nrg3373. - DOI - PubMed
    1. Lupski JR. Structural variation mutagenesis of the human genome: Impact on disease and evolution. Environ Mol Mutagen. 2015;56:419–436. doi: 10.1002/em.21943. - DOI - PMC - PubMed
    1. Macintyre G, Ylstra B, Brenton JD. Sequencing Structural Variants in Cancer for Precision Therapeutics. Trends Genet. 2016;32:530–542. doi: 10.1016/j.tig.2016.07.002. - DOI - PubMed
    1. Hedges DJ, et al. Evidence of novel fine-scale structural variation at autism spectrum disorder candidate loci. Mol Autism. 2012;3:2. doi: 10.1186/2040-2392-3-2. - DOI - PMC - PubMed
    1. Rovelet-Lecrux A, et al. APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat Genet. 2006;38:24–26. doi: 10.1038/ng1718. - DOI - PubMed

Publication types

MeSH terms