Fast and accurate short read alignment with Burrows-Wheeler transform
- PMID: 19451168
- PMCID: PMC2705234
- DOI: 10.1093/bioinformatics/btp324
Fast and accurate short read alignment with Burrows-Wheeler transform
Abstract
Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals.
Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package.
Availability: http://maq.sourceforge.net.
Figures




Similar articles
-
Fast and accurate long-read alignment with Burrows-Wheeler transform.Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15. Bioinformatics. 2010. PMID: 20080505 Free PMC article.
-
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.Bioinformatics. 2012 Jul 15;28(14):1830-7. doi: 10.1093/bioinformatics/bts276. Epub 2012 May 9. Bioinformatics. 2012. PMID: 22576173
-
Fast and memory efficient approach for mapping NGS reads to a reference genome.J Bioinform Comput Biol. 2019 Apr;17(2):1950008. doi: 10.1142/S0219720019500082. J Bioinform Comput Biol. 2019. PMID: 31057068
-
A survey of sequence alignment algorithms for next-generation sequencing.Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11. Brief Bioinform. 2010. PMID: 20460430 Free PMC article. Review.
-
Sense from sequence reads: methods for alignment and assembly.Nat Methods. 2009 Nov;6(11 Suppl):S6-S12. doi: 10.1038/nmeth.1376. Nat Methods. 2009. PMID: 19844229 Review.
Cited by
-
Isolated methylmalonic acidemia in Mexico: Genotypic spectrum, report of two novel MMUT variants and a possible synergistic heterozygosity effect.Mol Genet Metab Rep. 2024 Oct 16;41:101155. doi: 10.1016/j.ymgmr.2024.101155. eCollection 2024 Dec. Mol Genet Metab Rep. 2024. PMID: 39494389 Free PMC article.
-
Using Genetic Data to Determine Origin for Out-Migrating Smolt and Returning Adult Steelhead Trout (Oncorhynchus mykiss) in a Southeast Alaska Drainage.Ecol Evol. 2024 Oct 25;14(10):e70472. doi: 10.1002/ece3.70472. eCollection 2024 Oct. Ecol Evol. 2024. PMID: 39463743 Free PMC article.
-
Primary Adrenal Insufficiency, Complete Sex Reversal, and Unique Clinical Phenotype in a Patient with Severe CYP11A1 (P450scc) Deficiency-Case Report and Literature Overview.Children (Basel). 2024 Oct 12;11(10):1231. doi: 10.3390/children11101231. Children (Basel). 2024. PMID: 39457196 Free PMC article.
-
Population Genetic Investigation of Hypophthalmichthys nobilis in the Yangtze River Basin Based on RAD Sequencing Data.Biology (Basel). 2024 Oct 18;13(10):837. doi: 10.3390/biology13100837. Biology (Basel). 2024. PMID: 39452145 Free PMC article.
-
Harnessing the power of AI in precision medicine: NGS-based therapeutic insights for colorectal cancer cohort.Front Oncol. 2024 Oct 7;14:1407465. doi: 10.3389/fonc.2024.1407465. eCollection 2024. Front Oncol. 2024. PMID: 39435285 Free PMC article.
References
-
- Burrows M, Wheeler DJ. Technical report 124. Palo Alto, CA: Digital Equipment Corporation; 1994. A block-sorting lossless data compression algorithm.
-
- Campagna D, et al. PASS: a program to align short sequences. Bioinformatics. 2009;25:967–968. - PubMed
-
- Eaves HL, Gao Y. MOM: maximum oligonucleotide mapping. Bioinformatics. 2009;25:969–970. - PubMed
-
- Ferragina P, Manzini G. Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000) IEEE Computer Society; 2000. Opportunistic data structures with applications; pp. 390–398.
-
- Grossi R, Vitter JS. Proceedings on 32nd Annual ACM Symposium on Theory of Computing (STOC 2000) ACM; 2000. Compressed suffix arrays and suffix trees with applications to text indexing and string matching; pp. 397–406.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Molecular Biology Databases
Miscellaneous