Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Feb 16;291(5507):1304-51.
doi: 10.1126/science.1058040.

The sequence of the human genome

J C Venter  1 M D AdamsE W MyersP W LiR J MuralG G SuttonH O SmithM YandellC A EvansR A HoltJ D GocayneP AmanatidesR M BallewD H HusonJ R WortmanQ ZhangC D KodiraX H ZhengL ChenM SkupskiG SubramanianP D ThomasJ ZhangG L Gabor MiklosC NelsonS BroderA G ClarkJ NadeauV A McKusickN ZinderA J LevineR J RobertsM SimonC SlaymanM HunkapillerR BolanosA DelcherI DewD FasuloM FlaniganL FloreaA HalpernS HannenhalliS KravitzS LevyC MobarryK ReinertK RemingtonJ Abu-ThreidehE BeasleyK BiddickV BonazziR BrandonM CargillI ChandramouliswaranR CharlabK ChaturvediZ DengV Di FrancescoP DunnK EilbeckC EvangelistaA E GabrielianW GanW GeF GongZ GuP GuanT J HeimanM E HigginsR R JiZ KeK A KetchumZ LaiY LeiZ LiJ LiY LiangX LinF LuG V MerkulovN MilshinaH M MooreA K NaikV A NarayanB NeelamD NusskernD B RuschS SalzbergW ShaoB ShueJ SunZ WangA WangX WangJ WangM WeiR WidesC XiaoC YanA YaoJ YeM ZhanW ZhangH ZhangQ ZhaoL ZhengF ZhongW ZhongS ZhuS ZhaoD GilbertS BaumhueterG SpierC CarterA CravchikT WoodageF AliH AnA AweD BaldwinH BadenM BarnsteadI BarrowK BeesonD BusamA CarverA CenterM L ChengL CurryS DanaherL DavenportR DesiletsS DietzK DodsonL DoupS FerrieraN GargA GluecksmannB HartJ HaynesC HaynesC HeinerS HladunD HostinJ HouckT HowlandC IbegwamJ JohnsonF KalushL KlineS KoduruA LoveF MannD MayS McCawleyT McIntoshI McMullenM MoyL MoyB MurphyK NelsonC PfannkochE PrattsV PuriH QureshiM ReardonR RodriguezY H RogersD RombladB RuhfelR ScottC SitterM SmallwoodE StewartR StrongE SuhR ThomasN N TintS TseC VechG WangJ WetterS WilliamsM WilliamsS WindsorE Winn-DeenK WolfeJ ZaveriK ZaveriJ F AbrilR GuigóM J CampbellK V SjolanderB KarlakA KejariwalH MiB LazarevaT HattonA NarechaniaK DiemerA MuruganujanN GuoS SatoV BafnaS IstrailR LippertR SchwartzB WalenzS YoosephD AllenA BasuJ BaxendaleL BlickM CaminhaJ Carnes-StineP CaulkY H ChiangM CoyneC DahlkeA Deslattes MaysM DombroskiM DonnellyD ElyS EsparhamC FoslerH GireS GlanowskiK GlasserA GlodekM GorokhovK GrahamB GropmanM HarrisJ HeilS HendersonJ HooverD JenningsC JordanJ JordanJ KashaL KaganC KraftA LevitskyM LewisX LiuJ LopezD MaW MajorosJ McDanielS MurphyM NewmanT NguyenN NguyenM NodellS PanJ PeckM PetersonW RoweR SandersJ ScottM SimpsonT SmithA SpragueT StockwellR TurnerE VenterM WangM WenD WuM WuA XiaA ZandiehX Zhu
Affiliations

The sequence of the human genome

J C Venter et al. Science. .

Erratum in

  • Science 2001 Jun 5;292(5523):1838

Abstract

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

PubMed Disclaimer

Comment in

  • More on the sequencing of the human genome.
    Waterston RH, Lander ES, Sulston JE. Waterston RH, et al. Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3022-4; author reply 3025-6. doi: 10.1073/pnas.0634129100. Epub 2003 Mar 11. Proc Natl Acad Sci U S A. 2003. PMID: 12631699 Free PMC article. No abstract available.

Similar articles

Cited by

Publication types

MeSH terms