U.S. flag

An official website of the United States government

ALFA Release 4

Release Version: [20250407153717]

Key Advancements in Release 4:

We are excited to announce NCBI ALFA Release 4, a major update to one of the largest aggregated variant frequency databases. This release significantly enhances the scale and utility of ALFA, driven by a near doubling of the cohort size to ~409,000 individuals. This expansion provides unprecedented statistical power for estimating allele frequencies across diverse populations and offers substantially improved annotation for clinically relevant variants.

ALFA Release 4 continues to deliver comprehensive allele frequency information, now with even greater precision, directly benefiting genetic research and clinical variant interpretation. All data is available via our FTP site and integrated into NCBI resources.


Key Highlights of ALFA Release 4 (vs. Release 3):

  1. Expanded Cohort:

    • Subject numbers have nearly doubled from ~204k (R3) to ~409k (R4).
    • This provides enhanced resolution for variant frequencies, especially for rarer variants.
    • Increased identification of common variants (MAF >= 0.01) to over 15.5 million.
  2. Improved ClinVar Annotation:

    • ALFA R4 now provides frequency data for over 959,000 ClinVar RS IDs (up by 74.4% from R3).
    • The number of ClinVar variants for which ALFA is the exclusive public source of frequency data has risen to over 27,000 (a 35.1% increase).
    • Significant increases in coverage for medically important categories:
      • "Pathogenic" variants with ALFA frequency: up by 41.3%.
      • "Likely Pathogenic" variants with ALFA frequency: up by 59.6%.
  3. Refined Variant Spectrum Understanding:

    • Improved characterization of rare variants, with many R3 "singletons" now confirmed in multiple R4 individuals.
    • The overall trend reflects better alignment with maturing public reference databases, enhancing data consistency.

Input and Output Counts for ALFA Release 4

Input Count
Studies 105
Subjects 408,709
Genotypes 5,897,518,457,092
Output Count
Total RefSNPs 904,623,795
Exist in dbSNP [157] 904,097,097
Novel (ALFA R4) 526,698

ALFA Release 4 - Population Frequency Summary

Population Biosample ID Subjects Total_Variant_Count MAF=0 MAF>=0.01 0.01>MAF>=0.001 0.001>MAF<Singleton Singleton
European SAMN10492695 329,701 897,780,520 790,281,250 12,741,835 10,201,895 87,483,679 55,426,463
African Others SAMN10492696 1,094 889,859,886 866,611,118 16,876,112 6,349,951 86,633,823 6,515,392
East Asian SAMN10492697 6,475 889,375,287 877,587,021 11,418,542 240,538 87,716,207 3,529,370
African American SAMN10492698 30,249 890,716,779 823,546,164 17,577,679 17,328,803 85,581,029 25,268,059
Latin American 1 SAMN10492699 5,255 889,385,923 869,141,264 13,099,650 7,074,875 86,921,139 6,644,790
Latin American 2 SAMN10492700 11,126 889,449,105 862,145,708 9,823,129 17,361,597 86,226,437 11,158,586
Other Asian SAMN10492701 2,170 889,237,795 880,171,234 8,770,720 266,965 88,020,011 2,615,009
South Asian SAMN10492702 4,391 889,167,884 875,229,217 13,570,436 318,927 87,527,852 4,201,152
Other SAMN11605645 18,248 897,771,391 859,113,084 15,058,701 22,430,760 86,028,193 14,023,271
African SAMN10492703 31,343 890,717,262 822,541,969 17,625,293 17,647,001 85,544,496 25,781,404
Asian SAMN10492704 8,645 889,420,891 876,105,156 9,122,622 4,013,713 87,628,455 4,135,799
Total SAMN10492705 408,709 897,812,126 736,551,077 15,518,943 17,542,581 86,475,062 81,061,967

Notes on Population Groups:

  1. African: Total of African American and African Others; see population descriptions.
  2. Asian: All Asian individuals (EAS and OAS) excluding South Asian (SAS); see population descriptions.
  3. Total: Represents unique subjects, excluding redundant counts from aggregated African and Asian categories.

Column Descriptions (for Population Frequency Summary):

  • Population: ALFA computed populations.
  • Biosample ID: Population BioSample accession ID.
  • Subjects: Unique subject count by population.
  • Total_Variant_Count: Total unique variant sites reported for the population. (Note: Column name changed from "Total Site Count" in R3 image to "Total_Variant_Count" in R4 image).
  • MAF=0: Sites homozygous for the reference allele; no variant allele detected in the current subject sample size.
  • MAF>=0.01: Common variants with Minor Allele Frequency (MAF) >= 0.01.
  • 0.01>MAF>=0.001: Low-frequency variants.
  • 0.001>MAF<Singleton: Rare variants (excluding singletons). (Note: Column name/bin definition refined from "MAF < 0.001" in R3 image).
  • Singleton: Minor allele found in only one individual in that population sample.

We encourage the research and clinical communities to explore the enhanced ALFA Release 4 dataset to leverage these significant improvements in their work.

Support Center

Last updated: 2025-05-12T12:47:25Z