ALFA Release 4
Release Version: [20250407153717]
Key Advancements in Release 4:
We are excited to announce NCBI ALFA Release 4, a major update to one of the largest aggregated variant frequency databases. This release significantly enhances the scale and utility of ALFA, driven by a near doubling of the cohort size to ~409,000 individuals. This expansion provides unprecedented statistical power for estimating allele frequencies across diverse populations and offers substantially improved annotation for clinically relevant variants.
ALFA Release 4 continues to deliver comprehensive allele frequency information, now with even greater precision, directly benefiting genetic research and clinical variant interpretation. All data is available via our FTP site and integrated into NCBI resources.
Key Highlights of ALFA Release 4 (vs. Release 3):
-
Expanded Cohort:
- Subject numbers have nearly doubled from ~204k (R3) to ~409k (R4).
- This provides enhanced resolution for variant frequencies, especially for rarer variants.
- Increased identification of common variants (MAF >= 0.01) to over 15.5 million.
-
Improved ClinVar Annotation:
- ALFA R4 now provides frequency data for over 959,000 ClinVar RS IDs (up by 74.4% from R3).
- The number of ClinVar variants for which ALFA is the exclusive public source of frequency data has risen to over 27,000 (a 35.1% increase).
- Significant increases in coverage for medically important categories:
- "Pathogenic" variants with ALFA frequency: up by 41.3%.
- "Likely Pathogenic" variants with ALFA frequency: up by 59.6%.
-
Refined Variant Spectrum Understanding:
- Improved characterization of rare variants, with many R3 "singletons" now confirmed in multiple R4 individuals.
- The overall trend reflects better alignment with maturing public reference databases, enhancing data consistency.
Input and Output Counts for ALFA Release 4
Input | Count |
---|---|
Studies | 105 |
Subjects | 408,709 |
Genotypes | 5,897,518,457,092 |
Output | Count |
---|---|
Total RefSNPs | 904,623,795 |
Exist in dbSNP [157] | 904,097,097 |
Novel (ALFA R4) | 526,698 |
ALFA Release 4 - Population Frequency Summary
Population | Biosample ID | Subjects | Total_Variant_Count | MAF=0 | MAF>=0.01 | 0.01>MAF>=0.001 | 0.001>MAF<Singleton | Singleton |
---|---|---|---|---|---|---|---|---|
European | SAMN10492695 | 329,701 | 897,780,520 | 790,281,250 | 12,741,835 | 10,201,895 | 87,483,679 | 55,426,463 |
African Others | SAMN10492696 | 1,094 | 889,859,886 | 866,611,118 | 16,876,112 | 6,349,951 | 86,633,823 | 6,515,392 |
East Asian | SAMN10492697 | 6,475 | 889,375,287 | 877,587,021 | 11,418,542 | 240,538 | 87,716,207 | 3,529,370 |
African American | SAMN10492698 | 30,249 | 890,716,779 | 823,546,164 | 17,577,679 | 17,328,803 | 85,581,029 | 25,268,059 |
Latin American 1 | SAMN10492699 | 5,255 | 889,385,923 | 869,141,264 | 13,099,650 | 7,074,875 | 86,921,139 | 6,644,790 |
Latin American 2 | SAMN10492700 | 11,126 | 889,449,105 | 862,145,708 | 9,823,129 | 17,361,597 | 86,226,437 | 11,158,586 |
Other Asian | SAMN10492701 | 2,170 | 889,237,795 | 880,171,234 | 8,770,720 | 266,965 | 88,020,011 | 2,615,009 |
South Asian | SAMN10492702 | 4,391 | 889,167,884 | 875,229,217 | 13,570,436 | 318,927 | 87,527,852 | 4,201,152 |
Other | SAMN11605645 | 18,248 | 897,771,391 | 859,113,084 | 15,058,701 | 22,430,760 | 86,028,193 | 14,023,271 |
African | SAMN10492703 | 31,343 | 890,717,262 | 822,541,969 | 17,625,293 | 17,647,001 | 85,544,496 | 25,781,404 |
Asian | SAMN10492704 | 8,645 | 889,420,891 | 876,105,156 | 9,122,622 | 4,013,713 | 87,628,455 | 4,135,799 |
Total | SAMN10492705 | 408,709 | 897,812,126 | 736,551,077 | 15,518,943 | 17,542,581 | 86,475,062 | 81,061,967 |
Notes on Population Groups:
- African: Total of African American and African Others; see population descriptions.
- Asian: All Asian individuals (EAS and OAS) excluding South Asian (SAS); see population descriptions.
- Total: Represents unique subjects, excluding redundant counts from aggregated African and Asian categories.
Column Descriptions (for Population Frequency Summary):
- Population: ALFA computed populations.
- Biosample ID: Population BioSample accession ID.
- Subjects: Unique subject count by population.
- Total_Variant_Count: Total unique variant sites reported for the population. (Note: Column name changed from "Total Site Count" in R3 image to "Total_Variant_Count" in R4 image).
- MAF=0: Sites homozygous for the reference allele; no variant allele detected in the current subject sample size.
- MAF>=0.01: Common variants with Minor Allele Frequency (MAF) >= 0.01.
- 0.01>MAF>=0.001: Low-frequency variants.
- 0.001>MAF<Singleton: Rare variants (excluding singletons). (Note: Column name/bin definition refined from "MAF < 0.001" in R3 image).
- Singleton: Minor allele found in only one individual in that population sample.
We encourage the research and clinical communities to explore the enhanced ALFA Release 4 dataset to leverage these significant improvements in their work.