Frequently Asked Questions
- 1. What is dbSNP?
- 2. How can I search for SNPs in dbSNP?
- 3. What information is included in a dbSNP record?
- 4. What is an rsID, and why is it important?
- 5. How do I retrieve SNP data using NCBI eUtils API?
- 6. How do I interpret the clinical significance of a SNP?
- 7. Can I download the entire dbSNP database?
- 8. What genome assemblies does dbSNP support?
- 9. How can I find population frequency data for an SNP?
- 10. How often is dbSNP updated?
- 11. How to get flanking sequences with dbSNP redesign?
- 12. How can a variant like rs2084511460 have a frequency of 0 and still be considered and reported as an SNP?
- 13. How does dbSNP compute Genotype Frequency and Hardy-Weinberg Equilibrium (HWE)?
1. What is dbSNP?
Answer: dbSNP (Database of Single Nucleotide Polymorphisms) is a public database maintained by the National Center for Biotechnology Information (NCBI) that catalogs genetic variations, including single nucleotide variations (SNVs), insertions, deletions, and other minor genetic variations found human. See "About dbSNP" for more information and comparison with dbVar "dbSNP vs. dbVar".
2. How can I search for SNPs in dbSNP?
Answer: You can search for SNV and other genetic variations in dbSNP using the Entrez search system available on the NCBI website. You can search by rsID (Reference SNP ID), gene name, chromosomal location, or other filters like clinical significance and population frequency.
3. What information is included in a dbSNP record?
Answer: A dbSNP record includes:
- rsID (Reference SNP ID)
- Gene association
- Chromosomal position
- Alleles and frequency in populations
- Functional annotation (e.g., synonymous, missense, intronic)
- Clinical significance (e.g., benign, pathogenic)
- Linkouts to studies and databases like ClinVar and 1000 Genomes
- Links to publication
4. What is an rsID, and why is it important?
Answer: An rsID (Reference SNP ID) is a unique identifier assigned to each SNP in the dbSNP database. It helps standardize and track SNPs across different research studies and databases. See "About dbSNP Reference (rs) number" for more information
5. How do I retrieve SNP data using NCBI eUtils API?
Answer: You can use NCBI's eSearch and eFetch APIs to retrieve SNP data. The list of available search terms is here. For example:
-
To find SNPs associated with a gene:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&term=BRCA1[GENE]
-
To fetch detailed information about a specific SNP:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=334
6. How do I interpret the clinical significance of a SNP?
Answer: dbSNP includes clinical significance annotations based on ClinVar submissions. Common categories include:
- Pathogenic - Likely to cause disease
- Likely pathogenic – Strong evidence of disease association
- Benign - No known harmful effect
- Likely benign – Low likelihood of disease association
- Uncertain significance (VUS) – Insufficient evidence to classify
7. Can I download the entire dbSNP database?
Answer: Yes, the dbSNP database can be downloaded from the NCBI FTP site in various formats, such as VCF (Variant Call Format) and JSON.
8. What genome assemblies does dbSNP support?
Answer: dbSNP provides variant annotations for different genome assemblies, including:
- Human Genome Reference (GRCh38, GRCh37)
- Human Telomere-To-Telomere (Coming Soon)
9. How can I find population frequency data for an SNP?
Answer: dbSNP integrates population allele frequency data from projects such as:
- ALFA
- 1000 Genomes Project
- gnomAD (Genome Aggregation Database)
- ExAC (Exome Aggregation Consortium)
You can find allele frequencies in different ethnic groups within dbSNP reports.
10. How often is dbSNP updated?
Answer: dbSNP is updated periodically as new variant data is submitted. Always check the latest release notes on the NCBI dbSNP homepage.
11. How to get flanking sequences with dbSNP redesign?
Answer: With dbSNP redesign, the flanking sequences upstream and downstream of a SNP site are not shown directly on the RefSNP page. Instead of going back to the ‘classic’ site, a user can get the flanking sequences with a few steps.
-
Mouse over the marker near the lock icon and right click to bring up the pop-up menu options.
-
Select “Marker Detail” from pop-up menu.
-
Copy the flanking sequences in the marker detail box.
An example screenshot is shown below. A user can select “Reveal in Sequence View” in Step 2, if longer flanking sequences are needed.
12. How can a variant like rs2084511460 have a frequency of 0 and still be considered and reported as an SNP?
Answer: The frequency of 0 for the rs2084511460 variant in dbSNP indicates that the ALFA project has examined this variant, but no alternate allele was found within its dataset. This could mean either: - the alternate allele is extremely rare and wasn't detected in over 5,000 samples; or - it might not be a genuine variant despite its having been reported in a dbGaP study.
Note: Although the ALFA frequency for a variant may be listed as 0, an alternate allele has been observed somewhere (by virtue of its presence in dbSNP), and frequency information outside the scope of ALFA may still be found on the variant page in dbSNP.
13. How does dbSNP compute Genotype Frequency and Hardy-Weinberg Equilibrium (HWE)?
Answer: See Hardy-Weinberg Equilibrium Help page.