Nyuwa variants

navigation

Variant ID
Search
Search result
Variant annotation
Browser

Variant ID in NyuWa Chinese Population Variant Database (NCVD)

The format of variant in NCVD is Chromosome-Position-ReferenceAllele-AlternativeAllele, for example, 13-48045719-C-T. The position, reference allele and alternative allele of variants are left-aligned and normalized (https://genome.sph.umich.edu/wiki/Variant_Normalization). The position coordinate of NCVD is based on human assembly GRch38/hg38.

Search in NyuWa Chinese Population Variant Database (NCVD)

Searching can be done on the Home page and Search page. Currently, the database supports the following 4 types of search keys to query a variant or gene or region:

(1) Variant such as 13-48045719-C-T.
(2) dbSNP ID such as rs116855232.
(3) Region. The format is Chromosome:StartPosition-EndPosition, for example, 13:48045719-48045806.
(4) Gene. The format is an official gene name such as NUDT15.

Search result

After submitting the search key, there will be a feedback table which contains variants matching the querying. From the table, some overview information of the variants can be glanced, including Variant ID, dbSNP ID, information resulting from Ensembl gene annotation (region, Gene ID, Exonic Function, consequence), and statisctical information from own 2999 Chinese high depth genome sequencing (Allele Count, Allele Number, Allele Frequency).

Variant annotation page

Basic information

From the search result, every variant can be linked to a detail variant annotation page which is modularized. First it’s the basic information of variant including Allele Count, Allele Number, Allele Frequency and Number of Homozygotes as the same of search result. If the variant is also included in dbSNP database or gnomAD database, the links of variant in those database are provided. The Browser is linked to the genome browser to present the variant in the region of upstream 100bp and downstream 100bp.

Quality metrics

Then present the Genotype Quality metrics and Site Quality Metrics of the variant. These quality metrics distributions are all counted from our own genome sequencing data. The genotype quality metrics include genotype quality (GQ), approximate read depth (DP) and allele balance for heterozygotes. The site quality metrics include SiteQuality, FS, MQRankSum, InbreedingCoeff, ReadPosRankSum, VQSLOD, QD, DP, BaseQRankSum, MQ, ClippingRankSum. FS refers to phred-scaled p-value using Fisher's exact test to detect strand bias. MQRankSum refers to Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities. InbreedingCoeff refers to Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation. ReadPosRankSum refers to Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias. VQSLOD refers to Log odds of being a true variant versus being false under the trained Gaussian mixture model, likely the reason why the variant was filtered out. QD refers to Variant Confidence/Quality by Depth. DP refers to depth of informative coverage for each sample, reads with MQ=255 or with bad mates are filtered. BaseQRankSum refers to Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities. MQ refers to RMS Mapping Quality. ClippingRankSum refers to Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases.

External data population frequency

The database also reviews the alternative allele frequency from 1000 Genomes Project Phase 3 (1KGP3) dataset (2,504 genomes) [1] and gnomAD version 3 genome dataset (71,702 genomes)(https://gnomad.broadinstitute.org). The samples in these two dataset are both divided into races as follow.

dataset	Population	Description	Genomes
1KGP3	AFR	African	661
	AMR	American	347
	EAS	East Asian	504
	EUR	European	503
	SAS	South Asian	489
	total		2,504
gnomAD v3 Genomes	AFR	African/African-American	21,042
	AMI	Amish	450
	AMR	Latino/Admixed American	6,835
	ASJ	Ashkenazi Jewish	1,662
	EAS	East Asian	1,567
	FIN	Finnish	5,244
	NFE	Non-Finnish European	32,299
	SAS	South Asian	1,526
	OTH	Other (population not assigned)	1,077
	total		71,702

Region annotation

The region annotation is the gene-based annotation refers to Ensembl Gene and RefSeq Gene by software ANNOVAR [2]. The Region column contains one or two of the following items: exonic, splicing, UTR5, UTR3, intronic, ncRNA_exonic, ncRNA_splicing, ncRNA_intronic, unstream, downstream, intergenic. The Gene ID and Gene Detail columns tell the name of gene and the relative positin of the gene where the variant is located, respecitvely. The Exonic Function column tells the functional consequences of the variant (possible values in this fields include: nonsynonymous SNV, synonymous SNV, frameshift insertion, frameshift deletion, nonframeshift insertion, nonframeshift deletion, unknown). The Consequence column contains the gene name, the transcript identifier and the sequence change in the corresponding transcript (eg. NUDT15:NM_018283:exon3:c.C415T:p.R139C).

Nonsynonymous impact

The nonsynonymous impact presents the results of 5 prediction softwares (SIFT, PolyPhen2_HDIV, PolyPhen2_HVAR, FATHMM, CADD) on nonsynonymous SNV. These information is also adopted from ANNOVAR [2].

Loss of function predict

Loss of Function (LoF) variants are indentified by package LOFTEE developed recently by gnomAD group to assess stop-gained, splice site disrupting and frameshift variants as “low-confidence” (LC) or “high-confidence” (HC) LoF variants [3].

Disease annotation

The disease annotation is annotated by clinvar disease database [4], which is also adopted from ANNOVAR [2].

Pharmacogenomics

The pharmacogenomics variants and related drug information were collected from 34 Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines (https://cpicpgx.org/). Then add the pharmacogenomics annotation to the variants in this database.

Browser

The browser jumps to genome browser webpage to see the variants coordinates on human genome along with other tracks such as Genes and Gene predictions, Comparative Genomics and variation.

Reference

[1] AUTON A, ABECASIS G R, ALTSHULER D M, et al. A global reference for human genetic variation [J]. Nature, 2015, 526(7571): 68-74.

[2] WANG K, LI M Y, HAKONARSON H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data [J]. Nucleic Acids Research, 2010, 38(16).

[3] KARCZEWSKI K J, FRANCIOLI L C, TIAO G, et al. The mutational constraint spectrum quantified from variation in 141,456 humans [J]. Nature, 2020, 581(7809): 434-43.

[4] LANDRUM M J, LEE J M, BENSON M, et al. ClinVar: improving access to variant interpretations and supporting evidence [J]. Nucleic acids research, 2018, 46(D1): D1062-D7.