Introduction

Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. We depicted a comprehensive map of 366,034 polymorphic STRs (pSTRs) constructed from 6,487 deeply sequenced genomes, comprising 3,983 Chinese samples (~31.5x, NyuWa) and 2,504 samples from the 1000 Genomes Project (~33.3x, 1KGP). This study represents one of the largest and latest genome-wide studies of STR variation in various populations and will further our understanding of how this mutagenesis impacts the human genome.

Data

This database contains genome-wide information about copy number variation and other characteristics of polymorphic short tandem repeats (pSTRs) in humans including:

· LoF pSTR alleles: LoF_pSTR_alleles.txt

· eSTR identified in this study: eSTR.txt

· 3'aSTR identified in this study: aSTR.txt

· Highly variable pSTRs within superpopulation: Highly_variable_pSTRs.txt

· Expanded pSTRs within superpopulation: Expanded_pSTRs.txt

Reference

Shi, Y., Niu, Y., Zhang, P., Luo, H., Liu, S., Zhang, S., Wang, J., Li, Y., Liu, X., Song, T., Xu, T., & He, S. Characterization of genome-wide STR variation in 6,487 human genomes. Under review.

Contact us

Key Laboratory of RNA Biology,
Center for Big Data Research in Health,
Institute of Biophysics,
Chinese Academy of Sciences.

Email: Yirong Shi (shiyirong20@mails.ucas.ac.cn);
Yiwei Niu (niuyiwei16@mails.ucas.ac.cn);
Principal Investigator: Shunmin He (heshunmin@ibp.ac.cn)