Welcome to SmProt v2.0!

Small proteins are the general term for proteins with length shorter than 100 amino acids. SmProt contains records of Small Proteins encoded by genes, especially for ones from UTRs and non-coding RNAs. The selected small proteins were identified from ribosome profiling data, literature, mass spectrometry (MS), etc., carried out in eight species including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans and Escherichia coli. Moreover, SmProt contains features for the collected small proteins on their sequences, genomic locations, tissues/cell lines, assessment reflecting coding potential, function, variants, and related diseases that have been verified or predicted, etc.

UPDATE:Extra Attention on Reliability, Variants, Relationship Between Small Proteins and Diseases, Vast Increase of Tissues/Cell lines/Datasets, Translation Initiation, PhyloCSF Score, Translation Level, and other Detailed Information. Over 4000 conserved Small Protein Families identified from Human Microbiomes by this paper were also collected to display.


What's new!

1.Reliability guaranteed by more accurate algorithm, specially designed pipeline, scores evaluation and translation evidence.
2.Much more comprehensive annotation curated from multiple data sources! All small proteins derived from ribosome profiling datasets were completely new!
3.Variants in 5'UTRs called from WGS project and in sORFs called from Ribo-seq datasets were added.
4.Disease-specific translation events and variants in sORFs were predicted.
5.Small proteins with non-AUG translation initiation were added to the database.
6.419 Ribo-seq datasets were employed in the update. The number of small proteins increased to more than 3.6 million records.
7.Over 4000 conserved small protein families identified from human microbiomes were collected to display in humanMicroBio.


Besides, over 4000 conserved Small Protein Families identified from Human Microbiomes were collected.

Publication:     Li Y., et al. 2021. SmProt: A Reliable Repository with Comprehensive Annotation of Small Proteins Identified from Ribosome Profiling. Genomics Proteomics Bioinformatics. 

Hao Y., et al. 2017.SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci.Brief Bioinform bbx005.

Previous release:     SmProt v1.0.

What is SmProt?


Accepted by Genomics, Proteomics and Bioinformatics.
September 2021
Keep assessing, integrating and uploading new data, as well as optimizing the website structure.
January 2021
Refactored the entire construction of SmProt website.
December 2020
Evaluated impact of variants on sORF translation.
October 2020
Constructed pipeline for calling variants from Ribo-seq data.
July 2020
Collected new small proteins from literature.
June 2020
Finished analysis of our WGS project data.
May 2020
Released nearly 3 million additional records of small proteins.
February 2020
Started to call variants from our WGS project with 2902 individuals.
October 2019
Finished Ribo-seq/TI-seq data curation.
August 2019
Collected small proteins from new literature.
May 2019
Started to run the Ribo-seq/TI-seq data analysis pipeline.
March 2019
Started to collect new Ribo-seq/TI-seq datasets.
January 2019
Constructed new pipeline for SmProt v2.0.
May 2018
Conceived and investigated new points for update.
March 2018
SmProt article published in Briefings in Bioinformatics.
January 2017
1st public release.
June 2016
Started to collect small proteins from the literature.
December 2015
Started to collect MS data sets and ribosome profiling data sets.
June 2015

Visitor Statistics