Small proteins are the general term for proteins with length shorter than 100 amino acids. SmProt contains records of Small Proteins encoded by genes, especially for ones from UTRs and non-coding RNAs. The selected small proteins were identified from ribosome profiling data, literature, mass spectrometry (MS), etc., carried out in eight species including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans and Escherichia coli. Moreover, SmProt contains features for the collected small proteins on their sequences, genomic locations, tissues/cell lines, assessment reflecting coding potential, function, variants, and related diseases that have been verified or predicted, etc.
UPDATE:Extra Attention on Reliability, Variants, Relationship Between Small Proteins and Diseases, Vast Increase of Tissues/Cell lines/Datasets, Translation Initiation, PhyloCSF Score, and other Detailed Information. Over 4000 conserved Small Protein Families identified from Human Microbiomes by this paper were also collected to display.more
1.Reliability guaranteed by more accurate algorithm, specially designed pipeline, scores evaluation and translation evidence.
2.Much more comprehensive annotation curated from multiple data sources! All small proteins derived from ribosome profiling datasets were completely new!
3.Variants in 5'UTRs called from WGS project and in sORFs called from Ribo-seq datasets were added.
4.Disease-specific translation events and variants in sORFs were predicted.
5.Small proteins with non-AUG translation initiation were added to the database.
6.419 Ribo-seq datasets were employed in the update. The number of small proteins increased to more than 3.6 million records.
7.Over 4000 conserved small protein families identified from human microbiomes were collected to display in humanMicroBio.
Besides, over 4000 conserved Small Protein Families identified from Human Microbiomes were collected.