Download Center
Microprotein Metadata (.csv)
Includes detailed information for all microproteins, such as smORF_id, sequence, length, sequence properties, amino acid composition and structure confidence scores.
Microprotein Sequences (.fasta)
All microprotein sequences in FASTA format for downstream sequence analysis.
smORF Nuclotide Sequences (.fasta)
All smORF nuclotide sequences in FASTA format for downstream sequence analysis.
Genomic Coordinates of smORFs (.bed)
The genomic locations of all identified smORFs in BED format based on the human reference genome hg38,
including chromosome, start and end positions, smORF_id,and strand orientation.
Data Sources for smORF Collection (.tsv)
Contains detailed information about the data sources used for compiling the smORF dataset,
including Internal ID, Database , and Source ID.
Complete prediction results of 213 bioactivities for all microproteins (.pkl)
Complete subcellular localization prediction results of all microproteins (.csv)
Complete essentiality prediction results of all microproteins (.pkl)
All Predicted Microprotein Structures (.tar.gz)
All microprotein structure files predicted by AlphaFold2, packaged by ID in a single tar.gz file.
Microprotein–PDB Exact-Match Structure Metadata (.csv)
BLAST-based alignment results between HMPA microproteins and experimentally resolved PDB protein structures with sequence length <150 aa. This table retains only exact sequence matches (identity = 100%) and provides the corresponding PDB structure metadata.
Custom Batch Download
If you require specific structures, sequences, or other bulk data, please contact us: cuiqinghua@bjmu.edu.cn.