Protein localization patterns vary from genes. The localization difference in different subcellular compartments has proved to be an important characteristic to measure gene essentiality, whereas the subcellular localization diversity of genes has not been analyzed. Therefore, we introduce a Subcellular Diversity Index (SDI) to measure this diversity, and explored its correlation with gene essentiality. The SDI is based on the Cellular Component Ontology (GO-CCO) [1] and the semantic similarity measure from Wang et al [2]. We found that SDI was correlated with a few well-established measures of gene essentiality, and had a good performance in predicting essential genes. Besides, SDI showed an ability in identifying novel drug targets, for it had even better performance in predicting drug targets, and drug targets with higher SDI scores appeared to cause more side effects. As our analysis indicated that SDI can provide a different insight from other gene essentiality measures, we developed this database so that researchers can screen potentially important genes in various aspects using SDI.
Currently, users can
In the future, the SDI database will
Statistics
The current version of the SDI database integrates 122,435 entries of gene SDI scores and rankings for eight species, including human, mouse, rat, fruit fly, roundworm, zebra fish, thale cress and yeast. Three identifiers can be used for querying gene SDI, including Entrez gene ID, official gene symbol of NCBI and Ensembl gene ID.
References
[1] Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource[J]. Nucleic acids research, 32(suppl_1): D258-D261 (2004).
[2] Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of GO terms. Bioinformatics 23, 1274–81 (2007).
Citing SDI
Jia K, Zhou Y and Cui Q. Quantifying gene essentiality based on the context of cellular components. Frontiers in Genetics. 2020, 10: 1342. doi:10.3389/fgene.2019.01342.
PubMed: 32038710
The SDI of genes of all 8 species are available here. (All_species_SDI.zip, 933KB)
The SDI of human genes are available here. (HUMAN_SDI.txt.gz, 169KB)
The SDI of mouse genes are available here. (MOUSE_SDI.txt.gz, 163KB)
The SDI of rat genes are available here. (RAT_SDI.txt.gz, 158KB)
The SDI of thale cress genes are available here. (THALE-CRESS_SDI.txt.gz, 153KB)
The SDI of zebrafish genes are available here. (ZEBRAFISH_SDI.txt.gz, 102KB)
The SDI of roundworm genes are available here. (ROUNDWORM_SDI.txt.gz, 69KB)
The SDI of fly genes are available here. (FLY_SDI.txt.gz, 60KB)
The SDI of yeast genes are available here. (YEAST_SDI.txt.gz, 47KB)
The predicted information of human drug target based on SDI is available here. (SDI_Drugtar.txt.gz, 171KB)
The source code for calculating SDI is available here. (SDI_Code.zip, 26.9MB, implement in Python 2.7)
The file go-basic.obo were downloaded from Gene Ontology Consortium (http://geneontology.org/page/download-ontology).
The file gene2go were downloaded from NCBI (https://ftp.ncbi.nih.gov/gene/DATA/).
We try to understand life science using computing.
Dr. Qinghua Cui
Department of Biomedical Informatics, Peking University Health Science Center
38 Xueyuan Rd, Beijing 100191 China
Email : cuiqinghua@hsc.pku.edu.cn
Homepage : http://www.cuilab.cn/