SDI
Database for the subcellular diversity index

Welcome to the SDI Database

Protein localization patterns vary from genes. The localization difference in different subcellular compartments has proved to be an important characteristic to measure gene essentiality, whereas the subcellular localization diversity of genes has not been analyzed. Therefore, we introduce a Subcellular Diversity Index (SDI) to measure this diversity, and explored its correlation with gene essentiality. The SDI is based on the Cellular Component Ontology (GO-CCO) [1] and the semantic similarity measure from Wang et al [2]. We found that SDI was correlated with a few well-established measures of gene essentiality, and had a good performance in predicting essential genes. Besides, SDI showed an ability in identifying novel drug targets, for it had even better performance in predicting drug targets, and drug targets with higher SDI scores appeared to cause more side effects. As our analysis indicated that SDI can provide a different insight from other gene essentiality measures, we developed this database so that researchers can screen potentially important genes in various aspects using SDI.

Currently, users can

  • query for SDI values and rankings of given genes for 1 of the 8 species provided in the database;
  • query for SDI values and rankings of homologous genes of given genes for all 8 species provided in the database;
  • query for the predicted probabilities of human drug target from a binary logistic regression model based on SDI;
  • download all data in the database;
  • download the source code for calculating SDI.

In the future, the SDI database will

  • be updated timely;
  • be improved with better algorithm and statistics in calculating the SDI scores if possible.

Statistics

The current version of the SDI database integrates 122,435 entries of gene SDI scores and rankings for eight species, including human, mouse, rat, fruit fly, roundworm, zebra fish, thale cress and yeast. Three identifiers can be used for querying gene SDI, including Entrez gene ID, official gene symbol of NCBI and Ensembl gene ID.


References

[1] Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource[J]. Nucleic acids research, 32(suppl_1): D258-D261 (2004).

[2] Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of GO terms. Bioinformatics 23, 1274–81 (2007).


Citing SDI

Jia K, Zhou Y and Cui Q. Quantifying gene essentiality based on the context of cellular components. Frontiers in Genetics. 2020, 10: 1342. doi:10.3389/fgene.2019.01342.

PubMed: 32038710

STEP 1 | Select Species
STEP 2 | Select Identifier
STEP 3 | Input your gene list, or click here
for an example.
No error message

Query for the SDI values and rankings.


  1. Choose the species of your gene list.
  2. Choose the identifier of your gene list.
  3. Paste your gene list in the box.
  4. Click ‘Query’ to query for the SDI values and rankings.

No choice for your species, or want to profile your gene list in all 8 species?


  1. Choose ‘All’ in STEP 1.
  2. Choose ‘Official Gene Symbol’ in STEP 2.
  3. Input your gene symbols in STEP 3 and click ‘Query’ to look for SDI of the homologous genes in all 8 species of given genes.

Notice.


  1. The Ensembl gene ID only supports for 5 species.
  2. The Official Gene Symbols and Ensembl Gene IDs in the database were transformed from Entrez Gene IDs, using files derived from NCBI (https://ftp.ncbi.nih.gov/gene/DATA/).

STEP 1 | Select Identifier
STEP 2 | Input your gene list, or click here
for an example.
No error message

Query for the prediction scores of human drug targets.


  1. Choose the identifier of your gene list.
  2. Paste your gene list in the box.
  3. Click ‘Query’ to query for the predicted probabilities of drug target from a binary logistic regression model based on SDI.

More information.


  1. SDI were used as the only variable in the logistic regression model, and 10-fold cross-validation were used for validation.
  2. The regression model was performed in 19,341 human genes.
  3. The predicted probability (range from 0-1),its rankings, and whether the gene is a known drug targets are aviliable in query result.

The data are available as shown below:

The SDI of genes of all 8 species are available here. (All_species_SDI.zip, 933KB)

The SDI of human genes are available here. (HUMAN_SDI.txt.gz, 169KB)

The SDI of mouse genes are available here. (MOUSE_SDI.txt.gz, 163KB)

The SDI of rat genes are available here. (RAT_SDI.txt.gz, 158KB)

The SDI of thale cress genes are available here. (THALE-CRESS_SDI.txt.gz, 153KB)

The SDI of zebrafish genes are available here. (ZEBRAFISH_SDI.txt.gz, 102KB)

The SDI of roundworm genes are available here. (ROUNDWORM_SDI.txt.gz, 69KB)

The SDI of fly genes are available here. (FLY_SDI.txt.gz, 60KB)

The SDI of yeast genes are available here. (YEAST_SDI.txt.gz, 47KB)


The predicted information of human drug target based on SDI is available here. (SDI_Drugtar.txt.gz, 171KB)


The source code for calculating SDI is available here. (SDI_Code.zip, 26.9MB, implement in Python 2.7)

The file go-basic.obo were downloaded from Gene Ontology Consortium (http://geneontology.org/page/download-ontology).

The file gene2go were downloaded from NCBI (https://ftp.ncbi.nih.gov/gene/DATA/).

We try to understand life science using computing.




Dr. Qinghua Cui
Department of Biomedical Informatics, Peking University Health Science Center
38 Xueyuan Rd, Beijing 100191 China

Email : cuiqinghua@hsc.pku.edu.cn
Homepage : http://www.cuilab.cn/