Welcome to GIC
It is estimated that there are approximately 20,000 protein-coding genes in the human genome. That is not a small number. However, they only occupy 1-2% of the human genome. Even more genes are non-protein-coding. Research into these many genes, including coding and non-coding genes, needs enormous investment of time and funds. And more novel genes, especially non-coding genes, will be discovered over time.
So are they equally important? Previous research on several model organisms has defined some essential genes and many dispensable genes, depending on whether the organism can survive without the gene under lab condition, and essential genes generally tend to persevere across more species, evolve more slowly, encode proteins more connected in protein interaction network, etc. Then perhaps it is more practical to focus on the relatively more important genes. Besides, figuring out the essential gene set itself is no doubt extremely meaningful.
Thus we here present the Gene Importance Calculator (GIC). We found that the predicted probabilities produced by our logistic regression model correlated well with known measures of gene essentiality and had convincible prediction accuracy. We constructed the model from basic sequence features, including sequence length, triplet statistics and the minimum free energy of RNA secondary structure predicted by RNAfold (Hofacker et al. 1994).
Dr. Qinghua Cui Department of Biomedical Informatics Peking University Health Science Center 38 Xueyuan Rd, Beijing 100191 China Email: cuiqinghua(at)hsc.pku.edu.cn
Citation: Zeng P, Chen J, Meng Y, Zhou Y, Yang J, and Cui Q. Defining essentiality score of protein-coding genes and long noncoding RNAs. Frontiers in Genetics-Bioinformatics and Computational Biology 2018 (accepted).