[关键词]
[摘要]
人类基因组计划的成果推动了生物信息学研究的发展.基于疾病表型相似性策略寻找功能上存在联系的致病基因,即表型相似基因,具有重要的研究价值和广阔的应用前景,是新兴的研究热点.然而,生物医学领域尚没有利用计算机方法开展基于基因-疾病-表型关系网络的表型相似基因搜索研究.对此,利用疾病公开数据库构建了包含基因、疾病、表型这3类异构类型节点的疾病信息网络,并设计了基于疾病信息网络的相似基因搜索算法gSim-Miner.针对疾病表型数据的特点,设计了剪枝策略提高算法效率.通过在真实数据上的实验,验证了疾病信息网络对搜索表型相似基因的适用性以及gSim-Miner算法的有效性、执行效率和可扩展性.
[Key word]
[Abstract]
The results of Human Genome Project promote the development of bioinformatics. Searching disease genes that have function correlations, also called similar phenotype genes, based on the strategy of disease phenome similarity becomes an emerging research topic due to its important research value and wide range of applications. However, in biomedical field, there is no previous work that applies computer methods to search similar phenotype genes via a network consists of "gene-disease-phenotype" relations. To fill the gap, in this study, a disease information network containing three heterogeneous nodes (i.e., gene, disease, and phenotype) is built by making use of a disease open database. In addition, an algorithm, called gSim-Miner, is designed for the search of similar phenotype genes via the disease information network. Pruning strategies based on the characteristics of disease phenotype data are proposed to improve the efficiency of gSim-Miner. Experiments on real-world data sets demonstrate that the disease information network is feasible, and gSim-Miner is effective, efficient and extensible.
[中图分类号]
TP311
[基金项目]
国家自然科学基金(61572332,81473446);中国博士后科学基金(2016T90850);中央高校基本科研业务费(2016SCU 04A22)