Search of Genes with Similar Phenotype Based on Disease Information Network
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

National Natural Science Foundation of China (61572332, 81473446);China Postdoctoral Science Foundation (2016T90850);Fundamental Research Funds for the Central Universities (2016SCU04A22)

  • Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The results of Human Genome Project promote the development of bioinformatics. Searching disease genes that have function correlations, also called similar phenotype genes, based on the strategy of disease phenome similarity becomes an emerging research topic due to its important research value and wide range of applications. However, in biomedical field, there is no previous work that applies computer methods to search similar phenotype genes via a network consists of "gene-disease-phenotype" relations. To fill the gap, in this study, a disease information network containing three heterogeneous nodes (i.e., gene, disease, and phenotype) is built by making use of a disease open database. In addition, an algorithm, called gSim-Miner, is designed for the search of similar phenotype genes via the disease information network. Pruning strategies based on the characteristics of disease phenotype data are proposed to improve the efficiency of gSim-Miner. Experiments on real-world data sets demonstrate that the disease information network is feasible, and gSim-Miner is effective, efficient and extensible.

    Reference
    [1] Freimer N, Sabatti C.The human phenome project. Nature Genetics, 2003,34(1):15-21.[doi:10.1038/ng0503-15]
    [2] Oetting WS, Robinson PN, Greenblatt MS, Cotton RG, Beck T, Carey JC, Doelken SC, Girdea M, Groza T, Hamilton CM, Hamosh A, Kerner B, MacArthur JA, Maglott DR, Mons B, Rehm HL, Schofield PN, Searle BA, Smedley D, Smith CL, Bernstein IT, Zankl A, Zhao EY. Getting ready for the human phenome project:The 2012 forum of the human variome project. Human Mutation, 2013, 34(4):661-6.[doi:10.1002/humu.22293]
    [3] Mckusick VA. Mendelian inheritance in man and its online version, OMIM. American Journal of Human Genetics, 2007,80(4):588-604.[doi:10.1086/514346]
    [4] Sun YZ, Han JW. Mining Heterogeneous Information Networks:Principles and Methodologies. Morgan & Claypool Publishers, 2012.[doi:10.2200/S00433ED1V01Y201207DMK005]
    [5] Sun YZ, Han JW, Zhao PX, Yin ZJ, Cheng H, Wu TY. RankClus:Integrating clustering with ranking for heterogeneous information network analysis. In:Proc. of the 12th Int'l Conf. on Extending Data Base Technology. 2009. 565.[doi:10.1145/1516360.1516426]
    [6] Sun YZ, Yu Y, Han JW. Ranking-Based clustering of heterogeneous information networks with star network schema. In:Proc. of the Int'l Conf. on Knowledge Discovery and Data Mining. 2009. 797-806.[doi:10.1145/1557019.1557107]
    [7] Ji M, Sun YZ, Danilevsky M, Han JW, Gao J. Graph regularized transductive classification on heterogeneous information networks. In:Proc. of the European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Database. 2010.[doi:10.1007/978-3-642-15880-3_42]
    [8] Sun YZ, Han JW, Yan XF, Yu PS, Wu TY. PathSim:Meta path-based top-K similarity search in heterogeneous information networks. Proc. of the VLDB Endowment, 2011,4(11):992-1003.[doi:10.2200/S00433ED1V01Y201207DMK005]
    [9] Sun YZ, Aggarwal CC, Han JW. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc. of the VLDB Endowment, 2012,5(5):394-405.[doi:10.14778/2140436.2140437]
    [10] Huang Z, Zheng Y, Cheng R, Sun YZ, Mamoulis N, Li X. Meta structure:Computing relevance in large heterogeneous information networks. In:Proc. of the ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2016. 1595-1604.[doi:10.1145/2939672.2939815]
    [11] Digital bibliography & library project. 2017. http://dblp.org/
    [12] Chen L, Li X, Han JW. MedRank:Discovering influential medical treatments from literature by information network analysis. In:Proc. of the 24th Australasian Database Conf. Australian Computer Society. 2013. 3-12.
    [13] Jeh G, Widom J. Scaling personalized Web search. In:Proc. of the Int'l Conf. on World Wide Web. 2003. 271-279.[doi:10.1145/775152.775191]
    [14] Qi GJ, Aggarwal CC, Huang TS. On clustering heterogeneous social media objects with outlier links. In:Proc. of the Int'l Conf. on Web Search and Web Data Mining. 2012. 553-562.[doi:10.1145/2124295.2124363]
    [15] Rossi RG, Faleiros TDP, Lopes ADA, Rezende SO. Inductive model generation for text categorization using a bipartite heterogeneous network. In:Proc. of the Int'l Conf. on Data Mining. 2012. 1086-1091.[doi:10.1109/ICDM.2012.130]
    [16] Zhang J, Kong X, Jie L, Chang Y, Yu PS. NCR:A scalable network-based approach to co-ranking in question-and-answer sites. In:Proc. of the Int'l Conf. on Information and Knowledge Management. 2014. 709-718.[doi:10.1145/2661829.2661978]
    [17] Ren X, Liu J, Yu X, Khandelwal U, Gu Q, Wang L, Han J. ClusCite:Effective citation recommendation by information networkbased clustering. In:Proc. of the Int'l Conf. on Knowledge Discovery and Data Mining. 2014. 821-830.[doi:10.1145/2623330. 2623630]
    [18] Alkindy B, Guyeux C, Couchot JF, Salomon M, Bahi JM. Gene similarity-based approaches for determining core-genes of chloroplasts. In:Proc. of the Int'l Conf. on Bioinformatics and Biomedicine. 2015. 71-74.[doi:10.1109/BIBM.2014.6999130]
    [19] Du Z, Li L, Chen CF, Yu PS, Wang JZ. G-SESAME:Web tools for GO-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Research, 2009,37:W345-W349.[doi:10.1093/nar/gkp463]
    [20] Sanfilippo A, Baddeley B, Beagley N, Riensche R, Gopalan B. Enhancing automatic biological pathway generation with GO-based gene similarity. In:Proc. of the Int'l Joint Conf. on Bioinformatics, Systems Biology and Intelligent Computing. 2009. 448-453.[doi:10.1109/IJCBS.2009.96]
    [21] Baralis E, Bruno G, Fiori A. Measuring gene similarity by means of the classification distance. Knowledge & Information Systems, 2011,29(1):81-101.[doi:10.1007/s10115-010-0374-0]
    [22] Othman RM, Deris S, Illias RM. A genetic similarity algorithm for searching the gene ontology terms and annotating anonymous protein sequences. Journal of Biomedical Informatics, 2008,41(1):65-81.[doi:10.1016/j.jbi.2007.05.010]
    [23] Nagar A, Almubaid H. A new path length measure based on GO for gene similarity with evaluation using SGD pathways. In:Proc. of the Int'l Symp. on Computer-Based Medical Systems. 2008. 590-595.[doi:10.1109/CBMS.2008.27]
    [24] Alvarez MA, Yan C. A graph-based semantic similarity measure for the gene ontology. Journal of Bioinformatics & Computational Biology, 2011,9(6):681-695.[doi:10.1142/S0219720011005641]
    [25] Alvarez MA, Qi X, Yan C. A shortest-path graph kernel for estimating gene product semantic similarity. Journal of Biomed Semantics, 2011,2(1):1-9.[doi:10.1186/2041-1480-2-3]
    [26] Webber J. A programmatic introduction to Neo4j. In:Proc.of the 3rd Annual Conf. on Systems, Programming, and Applications:Software for Humanity. 2012. 217-218.[doi:10.1145/2384716.2384777]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

侯泳旭,段磊,李岭,卢莉,唐常杰.基于疾病信息网络的表型相似基因搜索.软件学报,2018,29(3):721-733

Copy
Share
Article Metrics
  • Abstract:4085
  • PDF: 7379
  • HTML: 3041
  • Cited by: 0
History
  • Received:July 31,2017
  • Revised:September 05,2017
  • Online: December 05,2017
You are the first2051302Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063