Framework for Domain-Oriented Academic Literatures Retrieval
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    A literature retrieval system, which returns user papers domain-related with queries and ranks papers by importance, can help users quickly learn one academic domain. This paper develops a framework for the domain-oriented literature retrieval, which combines links and contents analysis to search and rank important papers in one academic domain. This framework designs a score function that evaluates both importance of the paper and its relevance to the domain. The study first proposes a community-core discovery algorithm, which is capable of finding a collection of papers domain-related with query from citation network and calculates an importance score for each paper. To assign other papers a domain-related score, a supervised non-negative matrix factorization method, using identified domain-related paper as prior knowledge, is also developed. The experiments conducted on synthetic and real datasets demonstrate the feasibility and applicability of this framework.

    Reference
    [1] Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(8):888-905. [doi: 10.1109/34.868688]
    [2] Hagen L, Kahng AB. New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on Computed Aided Design, 1992,11(9):1074-1085. [doi: 10.1109/43.159993]
    [3] Ding C, He X, Zha H, Gu M, Simon H. A min-max cut algorithm for graph partitioning and data clustering. In: Cercone N, ed. Proc.of the 2001 IEEE Int’l Conf. on Data Mining. Washington: IEEE Computer Society, 2001. 107-114. [doi: 10.1109/ICDM.2001.989507]
    [4] Newman MEJ. Fast algorithm for detecting community structure in networks. Physical Review E, 2004,69(6):66-72. [doi: 10.1103/PhysRevE.69.066133]
    [5] Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Physical Review E, 2004,69(2):1-15. [doi: 10.1103/PhysRevE.69.026113]
    [6] Leicht EA, Clarkson G, Shedden K, Newman MEJ. Large-Scale structure of time evolving citation networks. The EuropeanPhysical Journal B—Condensed Matter and Complex Systems, 2007,59(1):75-83. [doi: 10.1140/epjb/e2007-00271-7]
    [7] Shen HW, Cheng XQ, Chen HQ, Liu Y. Information bottleneck based community detection in network. Chinese Journal ofComputers, 2008,31(4):677-686 (in Chinese with English abstract).
    [8] Yang N, Lin SX, Gao Q, Meng XF. Discovering signature of potential Web communities from clusters of MCL. Chinese Journal ofComputers, 2007,30(7):1086-1093 (in Chinese with English abstract).
    [9] Gan WY, He N, Li DY, Wang JM. Community discovery method in networks based on topological potential. RuanjianXuebao/Journal of Software, 2009,20(8):2241-2254 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3318.htm [doi: 10.3724/SP.J.1001.2009.03318]
    [10] Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature, 1999,401(6755):788-791. [doi: 10.1038/44565]
    [11] Lin CJ. Projected gradient methods for non-negative matrix factorization. Neural Computation, 2007,19(10):2756-2779. [doi: 10.1162/neco.2007.19.10.2756]
    [12] Zhu SH, Yu K, Chi Y, Gong YH. Combining content and link for classification using matrix factorization. In: Kraaij W, ed. Proc.of the 30th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM Press, 2007.487-494. [doi: 10.1145/1277741.1277825]
    [13] Chen YH, Rege M, Dong M, Hua J. Non-Negative matrix factorization for semi-supervised data clustering. Knowledge andInformation Systems, 2008,17(3):355-379. [doi: 10.1007/s10115-008-0134-6]
    [14] Chen P, Xie H, Maslov S, Redner S. Finding scientific gems with Google’s PageRank algorithm. Journal of Infometrics, 2007,1(1):8-15. [doi: 10.1016/j.joi.2006.06.001]
    [15] Ding Y, Cronin B. Popular and/or prestigious? Measures of scholarly esteem. Information Processing & Management, 2011,47(1):80-96. [doi: 10.1016/j.ipm.2010.01.002]
    [16] Yan EJ, Ding Y. Discovering author impact: A PageRank perspective. Information Processing & Management, 2011,47(1):125-134. [doi: 10.1016/j.ipm.2010.05.002]
    [17] Ma N, Guan JC, Zhao Y. Bringing PageRank to the citation analysis. Information Processing and Management, 2008,44(2):800-810. [doi: 10.1016/j.ipm.2007.06.006]
    [18] Bolelli L, Ertekin S, Giles CL. Clustering scientific literature using sparse citation graph analysis. Lecture Notes in ComputerScience, 2006,4213:30-41. [doi: 10.1007/11871637_8]
    [19] Lagoze YJC, Giles CL. Detecting research topics via the correlation between graphs and texts. In: Berkhin P, ed. Proc. of the 13thACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2007. 370-379. [doi: 10.1145/1281192.1281234]
    [20] Guo Z, Zhang ZF, Zhu SH, Chi Y, Gong YH. Knowledge discovery from citation networks. In: Wu XD, ed. Proc. of the 2009 IEEEInt’l Conf. on Data Mining. Washington: IEEE Computer Society, 2009. 800-805. [doi: 10.1109/ICDM.2009.137]
    [21] Yin XS, Huang JXJ, Li ZJ. Mining and modeling linkage information from citation context for improving biomedical literatureretrieval. Information Processing & Management, 2011,47(1):53-67. [doi: 10.1016/j.ipm.2010.03.010]
    [22] Craswell N, Szummer M. Random walks on the click graph. In: Kraaij W, ed. Proc. of the 30th Annual Int’l ACM SIGIR Conf. onResearch and Development in Information Retrieval. New York: ACM Press, 2007. 239-246. [doi: 10.1145/1277741.1277784]
    [23] Jones KS, Walker KS, Robertson SE. A probabilistic model of information retrieval: Development and comparative experiments.Information Processing & Management, 2000,36(6):779-840. [doi: 10.1016/S0306-4573(00)00015-7]
    [24] Manning CD, Raghavan P, Schutze H, Wrote; Wang B, Trans. Introduction to Information Retrieval. Beijing: Post & TelecomPress, 2010. 160-161 (in Chinese).
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

邱江涛,唐常杰,李庆.面向领域的学术文献检索框架.软件学报,2013,24(4):798-809

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 05,2012
  • Revised:March 19,2012
  • Online: March 26,2013
You are the first2032355Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063