Method of Entity Set Expansion Based on Frequent Pattern Under Meta Path
Author:
Affiliation:

Fund Project:

National Key Research and Development Program of China (973) (2017YFB0803304); National Natural Science Foundation of China (61772082, 61375058); Beijing Municipal Natural Science Foundation of China (4182043)

  • Article
  • | |
  • Metrics
  • |
  • Reference [46]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Entity set expansion (ESE) refers to getting a more complete set according to some rules, given several seed entities with specific semantic meaning. As a popular data mining task, ESE has many applications, such as dictionary construction and query suggestion. Contemporary ESE mainly utilizes text or Web information. That is, the intrinsic relations among entities are inferred from theirco-occurrences in text or Web. With the surge of knowledge graph in recent years, it is possible to extend entities according to their co-occurrences in knowledge graph. This paper studies the problem of the entity set expansion in knowledge graph. That is, given several seed entities, how to obtain more entities by leveraging knowledge graph. Firstly, the knowledge graph is modeled as a heterogeneous information network (HIN), which contains multiple types of entities or relationships. Next, a novel method of entity set expansion based on frequent pattern under Meta path, called FPMP_ESE, is proposed. FPMP_ESE employs Meta paths to capture the implicit common traits of seed entities. In order to find the important Meta paths between entities, an automatic Meta path generation method is designed based on frequent pattern called FPMPG. Then, two kinds of heuristic and PU learning methods are developed to distribute the weights of Meta paths. Finally, experiments on real dataset Yago demonstrate that the proposed method has better effectiveness and higher efficiency compared to other methods.

    Reference
    [1] Cohen WW, Sarawagi S. Exploiting dictionaries in namedentity extraction:Combining semi-Markov extraction processesand data integration methods. In:Proc. of the KDD. ACM Press, 2004. 89-98.
    [2] Pantel P, Lin D. Discovering word senses from text. In:Proc. of the 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. ACM Press, 2002. 613-619.
    [3] He Y, Xin D. Seisa:Set expansion by iterative similarity aggregation. In:Proc. of the WWW. ACM Press, 2011. 427-436.
    [4] Wang RC, Cohen WW. Language-Independent set expansionof named entities using the Web. In:Proc. of the ICDM. IEEE, 2007. 342-350.
    [5] Wang RC, Cohen WW. Iterative set expansion of named entities using the Web. In:Proc. of the ICDM. IEEE, 2008. 1091-1096.
    [6] Li XL, Zhang L, Liu B, Ng SK. Distributional similarityvs. PU learning for entity set expansion. In:Proc. of the ACL. ACL Press, 2010. 359-364.
    [7] Qi ZY, Liu K, Zhao J. A novel entity set expansion method leveraging entity semantic knowledge. Journal of Chinese Informantion Processing, 2013,27(2):1-10(in Chinese with English abstract).
    [8] Sun Y, Han J, Yan X, Yu PS, Wu T. Pathsim:Meta path-based top-k similarity search in heterogeneous information networks. Proc. of the VLDB Endowment, 2011,4(11):992-1003.
    [9] Zheng Y, Shi C, Cao X, Li X, Wu B. Entity set expansion with meta path in knowledge graph. In:Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Cham:Springer-Verlag, 2017. 317-329.
    [10] Singhal A. Introducing the knowledge graph:Things, not strings. In:Proc. of the Official Google Blog. 2012.
    [11] Lenat DB. CYC:A large-scale investment in knowledge infrastructure. Communications of the ACM, 1995,38(11):33-38.
    [12] Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase:A collaboratively createdgraph database for structuring human knowledge. In:Proc. of the 2008 ACM SIGMOD Int'l Conf. on Management of Data. New York. ACM Press, 2008. 1247-1250.
    [13] Suchanek FM, Kasneci G, Weikum G. YAGO:A core of semantic knowledge unifying word netand wikipedia. In:Proc. of the 16th Int'l Conf. on World Wide Web. New York:ACM Press, 2007. 697-706.
    [14] Dong XL, Murphy K, Gabrilovich E, Heitz G, Horn W, Lao N, Strohmann T, Sun SH, Zhang W. Knowledge vault:A Web-scale approach to probabilisticknowledge fusion. In:Proc. of the 20th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM Press, 2014. 601-610.
    [15] Paulheim H, Bizer C. Type inference on noisy RDF data. In:Proc. of the Semantic Web (ISWC 2013). LNCS 8218, Berlin, Heidelberg:Springer-Verlag, 2013. 510-525.
    [16] Socher R, Chen DQ, Manning CD, Ng A. Reasoning with neural tensor networks for knowledge base completion. In:Proc. of the Advances in Neural Information Processing Systems 26(NIPS 2013). Curran Associates, Inc., 2013. 926-934.
    [17] Zhao Y, Gao S, Gallinari P, Guo J. Knowledgebase completion by learning pairwise-interaction differentiate dembeddings. Data Mining and Knowledge Discovery, 2015,29(5):1486-1504.
    [18] Bryl V, Bizer C. Learning conflict resolutionstrategies for cross-language wikipedia data fusion. In:Proc. of the Companion Publication of the 23rd Int'l Conf. on World Wide Web Companion. Geneva:Int'l World Wide Web Conf. Steering Committee, 2014. 1129-1134.
    [19] Paulheim H, Bizer C. Improving the qualityof linked data using statistical distributions. Int'l Journal on Semantic Web and Information Systems (IJSWIS), 2014,10(2):63-86.
    [20] Zou L, Huang R, Wang H, Yu JX, He W, Zhao D. Natural language question answering over RDF:A graph datadriven approach. In:Proc. of the SIGMOD. ACM Press, 2014. 313-324.
    [21] Cao X, Zheng Y, Shi C, Li J, Wu B. Link prediction in schema-rich heterogeneous information network. In:Proc. of the PacificAsia Conf. on Knowledge Discovery and Data Mining. Springer Int'l Publishing, 2016. 449-460.
    [22] Nickel M, Murphy K, Tresp V, Gabrilovich E. A review of relational machine learning for knowledge graphs. Proc. of the IEEE, 2016,104(1):11-33.
    [23] Sun Y, Yu Y, Han J. Ranking-Based clustering of heterogeneous information networks with star network schema. In:Proc. of the KDD. 2009. 797-806.
    [24] Shi C, Li Y, Zhang J, Sun Y, Yu PS. A survey on heterogeneous information network analysis. IEEE Trans. on Knowledge and Data Engineering, 2017,29(1):17-37.
    [25] Shi C, Kong X, Huang Y, Philip SY, Wu B. HeteSim:A general framework for relevance measure in heterogeneous networks. IEEE Trans. on Knowledge & Data Engineering, 2014,26(10):2479-2492.
    [26] Agrawal R, Srikant R, et al. Fast algorithms for mining associationrules. In:Proc. of the 20th Int'l Conf. Very Large Data Bases, Vol.1215. VLDB, 1994. 487-499.
    [27] Han J, Pei J, Yin Y. Mining frequent patterns withoutcandidate generation. ACM SIGMOD Record, 2000,29(2):1-12.
    [28] Rakesh A, Srikant R. Mining sequential patterns. In:Proc. of the 11th Int'l Conf. on Data Engineering. IEEE, 1995.
    [29] Abedjan Z, Naumann F. Improving RDF data through associationrule mining. Datenbank-Spektrum, 2013,13(2):111-120.
    [30] Jiang T, Tan AH. Mining RDF metadata for generalized association rules. In:Proc. of the Int'l Conf. on Database and Expert Systems Applications. Springer-Verlag, 2006. 223-233.
    [31] Pasca M. Weakly-Supervised discovery of named entities using Web search queries. In:Proc. of the CIKM. ACM Press, 2007. 683-690.
    [32] Jindal P, Roth D. Learning from negative examples in setexpansion. In:Proc. of the ICDM. IEEE, 2011. 1110-1115.
    [33] Yu X, Sun Y, Norick B, Mao T, Han J. User guided entitysimilarity search using meta-path selection in heterogeneous information networks. In:Proc. of the CIKM. ACM Press, 2012. 2025-2029.
    [34] Metzger S, Schenkel R, Sydow M. Qbees:Query by entityexamples. In:Proc. of the CIKM. ACM Press, 2013. 1829-1832.
    [35] Metzger S, Schenkel R, Sydow M. Aspect-Based similar entity search in semantic knowledge graphs with diversity-awareness and relaxation. In:Proc. of the CWI and IAT. IEEE Computer Society, 2014. 60-69.
    [36] Chen J, Chen Y, Du X, Zhang X, Zhou X. Seed:A systemfor entity exploration and debugging in large-scale knowledgegraphs. In:Proc. of the ICDM. IEEE, 2016. 1350-1353.
    [37] Zhang J, Tang J. Focus of the next generation search engineer:Knowledge graph. Chinese Computer Society Communication, 2013, 9(4):64-68(in Chinese with English abstract).
    [38] Zou L, Chen YG. Massive RDF data management. Chinese Computer Society Communication, 2012,8(11):32-43(in Chinese with English abstract).
    [39] Aggarwal CC, Han J. Frequent Pattern Mining. Springer-Verlag, 2014.
    [40] Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In:Proc. of the 14th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. ACM Press, 2008. 213-220.
    [41] Shi B, Zhang ZZ, Sun L, Han XP. A probabilistic co-bootstrapping method for entity set expansion. In:Proc. of the 25th Int'l Conf. on Computational Linguistics (COLING 2014), Proc. of the Conf.:Technical Papers. Dublin, 2014. 2280-2290.
    [42] Lao N, Cohen WW. Relational retrieval using a combination of path-constrained random walks. Machine Learning, 2010,81(1):53-67.
    附中文参考文献:
    [7] 齐振宇,刘康,赵军.一种融合实体语义知识的实体集合扩展方法.中文信息学报,2013,27(2):1-10.
    [37] 张静,唐杰.下一代搜索引擎的焦点:知识图谱.中国计算机学会通讯,2013,9(4):64-68.
    [38] 邹磊,陈跃国.海量RDF数据管理.中国计算机学会通讯,2012,8(11):32-43.
    Related
    Cited by
Get Citation

郑玉艳,田莹,石川.一种元路径下基于频繁模式的实体集扩展方法.软件学报,2018,29(10):2915-2930

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 20,2017
  • Revised:November 08,2017
  • Online: February 08,2018
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063