一种准确而高效的领域知识图谱构建方法
作者:
作者简介:

杨玉基(1994-),男,河南巩义人,硕士,主要研究领域为知识图谱,数据挖掘;许斌(1973-),男,博士,副教授,博士生导师,CCF高级会员,主要研究领域为知识图谱,数据挖掘,服务计算;胡家威(1991-),男,工程师,主要研究领域为人工智能应用;仝美涵(1995-),女,博士,主要研究领域为知识工程,信息抽取;张鹏(1979-),男,工程师,CCF专业会员,主要研究领域为知识图谱构建和应用,文本语义挖掘;郑莉(1963-),女,教授,CCF专业会员,主要研究领域为计算机应用.

通讯作者:

杨玉基,E-mail:yangyujiyyj@gmail.com

基金项目:

国家高技术研究发展计划(863)(2015AA015401)


Accurate and Efficient Method for Constructing Domain Knowledge Graph
Author:
Fund Project:

National High Technology Research and Development Plan of China (2015AA015401)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [69]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    作为语义网的数据支撑,知识图谱在知识问答、语义搜索等领域起着至关重要的作用,一直以来也是研究领域和工程领域的一个热点问题,但是,构建一个质量较高、规模较大的知识图谱往往需要花费巨大的人力和时间成本.如何平衡准确率和效率、快速地构建出一个高质量的领域知识图谱,是知识工程领域的一个重要挑战.对领域知识图谱构建方法进行了系统研究,提出了一种准确、高效的领域知识图谱构建方法——"四步法",将该方法应用到中国基础教育九门学科知识图谱的构建中,在较短时间内构建出了准确率较高的学科知识图谱,证明了该方法构建领域知识图谱的有效性.以地理学科知识图谱为例,使用"四步法"共得到67万个实例、1 421万条三元组,其中,标注数据的学科知识覆盖率和知识准确率均在99%以上.

    Abstract:

    In supporting semantic Web, knowledge graphs have played a vital role in many areas such as knowledge QA and semantic search. Therefore, they have become a hot topic in the field of research and engineering. However, it is often costly to build a large-scale knowledge graph with high accuracy. How to balance the accuracy and efficiency, and quickly build a high-quality domain knowledge graph, is a big challenge in the field of knowledge engineering. This paper engages a systematic study on the construction of domain knowledge graphs, and puts forward an accurate and efficient method of constructing domain knowledge graphs as "four-steps". This method has been applied to the construction of knowledge graphs of nine subjects in the k12 education of China, and the nine subject knowledge graphs have been developed with high accuracy, which demonstrates that the new method is effective. For example, the geographical knowledge graph, which is constructed using the "four-steps" method, has 670 thousand instances and 14.21 million triples. And as part of it, the annotation data's knowledge coverage and knowledge accuracy are both above 99%.

    参考文献
    [1] Bernerslee T, Hendler J, Lassila O. The semantic Web. Scientific American, 2001,284(5):34-43.
    [2] Bizer C, Lehmann J, Kobilarov G, et al. DBpedia-A crystallization point for the Web of data. Web Semantics:Science, Services and Agents on the World Wide Web, 2009,7(3):154-165.
    [3] Suchanek FM, Kasneci G, Weikum G. Yago:A core of semantic knowledge. In:Proc. of the 16th Int'l Conf. on World Wide Web. ACM Press, 2007. 697-706.
    [4] Bollacker K, Evans C, Paritosh P, et al. Freebase:A collaboratively created graph database for structuring human knowledge. In:Proc. of the 2008 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2008. 1247-1250.
    [5] Singhal A. Introducing the knowledge graph:Things, not strings. Official Google Blog, 2012.
    [6] Gruber TR. Towards principles for the design of ontologies used for knowledge sharing. Int'l Journal of Human-Computer Studies, 1993,43.
    [7] Studer R, Benjamins VR, Fensel D. Knowledge engineering:Principles and methods. Data & Knowledge Engineering, 1998,25(1):161-197.
    [8] Du WH. A comparative study of ontology construction methods. Journal of Infomation, 2005,24(10):24-25(in Chinese with English abstract).
    [9] Shang XL. Comparative analysis of foreign ontology construction methods. Library and Information Service, 2012,56(4):116-119(in Chinese with English abstract).
    [10] Liu YS. Research of approaches and development tools in constructing ontology. Journal of Modern Infomation, 2009,29(9):17-24(in Chinese with English abstract).
    [11] Han J, Xiang Y. A survey on ontology building. Computer Applications and Software, 2007,24(9):21-23(in Chinese with English abstract).
    [12] Miller GA. WordNet:A lexical database for English. Communications of the ACM, 1995,38(11):39-41.
    [13] Uschold M, King M. Towards a methodology for building ontologies. In:Proc. of the Workshop on Basic Ontological Issues in Knowledge. 1995,133(2):137-142.
    [14] Fox MS. The TOVE project towards a common-sense model of the enterprise. In:Proc. of the Int'l Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE'92). Paderborn, 1992. 25-34.
    [15] Swartout B, Patil R, Knight K, et al. Toward distributed use of large-scale ontologies. In:Proc. of the 10th Workshop on Knowledge Acquisition for Knowledge-Based Systems. 1996. 138-148.
    [16] Fernández-López M, Gómez-Pérez A, Juristo N. METHONTOLOGY:From ontological art towards ontological engineering. In:Proc. of the AAAI'97. 1997.
    [17] Noy NF, McGuinness DL. Ontology development 101:A guide to creating your first ontology. 2001. https://doi.org/10.1016/j.artmed.2004.01.014
    [18] Du XY, Li M, Wang S. A survey on ontology learning research. Ruan Jian Xue Bao/Journal of Software, 2006,17(9):1837-1847(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/17/1837.htm[doi:10.1360/jos171837]
    [19] Hu FH. Chinese knowledge graph construction method based on multiple data sources[Ph.D. Thesis]. Shanghai:East China University of Science and Technology, 2015(in Chinese with English abstract).
    [20] Qiu JP, Mu N, Lou W, et al. An analysis of the progress of semantic annotation at home and abroad. Information Studies:Theory & Application, 2014,37(5):12-16(in Chinese with English abstract).
    [21] Jing T, Zuo WL, Sun JG, et al. Semantic annotation of Chinese Web pages:From sentences to RDF representations. Journal of Computer Research and Development, 2008,45(7):1221-1231(in Chinese with English abstract).
    [22] Zou L, Liao SM. Comparison and analysis of semantic annotation tools based on ontology. Computer Application, 2004,24(S1):328-330(in Chinese with English abstract).
    [23] Tao W, Li P, Liao SM. Analysis and summary of current ontology-based semantic annotation tools. Journal of Anhui University of Technology and Science, 2005,20(2):52-55(in Chinese with English abstract).
    [24] Yin CY, Bi Q, Wang CQ. Research on the characteristics of semantic annotation tools and its applicability. Information Studies:Theory & Application, 2014,37(12):111-116(in Chinese with English abstract).
    [25] Guo SY, Dou C, Chang Z. Review on semantic annotations of Web pages. Journal of Intelligence, 2015,(4):169-175(in Chinese with English abstract).
    [26] Kiryakov A, Popov B, Terziev I, et al. Semantic annotation, indexing, and retrieval. In:Proc. of the Semantic Web (ISWC 2003). Berlin, Heidelberg:Springer-Verlag, 2003. 484-499.
    [27] Andrews P, Zaihrayeu I, Pane J. A classification of semantic annotation systems. Semantic Web, 2012,3(3):223-248.
    [28] Reeve L, Han H. Survey of semantic annotation platforms. In:Proc. of the ACM Symp. on Applied Computing. 2005. 1634-1638.
    [29] Sporny M, Longley D, Kellogg G, et al. JSON-LD 1.0. W3C Recommendation (2014-1-16). 2014.
    [30] Adida B, Birbeck M, McCarron S, et al. RDFa in XHTML:Syntax and processing. W3C Recommendation. 2008.
    [31] Structured Data Markup Helper. Webmasters, Google Inc. https://www.google.com/webmasters/markup-helper/
    [32] Grassi M, Morbidoni C, Nucci M, et al. Pundit:Semantically structured annotations for Web contents and digital libraries. In:Proc. of the SDA. 2012. 49-60.
    [33] Morbidonia C, Picciolib A. Pundit 2.0. 2015.
    [34] Heflin J, Hendler J, Luke S. SHOE:A knowledge representation language for internet applications. 1999. https://www.researchgate.net/publication/2620999_SHOE_A_Knowledge_Representation_Language_for_Internet_Applications
    [35] Petridis K, Anastasopoulos D, Saathoff C, et al. M-Ontomat-Annotizer:Image annotation linking ontologies and multimedia lowlevel features. In:Proc. of the Knowledge-Based Intelligent Information and Engineering Systems. Berlin, Heidelberg:SpringerVerlag, 2006. 633-640.
    [36] Kahan J, Koivunen MR, Prud'Hommeaux E, et al. Annotea:An open RDF infrastructure for shared Web annotations. Computer Networks, 2002,39(5):589-608.
    [37] Kalyanpur A, Hendler J, Parsia B, et al. SMORE-Semantic markup, ontology, and RDF editor. 2006. https://www.researchgate.net/publication/235138099_SMORE-semantic_markup_ontology_and_RDF_editor
    [38] Kogut P, Holmes W. AeroDAML:Applying information extraction to generate DAML annotations from Web pages. In:Proc. of the Workshop Knowledge Markup & Semantic Annotation (K-CAP 2001). Victoria, 2001.
    [39] Vargas-Vera M, Motta E, Domingue J, et al. MnM:Ontology driven semi-automatic and automatic support for semantic markup. In:Proc. of the Knowledge Engineering and Knowledge Management:Ontologies and the Semantic Web. Berlin, Heidelberg:Springer-Verlag, 2002. 379-391.
    [40] Gupta S, Manning CD. Improved pattern learning for bootstrapped entity extraction. In:Proc. of the CoNLL. 2014. 98-108.
    [41] Curran JR, Murphy T, Scholz B. Minimising semantic drift with mutual exclusion bootstrapping. In:Proc. of the 10th Conf. of the Pacific Association for Computational Linguistics. 2007. 172-180.
    [42] Zhang CY. The study of entity relation extraction algorithm[Ph.D. Thesis]. Beijing:Beijing University of Posts and Telecommunications, 2015(in Chinese With English abstract).
    [43] Liu Q, Li Y, Duan H, et al. Knowledge graph construction techniques. Journal of Computer Research and Development, 2016,53(3):582-600(in Chinese with English abstract).
    [44] Salton G. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
    [45] Mihalcea R, Tarau P. TextRank:Bringing order into texts. In:Proc. of the Association for Computational Linguistics. 2004.
    [46] Page L, Brin S, Motwani R, et al. The PageRank citation ranking:Bringing order to the Web. Stanford Digital Libraries Working Paper, 1998,9(1):1-14.
    [47] DeRose S, Maler E, Daniel R. XML pointer language (XPointer). 2000. https://www.researchgate.net/publication/2771988_Xml_Pointer_Language_XPointer
    [48] Broekstra J, Kampman A, Van Harmelen F. Sesame:A generic architecture for storing and querying RDF and RDF schema. In:Proc. of the Semantic Web (ISWC 2002). Berlin, Heidelberg:Springer-Verlag, 2002. 54-68.
    [49] Zhang Z, Sun L, Han X. A joint model for entity set expansion and attribute extraction from Web search queries. In:Proc. of the AAAI. 2016. 3101-3107.
    [50] Hinton GE. Learning distributed representations of concepts. In:Proc. of the 8th Annual Conf. of the Cognitive Science Society. 1986. 12.
    [51] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781, 2013.
    [52] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In:Proc. of the Advances in Neural Information Processing Systems. 2013. 3111-3119.
    [53] Lin Y, Liu Z, Sun M. Neural relation extraction with multi-lingual attention. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics, Vol.1:Long Papers. 2017. 34-43.
    [54] Mintz M, Bills S, Snow R, et al. Distant supervision for relation extraction without labeled data. In:Proc. of the Joint Conf. of the 47th Annual Meeting of the ACL and the 4th Int'l Joint Conf. on Natural Language Processing of the AFNLP, Vol.2. Association for Computational Linguistics, 2009. 1003-1011.
    附中文参考文献:
    [8] 杜文华.本体构建方法比较研究.情报杂志,2005,24(10):24-25.
    [9] 尚新丽.国外本体构建方法比较分析.图书情报工作,2012,56(4):116-119.
    [10] 刘宇松.本体构建方法和开发工具研究.现代情报,2009,29(9):17-24.
    [11] 韩婕,向阳.本体构建研究综述.计算机应用与软件,2007,24(9):21-23.
    [18] 杜小勇,李曼,王珊.本体学习研究综述.软件学报,2006,17(9):1837-1847. http://www.jos.org.cn/1000-9825/17/1837.htm[doi:10. 1360/jos171837]
    [19] 胡芳槐.基于多种数据源的中文知识图谱构建方法研究[博士学位论文].上海:华东理工大学,2015.
    [20] 邱均平,牟楠,楼雯,等.国内外语义标注研究进展分析.情报理论与实践,2014,37(5):12-16.
    [21] 荆涛,左万利,孙吉贵,等.中文网页语义标注:由句子到RDF表示.计算机研究与发展,2008,45(7):1221-1231.
    [22] 邹亮,廖述梅.基于本体的语义标注工具比较与分析.计算机应用,2004,24(S1):328-330.
    [23] 陶皖,李平,廖述梅.当前基于本体的语义标注工具的分析.安徽工程科技学院学报:自然科学版,2005,20(2):52-55.
    [24] 尹长余,毕强,王传清.语义标注工具的特征分析及其适用性研究.情报理论与实践,2014,37(12):111-116.
    [25] 郭少友,窦畅,常桢.网页语义标注研究综述.情报杂志,2015,(4):169-175.
    [42] 张春云.实体关系抽取算法研究[博士学位论文].北京:北京邮电大学,2015.
    [43] 刘峤,李杨,段宏,等.知识图谱构建技术综述.计算机研究与发展,2016,53(3):582-600.
    相似文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杨玉基,许斌,胡家威,仝美涵,张鹏,郑莉.一种准确而高效的领域知识图谱构建方法.软件学报,2018,29(10):2931-2947

复制
分享
文章指标
  • 点击次数:7899
  • 下载次数: 17277
  • HTML阅读次数: 6864
  • 引用次数: 0
历史
  • 收稿日期:2017-07-22
  • 最后修改日期:2017-11-08
  • 在线发布日期: 2018-02-08
文章二维码
您是第19893335位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号