KGDB: Knowledge Graph Database System with Unified Model and Query Language
Author:
Affiliation:

Fund Project:

National Key Research and Development Program (2019YFE0198600); National Natural Science Foundation of China (61972275); CCF-Huawei Database Innovation Research Plan (CCF-Huawei DBIR2019004B)

  • Article
  • | |
  • Metrics
  • |
  • Reference [42]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Knowledge graph is an important cornerstone of artificial intelligence, which currently has two main data models: RDF graph and property graph. There are several query languages on these two data models. The query language on RDF graph is SPARQL, and the query language on property graph is mainly Cypher. Over the last decade, various communities have developed different data management methods for RDF graphs and property graphs. Inconsistent data models and query languages hinder the wider application of knowledge graphs. KGDB is a knowledge graph database system with unified data model and query language. (1) Based on the relational model, a unified storage scheme is proposed, which supports the efficient storage of RDF graphs and property graphs, and meets the requirement of knowledge graph data storage and query load. (2) Using the clustering method based on characteristic sets, KGDB can handle the issue of untyped triple storage. (3) It realizes the interoperability of SPARQL and Cypher, which are two different knowledge graph query languages, and enables them to operate on the same knowledge graph. The extensive experiments on real-world datasets and synthetic datasets are carried out. The experimental results show that, compared with the existing knowledge graph database management systems, KGDB can not only provide more efficient storage management, but also has higher query efficiency. KGDB saves 30% of the storage space on average compared with gStore and Neo4j. The experimental results on basic graph pattern matching query show that, for the real-world dataset, the query efficiency of KGDB is generally higher than that of gStore and Neo4j, and can be improved by at most two orders of magnitude.

    Reference
    [1] Wang X, Zou L, Wang CK, Peng P, Feng ZY. Research on knowledge graph data management:A survey. Ruan Jian Xue Bao/Journal of Software, 2019,30(7):2139-2174(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5841.htm[doi:10. 13328/j.cnki.jos.005841]
    [2] Zou L, Özsu MT, Chen L. gStore:A graph-based SPARQL query engine. The VLDB Journal, 2014,23(4):565-590.
    [3] The Neo4j Team. The Neo4j manual v4.1. 2020. https://neo4j.com/docs/developer-manual/current/
    [4] Dgraph Labs, Inc. The Dgraph homepage. 2020. https://dgraph.io/
    [5] The HugeGraph Team. The HugeGraph manual. 2020. https://hugegraph.github.io/hugegraph-doc/
    [6] Abadi DJ, Marcus A, Madden SR. Scalable semantic Web data management using vertical partitioning. In:Klas W, ed. Proc. of the 33rd Int'l Conf. on Very Large Data Bases. Vienna:VLDB Endowment, 2007. 411-422.
    [7] Bornea MA, Dolby J, Kementsietsidis A. Building an efficient RDF store over a relational database. In:Ross K, ed. Proc. of the 2013 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM, 2013. 121-132.
    [8] Moerkotte G, Neumann T. Characteristic sets:Accurate cardinality estimation for RDF queries with multiple joins. IEEE Trans.on Data Engineering, 2011,984-994.
    [9] Anagnostopoulos I, Mamoulis N, et al. Extended characteristic sets:Graph indexing for SPARQL query optimization. In:Proc. of the 2017 IEEE Int'l Conf. on Data Engineering (ICDE). California:IEEE, 2017. 497-508.
    [10] Anyanwu K, Kim H, et al. Type-Based semantic optimization for scalable RDF graph pattern matching. In:Proc. of the 26th Int'l Conf. on World Wide Web. New York:ACM, 2017. 785-793.
    [11] JanusGraph Authors. JanusGraph-Distributed graph database. 2020. http://janusgraph.org/
    [12] TigerGraph. TigerGraph-The first native parallel graph. 2020. https://www.tigergraph.com/
    [13] Zou L, Peng P. A survey of distributed RDF data management. Journal of Computer Research and Development, 2017,54(6):1213-1224(in Chinese with English abstract).
    [14] Wang TT, Rong CT, Lu W. Survey on technologies of distributed graph processing systems (in Chinese with English abstract). Ruan Jian Xue Bao/Journal of Software, 2018,29(3):569-586. http://www.jos.org.cn/1000-9825/5450.htm[doi:10.13328/j.cnki.jos. 005450]
    [15] Harris S, Gibbins N. 3store:Efficient bulk RDF storage. In:Volz R, ed. Proc. of the 1st Int'l Workshop on Practical and Scalable Semantic Systems. Sanibel Island:CEUR-WS.org, 2004. 81-95.
    [16] Pan Z, Heflin J. DLDB:Extending relational databases to support semantic Web queries. In:Volz R, ed. Proc. of the 1st Int'l Workshop on Practical and Scalable Semantic Systems. Sanibel Island:CEUR-WS.org, 2004. 109-113.
    [17] Wilkinson K. Jena property table implementation. In:Smart PR, ed. Proc. of the 2nd Int'l Workshop on Scalable Semantic Web Knowledge Base Systems. Athens, 2006. 35-46.
    [18] Abadi DJ, Marcus A, Madden SR. SW-Store:A vertically partitioned DBMS for semantic Web data management. VLDB Journal, 2009,18(2):385-406.
    [19] Yuan P, Liu P, Wu B, et al. TripleBit:A fast and compact system for large scale RDF data. Proc. of the VLDB Endowment, 2013, 6(7):517-528.
    [20] Neumann T, Weikum G. RDF-3X:A RISC-style engine for RDF. Proc. of the VLDB Endowment, 2008,1(1):647-659.
    [21] Weiss C, Karras P, Bernstein A. Hexastore:Sextuple indexing for semantic Web data management. Proc. of the VLDBEndowment, 2008,1(1):1008-1019.
    [22] Kim H, Ravindra P, et al. A semantics-aware storage framework for scalable processing of knowledge graphs on Hadoop. IEEE Trans. on Big Data, 2017:193-202.
    [23] Sun W, Fokoue A, Srinivas K. SQLgraph:An efficient relational-based property graph store. In:Sellis T, ed. Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM, 2015. 1887-1901.
    [24] The AgensGraph Team. Manual v1.0. 2020. https://bitnine.net/documentations/manual/agens_graph_developer_manual_en.html
    [25] Chodorow K. MongoDB:The Definitive Guide:Powerful and Scalable Data Storage. O'Reilly Media, Inc., 2013.
    [26] Blazegraph by Systap, LLC. Blazegraph. 2020. https://www.blazegraph.com/
    [27] OpenLink Software. OpenLink virtuoso. 2020. https://virtuoso.openlinksw.com/
    [28] Eclipse RDF4J. RDF4J. 2020. http://rdf4j.org/
    [29] Neumann T, Weikum G. RDF-3X:A RISC-style engine for RDF. Proc. of the VLDB Endowment, 2008,1(1):647-659.
    [30] Franz Inc. AllegroGraph. 2020. https://franz.com/agraph/allegrograph/
    [31] Ontotext. GraphDB. 2020. http://graphdb.ontotext.com/
    [32] Apache TinkerPop. TinkerPop3 documentation v.3.4.8. 2020. https://tinkerpop.apache.org/docs/3.4.8/reference/
    [33] Callidus Software Inc. OrientDB-Multi-Model database. 2020. http://orientdb.org/
    [34] S1CK. Cypher for apache spark. 2020. https://github.com/opencypher/cypher-for-apache-spark
    [35] Gutierrez C, Hurtado CA, Mendelzon AO. Foundations of semantic Web databases. Journal of Computer and System Sciences, 2011,77(3):520-541.
    [36] Francis N, Green A, Guagliardo P. Cypher:An evolving query language for property graphs. In:Das G, ed. Proc. of the 2018 Int'l Conf. on Management of Data. New York:ACM, 2018. 1433-1445.
    [37] Guo Y, Pan Z, Heflin J. LUBM:A benchmark for OWL knowledge base systems. Web Semantics:Science, Services and Agentson the World Wide Web, 2005,3(2-3):158-182.
    [38] University of Mannheim. DBpedia. 2020. http://wiki.dbpedia.org/About
    附中文参考文献:
    [1] 王鑫,邹磊,王朝坤,彭鹏,冯志勇.知识图谱数据管理研究综述.软件学报,2019,30(7):2139-2174. http://www.jos.org.cn/1000-9825/5841.htm[doi:10.13328/j.cnki.jos.005841]
    [13] 邹磊,彭鹏.分布式RDF数据管理综述.计算机研究与发展,2017,54(6):1213-1224.
    [14] 王童童,荣垂田,卢卫.分布式图处理系统技术综述.软件学报,2018,29(3):569-586. http://www.jos.org.cn/1000-9825/5450.htm[doi:10.13328/j.cnki.jos.005450]
    Cited by
Get Citation

刘宝珠,王鑫,柳鹏凯,李思卓,张小旺,杨雅君. KGDB:统一模型和语言的知识图谱数据库管理系统.软件学报,2021,32(3):781-804

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 20,2020
  • Revised:September 03,2020
  • Online: January 21,2021
  • Published: March 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063