[关键词]
[摘要]
知识图谱是人工智能的重要基石,其目前主要有RDF图和属性图两种数据模型,在这两种数据模型之上有数种查询语言.RDF图上的查询语言为SPARQL,属性图上的查询语言主要为Cypher.10年来,各个社区开发了分别针对RDF图和属性图的不同数据管理方法,不统一的数据模型和查询语言限制了知识图谱的更广泛应用.KGDB(knowledge graph database)是统一模型和语言的知识图谱数据库管理系统:(1)以关系模型为基础,提出了统一的存储方案,支持RDF图和属性图的高效存储,满足知识图谱数据存储和查询负载的需求;(2)使用基于特征集的聚类方法解决无类型实体的存储问题;(3)实现了SPARQL和Cypher两种不同知识图谱查询语言的互操作性,使其能够操作同一个知识图谱.在真实数据集与合成数据集上进行的大量实验表明:KGDB与已有的知识图谱数据库管理系统相比,不仅能够提供更加高效的存储管理,而且具有更高的查询效率.KGDB平均比gStore和Neo4j节省了30%的存储空间,基本图模式查询上的实验表明:在真实数据集上的查询速度普遍高于gStore和Neo4j,最快可提高2个数量级.
[Key word]
[Abstract]
Knowledge graph is an important cornerstone of artificial intelligence, which currently has two main data models: RDF graph and property graph. There are several query languages on these two data models. The query language on RDF graph is SPARQL, and the query language on property graph is mainly Cypher. Over the last decade, various communities have developed different data management methods for RDF graphs and property graphs. Inconsistent data models and query languages hinder the wider application of knowledge graphs. KGDB is a knowledge graph database system with unified data model and query language. (1) Based on the relational model, a unified storage scheme is proposed, which supports the efficient storage of RDF graphs and property graphs, and meets the requirement of knowledge graph data storage and query load. (2) Using the clustering method based on characteristic sets, KGDB can handle the issue of untyped triple storage. (3) It realizes the interoperability of SPARQL and Cypher, which are two different knowledge graph query languages, and enables them to operate on the same knowledge graph. The extensive experiments on real-world datasets and synthetic datasets are carried out. The experimental results show that, compared with the existing knowledge graph database management systems, KGDB can not only provide more efficient storage management, but also has higher query efficiency. KGDB saves 30% of the storage space on average compared with gStore and Neo4j. The experimental results on basic graph pattern matching query show that, for the real-world dataset, the query efficiency of KGDB is generally higher than that of gStore and Neo4j, and can be improved by at most two orders of magnitude.
[中图分类号]
[基金项目]
国家重点研发计划(2019YFE0198600);国家自然科学基金(61972275);CCF-华为数据库创新研究计划(CCF-Huawei DBIR2019004B)