 |
|
|
|
 |
 |
 |
|
 |
|
 |
|
|
王鑫,徐强,柴乐乐,杨雅君,柴云鹏.大规模RDF图数据上高效率分布式查询处理.软件学报,2019,30(3):498-514 |
大规模RDF图数据上高效率分布式查询处理 |
Efficient Distributed Query Processing on Large Scale RDF Graph Data |
投稿时间:2018-07-20 修订日期:2018-09-20 |
DOI:10.13328/j.cnki.jos.005696 |
中文关键词: 星形分解 分布式 基本图模式匹配 大规模RDF图 MapReduce |
英文关键词:star decomposition distributed basic graph pattern matching large scale RDF graphs MapReduce |
基金项目:国家自然科学基金(61572353,61402323,61472427);天津市自然科学基金(17JCYBJC15400);数字出版技术国家重点实验室开放课题;北京自然科学基金(4172031) |
作者 | 单位 | E-mail | 王鑫 | 天津大学 智能与计算学部, 天津 300354 天津市认知计算与应用重点实验室, 天津 300354 | | 徐强 | 天津大学 智能与计算学部, 天津 300354 天津市认知计算与应用重点实验室, 天津 300354 | | 柴乐乐 | 天津大学 智能与计算学部, 天津 300354 天津市认知计算与应用重点实验室, 天津 300354 | | 杨雅君 | 天津大学 智能与计算学部, 天津 300354 天津市认知计算与应用重点实验室, 天津 300354 数字出版技术国家重点实验室, 北京 100871 | yjyang@tju.edu.cn | 柴云鹏 | 中国人民大学 信息学院, 北京 100872 | |
|
摘要点击次数: 2390 |
全文下载次数: 2099 |
中文摘要: |
知识图谱是智能数据的主要表现形式,随着知识图谱领域的不断发展,大量的智能图数据以资源描述框架(resource description framework,简称RDF)形式发布出来.RDF图上的SPARQL查询语义对应于图同态,是一个NP-完全问题.因此,如何使用分布式方法在大规模RDF图上有效回答SPARQL查询是一个富有挑战性的问题.目前已有研究使用MapReduce计算模型处理大规模RDF数据,但其将SPARQL查询拆分成单个的查询子句,没有考虑RDF数据的丰富语义和自身的图特性,导致MapReduce迭代次数过多.首先,利用RDF数据内嵌的语义和结构信息作为启发式信息,将查询图分解为星形的集合,可以在更少次迭代内得到查询结果.同时,分解算法给出中间结果较少的星形匹配顺序,基于此顺序,每轮MapReduce操作通过连接操作匹配一个新的星形,直至产生最终的答案.最后,在标准合成数据集WatDiv和真实数据集DBpedia上进行大量的实验评估.实验结果表明:所提基于星形分解的分布式SPARQL BGP匹配算法能够高效回答查询,查询时间比SHARD和S2X算法的查询时间平均提高一个数量级,且优化算法的查询时间与基本算法相比缩短了49.63%~78.71%. |
英文摘要: |
Knowledge graphs are the main representation form of intelligent data. With the development of knowledge graphs, more and more intelligent data has been released in the form of the resource description framework (RDF). It is known that the semantics of SPARQL correspond to graph homomorphism which is an NP-complete problem. Therefore, how to efficiently answer SPARQL queries in parallel over big RDF graphs has been widely recognized as a challenging problem. There are some research works using the MapReduce computational model to process big RDF graph. However, SPARQL queries in these works are decomposed into the set of query clauses without considering any semantics and graph structure embedded in RDF graph, which leads to overmuch MapReduce iterations. This study first decomposes the SPARQL query graph into a set of stars by utilizing the semantic and structural information embedded RDF graphs as heuristics, which can be matched in fewer MapReduce iterations. Meanwhile, a matching order of these stars is given to reduce intermediate results in MapReduce iterations. During the matching phase, each round of MapReduce adds one star with the join operation. The extensive experiments on both synthetic dataset WatDiv, and real-world dataset DBpedia are carried out. The experiments results demonstrate that the proposed star decomposition-based method can answer SPARQL BGP queries efficiently, which outperforms SHARD and S2X by one order of magnitude. Finally, extensive experiments show that the performance of the optimization algorithms is improved by 49.63% and 78.71% than the basic algorithm over both synthetic and real datasets. |
HTML 下载PDF全文 查看/发表评论 下载PDF阅读器 |
|
|
|
|
|
|
 |
|
|
|
|
 |
|
 |
|
 |
|