[关键词]
[摘要]
图作为一种表示复杂信息的数据结构,被广泛应用于社交网络、知识图谱、语义网、生物信息学和化学信息学等领域.随着各领域应用的普及和深入开展,如何管理这些复杂图数据,是目前图数据库技术面临的巨大挑战.图的相似性查询是图数据管理中的热点问题之一,对图查询问题的研究主要包括图的相似性查询等.重点研究基于编辑距离(graph edit distance)的图相似性查询处理问题.首先,通过对目前代表性的问题求解算法分析发现,目前已提出的过滤规则都具有自己的优缺点和适用性.其次,针对已有方法在过滤阶段自身存在的优缺点和适用性的问题,提出一种面向关系型数据库的过滤框架,新的过滤框架可以支持所有已有的过滤规则,从而通过结合不同的过滤规则来优化图相似查询算法以提高查询效率.该方法可以最大程度地保留不同过滤规则的优点并克服其缺点,从而对不同查询具有普遍适用性.最后,基于PubChem数据集,通过比较算法在求解查询结果的时间消耗,验证所提出算法的高效性及可扩展性.实验结果表明,所提出的方法优于现有算法.
[Key word]
[Abstract]
Graphs are widely used to model complicated data in many areas such as social networking, knowledge base, semantic web, bioinformatics and cheminformatics. More and more graph data are collected such that it has become a rather challenging problem to manage such complex data. The database community has had a long-standing interest in querying graph databases, and graph similarity search is one of most popular topics. This paper focuses on the graph similarity search problem with edit distance constraints. Firstly, several state-of-the-art methods are investigated to reveal that all the proposed pruning rules have limitations and none of them can outperform others on various queries. To address this problem, then a novel approach is proposed to support the graph similarity search in the framework of query evaluation using the relational model. The proposed approach develops a novel unified filtering framework by combing all the existing pruning rules. It can avoid limitations on existing pruning rules, and have more widely applications. A series of experiments are also conducted to evaluate the proposed approach. The results show that the new approach can outperform all existing state-of-the-art methods.
[中图分类号]
TP311
[基金项目]
国家自然科学基金(61502504,61702432);中国人民大学科学研究基金(中央高校基本科研业务费专项资金)(15XNLF09);福建省中青年教师教育科研项目(JAT160003)