[关键词]
[摘要]
现实生活中,大量数据都可以使用多维网络进行建模.如何更好地对多维网络进行分析,是研究人员关注的重点.OLAP (联机分析处理)技术已被证实是对多维关系数据进行分析的有效工具,但应用OLAP技术管理与分析多维网络数据以支持有效决策,仍是一项巨大的挑战.设计并提出了一种图立方体模型:路径-维度立方体,并针对提出的立方体模型将物化过程划分为关系路径物化与关联维度物化两部分,分别提出了物化策略,并基于Spark框架设计了相关算法.在此基础上,针对网络数据设计并细化了相关的GraphOLAP (图联机分析处理)操作,丰富了框架的分析角度,提高了对多维网络的分析能力.最后,在Spark上实现了相关算法,通过对多个真实应用场景中的数据构建多维网络,在分析框架上进行了分析,实验结果表明,所提出的图立方体模型和物化算法具有一定的有效性和可扩展性.
[Key word]
[Abstract]
Most data in real life can be described as multidimensional networks. How to process the analysis on multidimensional networks from multiple views and multiple granularities is still the focus of current research. Meanwhile, OLAP (online analytical processing) technology has been proven to be an effective tool on relational data. However, it is an enormous challenge to manage and analyze multidimensional heterogeneous networks via OLAP technology to support effective decision making. In this paper, a P&D (path and dimension) graph cube model is proposed. Based on this model, the graph cube materialization is divided into two parts, termed as path related materialization and dimension related materialization, and the corresponding materialization algorithms are designed. Some GraphOLAP operations are also refined to improve the ability of analyzing multidimensional networks. Finally, the algorithms are implemented on Spark and the multidimensional networks are constructed through real datasets. These networks are then analyzed using the framework. The results of experiments validate the effectiveness and scalability of P&D graph cube model and the materialization algorithms.
[中图分类号]
TP311
[基金项目]
国家重点基础研究发展计划(973)(2013CB329606);国家自然科学基金(61772082)