大规模图的分布式核分解算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP301

基金项目:

国家自然科学基金(62172146, 62102143); 湖南省自然科学基金(2023JJ10016, 2022JJ30009, 2023JJ30083); 湖南省教育厅科学研究重点项目(22A0592)


Distributed Approaches to Core Decomposition on Large-scale Graphs
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着互联网信息技术的发展, 社交网络、计算机网络及生物信息网络等领域涌现海量大规模图数据. 鉴于传统图数据管理技术在处理大规模图时存在存储及性能方面的局限, 大规模图的分布式处理技术已成为图数据库领域的研究热点, 并得到工业界和学术界的广泛关注. 图的核分解用于计算图中所有顶点的核值, 有助于挖掘重要图结构信息, 在社区搜索、蛋白质结构分析和网络结构可视化等诸多应用中发挥着关键作用. 当前以顶点为中心计算模式的分布式核分解算法中采用一种广播的消息传递机制, 一方面, 存在大量的冗余通信及计算开销; 另一方面, 处理大规模图核分解过程中易产生内存溢出问题. 为此, 分别提出基于全局激活和层次剥离计算框架, 并提出分布式核分解新算法, 通过引入基于顶点核值局部性特点的消息剪枝策略和以计算节点为中心的计算新模式, 保证算法有效性的同时提升其性能. 在国家超级计算长沙中心分布式集群上, 分别针对大规模真实和合成数据集, 算法总耗时性能提升比例为37%–98%, 验证所提模型和算法的有效性和高效性.

    Abstract:

    With the development of Internet information technology, large-scale graphs have widely emerged in social networks, computer networks, and biological information networks. In view of the storage and performance limitations of traditional graph data management technology when dealing with large-scale graphs, distributed management technology has become a hotspot in industry and academia fields. The core decomposition is adopted to get core numbers of vertices in a graph and plays a key role in many applications, including community search, protein structure analysis, and network structure visualization. The existing distributed core decomposition algorithm applied a broadcast message delivery mechanism based on the vertex-centric mode, which may generate a large amount of redundant communication and computation overhead and lead to memory overflow when processing large-scale graphs. To address these issues, this study proposes novel distributed core decomposition algorithms based on global activation and peeling calculation frameworks, respectively. In addition, there are several strategies designed to improve algorithm performance. Based on the locality of the vertex core number, the study proposes a new message-pruning strategy and a new worker-centric computing mode, thereby improving the efficiency of our algorithms. To verify those strategies, this study deploys the proposed models and algorithms on the distributed cluster of the National Supercomputing Center in Changsha, and the effectiveness and efficiency of the proposed methods are evaluated through a large number of experiments on real and synthetic data sets. The total time performance of the algorithm is improved by 37% to 98%.

    参考文献
    相似文献
    引证文献
引用本文

翁同峰,周旭,李肯立,胡逸騉.大规模图的分布式核分解算法.软件学报,,():1-22

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-05-03
  • 最后修改日期:2023-08-03
  • 录用日期:
  • 在线发布日期: 2024-01-31
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号