基于邻域k-核的社区模型与查询算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划(2021VEB3301301);国家自然科学基金(62072034,U2241211);中国博士后科学基金(2023M730251,2023TQ0026)


Community Model and Query Algorithm Based on Neighborhood k-core
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    现实生活中的网络通常存在社区结构,社区查询是图数据挖掘的基本任务.现有研究工作提出了多种模型来识别网络中的社区,如基于k-核的模型和基于k-truss的模型.然而,这些模型通常只限制社区内节点或边的邻居数量,忽略了邻居之间的关系,即节点的邻域结构,从而导致社区内节点的局部稠密性较低.针对这一问题,将节点的邻域结构信息融入k-核稠密子图中,提出一种基于邻域连通k-核的社区模型,并定义了社区的稠密度.基于这一新模型,研究了最稠密单社区查询问题,即返回包含查询节点集且具有最高稠密度的社区.在现实生活图数据中,一组查询节点可能会分布在多个不相交的社区中.为此,进一步研究了基于稠密度阈值的多社区查询问题,即返回包含查询节点集的多个社区,且每个社区的稠密度不低于用户指定的阈值.针对最稠密单社区查询和基于稠密度阈值的多社区查询问题,首先定义了边稠密度的概念,并提出了基于边稠密度的基线算法.为了提高查询效率,设计了索引树和改进索引树结构,能够支持在多项式时间内输出结果.通过与基线算法在多组数据集上的对比,验证了基于邻域连通k-核的社区模型的有效性和所提出查询算法的效率.

    Abstract:

    Real-world networks often exhibit community structures, and community query is a fundamental task in graph data mining. Existing studies introduced various models to identify communities within networks, such as k-core based models and k-truss based models. Nevertheless, these models typically confine themselves to constraining the number of neighbors of nodes or edges within a community, disregarding the relationships between these neighbors, namely, the neighborhood structure of the nodes. Consequently, the localized density of nodes within communities tends to be low. To address this limitation, this study integrates the information regarding the neighborhood structure of nodes into the k-core dense subgraph model, thereby introducing a community model based on neighborhood k-core and defining the density of a community. Based on the novel model, this study investigates the densest single community query problem which outputs the community containing the query node set with the highest community density. In real-life networks, the query nodes may be distributed across multiple disjoint communities. To this end, this study further works on the problem of multi-community query based on a density threshold. This entails returning multiple communities that encompass the query node set, with each community demonstrating a density no lower than the user-specified threshold. For the problem of the densest single community query and the multi-community query based on a density threshold, this study introduces the concept of edge density with which the basic algorithms are proposed. To improve the efficiency, the index tree and the enhanced index tree structures are devised to support outputting results in polynomial time. The effectiveness of the community model based on neighborhood k-core and the efficiency of query algorithms are demonstrated through comparative analyses against basic algorithms using several different datasets.

    参考文献
    相似文献
    引证文献
引用本文

张琦,程苗苗,李荣华,王国仁.基于邻域k-核的社区模型与查询算法.软件学报,2024,35(3):1051-1073

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-07-16
  • 最后修改日期:2023-09-05
  • 录用日期:
  • 在线发布日期: 2023-11-08
  • 出版日期: 2024-03-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号