异质信息网络的复杂条件社区搜索
作者:
作者简介:

王家龙(1993-),男,博士生,主要研究领域为数据挖掘,社会网络分析;杨杰(1996-),男,硕士生,主要研究领域为数据挖掘,社区搜索;周丽华(1968-),女,博士,教授,博士生导师,CCF专业会员,主要研究领域为数据挖掘,社会网络分析,人工智能;王丽珍(1962-),女,博士,教授,博士生导师,CCF杰出会员,主要研究领域为数据挖掘,数据仓库,计算机算法;王睿康(1999-),男,本科生,主要研究领域为社会网络分析.

通讯作者:

周丽华,E-mail:lhzhou@ynu.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(62062066,61762090,61966036);云南省基础研究计划重点项目(202201AS070015);云南省高校物联网技术及应用重点实验室;云南大学研究生科研创新基金(2021Y024)


Complex Conditional Community Search over Heterogeneous Information Networks
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [55]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    社区是信息网络的重要属性, 社区搜索旨在寻找满足用户给定条件的节点集合, 是信息网络分析的重要研究内容. 异质信息网络由于包含更加全面、丰富的结构和语义信息, 所以异质信息网络的社区搜索近年来受到人们的广泛关注. 针对现有异质信息网络的社区搜索方法难以满足复杂条件社区搜索要求的不足, 定义了复杂条件社区搜索问题, 提出了考虑非对称元路径、受限元路径和禁止节点约束的搜索算法. 3种算法分别通过元路径补全策略、调整带标签的批量搜索策略和拆分复杂搜索条件的方式搜索社区, 同时针对禁止节点约束的搜索算法设计了基于剪枝策略和近似策略的优化算法以提高搜索效率. 在真实数据集上进行了大量实验, 实验结果证明了所提算法的有效性和高效性.

    Abstract:

    Community is an important attribute of information networks. Community search, as an important content of information network analysis, aims to find a set of nodes that meet the conditions specified by the user. As heterogeneous information networks contain more comprehensive and richer structural and semantic information, community search in such networks has received extensive attention in recent years. However, the existing community search methods for heterogeneous information networks cannot be directly applied when the search conditions are complex. For this reason, this study defines community search under complex conditions and proposes search algorithms considering asymmetric meta-paths, constrained meta-paths, and prohibited node constraints. These three algorithms respectively use the meta-path completion strategy, the strategy of adjusting batch search with labeling, and the way of dividing complex search conditions to search communities. Moreover, two optimization algorithms respectively based on the pruning strategy and the approximate strategy are designed to improve the efficiency of the search algorithm with prohibited node constraints. A large number of experiments are performed on real datasets, and the experimental results verify the effectiveness and efficiency of the proposed algorithms.

    参考文献
    [1] Sozio M, Gionis A. The community-search problem and how to plan a successful cocktail party. In: Proc. of the 16th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Washington: ACM, 2010. 939-948.
    [2] Tang L, Liu H. Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery, 2010, 2(1): 1-137. [doi: 10.2200/S00298ED1V01Y201009DMK003]
    [3] Chen JC, Yuan B. Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics, 2006, 22(18): 2283-2290. [doi: 10.1093/bioinformatics/btl370]
    [4] 乔少杰, 李天瑞, 韩楠, 高云君, 元昌安, 王晓腾, 唐常杰. 大数据环境下移动对象自适应轨迹预测模型. 软件学报, 2015, 26(11): 2869-2883. http://www.jos.org.cn/1000-9825/4889.htm
    Qiao SJ, Li TR, Han N, Gao YJ, Yuan CA, Wang XT, Tang CJ. Self-adaptive trajectory prediction model for moving objects in big data environment. Ruan Jian Xue Bao/Journal of Software, 2015, 26(11): 2869-2883 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4889.htm
    [5] 乔少杰, 金琨, 韩楠, 唐常杰, 格桑多吉, Gutierrez LA. 一种基于高斯混合模型的轨迹预测算法. 软件学报, 2015, 26(5): 1048-1063. http://www.jos.org.cn/1000-9825/4796.htm
    Qiao SJ, Jin K, Han N, Tang CJ, Ge SDJ, Gutierrez LA. Trajectory prediction algorithm based on gaussian mixture model. Ruan Jian Xue Bao/Journal of Software, 2015, 26(5): 1048-1063 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4796.htm
    [6] 乔少杰, 吴凌淳, 韩楠, 黄发良, 毛睿, 元昌安, Gutierrez LA. 情景感知驱动的移动对象多模式轨迹预测技术综述. 软件学报, 2023, 34(1): 312-333. http://www.jos.org.cn/1000-9825/6395.htm
    Qiao SJ, Wu LC, Han N, Huang FL, Mao R, Yuan CA, Gutierrez LA. Multiple-motion-pattern trajectory prediction of moving objects with context awareness: A survey. Ruan Jian Xue Bao/Journal of Software, 2023, 34(1): 312-333 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6395.htm
    [7] 乔少杰, 韩楠, 岳昆, 易玉根, 黄发良, 元昌安, 丁鹏, Gutierrez LA. 基于数据场聚类的共享单车需求预测模型. 软件学报, 2022, 33(4): 1451-1476. http://www.jos.org.cn/1000-9825/6461.htm
    Qiao SJ, Han N, Yue K, Yi YG, Huang FL, Yuan CA, Ding P, Gutierrez LA. Shared-bike demand prediction model based on station clustering. Ruan Jian Xue Bao/Journal of Software, 2022, 33(4): 1451-1476 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6461.htm
    [8] Porter MA, Onnela JP, Mucha PJ. Communities in networks. Notices of the American Mathematical Society, 2009, 56(9): 1082-1100. (查阅所有网上资料, 页码信息未找到, 请核对补充)
    [9] Jiang CJ, Li Z. Network community detection. In: Jiang CJ, Li Z, eds. Mobile Information Service for Networks. Singapore: Springer, 2020. 71-101.
    [10] 郑玉艳, 王明省, 石川, 王锐. 异质信息网络中基于元路径的社团发现算法研究. 中文信息学报, 2018, 32(9): 132-142. [doi: 10.3969/j.issn.1003-0077.2018.09.018]
    Zheng YY, Wang MS, Shi C, Wang R. Research on community detection algorithm based on meta path in heterogeneous information network. Journal of Chinese Information Processing, 2018, 32(9): 132-142 (in Chinese with English abstract). [doi: 10.3969/j.issn.1003-0077.2018.09.018]
    [11] Huang X, Lakshmanan LVS, Xu JL. Community search over big graphs: Models, algorithms, and opportunities. In: Proc. of the 33rd IEEE Int’l Conf. on Data Engineering. San Diego: IEEE, 2017. 1451-1454.
    [12] Fang YX, Huang X, Qin L, Zhang Y, Zhang WJ, Cheng R, Lin XM. A survey of community search over big graphs. The VLDB Journal, 2020, 29(1): 353-392. [doi: 10.1007/s00778-019-00556-x]
    [13] 单菁, 申德荣, 寇月, 聂铁铮, 于戈. 基于重叠社区搜索的传播热点选择方法. 软件学报, 2017, 28(2): 326-340. http://www.jos.org.cn/1000-9825/5117.htm
    Shan J, Shen DR, Kou Y, Nie TZ, Yu G. Approach for hot spread node selection based on overlapping community search. Ruan Jian Xue Bao/Journal of Software, 2017, 28(2): 326-340 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5117.htm
    [14] Cui WY, Xiao YH, Wang HX, Wang W. Local search of communities in large graphs. In: Proc. of the 2014 ACM SIGMOD Int’l Conf. on Management of Data. Snowbird: ACM, 2014. 991-1002.
    [15] 竺俊超, 王朝坤. 复杂条件下的社区搜索方法. 软件学报, 2019, 30(3): 552-572. http://www.jos.org.cn/1000-9825/5699.htm
    Zhu JC, Wang CK. Approaches to community search under complex conditions. Ruan Jian Xue Bao/Journal of Software, 2019, 30(3): 552-572 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5699.htm
    [16] Shi C, Li YT, Zhang JW, Sun YZ, Yu PS. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17-37. [doi: 10.1109/TKDE.2016.2598561]
    [17] Fang YX, Yang YX, Zhang WJ, Lin XM, Cao X. Effective and efficient community search over large heterogeneous information networks. Proceedings of the VLDB Endowment, 2020, 13(6): 854-867. [doi: 10.14778/3380750.3380756]
    [18] Yang YX, Fang YX, Lin XM, Zhang WJ. Effective and efficient truss computation over large heterogeneous information networks. In: Proc. of the 36th IEEE Int’l Conf. on Data Engineering. Dallas: IEEE, 2020. 901-912.
    [19] Jian X, Wang Y, Chen L. Effective and efficient relational community detection and search in large dynamic heterogeneous information networks. Proceedings of the VLDB Endowment, 2020, 13(6): 1723-1736. [doi: 10.14778/3401960.3401969]
    [20] Barbieri N, Bonchi F, Galimberti E, Gullo F. Efficient and effective community search. Data Mining and Knowledge Discovery, 2015, 29(5): 1406-1433. [doi: 10.1007/s10618-015-0422-1]
    [21] Huang X, Cheng H, Qin L, Tian WT, Yu JX. Querying k-Truss community in large and dynamic graphs. In: Proc. of the 2014 ACM SIGMOD Int’l Conf. on Management of Data. Snowbird: ACM, 2014. 1311-1322.
    [22] Akbas E, Zhao PX. Truss-based community search: A truss-equivalence based indexing approach. Proceedings of the VLDB Endowment, 2017, 10(11): 1298-1309. [doi: 10.14778/3137628.3137640]
    [23] Wang CK, Zhu JC. Forbidden nodes aware community search. In: Proc. of the 33rd AAAI Conf. on Artificial Intelligence. Honolulu: AAAI, 2019. 758-765.
    [24] Fang YX, Wang ZR, Cheng R, Wang HZ, Hu JF. Effective and efficient community search over large directed graphs. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(11): 2093-2107. [doi: 10.1109/TKDE.2018.2872982]
    [25] Giatsidis C, Thilikos DM, Vazirgiannis M. D-cores: Measuring collaboration of directed graphs based on degeneracy. In: Proc. of the 11th IEEE Int’l Conf. on Data Mining. Vancouver: IEEE, 2011. 201-210.
    [26] Fang YX, Cheng R, Luo SQ, Hu JF. Effective community search for large attributed graphs. Proceedings of the VLDB Endowment, 2016, 9(12): 1233-1244. [doi: 10.14778/2994509.2994538]
    [27] Fang YX, Cheng R, Chen YK, Luo SQ, Hu JF. Effective and efficient attributed community search. The VLDB Journal, 2017, 26(6): 803-828. [doi: 10.1007/s00778-017-0482-5]
    [28] Huang X, Lakshmanan LVS. Attribute-driven community search. Proceedings of the VLDB Endowment, 2017, 10(9): 949-960. [doi: 10.14778/3099622.3099626]
    [29] Zhang ZW, Huang X, Xu JL, Choi B, Shang ZC. Keyword-centric community search. In: Proc. of the 35th IEEE Int’l Conf. on Data Engineering. Macao: IEEE, 2019. 422-433.
    [30] Liu Q, Zhu YF, Zhao MJ, Huang X, Xu JL, Gao YJ. VAC: Vertex-centric attributed community search. In: Proc. of the 36th IEEE Int’l Conf. on Data Engineering. Dallas: IEEE, 2020. 937-948.
    [31] Fang YX, Cheng R, Li XD, Luo SQ, Hu JF. Effective community search over large spatial graphs. Proceedings of the VLDB Endowment, 2017, 10(6): 709-720. [doi: 10.14778/3055330.3055337]
    [32] Fang YX, Wang Z, Cheng R, Li XD, Luo SQ, Hu JF, Chen XJ. On spatial-aware community search. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(4): 783-798. [doi: 10.1109/TKDE.2018.2845414]
    [33] Wang K, Cao X, Lin XM, Zhang WJ, Qin L. Efficient computing of radius-bounded k-cores. In: Proc. of the 34th IEEE Int’l Conf. on Data Engineering. Paris: IEEE, 2018. 233-244.
    [34] Zhu QJ, Hu HB, Xu C, Xu JL, Lee WC. Geo-social group queries with minimum acquaintance constraints. The VLDB Journal, 2017, 26(5): 709-727. [doi: 10.1007/s00778-017-0473-6]
    [35] Luo JH, Cao X, Xie XK, Qu Q. Best co-located community search in attributed networks. In: Proc. of the 28th ACM Int’l Conf. on Information and Knowledge Management. Beijing: ACM, 2019. 2453-2456.
    [36] Al-Baghdadi A, Lian X. Topic-based community search over spatial-social networks. Proceedings of the VLDB Endowment, 2020, 13(12): 2104-2117. [doi: 10.14778/3407790.3407812]
    [37] Li RH, Su J, Qin L, Yu JX, Dai QQ. Persistent community search in temporal networks. In: Proc. of the 34th IEEE Int’l Conf. on Data Engineering. Paris: IEEE, 2018. 797-808.
    [38] 徐兰天, 李荣华, 王国仁, 王彪. 面向时序图的k-truss社区搜索算法研究. 计算机科学与探索, 2020, 14(9): 1482-1489. [doi: 10.3778/j.issn.1673-9418.1909050]
    Xu LT, Li RH, Wang GR, Wang B. Research on k-truss community search algorithm for temporal networks. Journal of Frontiers of Computer Science & Technology, 2020, 14(9): 1482-1489 (in Chinese with English abstract). [doi: 10.3778/j.issn.1673-9418.1909050]
    [39] Wang ZZ, Yuan Y, Zhou XM, Qin HC. Effective and efficient community search in directed graphs across heterogeneous social networks. In: Proc. of the 31st Australasian Database Conf. on Databases Theory and Applications. Melbourne: Springer, 2020. 161-172.
    [40] Qiao LP, Zhang ZW, Yuan Y, Chen C, Wang GR. Keyword-centric community search over large heterogeneous information networks. In: Proc. of the 26th Int’l Conf. on Database Systems for Advanced Applications. Taipei: Springer, 2021. 158-173.
    [41] 杨杰. 异质信息网络复杂条件社区搜索研究 [硕士学位论文]. 昆明: 云南大学, 2021.
    Yang J. The research on complex conditional community search in heterogeneous information network [MS. Thesis]. Kunming: Yunnan University, 2021 (in Chinese with English abstract).
    [42] 孙艺洲, 韩家炜. 异构信息网络挖掘: 原理和方法. 北京: 机械工业出版社, 2017.
    Sun YZ, Han JW. Mining Heterogeneous Information Networks: Principles and Methodologies. Beijing: China Machine Press, 2017 (in Chinese).
    [43] Shi C, Li YT, Yu PS, Wu B. Constrained-meta-path-based ranking in heterogeneous information network. Knowledge and Information Systems, 2016, 49(2): 719-747. [doi: 10.1007/s10115-016-0916-1]
    [44] Sun YZ, Han JW, Yan XF, Yu PS, Wu TY. PathSim: Meta path-based top-K similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 2011, 4(11): 992-1003. [doi: 10.14778/3402707.3402736]
    [45] Shi C, Kong XN, Huang Y, Yu PS, Wu B. HeteSim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2479-2492. [doi: 10.1109/TKDE.2013.2297920]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王家龙,杨杰,周丽华,王丽珍,王睿康.异质信息网络的复杂条件社区搜索.软件学报,2023,34(10):4830-4850

复制
分享
文章指标
  • 点击次数:596
  • 下载次数: 2519
  • HTML阅读次数: 681
  • 引用次数: 0
历史
  • 收稿日期:2021-07-31
  • 最后修改日期:2021-09-18
  • 在线发布日期: 2022-12-08
  • 出版日期: 2023-10-06
文章二维码
您是第20060976位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号