多维图结构聚类的社交关系挖掘算法

doi:10.13328/j.cnki.jos.005454

微信服务号

微信订阅号

2025年3月30日 14:18 星期日

首页 > 过刊浏览>2018年第29卷第3期 >839-852. DOI:10.13328/j.cnki.jos.005454

PDF HTML阅读 XML下载导出引用引用提醒

多维图结构聚类的社交关系挖掘算法
DOI:
                        10.13328/j.cnki.jos.005454
                    
CSTR:
                        
                    
作者:
                        李振军李振军
深圳大学 计算机与软件学院, 广东 深圳 518060
在期刊界中查找
在百度中查找
在本站中查找
代强强代强强
深圳大学 计算机与软件学院, 广东 深圳 518060
在期刊界中查找
在百度中查找
在本站中查找
李荣华李荣华
深圳大学 计算机与软件学院, 广东 深圳 518060
在期刊界中查找
在百度中查找
在本站中查找
毛睿毛睿
深圳大学 计算机与软件学院, 广东 深圳 518060
在期刊界中查找
在百度中查找
在本站中查找
乔少杰乔少杰
成都信息工程大学 网络空间安全学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:李振军(1979-),男,广西柳州人,博士,工程师,主要研究领域为数据挖掘,深度学习;毛睿(1975-),男,博士,教授,CCF高级会员,主要研究领域为数据挖掘,数据库,统计方法,机器学习,计算生物;代强强(1992-),男,硕士,主要研究领域为社交数据挖掘,图算法;乔少杰(1981-),男,博士,教授,CCF高级会员,主要研究领域为轨迹数据挖掘,机器学习;李荣华(1985-),男,博士,讲师,主要研究领域为图数据挖掘,社交网络分析.
通讯作者:代强强,E-mail:1134133685@qq.com
中图分类号:TP311
基金项目:国家自然科学基金（61402292，61772091）；国家自然科学基金广东省联合基金（U1301252）；教育部人文社会科学研究规划基金（15YJAZH058）

Social Relationship Mining Algorithm by Multi-Dimensional Graph Structural Clustering

Author:

LI Zhen-Jun
LI Zhen-Jun
College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China
在期刊界中查找
在百度中查找
在本站中查找
DAI Qiang-Qiang
DAI Qiang-Qiang
College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China
在期刊界中查找
在百度中查找
在本站中查找
LI Rong-Hua
LI Rong-Hua
College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China
在期刊界中查找
在百度中查找
在本站中查找
MAO Rui
MAO Rui
College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China
在期刊界中查找
在百度中查找
在本站中查找
QIAO Shao-Jie
QIAO Shao-Jie
School of Cybersecurity, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61402292, 61772091);National Natural Science Foundation of China Guangdong Joint Fund Project (U1301252);Planning Foundation for Humanities and Social Sciences of Ministry of Education of China (15YJAZH058)

摘要

图/表

访问统计

参考文献 [27]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

社交关系的数据挖掘一直是大图数据研究领域中的热门问题.图聚类算法如SCAN （structural clustering algorithm for network）虽然可以迅速地从海量图数据中获得关系紧密的社区结构，但这类社区往往只表示了社交对象的聚集，无法反馈对象间的真实社交关系，如家庭成员、同事、同学等.要获取对象间真实的社交关系，需要更多维度地挖掘现实中社交对象间复杂的交互关系.对象间的交互维度很多，例如通话、见面、微信、电子邮件等，而传统SCAN等聚类算法仅能够挖掘单维度的交互数据.在研究社交对象间的多维社交关系图数据与传统图结构聚类算法的基础上，提出了一种有效的子空间聚类算法SCA （subspace cluster algorithm），对多维度下子空间的图结构聚类进行研究，目的是探索如何通过图数据挖掘发现对象间真实的社交关系.SCA算法遵循自底向上的原则，能够发现社交图数据中所有子空间的聚类集.为提升SCA的运行速度，利用其子空间聚类的单调性进行了性能优化，进而提出了剪枝算法SCA+.最后进行了大规模的性能测试实验以及真实数据的案例研究，其结果验证了算法的效率和效用.

关键词:图聚类;多维图数据;社交关系;子空间

Abstract:

Social relationship mining is a hot topic in the area of massive graph analysis. Graph clustering algorithms such as SCAN (structural clustering algorithm for networks) can quickly discover the communities from the massive graph data. However, relationships in these communities fail to reflect the ‘real’ social information such as family, colleagues and classmates. In reality, social data is very complex, and there are many types of interaction among each individual, such as calling, meeting, chatting in WeChat, and sending emails. However, traditional SCAN algorithm can only handle single dimensional graph data. Based on the study of multidimensional social graph data and traditional clustering algorithms, this paper first proposes an efficient subspace clustering algorithm named SCA by mining multi-dimensional clusters in subspaces as a mean to explore real social relationships. SCA follows the bottom-up principle and can discover the set of clusters from the social graph data in all dimensions. To improve the efficiency of SCA, the paper also develops a pruning algorithm called SCA+ based on the monotonicity of subspace clustering. Extensive experiments on several real-world multi-dimensional graph data demonstrate the efficiency and effectiveness of the proposed algorithms.

Key words:graph clustering;multi-dimensional graph data;social relationship;subspace

参考文献

[1] Ye X, Huang Q, Li W. Integrating big social data. Computing and Modeling for Spatial Social Science, 2017,43(5):377-378.[doi:10.1080/15230406.2016.1212302]

[2] Peng S, Wang G, Xie D. Social influence analysis in social networking big data:Opportunities and challenges. IEEE Network, 2017,31(1):11-17.[doi:10.1109/MNET.2016.1500104NM]

[3] Bello-Orgaz G, Jung JJ, Camacho D. Social big data:Recent achievements and new challenges. Information Fusion, 2016,28:45-59.[doi:10.1016/j.inffus.2015.08.005]

[4] Mokken RJ. Cliques, clubs and clans. Quality & Quantity, 1979,13:161-173.[doi:10.1007/BF00139635]

[5] Seidman SB, Foster BL. A graph-Theoretic generalization of the clique concept. Journal of Mathematical Sociology, 1978,6:139-154.[doi:10.1080/0022250X.1978.9989883]

[6] Borgatti SP, Everett MG, Shirey PR. LS sets, lambda sets and other cohesive subsets. Social Networks, 1990,12:337-357.[doi:10.1016/0378-8733(90)90014-Z]

[7] Seidman SB. Internal cohesion of LS sets in graphs. Social Networks, 1983,5:97-107.[doi:10.1016/0378-8733(83)90020-5]

[8] Seidman SB. LS sets as cohesive subsets of graphs and hypergraphs. Mathematical Social Sciences, 1983,6:87-91.[doi:10.1016/0165-4896(83)90048-3]

[9] Seidman SB. Network structure and minimum degree. Social Networks, 1983,5:269-287.[doi:10.1016/0378-8733(83)90028-X]

[10] Cohen JD. Trusses:Cohesive subgraphs for social network analysis. National Security Agency Technical Report, 2008.

[11] Wang J, Cheng J. Truss decomposition in massive networks. Proc. of the VLDB Endowment (PVLDB), 2012,5:812-823.[doi:10.14778/2311906.2311909]

[12] Batagelj V, Zaversnik M. An O(m) algorithm for cores decomposition of networks. Computer Science, 2003,1(6):34-37.

[13] Xu X, Yuruk N, Feng Z, Schweiger TAJ. SCAN:A structural clustering algorithm for networks. In:Proc. of the ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. ACM Press, 2007. 824-833.[doi:10.1145/1281192.1281280]

[14] Shiokawa H, Fujiwara Y, Onizuka M. SCAN++:Efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. VLDB Endowment, 2015,8(11):1178-1193.[doi:10.14778/2809974.2809980]

[15] Chang L, Li W, Lin X, Qin L, Zhang W. pSCAN:Fast and exact structural graph clustering. IEEE Trans. on Knowledge & Data Engineering, 2017,29(2):387-401.[doi:10.1109/TKDE.2016.2618795]

[16] Chen JM, Chen JJ, Liu J, Huang YL, Wang Y, Feng X. Clustering algorithm for large scale social network based on structure similarity. Journal of Electronics and Information, 2015,37(2):449-454(in Chinese with English abstract).[doi:10.11999/JEIT 140512]

[17] Chen G, Huang RZ, Zhong WL. Multi-Dimensional text representation based on social features. Computer Engineering and Science, 2016,38(11):2348-2355(in Chinese with English abstract).[doi:10.3969/j.issn.1007-130X.2016.11.029]

[18] Lu J, Peng X, Deng W, Mian A. Regularization techniques for high-dimensional data analysis. Image & Vision Computing, 2017,60:1-3.[doi:10.1016/j.imavis.2017.03.001]

[19] He L, Cai YC, Yang Z. Survey of clustering algorithms for high-dimensional data. Application Research of Computers, 2010,27(1):23-26(in Chinese with English abstract).[doi:10.3969/j.issn.1001-3695.2010.01.006]

[20] Kröger P, Kriegel HP, Kailing K. Density-Connected subspace clustering for high-dimensional data. In:Proc. of the Siam Int'l Conf. on Data Mining. 2004. 246-257.[doi:10.1137/1.9781611972740.23]

[21] Andris C. Integrating social network data into GISystems. Int'l Journal of Geographical Information Science, 2016,30(10):1-23.[doi:10.1080/13658816.2016.1153103]

[22] Gao Q, Zhang FL, Wang RJ, Zhou F. Track data:A survey of key technologies in data processing. Ruan Jian Xue Bao/Journal of Software, 2017,28(4):959-992(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5143.htm[doi:10.13328/j.cnki. jos.005143]

附中文参考文献:

[16] 陈季梦,陈佳俊,刘杰,黄亚楼,王嫄,冯霞.基于结构相似度的大规模社交网络聚类算法.电子与信息学报,2015,37(2):449-454.[doi:10.11999/JEIT140512]

[17] 陈功,黄瑞章,钟文良.基于社交特征的多维度文本表示方法.计算机工程与科学,2016,38(11):2348-2355.[doi:10.3969/j.issn. 1007-130X.2016.11.029]

[19] 贺玲,蔡益朝,杨征.高维数据聚类方法综述.计算机应用研究,2010,27(1):23-26.[doi:10.3969/j.issn.1001-3695.2010.01.006]

[22] 高强,张凤荔,王瑞锦,周帆.轨迹大数据:数据处理关键技术研究综述.软件学报,2017,28(4):959-992. http://www.jos.org.cn/1000-9825/5143.htm[doi:10.13328/j.cnki.jos.005143]

引用本文

李振军,代强强,李荣华,毛睿,乔少杰.多维图结构聚类的社交关系挖掘算法.软件学报,2018,29(3):839-852

复制

文章指标

点击次数:4274
下载次数: 8617
HTML阅读次数: 3532
引用次数: 0

历史

收稿日期:2017-08-02
最后修改日期:2017-09-05
录用日期:
在线发布日期: 2017-12-05
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码