[关键词]
[摘要]
社交网络中的链接关系根据其潜在的含义可分为正关系和负关系.若对网络中的链接关系进行正负标注,则可形成一个符号网络.符号网络在社会学、信息学、生物学等多个领域存在广泛应用.针对符号网络中链接关系的正负预测,已经成为当前研究的热点之一.在大数据背景下,随着符号网络规模的日益扩大,符号预测算法的可伸缩性问题日益突出.一些研究者提出了分布式环境下的符号预测方法,使得算法的可伸缩性问题部分得到缓解.但是由于大多数算法采用了服务器-客户端方式的分布式框架,导致问题并没有得到根本上的解决.提出了一种端到端分布式框架(client to client distributed framework,简称C2CDF),相比传统服务器-客户端架构的集中通信模式,C2CDF的各个节点间地位平等,不存在集中通信,集群的带宽瓶颈和压力得以减轻.通过在社交网络正负符号预测、广告点击率预测及森林类型预测这3个不同真实数据集上的实验结果表明:C2CDF能够在拥有更高准确性的同时,获得2.3倍~3.3倍的加速比,而且拥有良好的泛化性,不仅应用在了社交网络正负符号预测方面,也能作用于广告点击预测等其他领域.
[Key word]
[Abstract]
The edges of a network can be divided into positive and negative relationships according to their potential meanings. When the edges of a network are signed with plus or minus signs respectively, a signed network can be formed. Signed networks are widely used in many fields such as sociology, informatics and biology. Hence, the sign prediction problem in signed networks has become one of research hot spots. In large dataset, the scalability of sign prediction algorithm is still a great challenge. There are many related works in the distributed design of signed network prediction methods, however, the computation efficiency is still limited by the fundamental server/client framework. This paper proposes client to client distributed framework (C2CDF). Compared with traditional server/client framework, C2CDF is a completely new client-to-client framework which can release the bandwidth pressure by abandoning the server node and allowing the communications between the client nodes. The Experiments on sign prediction in signed social networks, prediction in click-through rate and prediction in forest type show that C2CDF is a general approach which can not only be applied in sign prediction in signed network but also be used in the other prediction areas. In these three datasets, C2CDF can achieve better performance than FM inferred by the traditional SGD algorithm. C2CDF also achieves a 2.3-3.3x speed-up over the method implemented under the server/client framework while obtains a better accuracy performance than the method compared against.
[中图分类号]
TP311
[基金项目]
国家自然科学基金(61772537,61772536,61702522,61532021);国家重点研发计划(2016YFB1000702)