2014, 25(12):2733-2752. DOI: 10.13328/j.cnki.jos.004724
Abstract:The main objective of many studies in the physical, behavioral, social, and biological sciences is the elucidation of cause-effect relationships among variables or events. Many causality problems, occur when new words and behaviors are mapped from individuals to the Internet or are created by the Internetitself. Causality is hidden behind correlations; conclusion made by correlation analysis is likely to be unreliable or even wrong; and in absence of causality, methods based on correlation is unable to intervene, control and manage. Thus, causal analysis is necessary in social media. This paper first introduces the value, importance, and necessity of causality analysis, followed by causality problems existing in social media. Then, a brief overview of the recent research on causal inference is provided with analysis basic theory, problems and research status. Finally, comparisons among previous studies are made to suggest the future research directions and causality application in social media
CHAI Bian-Fang , JIA Cai-Yan , YU Jian
2014, 25(12):2753-2766. DOI: 10.13328/j.cnki.jos.004722
Abstract:The growth of the Internet and the emergence of online social websites bring up the development of massive networks which are large in scale, complex in structure, and dynamical in time. Exploring latent structure underlying a network is the fundamental solution to understand and analyze the network. Probabilistic models become effective tools in diverse areas of structure exploratory due to their flexibility in modeling, interpretability and the sound theoretical framework, however they incur computational bottlenecks. Recently, several approaches based on probabilistic models have been developed to explore structure in massive networks, which aim to solve the computational problems from three aspects: representations of a network, assumptions of the structure and methods of parameter estimation. This study classifies existing approaches as two categories by the methods of parameter estimation: approaches based on stochastic variational inference and online EM approaches, and analyzes in detail their designing incentives, principles, pros and cons. The properties and performance of classical models are compared and analyzed qualitatively and quantitatively, and as a result the principles are provided to develop approaches of structure detection in massive networks. Finally, the core problems of structure exploratory in massive networks are summarized based on probabilistic models and the development trend of this area is projected.
2014, 25(12):2767-2776. DOI: 10.13328/j.cnki.jos.004730
Abstract:Web user's online interacting behavior with others usually makes some user generated content (e.g. forum threads and Weibo topics) popular. The modeling and prediction of the popularity of online content are of great research importance and practical value in many different domains. To predict the popularity of forum threads, this paper discusses several dynamic factors that might affect the popularity of online content based on the information of dynamic evolution at the early stage, and proposes a popularity prediction algorithm which makes use of the locality property and combines multiple dynamic factors. The proposed algorithm is further evaluated with the Douban group dataset. The experimental results show that, compared with the baseline methods, our method achieves relatively better performance in predicting the popularity of forum threads.
YANG Zhen , WANG Lai-Tao , LAI Ying-Xu
2014, 25(12):2777-2789. DOI: 10.13328/j.cnki.jos.004729
Abstract:An improved semantic distance for short text is proposed. The new method calculates the semantic distance between two word strings as balance of the extent of word sequence alignment and the meaning matching between word strings. First, after linguistic preprocessing, the extent of word sequence alignment is computed by the structural distance which measures the maximum matching based on the HIT-CIR Tongyici Cilin (extended edition). Then the meaning matching between word strings is computed by an improved edit distance which allocates each word a weight according to its word type. Finally, the semantic distance between the word strings is measured as a balance of structural distance and word meaning matching distance. In addition, in order to eliminate the influence of the sentence length, the proposed semantic distance is adjusted using the distinct word count estimated by the Heap's law and Zipf law. Experimental results show that the presented methods are more efficient than the classical edit distance models.
ZHANG Lin , QIAN Guan-Qun , FAN Wei-Guo , HUA Kun , ZHANG Li
2014, 25(12):2790-2807. DOI: 10.13328/j.cnki.jos.004728
Abstract:This paper researches the newly emerging user reviews (referred here as "light reviews") generated from smart mobile devices. The similarities and differences between this research and the early studies are pointed out. The unique characteristics of the light review can be summarized as having shorter texts, bigger span, and in most cases fewer words per review. The review length and scale also meet the power-law distribution. A series of experiments are studies based on light reviews, resulting in some interesting findings: (1) There is an inverse relationship between classification accuracy and review length; (2) The traditional classical feature selection and feature weight method do not perform well enough on light reviews; (3) The polar word ratio in short reviews, which is the most important feature in sentiment analysis, is higher than in long reviews; (4) There is a higher shared feature term proportion between short review and long review. Based on above studies, the paper puts forward a feature selection method based on short text co-occurrence feature. By combining the information advantages in short reviews with the traditional feature selection methods, the presented method preserves useful information and details as much as possible while removing noise. The results of experiment show that the method is effective and the classification rate is higher.
ZHOU Xiao-Ping , LIANG Xun , ZHANG Hai-Yan
2014, 25(12):2808-2823. DOI: 10.13328/j.cnki.jos.004720
Abstract:Detecting user communities with denser common interests and network structure plays an important role in target marketing and self-oriented services. User-Generated content and the relationship between the users are often separated in the current methods on community detection, which results in the unreasonable community structures. Though some methods tried to combine the two factors, they are complex. Link community algorithm (LCA) is an efficient state-of-art method on overlapping community discovery. However, LCA does not take into account the real interest characteristics when calculating the similarity between the links. To solve the issues on user community detection on Micro-blog, this paper proposes a R-C model which takes the user relationships as the network nodes, treats the intersection of the interest characteristics of the two users in a link as the link's interest characteristics, and makes the shared user between two links as the underlying link between the links. Also, the community detection method based on the R-C model is discussed, and the complexity in clustering is analyzed. Finally, compared with node CNM and LCA, the method using R-C model is proved to be better in finding closer relationship and denser common interest user communities.
HU Yun , WANG Chong-Jun , WU Jun , XIE Jun-Yuan , LI Hui
2014, 25(12):2824-2836. DOI: 10.13328/j.cnki.jos.004721
Abstract:Micro-Blog cyberspace is a booming multiple mode network of numerous overlapping communities covering huge amount of users and topics relating to the nature, the society and the everyday life. Based on in depth analysis on the entities and inherent relationships among the network, this paper purposes a user-topic relation dominated structural module for overlapping community representation and detection, and also infuses the follow relationship along with the blog-forward and blog-comment relationship into the module. By introducing a virtual community into the actual communities of the network, the paper also puts forward an improved global belongingness matrix as user's role representation which has the ability to properly describe a user's degree of participation and importance in the network. Experimental results on Sina's micro-blog dataset show that the new method is favorable and efficient for finding meaningful communities from the micro-blog. Furthermore, the proposed module and algorithms can be adapted in various ways for similar social network analysis and helpful for community evolution research.
ZHANG Yu-Bo , ZHANG Xi-Zhe , ZHANG Bin
2014, 25(12):2837-2851. DOI: 10.13328/j.cnki.jos.004723
Abstract:Locating information source accurately is important for controlling its diffusion on the social network. In previous studies, a feasible way is locating the source using process information collected by the observers. Thus, the accuracy rate is closely related to the observer positions. In this paper, an optimal deployment method for observer positions is proposed. Considering the information diffusion process for single source, it firstly analyzes the relationship between the accuracy rate for locating a specified source and the positions of observers. Based on the relationship, it finds a key factor which is related to the accuracy rate of locating any source. It then suggests a method to deploy the observer positions based on r-coverage rate. It chooses the r-coverage rate of the observers as the objective function to implement the r-coverage rate first observer selection algorithm. The proposed method is tested on model and real networks respectively. Results show that the proposed method is effective. The observer deployment method is significant in controlling internet rumors and computer virus.
ZOU Ben-You , LI Cui-Ping , TAN Li-Wen , CHEN Hong , WANG Shao-Qing
2014, 25(12):2852-2864. DOI: 10.13328/j.cnki.jos.004725
Abstract:In social networks, recommender systems can help users to deal with information overload and provide personalized recommendations to them. The trust relationship of users is used in the social networks' recommender systems. But the state-of-art algorithms only use the single trust relationship which cannot capture the trust to user's friends when looking for different items. This paper proposes a topic-based trust recommendation algorithm using tensor factorization model. As the social information changes rapidly, the state-of-art algorithms often need redo factorization. To address the issue, the paper also presents an effective incremental method to adaptively update its previous factorized components rather than re-computing them on the whole dataset when the data changes. Experiments show that the proposed method can achieve better performance and the incremental method is suitable for the rapid changes in the social networks.
ZHANG Xin , HE Ben , LUO Tie-Jian , LI Dong-Xing
2014, 25(12):2865-2876. DOI: 10.13328/j.cnki.jos.004726
Abstract:Recently, Twitter search has drawn much attention of researchers in social networks. Although rich features of Twitter can be incorporated into rank learning, the retrieval effectiveness can be hurt by the lack of training data. Transductive learning, as a common semi-supervised learning method, has been playing an import role in dealing with the lacking of training data. Due to the fact that noise is generated during the iterative process of transductive learning, a clustering-based transductive method is proposed. There exist two important parameters in the clustering-based transductive approach, namely the threshold of clustering and the number of the documents that will be clustered. This paper extends the method by utilizing a different clustering algorithm. As shown by extensive experiments on the standard TREC Tweets11 collection, both of the two parameters have an effect on the retrieval effectiveness. Furthermore, the robustness of the clustering-based transduction approach on different query sets is also studied. Finally, the paper proposes an adaptive clustering-based approach by introducing a so called cluster coherence as quality controller. The experimental results show that the robustness of the proposed method is better.
WU Xin-Dong , LI Ya-Dong , HU Dong-Hui
2014, 25(12):2877-2892. DOI: 10.13328/j.cnki.jos.004727
Abstract:Based on advances in computing technology and information technology, social networks have emerged as a new tool for people to exchange information and build interaction networks, and have become a key topic for social software studies in social computing. Social network forensics seeks to acquire, organize, analyze and visualize user information as direct, objective and fair evidence from a third-party perspective. Along with the rapid development of the Internet, social network forensics faces new challenges in dealing with user information being diverse, real-time and dynamic, huge in volume, and interactive, and also photo trustworthiness. It therefore has become a hot issue for opinion analysis, affective computing, content analysis in social networking relations, as well as individual, group and social behaviors in social networks and social computing. This paper designs a forensic model for social network forensics, and implements it on Sina microblogging. This model provides user information analysis, facial image recognition, and location presentation for trustworthiness analysis of digital evidence, and applies visualization to help reduce the difficulty of analysis and forensics on massive data from social networks.
WANG Ying , WANG Xin , ZUO Wan-Li
2014, 25(12):2893-2904. DOI: 10.13328/j.cnki.jos.004731
Abstract:With the pervasion of social media, trust, as the basis of human interactions, has been playing an important role in addressing information sharing, experience communication, and public opinions. However, trust is a complex and abstract concept influenced by many factors, and it is difficult to identify the inducing factors and analyze the formation mechanism. Recognizing that social theories from social sciences are helpful to explain social phenomena, and social networks reflect user correlations in real world, this paper investigatethes trust prediction problem from the perspective of social science, and constructs trust prediction model by studying the disciplines of trust relations occurring and developing based on social status theory and homophily theory. Firstly, it gives a brief introduction to social status theory and homophily theory, and verifies the existence of social status theory and homophily theory in social networks. Then, it proposes social status regularization and homophily regularization according to the effects of social status theory and homophily theory in predicting trust relations. Lastly, it models trust prediction by incorporating non-negative matrix tri-factorization, social status theory and homophily theory, and establishes trust prediction model SocialTrust. Experimental results demonstratethe effectiveness of the proposed method in trust prediction with a higher accuracy than other baseline methods.