Effective Link-Based Measure of Node Similarity on Information Networks
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [33]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Information networks are ubiquitous. These networks can be modeled by graphs where nodes refer to objects and edges are relationships between objects in the networks. Measuring nodes similarity in a graph is a fundamental problem of graph data management. There are many real-world applications based on similarity between objects, such as networks analyses, information retrieval and recommendation systems. A number of link-based similarity measures have been proposed. Both SimRank and personalized PageRank are the most popular and influential similarity measures. The two measures differ from each other because they depend on different paths. This paper proposes a similarity measure that takes advantages of both SimRank and personalized PageRank. The new measure strengthens the principle of SimRank: "Two objects are similar if they are referenced by similar objects". The paper also analyzes the new similarity measure in theory and proposes an optimization algorithm which reduces the time cost from O(kn4) to O(knl) where k is the number of iteration, n is the number of nodes and l is the number of edges. Experimental results demonstrate the effectiveness of the new similarity measure and efficiency of the optimization algorithm.

    Reference
    [1] Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 2007,58(7):1019-1031. [doi: 10.1002/asi.20591]
    [2] Antonellis I, Molina HG, Chang CC. Simrank++: Query rewriting through link analysis of the click graph. Proc. of the VLDB Endowment, 2008,1(1):408-421. [doi: 10.14778/1453856.1453903]
    [3] Jin R, Lee VE, Hong H. Axiomatic ranking of network role similarity. In: Proc. of the 17th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2011). New York: ACM Press, 2011. 922-930. [doi: 10.1145/2020408.2020561]
    [4] Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating Web spam with trustrank. In: Proc. of the 30st Int'l Conf. on Very Large Data Bases (VLDB 2004). New York: ACM Press, 2004. 576-587.
    [5] Gupta P, Goel A, Lin J, Sharma A, Wang D, Zadeh R. WTF: The who to follow service at Twitter. In: Proc. of the 22Nd Int'l Conf. on World Wide Web (WWW 2013). New York: ACM Press, 2013. 505-514.
    [6] Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M. Fast and exact top-k search for random walk with restart. Proc. of the VLDB Endowment, 2012,5(5):442-453. [doi: 10.14778/2140436.2140441]
    [7] Jeh G, Widom J. Scaling personalized Web search. In: Proc. of the 12th Int'l Conf. on World Wide Web (WWW 2003). New York: ACM Press, 2003. 271-279. [doi: 10.1145/775152.775191]
    [8] Jeh G, Widom J. SimRank: A measure of structural-context similarity. In: Proc. of the 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2002). New York: ACM Press, 2002. 536-543. [doi: 10.1145/775047.775126]
    [9] Sarkar P, Moore AW, Prakash A. Fast incremental proximity search in large graphs. In: Proc. of the 25th Internet Conf. on Machine Learning (ICML 2008). New York: ACM Press, 2008. 896-903. [doi: 10.1145/1390156.1390269]
    [10] Fujiwara Y, Nakatsuji M, Yamamuro M, Shiokawa H, Onizuka M. Efficient personalized PageRank with accuracy assurance. In: Proc. of the 18th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2012). New York: ACM Press, 2012. 15-23. [doi: 10.1145/2339530.2339538]
    [11] Zhu F, Fang Y, Jing Y. Incremental and accuracy-aware personalized PageRank through scheduled approximation. Proc. of the VLDB Endowment, 2013,6(6):481-492. [doi: 10.14778/2536336.2536348]
    [12] Fujiwara Y, Nakatsuji M, Yamamuro M, Shiokawa H, Onizuka M. Efficient ad-hoc search for personalized PageRank. In: Proc. of the 2013 ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2013. 445-456. [doi: 10.1145/2463676. 2463717]
    [13] Abbassi Z, Mirrokni VS. A recommender system based on local random walks and spectral methods. In: Proc. of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis. New York: ACM Press, 2007. 102-108. [doi: 10. 1007/978-3-642-00528-2_8]
    [14] Sun L, Cheng R, Li X, Cheung DW, Han J. On link-based similarity join. Proc. of the VLDB Endowment, 2011,4(11):714-725.
    [15] Zheng W, Zou L, Feng Y, Chen L, Zhao D. Efficient SimRank-based similarity join over large graphs. Proc. of the VLDB Endowment, 2011,4(11):493-504. [doi: 10.14778/2536349.2536350]
    [16] Fujiwara Y, Nakatsuji M, Shiokawa H, Onizuka M. Efficient search algorithm for SimRank. In: Proc. of the 29th Int'l Conf. on Data Engineering (ICDE 2013). Washington: IEEE Computer Society, 2013. 589-600. [doi: 10.1109/ICDE.2013.6544858]
    [17] Yu W, Lin X, Zhang W. Towards efficient SimRank computation on large networks. In: Proc. of the 29th Int'l Conf. on Data Engineering (ICDE 2013). Washington: IEEE Computer Society, 2013. 601-612. [doi: 10.1109/ICDE.2013.6544859]
    [18] Lizorkin D, Velikhov P. Accuracy estimate and optimization techniques for simrank computation. The VLDB Journal, 2010,19(1): 45-66. [doi: 10.1007/s00778-009-0168-8]
    [19] Lin ZJ, Lyu MR, King I. MatchSim: A novel similarity measure based on maximum neighborhood matching. Knowledge and Information Systems, 2012,32(1):141-166. [doi: 10.1007/s10115-011-0427-z]
    [20] Xiao Y, Wu W, Pei J, Wang W, He Z. Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proc. of the 12th Int'l Conf. on Extending Database Technology (EDBT 2009). New York: ACM Press, 2009. 493-504. [doi: 10.1145/1516360.1516 418]
    [21] Tribl S, Leser U. Fast and practical indexing and querying of very large graphs. In: Proc. of the 2007 ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2007. 845-856. [doi: 10.1145/1247480.1247573]
    [22] Schloegel K, Karypis G, Kumar V. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience, 2002,14(3):219-240. [doi: 10.1002/cpe.605]
    [23] Fan W, Li, Ma S, Tang N, Wu Y. Graph pattern matching: From intractable to polynomial time. Proc. of the VLDB Endowment, 2010,3(1):264-275.
    [24] Zhao P, Han J, Sun Y. P-Rank: A comprehensive structural similarity measure over information networks. In: Proc. of the 18th ACM Conf. on Information and Knowledge Management. New York: ACM Press, 2009. 553-562. [doi: 10.1145/1645953.16460 25]
    [25] Xi WS, Fox EA, Fan B. SimFusion: Measuring similarity using unified relationship matrix. In: Proc. of the 28th Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM Press, 2005. 130-137. [doi: 10.1145/10760 34.1076059]
    [26] Sun Y, Han J. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proc. of the VLDB Endowment, 2011,4(11):992-1003.
    [27] Fang Y, Chang KCC, Lauw HW. RoundTripRank: Graph-Based proximity with importance and specificity. In: Proc. of the 29th Int'l Conf. on Data Engineering (ICDE 2013). Washington: IEEE Computer Society, 2013. 613-624. [doi: 10.1109/ICDE.2013.654 4860]
    [28] Lee P, Lakshmanan LVS, Yu JX. On top-k structural similarity search. In: Proc. of the 28th Int'l Conf. on Data Engineering (ICDE 2012). Washington: IEEE Computer Society, 2012. 774-785. [doi: 10.1109/ICDE.2012.109]
    [29] Fogaras D, Csalogany K, Racz B, Sarlos T. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2005,17(2):333-358. [doi: 10.1080/15427951.2005.10129104]
    [30] Bahman B, Chakrabarti K, Xin D. Fast personalized PageRank on MapReduce. In: Proc. of the 2011 ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2011. 973-984. [doi: 10.1145/1989323.1989425]
    [31] Fogara D, Racz B. Practical algorithms and lower bounds for similarity search in massive graphs. IEEE Trans. on Knowledge and Data Engineering, 2007,19(5):585-598. [doi: 10.1109/TKDE.2007.1008]
    [32] Li C, Han J, He G. Fast computation of simrank for static and dynamic information networks. In: Proc. of the 13th Int'l Conf. on Extending Database Technology (EDBT 2010). New York: ACM Press, 2010. 465-476. [doi: 10.1145/1739041.1739098]
    [33] Li P, Liu H, Yu JX, He J, Du X. Fast single-pair simrank computation. In: Proc. of the SIAM Int'l Conf. on Data Mining. Philadelphia: SIAM, 2010. 571-582.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张应龙,李翠平,陈红.信息网络中一个有效的基于链接的结点相似度度量.软件学报,2014,25(11):2602-2615

Copy
Share
Article Metrics
  • Abstract:3216
  • PDF: 5982
  • HTML: 1746
  • Cited by: 0
History
  • Received:September 02,2013
  • Revised:January 21,2014
  • Online: November 05,2014
You are the first2037969Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063