图数据表示与压缩技术综述
作者:
基金项目:

国家自然科学基金(61202477); 国家科技支撑计划(2012BAH46B02); 中国科学院战略性科技先导专项(XDA060 30602)


Survey on Succinct Representation of Graph Data
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [69]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    对包含亿万个节点和边的图数据进行高效、紧凑的表示和压缩,是大规模图数据分析处理的基础.图数据压缩技术可以有效地降低图数据的存储空间,同时支持在压缩形式的图数据上进行快速访问.通过深入分析该技术的发展现状,将该技术分为基于传统存储结构的压缩技术、网页图压缩技术、社交网络图压缩技术、面向特定查询的图压缩技术4类.分别对每类技术详细分析了其代表方法并比较了它们之间的性能差异.最后对该技术进行了总结和展望.

    Abstract:

    How to effectively compress and represent the large-scale graphic data becomes the fundamental issue for analysis and processing. Graphic data compression technology is an effective solution to significantly reduce the storage space while supporting fast access in the compressed form. An in-depth analysis is provided on the current development of the technologies, including compression technology based on the traditional storage structure, Web graph compression technology, social network compression technology and compression technology for a particular query. A detailed analysis and performance comparison about the representative methods of each technology is presented. Finally, the summary and prospect are listed.

    参考文献
    [1] The Graph 500 List. http://www.graph500.org
    [2] Randall KH, Stata R, Wickremesinghe RG, Wiener, JL. The link database: Fast access to graphs of the Web. In: Proc. of the 2002 Data Compression Conf. (DCC). Piscataway: IEEE, 2002. 122~131. [doi: 10.1109/DCC.2002.999950]
    [3] Kumar R, Novak J, Tomkins A. Structure and evolution of online social networks. In: Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2006. 611~617. [doi: 10.1145/1150402.1150476]
    [4] Perugini S, Gon?alves MA, Fox EA. Recommender systems research: A connection-centric survey. Journal of Intelligent Information Systems, 2004,23(2):107~143. [doi: 10.1023/B:JIIS.0000039532.05533.99]
    [5] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998,30(1): 107~117. [doi: 10.1016/S0169-7552(98)00110-X]
    [6] Kleinberg JM. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 1999,46(5):604~632. [doi: 10.1145/324133.324140]
    [7] Saito K, Kimura M, Ohara K, Motoda H. Efficient discovery of influential nodes for SIS models in social networks. Knowledge and Information Systems, 2012,30(3):613~635. [doi: 10.1007/s10115-011-0396-2]
    [8] Zhuge H. Communities and emerging semantics in semantic link network: Discovery and learning. IEEE Trans. on Knowledge and Data Engineering, 2009,21(6):785~799. [doi: 10.1109/TKDE.2008.141]
    [9] Cha M, Mislove A, Gummadi KP. A measurement-driven analysis of information propagation in the flickr social network. In: Proc. of the 18th Int’l Conf. on World Wide Web (WWW). New York: ACM, 2009. 721~730. [doi: 10.1145/1526709.1526806]
    [10] Musia? K, Kazienko P, Bródka P. User position measures in social networks. In: Proc. of the 3rd Workshop on Social Network Mining and Analysis. New York: ACM, 2009. 1~9. [doi: 10.1145/1731011.1731017]
    [11] Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and analysis of online social networks. In: Proc. of the 7th ACM SIGCOMM Conf. on Internet Measurement (IMC). New York: ACM, 2007. 29~42. [doi: 10.1145/1298306. 1298311]
    [12] Saito H, Toyoda M, Kitsuregawa M, Aihara K. A large-scale study of link SPAM detection by graph algorithms. In: Proc. of the 3rd Int’l Workshop on Adversarial Information Retrieval on the Web. New York: ACM, 2007. 45~48. [doi: 10.1145/1244408. 1244417]
    [13] Donato D, Laura L, Leonardi S, Millozzi S. Algorithms and experiments for the Webgraph. Journal of Graph Algorithms and Applications, 2006,10(2):219~236. [doi: 10.7155/jgaa.00125]
    [14] GlobalWebIndex. 2013. http://www.globalwebindex.net/products/report/stream-report-social-q1-2013
    [15] China Internet Network Information Center. 2012. http://www.cnnic.net.cn/research/bgxz/tjbg/201201/t20120116_23668.html
    [16] Vitter JS. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys (CsUR), 2001, 33(2):209~271. [doi: 10.1145/384192.384193]
    [17] Vitter JS. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2008, 2(4):305~474. [doi: 10.1561/0400000014]
    [18] Badue C, Ribeiro-Neto B, Baeza-Yates R, Ziviani N. Distributed query processing using partitioned inverted files. In: Proc. of the 8th Symp. on String Processing and Information Retrieval (SPIRE). Piscataway: IEEE, 2001. 10~20. [doi: 10.1109/SPIRE.2001. 989733]
    [19] Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared-nothing text document information retrieval systems. The Int’l Journal on Very Large Data Bases-Parallelism in Database Systems, 1993,2(3):8~17. [doi: 10.1007/BF01228671]
    [20] Yu G, Gu Y, Bao YB, Wang ZG. Large scale graph data processing on cloud computing environments. Chinese Journal of Computers, 2011,34(10):1753~1767 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2011.01753]
    [21] Boldi P, Vigna S. The Webgraph Framework I: Compression techniques. In: Proc. of the 13th Int’l Conf. on World Wide Web (WWW). New York: ACM, 2004. 595~602. [doi: 10.1145/988672.988752]
    [22] Boldi P, Vigna S. The WebGraph Framework II: Codes for the World-Wide Web. In: Proc. of the Conf. on Data Compression (DCC). Piscataway: IEEE, 2004. [doi: 10.1109/DCC.2004.1281504]
    [23] Jacobson G. Space-Efficient static trees and graphs. In: Proc. of the 30th Annual Symp. on Foundations of Computer Science. Piscataway: IEEE, 1989. 549~554. [doi: 10.1109/SFCS.1989.63533]
    [24] Munro JI, Raman V. Succinct representation of balanced parentheses, static trees and planar graphs. In: Proc. of the 54th Annual Symp. on Foundations of Computer Science. Piscataway: IEEE, 1997. 118~126. [doi: 10.1109/SFCS.1997.646100]
    [25] Chuang RC, Garg A, He X, Kao M, Lu H. Compact encodings of planar graphs via canonical orderings and multiple parentheses. Automata, Languages and Programming, 1998,1443(1):118~129. [doi: 10.1007/BFb0055046]
    [26] Deo N, Litow B. A structural approach to graph compression. In: Proc. of the 23th MFCS Workshop on Communications. 1998. 91~101.
    [27] He X, Kao M, Lu H. A fast general methodology for information-theoretically optimal encodings of graphs. SIAM Journal on Computing, 2000,30(3):838~846. [doi: 10.1137/S0097539799359117]
    [28] Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C. Fully automatic cross-associations. In: Proc. of the 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2004. 79~88. [doi: 10.1145/1014052.1014064]
    [29] Adler M, Mitzenmacher M. Towards compressing web graphs. In: Proc. of the Conf. on Data Compression (DCC). Piscataway: IEEE, 2001. 203~212. [doi: 10.1109/DCC.2001.917151]
    [30] Boldi P, Santini M, Vigna S. A large time-aware Web graph. ACM SIGIR Forum, 2008,42(2):33~38. [doi: 10.1145/1480506. 1480511]
    [31] Boldi P, Santini M, Vigna S. Permuting Web graphs. In: Algorithms and Models for the Web-Graph. Heidelberg: Springer-Verlag, 2009. 116~126. [doi: 10.1007/978-3-540-95995-3_10]
    [32] Boldi P, Rosa M, Santini M, Vigna S. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: Proc. of the 20th Int’l Conf. on World Wide Web (WWW). New York: ACM, 2011. 587~596. [doi: 10.1145/1963405. 1963488]
    [33] Buehrer G, Chellapilla K. A scalable pattern mining approach to Web graph compression with communities. In: Proc. of the 2008 Int’l Conf. on Web Search and Data Mining (WSDM). New York: ACM, 2008. 95~106. [doi: 10.1145/1341531.1341547]
    [34] Asano Y, Miyawaki Y, Nishizeki T. Computing and Combinatorics. Heidelberg: Springer-Verlag, 2008. 1~11. [doi: 10.1007/978-3- 540-69733-6_1]
    [35] Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P. On compressing social networks. In: Proc. of the 15th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2009. 219~228. [doi: 10. 1145/1557019.1557049]
    [36] Apostolico A, Drovandi G. Graph compression by BFS. Algorithms, 2009,2(3):1031~1044. [doi: 10.3390/a2031031]
    [37] Hernández C, Navarro G. Compression of Web and social graphs supporting neighbor and community queries. In: Proc. of the 6th ACM Workshop on Social Network Mining and Analysis (SNAKDD). New York: ACM, 2011. 1~10.
    [38] Hernández C, Navarro G. Compressed representation of Web and social networks via dense subgraphs. In: Proc. of 19th Int’l Symp. on String Processing and Information Retrieval (SPIRE). Heidelberg: Springer-Verlag, 2012. 264~276. [doi: 10.1007/978-3-642- 34109-0_28]
    [39] Brisaboa N R, Ladra S, Navarro G. k2-Trees for compact Web graph representation. In: Proc. of the 16th Int’l Symp. on String Processing and Information Retrieval (SPIRE). Heidelberg: Springer-Verlag, 2009. 18~30. [doi: 10.1007/978-3-642-03784-9_3]
    [40] Claude F, Navarro G. A Fast and compact Web graph representations. ACM Trans. on the Web (TWEB), 2010,4(4):1~31. [doi: 10. 1145/1841909.1841913]
    [41] Larsson NJ, Moffat A. Off-Line dictionary-based compression. Proc. of the IEEE, 2000,88(11):1722~1732. [doi: 10.1109/5. 892708]
    [42] Ziv J, Lempel A. Compression of individual sequences via variable-rate coding. IEEE Trans. on Information Theory, 1978,24(5): 530~536. [doi: 10.1109/TIT.1978.1055934]
    [43] Maserrat H, Pei J. Neighbor query friendly compression of social networks. In: Proc. of the 16th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2010. 533~542. [doi: 10.1145/1835804.1835873]
    [44] Fan W, Li J, Wang X, Wu Y. Query preserving graph compression. In: Proc. of the 2012 ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD). New York: ACM, 2012. 157~168. [doi: 10.1145/2213836.2213855]
    [45] González R, Grabowski S, M?kinen V, Navarro G. Practical implementation of rank and select queries. In: Proc. of the 4th Workshop on Efficient and Experimental Algorithms (WEA). CTI Press and Ellinika Grammata, 2005. 27~38.
    [46] Brisaboa NR, Ladra S, Navarro G. Compact representation of Web graphs with extended functionality. Information Systems, 2014, 39:152~174. [doi: 10.1016/j.is.2013.08.003]
    [47] Brisaboa NR, Ladra S, Navarro G. DACs: Bringing direct access to variable-length codes. Information Processing and Management: an International Journal, 2013,49(1):392~404. [doi: 10.1016/j.ipm.2012.08.003]
    [48] Claude F, Navarro G. Extended compact Web graph representations. In: Algorithms and Applications. Heidelberg: Springer-Verlag, 2010. 77~91. [doi: 10.1007/978-3-642-12476-1_5]
    [49] He M, Munro JI. Succinct representations of dynamic strings. In: Proc. of the 17th Int’l Symp. on String Processing and Information Retrieval (SPIRE). Heidelberg: Springer-Verlag, 2010. 334~346. [doi: 10.1007/978-3-642-16321-0_35]
    [50] Feder T, Motwani R. Clique partitions, graph compression and speeding-up algorithms. In: Proc. of the 23rd Annual ACM Symp. on Theory of Computing. New York: ACM, 1991. 123~133. [doi: 10.1145/103418.103424]
    [51] Claude F, Ladra S. Practical representations for Web and social graphs. In: Proc. of the 20th ACM Int’l Conf. on Information and Knowledge Management (CIKM). New York: ACM, 2011. 1185~1190. [doi: 10.1145/2063576.2063747]
    [52] Garey MR, Johnson DS, Stockmeyer L. Some simplified NP-complete problems. In: Proc. of the 6th Annual ACM Symp. on Theory of Computing. New York: ACM, 1974. 47~63. [doi: 10.1145/800119.803884]
    [53] Broder AZ, Charikar M, Frieze AM, Mitzenmacher M. Min-Wise independent permutations. Journal of Computer and System Sciences, 2000,60(3):630~659. [doi: 10.1006/jcss.1999.1690]
    [54] Shi Q, Xiao Y, Bessis N, Lu Y, Chen Y, Hill R. Optimizing K2 trees: A case for validating the maturity of network of practices. Computers & Mathematics with Applications, 2012,63(2):427~436. [doi: 10.1016/j.camwa.2011.07.060]
    [55] Raman R, Raman V, Rao SS. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. of the 30th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA). Philadelphia: Society for Industrial and Applied Mathematics, 2002. 233~242.
    [56] Grossi R, Gupta A, Vitter JS. High-Order entropy-compressed text indexes. In: Proc. of the 14th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2003. 841~850.
    [57] Dovier A, Piazza C, Policriti A. A fast bisimulation algorithm. In: Proc. of the 13th Int’l Conf., Computer Aided Verification. Heidelberg: Springer-Verlag, 2001. 79~90. [doi: 10.1007/3-540-44585-4_8]
    [58] Zhang X, Zhao C, Wang P, Zhou F. Mining link patterns in linked data. In: Web-Age Information Management. Heidelberg: Springer-Verlag, 2012. 83~94. [doi: 10.1007/978-3-642-32281-5_9]
    [59] Babu MM, Teichmann SA. Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Research, 2003,31(4):1234~1244. [doi: 10.1093/nar/gkg210]
    [60] Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature, 2001,411(6833):41~42. [doi: 10. 1038/35075138]
    [61] Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL. Hierarchical organization of modularity in metabolic networks. Science, 2002,297(5586):1551~1555. [doi: 10.1126/science.1073374]
    [62] Jiang X, Zhang X, Gao F, Pu C, Wang P. Graph Compression Strategies for Instance-Focused Semantic Mining, Linked Data and Knowledge Graph. Heidelberg: Springer-Verlag, 2013. 50~61. [doi: 10.1007/978-3-642-54025-7_5]
    [63] Ahnert SE. Power graph compression reveals dominant relationships in genetic transcription networks. Molecular BioSystems, 2013,9(11):2681~2685. [doi: 10.1039/C3MB70236G]
    [64] Stanford Large Network Dataset Collection. 2009. http://snap.stanford.edu/data
    [65] Laboratory for Web Algorithmics. 2011. http://law.di.unimi.it/datasets.php
    [66] WebGraphs on recoded.cl. 2010. http://webgraphs.recoded.cl/
    [67] WebGraph homepage. 2009. http://webgraph.di.unimi.it/
    [68] Drovandi G. PhD Web Site. 2012. http://www.dia.uniroma3.it/~drovandi/software.php
    [69] Hernandez C, Navarro G. Compressed representations for Web and social graphs. Knowledge and Information Systems, 2013, 40(1):1~35. [doi: 10.1007/s10115-013-0648-4]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张宇,刘燕兵,熊刚,贾焰,刘萍,郭莉.图数据表示与压缩技术综述.软件学报,2014,25(9):1937-1952

复制
分享
文章指标
  • 点击次数:6389
  • 下载次数: 8962
  • HTML阅读次数: 3275
  • 引用次数: 0
历史
  • 收稿日期:2014-01-26
  • 最后修改日期:2014-06-09
  • 在线发布日期: 2014-09-09
文章二维码
您是第19727216位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号