Institute of Information Engineering, The Chinese Academy of Sciences, Beijing 100093, China;University of Chinese Academy of Sciences, Beijing 100049, China 在期刊界中查找 在百度中查找 在本站中查找
How to effectively compress and represent the large-scale graphic data becomes the fundamental issue for analysis and processing. Graphic data compression technology is an effective solution to significantly reduce the storage space while supporting fast access in the compressed form. An in-depth analysis is provided on the current development of the technologies, including compression technology based on the traditional storage structure, Web graph compression technology, social network compression technology and compression technology for a particular query. A detailed analysis and performance comparison about the representative methods of each technology is presented. Finally, the summary and prospect are listed.
[2] Randall KH, Stata R, Wickremesinghe RG, Wiener, JL. The link database: Fast access to graphs of the Web. In: Proc. of the 2002 Data Compression Conf. (DCC). Piscataway: IEEE, 2002. 122~131. [doi: 10.1109/DCC.2002.999950]
[3] Kumar R, Novak J, Tomkins A. Structure and evolution of online social networks. In: Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2006. 611~617. [doi: 10.1145/1150402.1150476]
[4] Perugini S, Gon?alves MA, Fox EA. Recommender systems research: A connection-centric survey. Journal of Intelligent Information Systems, 2004,23(2):107~143. [doi: 10.1023/B:JIIS.0000039532.05533.99]
[5] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998,30(1): 107~117. [doi: 10.1016/S0169-7552(98)00110-X]
[6] Kleinberg JM. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 1999,46(5):604~632. [doi: 10.1145/324133.324140]
[7] Saito K, Kimura M, Ohara K, Motoda H. Efficient discovery of influential nodes for SIS models in social networks. Knowledge and Information Systems, 2012,30(3):613~635. [doi: 10.1007/s10115-011-0396-2]
[8] Zhuge H. Communities and emerging semantics in semantic link network: Discovery and learning. IEEE Trans. on Knowledge and Data Engineering, 2009,21(6):785~799. [doi: 10.1109/TKDE.2008.141]
[9] Cha M, Mislove A, Gummadi KP. A measurement-driven analysis of information propagation in the flickr social network. In: Proc. of the 18th Int’l Conf. on World Wide Web (WWW). New York: ACM, 2009. 721~730. [doi: 10.1145/1526709.1526806]
[10] Musia? K, Kazienko P, Bródka P. User position measures in social networks. In: Proc. of the 3rd Workshop on Social Network Mining and Analysis. New York: ACM, 2009. 1~9. [doi: 10.1145/1731011.1731017]
[11] Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and analysis of online social networks. In: Proc. of the 7th ACM SIGCOMM Conf. on Internet Measurement (IMC). New York: ACM, 2007. 29~42. [doi: 10.1145/1298306. 1298311]
[12] Saito H, Toyoda M, Kitsuregawa M, Aihara K. A large-scale study of link SPAM detection by graph algorithms. In: Proc. of the 3rd Int’l Workshop on Adversarial Information Retrieval on the Web. New York: ACM, 2007. 45~48. [doi: 10.1145/1244408. 1244417]
[13] Donato D, Laura L, Leonardi S, Millozzi S. Algorithms and experiments for the Webgraph. Journal of Graph Algorithms and Applications, 2006,10(2):219~236. [doi: 10.7155/jgaa.00125]
[15] China Internet Network Information Center. 2012. http://www.cnnic.net.cn/research/bgxz/tjbg/201201/t20120116_23668.html
[16] Vitter JS. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys (CsUR), 2001, 33(2):209~271. [doi: 10.1145/384192.384193]
[17] Vitter JS. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2008, 2(4):305~474. [doi: 10.1561/0400000014]
[18] Badue C, Ribeiro-Neto B, Baeza-Yates R, Ziviani N. Distributed query processing using partitioned inverted files. In: Proc. of the 8th Symp. on String Processing and Information Retrieval (SPIRE). Piscataway: IEEE, 2001. 10~20. [doi: 10.1109/SPIRE.2001. 989733]
[19] Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared-nothing text document information retrieval systems. The Int’l Journal on Very Large Data Bases-Parallelism in Database Systems, 1993,2(3):8~17. [doi: 10.1007/BF01228671]
[20] Yu G, Gu Y, Bao YB, Wang ZG. Large scale graph data processing on cloud computing environments. Chinese Journal of Computers, 2011,34(10):1753~1767 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2011.01753]
[21] Boldi P, Vigna S. The Webgraph Framework I: Compression techniques. In: Proc. of the 13th Int’l Conf. on World Wide Web (WWW). New York: ACM, 2004. 595~602. [doi: 10.1145/988672.988752]
[22] Boldi P, Vigna S. The WebGraph Framework II: Codes for the World-Wide Web. In: Proc. of the Conf. on Data Compression (DCC). Piscataway: IEEE, 2004. [doi: 10.1109/DCC.2004.1281504]
[23] Jacobson G. Space-Efficient static trees and graphs. In: Proc. of the 30th Annual Symp. on Foundations of Computer Science. Piscataway: IEEE, 1989. 549~554. [doi: 10.1109/SFCS.1989.63533]
[24] Munro JI, Raman V. Succinct representation of balanced parentheses, static trees and planar graphs. In: Proc. of the 54th Annual Symp. on Foundations of Computer Science. Piscataway: IEEE, 1997. 118~126. [doi: 10.1109/SFCS.1997.646100]
[25] Chuang RC, Garg A, He X, Kao M, Lu H. Compact encodings of planar graphs via canonical orderings and multiple parentheses. Automata, Languages and Programming, 1998,1443(1):118~129. [doi: 10.1007/BFb0055046]
[26] Deo N, Litow B. A structural approach to graph compression. In: Proc. of the 23th MFCS Workshop on Communications. 1998. 91~101.
[27] He X, Kao M, Lu H. A fast general methodology for information-theoretically optimal encodings of graphs. SIAM Journal on Computing, 2000,30(3):838~846. [doi: 10.1137/S0097539799359117]
[28] Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C. Fully automatic cross-associations. In: Proc. of the 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2004. 79~88. [doi: 10.1145/1014052.1014064]
[29] Adler M, Mitzenmacher M. Towards compressing web graphs. In: Proc. of the Conf. on Data Compression (DCC). Piscataway: IEEE, 2001. 203~212. [doi: 10.1109/DCC.2001.917151]
[30] Boldi P, Santini M, Vigna S. A large time-aware Web graph. ACM SIGIR Forum, 2008,42(2):33~38. [doi: 10.1145/1480506. 1480511]
[31] Boldi P, Santini M, Vigna S. Permuting Web graphs. In: Algorithms and Models for the Web-Graph. Heidelberg: Springer-Verlag, 2009. 116~126. [doi: 10.1007/978-3-540-95995-3_10]
[32] Boldi P, Rosa M, Santini M, Vigna S. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: Proc. of the 20th Int’l Conf. on World Wide Web (WWW). New York: ACM, 2011. 587~596. [doi: 10.1145/1963405. 1963488]
[33] Buehrer G, Chellapilla K. A scalable pattern mining approach to Web graph compression with communities. In: Proc. of the 2008 Int’l Conf. on Web Search and Data Mining (WSDM). New York: ACM, 2008. 95~106. [doi: 10.1145/1341531.1341547]
[34] Asano Y, Miyawaki Y, Nishizeki T. Computing and Combinatorics. Heidelberg: Springer-Verlag, 2008. 1~11. [doi: 10.1007/978-3- 540-69733-6_1]
[35] Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P. On compressing social networks. In: Proc. of the 15th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2009. 219~228. [doi: 10. 1145/1557019.1557049]
[36] Apostolico A, Drovandi G. Graph compression by BFS. Algorithms, 2009,2(3):1031~1044. [doi: 10.3390/a2031031]
[37] Hernández C, Navarro G. Compression of Web and social graphs supporting neighbor and community queries. In: Proc. of the 6th ACM Workshop on Social Network Mining and Analysis (SNAKDD). New York: ACM, 2011. 1~10.
[38] Hernández C, Navarro G. Compressed representation of Web and social networks via dense subgraphs. In: Proc. of 19th Int’l Symp. on String Processing and Information Retrieval (SPIRE). Heidelberg: Springer-Verlag, 2012. 264~276. [doi: 10.1007/978-3-642- 34109-0_28]
[39] Brisaboa N R, Ladra S, Navarro G. k2-Trees for compact Web graph representation. In: Proc. of the 16th Int’l Symp. on String Processing and Information Retrieval (SPIRE). Heidelberg: Springer-Verlag, 2009. 18~30. [doi: 10.1007/978-3-642-03784-9_3]
[40] Claude F, Navarro G. A Fast and compact Web graph representations. ACM Trans. on the Web (TWEB), 2010,4(4):1~31. [doi: 10. 1145/1841909.1841913]
[41] Larsson NJ, Moffat A. Off-Line dictionary-based compression. Proc. of the IEEE, 2000,88(11):1722~1732. [doi: 10.1109/5. 892708]
[42] Ziv J, Lempel A. Compression of individual sequences via variable-rate coding. IEEE Trans. on Information Theory, 1978,24(5): 530~536. [doi: 10.1109/TIT.1978.1055934]
[43] Maserrat H, Pei J. Neighbor query friendly compression of social networks. In: Proc. of the 16th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD). New York: ACM, 2010. 533~542. [doi: 10.1145/1835804.1835873]
[44] Fan W, Li J, Wang X, Wu Y. Query preserving graph compression. In: Proc. of the 2012 ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD). New York: ACM, 2012. 157~168. [doi: 10.1145/2213836.2213855]
[45] González R, Grabowski S, M?kinen V, Navarro G. Practical implementation of rank and select queries. In: Proc. of the 4th Workshop on Efficient and Experimental Algorithms (WEA). CTI Press and Ellinika Grammata, 2005. 27~38.
[46] Brisaboa NR, Ladra S, Navarro G. Compact representation of Web graphs with extended functionality. Information Systems, 2014, 39:152~174. [doi: 10.1016/j.is.2013.08.003]
[47] Brisaboa NR, Ladra S, Navarro G. DACs: Bringing direct access to variable-length codes. Information Processing and Management: an International Journal, 2013,49(1):392~404. [doi: 10.1016/j.ipm.2012.08.003]
[48] Claude F, Navarro G. Extended compact Web graph representations. In: Algorithms and Applications. Heidelberg: Springer-Verlag, 2010. 77~91. [doi: 10.1007/978-3-642-12476-1_5]
[49] He M, Munro JI. Succinct representations of dynamic strings. In: Proc. of the 17th Int’l Symp. on String Processing and Information Retrieval (SPIRE). Heidelberg: Springer-Verlag, 2010. 334~346. [doi: 10.1007/978-3-642-16321-0_35]
[50] Feder T, Motwani R. Clique partitions, graph compression and speeding-up algorithms. In: Proc. of the 23rd Annual ACM Symp. on Theory of Computing. New York: ACM, 1991. 123~133. [doi: 10.1145/103418.103424]
[51] Claude F, Ladra S. Practical representations for Web and social graphs. In: Proc. of the 20th ACM Int’l Conf. on Information and Knowledge Management (CIKM). New York: ACM, 2011. 1185~1190. [doi: 10.1145/2063576.2063747]
[52] Garey MR, Johnson DS, Stockmeyer L. Some simplified NP-complete problems. In: Proc. of the 6th Annual ACM Symp. on Theory of Computing. New York: ACM, 1974. 47~63. [doi: 10.1145/800119.803884]
[53] Broder AZ, Charikar M, Frieze AM, Mitzenmacher M. Min-Wise independent permutations. Journal of Computer and System Sciences, 2000,60(3):630~659. [doi: 10.1006/jcss.1999.1690]
[54] Shi Q, Xiao Y, Bessis N, Lu Y, Chen Y, Hill R. Optimizing K2 trees: A case for validating the maturity of network of practices. Computers & Mathematics with Applications, 2012,63(2):427~436. [doi: 10.1016/j.camwa.2011.07.060]
[55] Raman R, Raman V, Rao SS. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. of the 30th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA). Philadelphia: Society for Industrial and Applied Mathematics, 2002. 233~242.
[56] Grossi R, Gupta A, Vitter JS. High-Order entropy-compressed text indexes. In: Proc. of the 14th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2003. 841~850.
[57] Dovier A, Piazza C, Policriti A. A fast bisimulation algorithm. In: Proc. of the 13th Int’l Conf., Computer Aided Verification. Heidelberg: Springer-Verlag, 2001. 79~90. [doi: 10.1007/3-540-44585-4_8]
[58] Zhang X, Zhao C, Wang P, Zhou F. Mining link patterns in linked data. In: Web-Age Information Management. Heidelberg: Springer-Verlag, 2012. 83~94. [doi: 10.1007/978-3-642-32281-5_9]
[59] Babu MM, Teichmann SA. Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Research, 2003,31(4):1234~1244. [doi: 10.1093/nar/gkg210]
[60] Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature, 2001,411(6833):41~42. [doi: 10. 1038/35075138]
[61] Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL. Hierarchical organization of modularity in metabolic networks. Science, 2002,297(5586):1551~1555. [doi: 10.1126/science.1073374]
[62] Jiang X, Zhang X, Gao F, Pu C, Wang P. Graph Compression Strategies for Instance-Focused Semantic Mining, Linked Data and Knowledge Graph. Heidelberg: Springer-Verlag, 2013. 50~61. [doi: 10.1007/978-3-642-54025-7_5]
[63] Ahnert SE. Power graph compression reveals dominant relationships in genetic transcription networks. Molecular BioSystems, 2013,9(11):2681~2685. [doi: 10.1039/C3MB70236G]
[64] Stanford Large Network Dataset Collection. 2009. http://snap.stanford.edu/data
[65] Laboratory for Web Algorithmics. 2011. http://law.di.unimi.it/datasets.php
[66] WebGraphs on recoded.cl. 2010. http://webgraphs.recoded.cl/
[68] Drovandi G. PhD Web Site. 2012. http://www.dia.uniroma3.it/~drovandi/software.php
[69] Hernandez C, Navarro G. Compressed representations for Web and social graphs. Knowledge and Information Systems, 2013, 40(1):1~35. [doi: 10.1007/s10115-013-0648-4]