大数据实时交互式分析

doi:10.13328/j.cnki.jos.005886

微信服务号

微信订阅号

2025年4月23日 23:26 星期三

首页 > 过刊浏览>2020年第31卷第1期 >162-182. DOI:10.13328/j.cnki.jos.005886

PDF HTML阅读 XML下载导出引用引用提醒

大数据实时交互式分析
DOI:
                        10.13328/j.cnki.jos.005886
                    
CSTR:
                        
                    
作者:
                        袁喆袁喆
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
文继荣文继荣
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
魏哲巍魏哲巍
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
刘家俊刘家俊
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
姚斌姚斌
上海交通大学 计算机科学与工程系, 上海 200240
在期刊界中查找
在百度中查找
在本站中查找
郑凯郑凯
电子科技大学 计算机科学与工程学院, 四川 成都 610054
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:袁喆(1994-),男,江西南昌人,学士,CCF学生会员,主要研究领域为图计算,知识图谱,信息检索;文继荣(1972-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为信息检索,数据挖掘,机器学习;魏哲巍(1986-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为图计算,海量数据算法,数据流算法;刘家俊(1984-),男,博士,副教授,主要研究领域为多媒体,计算机视觉和社交媒体中的数据挖掘,数据库,机器学习;姚斌(1981-),男,博士,副教授,CCF专业会员,主要研究领域为数据库管理,大数据分析;郑凯(1983-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为数据库.
通讯作者:文继荣,E-mail:jrwen@ruc.edu.cn
中图分类号:TP311
基金项目:国家自然科学基金（61832017，61972401，61932001，61602487，61922054，61872235，61729202，U1636210，61972069，61836007，61532018）；北京高校卓越青年科学家计划（BJJWZYJH01201910002009）；国家重点研发计划（2018YFC1504504，2016YFB0700502）

Real-time Interactive Analysis on Big Data

Author:

YUAN Zhe
YUAN Zhe
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
WEN Ji-Rong
WEN Ji-Rong
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
WEI Zhe-Wei
WEI Zhe-Wei
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Jia-Jun
LIU Jia-Jun
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
YAO Bin
YAO Bin
Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200240, China
在期刊界中查找
在百度中查找
在本站中查找
ZHENG Kai
ZHENG Kai
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61832017, 61972401, 61932001, 61602487, 61922054, 61872235,61729202, U1636210, 61972069, 61836007, 61532018); Beijing Outstanding Young Scientist Program (BJJWZYJH01201910002009);National Key Research and Development Program of China (2018YFC1504504, 2016YFB0700502)

摘要

图/表

访问统计

参考文献 [133]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

实时交互式分析针对多目标和多角度的分析任务，通过多轮次的用户-数据库交互过程，逐步明确分析任务与分析目标，全方位地了解相关领域信息，最终得到科学的、全面的分析结果.相比传统数据库“提交查询-返回结果”的单轮次交互查询方式，实时交互式分析更强调交互的实时性与查询结果的时效性.对实时交互式分析的研究已成为近几年研究的热点.针对当前实时交互式分析面临的若干关键问题，对现有的实时交互式分析研究的理论基础、数据模型与系统构架进行了综述.

关键词:实时交互式分析;跨模态数据;近似查询处理

Abstract:

Real-time interactive analysis focuses on multi-object and multi-perspective analysis tasks. By employing a multiple userdatabase interaction process, interactive analysis is able to provide a more comprehensive understanding of the analytic task. Comparing to traditional database where queries are issued and answered in a single interaction, interactive analysis emphasizes on the responses time of the query and timeliness of the results. Real-time interactive analysis has been extensively studied in recently years. In this survey, comprehensive review is provided on the theoretical foundation, data models, and systems of the real-time interactive analysis.

Key words:real-time interactive analysis;cross-modal data;approximate query processing

参考文献

[1] Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel:Interactive analysis of Web-scale datasets. Communications of the ACM, 2011,54(6):114-123.[doi:10.1145/1953122.1953148]

[2] Marchionini G. Exploratory search:From finding to understanding. Communications of the ACM, 2006,49(4):41-46.[doi:10. 1145/1121949.1121979]

[3] Miller RB. Response time in man-computer conversational transactions. In:Proc. of the Fall Joint Computer Conf. Part I. New York:ACM Press, 1968. 267-277.[doi:10.1145/1476589.1476628]

[4] Liu ZC, Heer J. The effects of interactive latency on exploratory visual analysis. IEEE Trans. on Vis Comput Graph, 2014,20(12):2122-2131.

[5] Aggarwal A, Vitter JS. The input/output complexity of sorting and related problems. Communications of the ACM, 1988,31(9):1116-1127.[doi:10.1145/48529.48535]

[6] Frigo M, Leiserson CE, Prokop H, Ramachandran S. Cache-oblivious algorithms. In:Proc. of the 40th Annual Symp. on Foundations of Computer Science. Washington:IEEE Computer Society, 1999. 285-297.[doi:10.1109/SFFCS.1999.814600]

[7] Henzinger MR, Raghavan P, Rajagopalan S. Computing on data streams. SRC Technical Note, 1998-011, Boston:American Mathematical Society, 1998. 107-118.

[8] Feigenbaum J, Kannan S, Strauss MJ, Viswanathan M. An approximate L1-difference algorithm for massive data streams. SIAM Journal on Computing, 2003,32(1):131-151.[doi:10.1137/S0097539799361701]

[9] Valiant LG. A bridging model for parallel computation. Communications of the ACM, 1990,33(8):103-111.[doi:10.1145/79173. 79181]

[10] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM, 2008,51(1):107-113.[doi:10.1145/1327452.1327492]

[11] Lohr SL. Sampling:Design and Analysis. 2nd ed., San Francisco:CENGAGE Learning, 2010.

[12] Cormode G, Garofalakis M, Haas PJ, Jermaine C. Synopses for massive data:Samples, histograms, wavelets, sketches. Foundations and Trends^® in Databases, 2011,4(1-3):1-294.[doi:10.1561/1900000004]

[13] Li FF, Wu B, Yi K, Zhao ZY. Wander join and XDB:Online aggregation via random walks. SIGMOD Record, 2017,46(1):33-40.[doi:10.1145/3093754.3093763]

[14] Pansare N, Borkar VR, Jermaine C, Condie T. Online aggregation for large MapReduce jobs. PVLDB, 2011,4(11):1135-1145.

[15] Acharya S, Gibbons PB, Poosala V. Congressional samples for approximate answering of group-by queries. SIGMOD Record, 2000,29(2):487-498.[doi:10.1145/335191.335450]

[16] Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I. BlinkDB:Queries with bounded errors and bounded response times on very large data. In:Proc. of the 8th ACM European Conf. on Computer Systems. New York:ACM Press, 2013. 29-42.[doi:10.1145/2465351.2465355]

[17] Hellerstein JM, Haas PJ, Wang HJ. Online aggregation. SIGMOD Record, 1997,26(2):171-182.

[18] Haas PJ, Hellerstein JM. Ripple joins for online aggregation. SIGMOD Record, 1999,28(2):287-298.[doi:10.1145/304181. 304208]

[19] Dobra A, Jermaine C, Rusu F, Xu F. Turbo-charging estimate convergence in DBO. Proc. of the VLDB Endowment, 2009,2(1):419-430.[doi:10.14778/1687627.1687675]

[20] Jermaine C, Arumugam S, Pol A, Dobra A. Scalable approximate query processing with the DBO engine. ACM Trans. on Database Systems, 2008,33(4):23:1-23:54.[doi:10.1145/1412331.1412335]

[21] Peng JL, Zhang DX, Wang JN, Pei J. AQP++:Connecting approximate query processing with aggregate precomputation for interactive analytics. In:Proc. of the 2018 Int'l Conf. on Management of Data. New York:ACM Press, 2018. 1477-1492.[doi:10. 1145/3183713.3183747]

[22] Park Y, Tajik AS, Cafarella M, Mozafari B. Database learning:Toward a database that becomes smarter every time. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. New York:ACM Press, 2017. 587-602.[doi:10.1145/3035918.3064013]

[23] Galakatos A, Crotty A, Zgraggen E, Binnig C, Kraska T. Revisiting reuse for approximate query processing. Proc. of the VLDB Endow., 2017,10(10):1142-1153.[doi:10.14778/3115404.3115418]

[24] Garofalakis MN, Gibbon PB. Approximate query processing:Taming the TeraBytes. In:Proc. of the 27th Int'l Conf. on Very Large Data Bases. San Francisco:Morgan Kaufmann Publishers Inc., 2001. 725.

[25] Chaudhuri S, Ding B, Kandula S. Approximate query processing:No silver bullet. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. New York:ACM Press, 2017. 511-519.[doi:10.1145/3035918.3056097]

[26] Xie D, Li FF, Yao B, Li GF, Zhou L, Guo MY. Simba:Efficient in-memory spatial analytics. In:Proc. of the 2016 Int'l Conf. on Management of Data. New York:ACM Press, 2016. 1071-1085.[doi:10.1145/2882903.2915237]

[27] Xie D, Li FF, Yao B, Li GF, Chen ZP, Zhou L, Guo MY. Simba:Spatial in-memory big data analysis. In:Proc. of the 24th ACM SIGSPATIAL Int'l Conf. on Advances in Geographic Information Systems. New York:ACM Press, 2016. 86:1-86:4.[doi:10.1145/2996913.2996935]

[28] Olken F. Random sampling from database[Ph.D. Thesis]. Berkeley:University of California at Berkeley, 1993.

[29] Olken F, Rotem D. Sampling from spatial databases. In:Proc. of the 9th Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 1993. 199-208.

[30] Azevedo LG, Zimbrão1 G, de Souza JM. Approximate query processing in spatial databases using raster signatures. In:Proc. of the Advances in Geoinformatics:VIII Brazilian Symp. on GeoInformatics (GEOINFO 2006). Berlin, Heidelberg:Springer-Verlag, 2006. 69-86.[doi:10.1007/978-3-540-73414-7_5]

[31] Belussi A, Catania B, Migliorini S. Approximate queries for spatial data. In:Proc. of the Advanced Query Processing-Volume 1:Issues and Trends. Berlin, Heidelberg:Springer-Verlag, 2013. 83-127.[doi:10.1007/978-3-642-28323-9_5]

[32] Joshi S, Jermaine C. Materialized sample views for database approximation. In:Proc. of the 22nd Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 2006. 151-165.[doi:10.1109/ICDE.2006.90]

[33] Hu XC, Qiao M, Tao YF. Independent range sampling. In:Proc. of the 33rd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. New York:ACM Press, 2014. 246-255.[doi:10.1145/2594538.2594545]

[34] Christensen R, Wang L, Li FF, Yi K, Tang J, Villa N. STORM:Spatio-temporal online reasoning and management of large spatiotemporal data. In:Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2015. 1111-1116.[doi:10.1145/2723372.2735373]

[35] Wang L, Christensen R, Li FF, Yi K. Spatial online sampling and aggregation. Proc. of the VLDB Endowment, 2015,9(3):84-95.[doi:10.14778/2850583.2850584]

[36] Jeh G, Widom J. SimRank:A measure of structural-context similarity. In:Proc. of the 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM Press, 2002. 538-543.[doi:10.1145/775047.775126]

[37] Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking:Bringing order to the Web. Technical Report, Stanford:Stanford InfoLab, 1999. 1-17.

[38] Katz L. A new status index derived from sociometric analysis. Psychometrika, 1953,18(1):39-43.[doi:10.1007/BF02289026]

[39] Fogaras D, Racz B. Scaling link-based similarity search. In:Proc. of the 14th Int'l Conf. on World Wide Web. New York:ACM Press, 2005. 641-650.[doi:10.1145/1060745.1060839]

[40] Tian BY, Xiao XK. SLING:A near-optimal index structure for SimRank. In:Proc. of the 2016 Int'l Conf. on Management of Data. New York:ACM Press, 2016. 1859-1874.[doi:10.1145/2882903.2915243]

[41] Liu Y, Zheng BL, He XD, Wei ZW, Xiao XK, Zheng K, Lu JH. Probesim:Scalable single-source and top-k simrank computations on dynamic graphs. Proc. of the VLDB Endowment, 2017,11(1):14-26.[doi:10.14778/3151113.3151115]

[42] Luo XC, Gao J, Zhou C, Yu X. UniWalk:Unidirectional random walk based scalable simrank computation over large graph. In:Proc. of the 33rd Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 2017. 325-336.[doi:10.1109/ICDE. 2017.92]

[43] Lofgren P, Banerjee S, Goel A. Personalized PageRank estimation and search:A bidirectional approach. In:Proc. of the 9th ACM Int'l Conf. on Web Search and Data Mining. New York:ACM Press, 2016. 163-172.[doi:10.1145/2835776.2835823]

[44] Wang SB, Yang RC, Xiao XK, Wei ZW, Yang Y. FORA:Simple and effective approximate single-source personalized PageRank. In:Proc. of the 23rd ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM Press, 2017. 505-514.[doi:10.1145/3097983.3098072]

[45] Feigenbaum J, Kannan S, McGregor A, Suri S, Zhang J. On graph problems in a semi-streaming model. Theoretical Computer Science, 2005,348(2):207-216. https://doi.org/10.1016/j.tcs.2005.09.013

[46] Baswana S. Streaming algorithm for graph spanners-Single pass and constant processing time per edge. Information Processing Letters, 2008,106(1):110-114.[doi:10.1016/j.ipl.2007.11.001]

[47] Elkin M. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Trans. on Algorithms, 2011,7(2):20:1-20:17.[doi:10.1145/1921659.1921666]

[48] Ahn KJ, Guha S. Graph sparsification in the semi-streaming model. In:Proc. of the 36th Int'l Collogquium on Automata, Languages and Programming:Part II. Berlin, Heidelberg:Springer-Verlag, 2009. 328-338.[doi:10.1007/978-3-642-02930-1_27]

[49] Ahn KJ, Guha S, McGregor A. Graph sketches:Sparsification, spanners, and subgraphs. In:Proc. of the 31st ACM SIGMODSIGACT-SIGAI Symp. on Principles of Database Systems. New York:ACM Press, 2012. 5-14.[doi:10.1145/2213556.2213560]

[50] Goel A, Kapralov M, Post I. Single pass sparsification in the streaming model with edge deletions. arXiv:1203.4900, 2012.

[51] Feigenbaum J, Kannan S, McGregor A, Suri S, Zhang J. On graph problems in a semi-streaming model. In:Proc. of the Automata, Languages and Programming. Berlin, Heidelberg:Springer-Verlag, 2004. 531-543.[doi:10.1007/978-3-540-27836-8_46]

[52] Ahn KJ, Guha S, McGregor A. Analyzing graph structure via linear measurements. In:Proc. of the 23rd Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia:Society for Industrial and Applied Mathematics, 2012. 459-467.

[53] Crouch MS, McGregor A, Stubbs D. Dynamic graphs in the sliding-window model. In:Proc. of the Algorithms-ESA 2013. Berlin, Heidelberg:Springer-Verlag, 2013. 337-348.[doi:10.1007/978-3-642-40450-4_29]

[54] Kapralov M. Better bounds for matchings in the streaming model. In:Proc. of the 24th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia:Society for Industrial and Applied Mathematics, 2013. 1679-1697.

[55] Ahn KJ, Guha S. Access to data and number of iterations:Dual primal algorithms for maximum matching under resource constraints. In:Proc. of the 27th ACM Symp. on Parallelism in Algorithms and Architectures. New York:ACM Press, 2015. 202-211.[doi:10.1145/2755573.2755586]

[56] Epstein L, Levin A, Mestre J, Segev D. Improved approximation guarantees for weighted matching in the semi-streaming model. SIAM Journal on Discrete Mathematics, 2011,25(3):1251-1265.[doi:10.1137/100801901]

[57] McGregor A. Graph stream algorithms:A survey. SIGMOD Record, 2014,43(1):9-20.[doi:10.1145/2627692.2627694]

[58] Abello J, Finocchi I, Korn J. Graph sketches. In:Proc. of the IEEE Symp. on Information Visualization. Washington:IEEE Computer Society, 2001. 67-70.[doi:10.1109/INFVIS.2001.963282]

[59] Gao LL, Song JK, Liu XY, Shao JM, Liu JJ, Shao J. Learning in high-dimensional multimedia data:The state of the art. Multimedia Systems, 2017,23(3):303-313.[doi:10.1007/s00530-015-0494-1]

[60] Wang BK, Yang Y, Xu X, Hanjalic A, Shen HT. Adversarial cross-modal retrieval. In:Proc. of the 25th ACM Int'l Conf. on Multimedia. New York:ACM Press, 2017. 154-162.[doi:10.1145/3123266.3123326]

[61] Beyer K, Haas PJ, Reinwald B, Sismanis Y, Gemulla R. On synopses for distinct-value estimation under multiset operations. In:Proc. of the 2007 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2007. 199-210.[doi:10.1145/1247480.1247504]

[62] Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 2002, 31(6):1794-1813.[doi:10.1137/S0097539701398363]

[63] Kane DM, Nelson J, Woodruff DP. An optimal algorithm for the distinct elements problem. In:Proc. of the 29th ACM SIGMODSIGACT-SIGART Symp. on Principles of Database Systems. New York:ACM Press, 2010. 41-52.[doi:10.1145/1807085. 1807094]

[64] Arasu A, Manku GS. Approximate counts and quantiles over sliding windows. In:Proc. of the 23rd ACM SIGMOD-SIGACTSIGART Symp. on Principles of Database Systems. New York:ACM Press, 2004. 286-296.[doi:10.1145/1055558.1055598]

[65] Braverman V, Ostrovsky R, Zaniolo C. Optimal sampling from sliding windows. In:Proc. of the 28th ACM SIGMOD-SIGACTSIGART Symp. on Principles of Database Systems. New York:ACM Press, 2009. 147-156.[doi:10.1145/1559795.1559818]

[66] Neumeyer L, Robbins B, Nair A, Kesari A. S4:Distributed stream computing platform. In:Proc. of the IEEE Int'l Conf. on Data Mining (ICDM). Washington:IEEE Computer Society, 2010. 170-177.[doi:10.1109/ICDMW.2010.172]

[67] Zaharia M, Das T, Li HY, Hunter T, Shenker S, Stoica I. Discretized streams:Fault-tolerant streaming computation at scale. In:Proc. of the 24th ACM Symp. on Operating Systems Principles. New York:ACM Press, 2013. 423-438.[doi:10.1145/2517349. 2522737]

[68] Poosala V, Haas PJ, Ioannidis YE, Shekita EJ. Improved histograms for selectivity estimation of range predicates. In:Proc. of the '96 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 1996. 294-305.[doi:10.1145/233269.233342]

[69] Cormode G, Hadjieleftheriou M. Finding frequent items in data streams. Proc. of the VLDB Endowment, 2008,1(2):1530-1541.[doi:10.14778/1454159.1454225]

[70] Cormode G, Muthukrishnan S. What's hot and what's not:Tracking most frequent items dynamically. In:Proc. of the 22nd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. New York:ACM Press, 2003. 296-306.[doi:10.1145/773153.773182]

[71] Karp RM, Shenker S, Papadimitriou CH. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. on Database Systems, 2003,28(1):51-55.[doi:10.1145/762471.762473]

[72] Lee LK, Ting HF. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In:Proc. of the 25th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. New York:ACM Press, 2006. 290-297.[doi:10.1145/1142351.1142393]

[73] Metwally A, Agrawal D, Abbadi AE. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. on Database Systems, 2006,31(3):1095-1133.[doi:10.1145/1166074.1166084]

[74] Zhang LF, Guan Y. Frequency estimation over sliding windows. In:Proc. of the 24th Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 2008. 1385-1387.[doi:10.1109/ICDE.2008.4497564]

[75] Estan C, Naughton JF. End-biased samples for join cardinality estimation. In:Proc. of the 22nd Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 2006. 20.[doi:10.1109/ICDE.2006.61]

[76] Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In:Proc. of the 28th Annual ACM Symp. on Theory of Computing. New York:ACM Press, 1996. 20-29.[doi:10.1145/237814.237823]

[77] Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In:Proc. of the 29th Int'l Colloquium on Automata, Languages and Programming. Berlin, Heidelberg:Springer-Verlag, 2002. 693-703.

[78] Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M. Surfing wavelets on streams:One-pass summaries for approximate aggregate queries. In:Proc. of the 27th Int'l Conf. on Very Large Data Bases. San Francisco:Morgan Kaufmann Publishers Inc., 2001. 79-88.

[79] Plattner C, Wapf A, Alonso G. Searching in time. In:Proc. of the 2006 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2006. 754-756.[doi:10.1145/1142473.1142578]

[80] Shaull R, Shrira L, Xu H. Skippy:A new snapshot indexing method for time travel in the storage manager. In:Proc. of the 2008 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2008. 637-648.[doi:10.1145/1376616.1376681]

[81] Cormode G, Muthukrishnan S. An improved data stream summary:The count-min sketch and its applications. Journal of Algorithms, 2005,55(1):58-75.[doi:10.1016/j.jalgor.2003.12.001]

[82] Greenwald M, Khanna S. Space-efficient online computation of quantile summaries. In:Proc. of the 2001 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2001. 58-66.[doi:10.1145/375663.375670]

[83] Guha S, McGregor A. Stream order and order statistics:Quantile estimation in random-order streams. SIAM Journal on Computing, 2009,38(5):2044-2059.[doi:10.1137/07069328X]

[84] Tao YF, Yi K, Sheng C, Pei J, Li FF. Logging every footstep:Quantile summaries for the entire history. In:Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2010. 639-650.[doi:10.1145/1807167.1807237]

[85] Yu X, Chong ZH, Lu HJ, Zhou AY. False positive or false negative:Mining frequent itemsets from high speed transactional data streams. In:Proc. of the 30th Int'l Conf. on Very Large Data Bases, Vol.30. San Francisco:VLDB Endowment, 2004. 204-215.

[86] Dobra A, Garofalakis M, Gehrke J, Rastogi R. Processing complex aggregate queries over data streams. In:Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2002. 61-72.[doi:10.1145/564691.564699]

[87] Alon N, Gibbons PB, Matias Y, Szegedy M. Tracking join and self-join sizes in limited storage. In:Proc. of the 18th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. New York:ACM Press, 1999. 10-20.[doi:10.1145/303976.303978]

[88] Rusu F, Dobra A. Statistical analysis of sketch estimators. In:Proc. of the 2007 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2007. 187-198.[doi:10.1145/1247480.1247503]

[89] Rusu F, Dobra A. Sketching sampled data streams. In:Proc. of the 25th Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 2009. 381-392.[doi:10.1109/ICDE.2009.31]

[90] Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In:Proc. of the 21st ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. New York:ACM Press, 2002. 1-16.[doi:10.1145/543613. 543615]

[91] Muthukrishnan S. Data streams:Algorithms and applications. Foundations and Trends^® in Theoretical Computer Science, 2005, 1(2):117-236.[doi:10.1561/0400000002]

[92] Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive:A warehousing solution over a map-reduce framework. Proc. of the VLDB Endowment, 2009,2(2):1626-1629.[doi:10.14778/1687553.1687609]

[93] Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig Latin:A not-so-foreign language for data processing. In:Proc. of the 2008 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2008. 1099-1110.[doi:10.1145/1376616. 1376726]

[94] Sadayuki F. Presto. https://github.com/prestodb/presto

[95] Wang JD, Zhang T, Song JK, Sebe N, Shen HT. A survey on learning to Hash. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2016,40(1):769-790.

[96] Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R. MapReduce online. In:Proc. of the 7th USENIX Conf. on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2010. 21.

[97] Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D. Storm@Twitter. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2014. 147-156.[doi:10.1145/2588555.2595641]

[98] Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S. Twitter heron:Stream processing at scale. In:Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2015. 239-250.[doi:10.1145/2723372.2742788]

[99] Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark:Cluster computing with working sets. In:Proc. of the 2nd USENIX Conf. on Hot Topics in Cloud Computing. Berkeley:USENIX Association, 2010. 10.

[100] Engle C, Lupher A, Xin R, Zaharia M, Franklin MJ, Shenker S, Stoica I. Shark:Fast data analysis using coarse-grained distributed memory. In:Proc. of the 2012 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2012. 689-692.[doi:10.1145/2213836.2213934]

[101] Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M. Spark SQL:Relational data processing in spark. In:Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2015. 1383-1394.[doi:10.1145/2723372.2742797]

[102] Spark. https://spark.apache.org/streaming/

[103] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable:A distributed storage system for structured data. ACM Trans. on Computer Systems, 2008,26(2):4:1-4:26.[doi:10.1145/1365815.1365816]

[104] Cooper BF, Ramakrishnan R, Srivastava U, Silberstein A, Bohannon P, Jacobsen H, Puz N, Weaver D, Yerneni R. PNUTS:Yahoo!'s hosted data serving platform. Proc. of the VLDB Endowment, 2008,1(2):1277-1288.[doi:10.14778/1454159.1454167]

[105] Kornacker M, Behm A, Bittorf V, Bobrovytsky T, Choi A, Erickson J, Grund M, Hecht D, Jacobs M, Joshi I, Kuff L, Kumar D, Leblang A, Li N, Robinson H, Rorke D, Rus S, Russell J, Tsirogiannis D, Wanderman-milne S, Yoder M. Impala:A modern, opensource SQL engine for Hadoop. In:Proc. of the 2015 Biennial Conf. on Innovative Data Systems Research. 2015.

[106] Yang FJ, Tschetter E, Léauté X, Ray N, Merlino G, Ganguli D. Druid:A real-time analytical data store. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2014. 157-168.[doi:10.1145/2588555. 2595631]

[107] Idreos S, Papaemmanouil O, Chaudhuri S. Overview of data exploration techniques. In:Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2015. 277-281.[doi:10.1145/2723372.2731084]

[108] Roy SB, Stefanidis K, Koutrika G, Lakshmanan LV, Riedewald M. Report on the 3rd Int'l workshop on exploratory search in databases and the Web (ExploreDB 2016). SIGMOD Record, 2016,45(3):35-38.[doi:10.1145/3022860.3022867]

[109] Drosou M, Pitoura E. YmalDB:A result-driven recommendation system for databases. In:Proc. of the 16th Int'l Conf. on Extending Database Technology. New York:ACM Press, 2013. 725-728.[doi:10.1145/2452376.2452464]

[110] Kamat N, Jayachandran P, Tunga K, Nandi A. Distributed and interactive cube exploration. In:Proc. of the 30th Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 2014. 472-483.[doi:10.1109/ICDE.2014.6816674]

[111] Sellam T, Kersten ML. Meet Charles, big data query advisor. In:Proc. of the 2013 Biennial Conf. on Innovative Data Systems Research. 2013.

[112] Dimitriadou K, Papaemmanouil O, Diao YL. Interactive data exploration based on user relevance feedback. In:Proc. of the 30th Int'l Conf. on Data Engineering. Washington:IEEE Computer Society, 2014. 292-295.[doi:10.1109/ICDEW.2014.6818343]

[113] Jiang LL, Nandi A. SnapToQuery:Providing interactive feedback during exploratory query specification. Proc. of the VLDB Endowment, 2015,8(11):1250-1261.[doi:10.14778/2809974.2809986]

[114] Golovchinsky G, Diriye A, Dunnigan T. The future is in the past:Designing for exploratory search. In:Proc. of the 4th Information Interaction in Context Symp. New York:ACM Press, 2012. 52-61.[doi:10.1145/2362724.2362738]

[115] Stolte C, Tang D, Hanrahan P. Polaris:A system for query, analysis, and visualization of multidimensional databases. Communications of the ACM, 2008,51(11):75-84.[doi:10.1145/1400214.1400234]

[116] Key A, Howe B, Perry D, Aragon C. VizDeck:Self-organizing dashboards for visual analytics. In:Proc. of the 2012 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2012. 681-684.[doi:10.1145/2213836.2213931]

[117] Schoeffmann K, Ahlström D, Bailer W, Cobârzan C, Hopfgartner F, McGuinness K, Gurrin C, Frisson C, Le D, Del Fabro M, Bai HL, Weiss W. The video browser showdown:A live evaluation of interactive video search tools. Int'l Journal of Multimedia Information Retrieval, 2014,3(2):113-127.[doi:10.1007/s13735-013-0050-8]

[118] Acharya S, Gibbons PB, Poosala V, Ramaswamy S. The aqua approximate query answering system. In:Proc. of the 1999 ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 1999. 574-576.[doi:10.1145/304182.304581]

[119] Olston C, Bortnikov E, Elmeleegy K, Junqueira F, Reed B. Interactive analysis of Web-scale data. In:Proc. of the 2009 Biennial Conf. on Innovative Data Systems Research. 2009.

[120] Sidirourgos L, Kersten M, Boncz P. SciBORQ:Scientific data management with bounds on runtime and quality. In:Proc. of the 5th Biennial Conf. on Innovative Data Systems Research. 2011. 296-301.

[121] Agarwal S, Iyer AP, Panda A, Madden S, Mozafari B, Stoica I. Blink and it's done:Interactive queries on very large data. Proc. of the VLDB Endowment, 2012,5(12):1902-1905.[doi:10.14778/2367502.2367533]

[122] Mozafari B, Ramnarayan J, Menon S, Mahajan Y, Chakraborty S, Bhanawat H, Bachhav K. SnappyData:A unified cluster for streaming, transactions and interactice analytics. In:Proc. of the 2017 Biennial Conf. on Innovative Data Systems Research. 2017.

[123] Park Y, Mozafari B, Sorenson J, Wang JH. VerdictDB:Universalizing approximate query processing. In:Proc. of the 2018 Int'l Conf. on Management of Data. New York:ACM Press, 2018. 1461-1476.[doi:10.1145/3183713.3196905]

[124] Krishnan DR. The marriage of incremental and approximate computing[MS. Thesis]. Dresden:Technical University Dresden, 2016.

[125] Mozafari B, Niu N. A handbook for building an approximate query engine. IEEE Data Engineering Bulletin, 2015,38(3):3-29.

[126] Mozafari B. Approximate query engines:Commercial challenges and research opportunities. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. New York:ACM Press, 2017. 521-524.[doi:10.1145/3035918.3056098]

[127] Cheng XQ, Jin XL, Wang YZ, Guo JF, Zhang TY, Li GJ. Survey on big data system and analytic technology. Ruan Jian Xue Bao/Journal of Software, 2014,25(9):1889-1908(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4674.htm[doi:10.13328/j.cnki.jos.004674]

[128] Wang MX, Li FF, Gu Y, Yu G. Survey on interactive data exploration. Journal of Frontiers of Computer Science & Technology, 2017,11(2):171-184(in Chinese with English abstract).

[129] Huang L, Sun K, Chen XZ, Zhou MQ. In-memory cluster computing:Interactive data analysis. Journal of East China Normal University (Natural Sciences), 2014,2014(5):216-227(in Chinese with English abstract).

附中文参考文献:

[127] 程学旗,靳小龙,王元卓,郭嘉丰,张铁赢,李国杰.大数据系统和分析技术综述.软件学报,2014,25(9):1889-1908. http://www.jos.org.cn/1000-9825/4674.htm[doi:10.13328/j.cnki.jos.004674]

[128] 王蒙湘,李芳芳,谷峪,于戈.交互式数据探索综述.计算机科学与探索,2017,11(2):171-184.

[129] 黄岚,孙珂,陈晓竹,周敏奇.内存集群计算:交互式数据分析.华东师范大学学报(自然科学版),2014,2014(5):216-227.

引用本文

袁喆,文继荣,魏哲巍,刘家俊,姚斌,郑凯.大数据实时交互式分析.软件学报,2020,31(1):162-182

复制

文章指标

点击次数:5018
下载次数: 8836
HTML阅读次数: 5720
引用次数: 0

历史

收稿日期:2018-09-14
最后修改日期:2019-06-13
录用日期:
在线发布日期: 2019-11-07
出版日期: 2020-01-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码