新型数据管理系统研究进展与趋势

doi:10.13328/j.cnki.jos.005646

微信服务号

微信订阅号

2025年3月29日 19:24 星期六

首页 > 过刊浏览>2019年第30卷第1期 >164-193. DOI:10.13328/j.cnki.jos.005646

PDF HTML阅读 XML下载导出引用引用提醒

新型数据管理系统研究进展与趋势
DOI:
                        10.13328/j.cnki.jos.005646
                    
CSTR:
                        
                    
作者:
                        崔斌崔斌
北京大学 信息科学技术学院, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找
高军高军
北京大学 信息科学技术学院, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找
童咏昕童咏昕
软件开发环境国家重点实验室(北京航空航天大学), 北京 100083
在期刊界中查找
在百度中查找
在本站中查找
许建秋许建秋
南京航空航天大学 计算机科学与技术学院, 江苏 南京 211106
在期刊界中查找
在百度中查找
在本站中查找
张东祥张东祥
电子科技大学 计算机科学与工程学院, 四川 成都 611731
在期刊界中查找
在百度中查找
在本站中查找
邹磊邹磊
北京大学 计算机科学技术研究所, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:崔斌(1975-),男,浙江宁波人,博士,教授,博士生导师,CCF杰出会员,主要研究领域为数据库系统,大数据管理和分析;高军(1975-),男,博士,教授,博士生导师, CCF高级会员,主要研究领域为分布式数据管理,图数据管理和深度分析;童咏昕(1982-),男,博士,副教授,CCF专业会员,主要研究领域为众包数据管理,群体智能,时空数据管理与挖掘,不确定数据管理与挖掘;许建秋(1982-),男,博士,副教授,CCF专业会员,主要研究领域为空间,移动对象数据管理;张东祥(1985-),男,博士,教授,CCF专业会员,主要研究领域为时空大数据分析,智能交通优化;邹磊(1981-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为图数据库系统,知识图谱分析.
通讯作者:崔斌,E-mail:bin.cui@pku.edu.cn
中图分类号:
基金项目:国家自然科学基金（61832001，61572040，61822201，61622201，61602087）

Progress and Trend in Novel Data Management System

Author:

CUI Bin
CUI Bin
School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找
GAO Jun
GAO Jun
School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找
TONG Yong-Xin
TONG Yong-Xin
State Key Laboratory of Software Development Environment(Beijing University of Aeronautics and Astronautics), Beijing 100083, China
在期刊界中查找
在百度中查找
在本站中查找
XU Jian-Qiu
XU Jian-Qiu
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Dong-Xiang
ZHANG Dong-Xiang
College of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
在期刊界中查找
在百度中查找
在本站中查找
ZOU Lei
ZOU Lei
Institute of Computer Science and Technology, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61832001, 61572040, 61822201, 61622201, 61602087)

摘要

图/表

访问统计

参考文献 [185]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随着各类新型计算技术和新兴应用领域的浮现，传统数据库技术面临新的挑战，正在从适用常规应用的单一处理方法逐步转为面向各类特殊应用的多种数据处理方式.分析并展望了新型数据管理系统的研究进展和趋势，涵盖分布式数据库、图数据库、流数据库、时空数据库和众包数据库等多个领域.具体而言：分布式数据管理技术是支持可扩展的海量数据处理的关键技术；以社交网络为代表的大规模图结构数据的处理需求带来了图数据库技术的发展；流数据管理技术用来应对数据动态变化的管理需求；时空数据库主要用于支持移动对象管理；对多源、异构而且劣质数据源的集成需求催生出新型的众包数据库技术.最后讨论了新型数据库管理系统的未来发展趋势.

关键词:分布式数据库;图数据库;流数据库;时空数据库;众包数据库

Abstract:

With the emergence of novel computing techiniques and applications, the traditional database manamgement systems face challenges, and undergo significant shifts from the single data model processing to multiple data model processing. This paper presents a comphrensive survey on the recent progress and future direction in the novel data management systems, including distributed databases, graph databases, streaming databases, spatial-temporal databases, and crowdsourcing databases. Specifically, the distributed techinqiues play a key role to improve the scabablity of large scale data processing. Graph data management techniques are driven by the big graph management requirement in applications like social network. Stream data management techiniques are also developed to process dynamic data. Spatial-temporal databases are mainly applied in the management of mobile objects. Last but not least, the processing of multiple sources, hetergonenous and low quality data motivates the advance of crowd-sourcing techniques. This study also surveys other hot research directions and foresees the future work.

Key words:distributed databases;graph databases;stream databases;spatial-temporal databases;crowd-sourcing databases

参考文献

[1] Gilbert S, Lynch NA. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant Web services. SIGACT News, 2002,33(2):51-59.

[2] https://aws.amazon.com/cn/rds/

[3] Krishnan SPT, Gonzalez JLU. Building Your Next Big Thing with Google Cloud Platform. Apress, 2015. 159-183.

[4] Talaat S. Pro PowerShell for Microsoft Azure. Apress, 2015. 95-115.

[5] https://cn.aliyun.com/product/rds

[6] https://cloud.tencent.com/product/cdb

[7] https://www.163yun.com/

[8] Ongaro D, Ousterhout JK. In search of an understandable consensus algorithm. In:Proc. of the USENIX Annual Technical Conf. 2014. 305-319.

[9] George L. Hbase-The Definitive Guide:Random Access to Your Planet-Size Data. O'Reilly, 2011. 1-522.

[10] Lakshman A, Malik P. Cassandra:A decentralized structured storage system. Operating Systems Review, 2010,44(2):35-40.

[11] https://redis.io/

[12] http://memcachedb.org/

[13] Sivasubramanian S. Amazon DynamoDB:A seamlessly scalable non-relational database service. In:Proc. of the SIGMOD Conf. 2012. 729-730.

[14] Sciore E. SimpleDB:A simple Java-based multiuser syst for teaching database internals. In:Proc. of the SIGCSE. 2007. 561-565.

[15] Chodorow K, Dirolf M. MongoDB-The Definitive Guide:Powerful and Scalable Data Storage. O'Reilly, 2010. 1-193.

[16] Anderson JC, Lehnardt J, Slater N. CouchDB-The Definitive Guide:Time to Relax. O'Reilly, 2010. 1-245.

[17] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber R. Bigtable:A distributed storage system for structured data. In:Proc. of the OSDI. 2006. 205-218.

[18] Ghemawat S, Gobioff H, Leung ST. The Google file system. In:Proc. of the SOSP. 2003. 29-43.

[19] Borthakur D. HDFS architecture guide. In:Proc. of the Hadoop Apache Project. 2008.

[20] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. In:Proc. of the OSDI. 2004. 137-150.

[21] Mohan C, Haderle DJ, Lindsay BG, Pirahesh H, Schwarz PM. ARIES:A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. on Database Systems, 1992,17(1):94-162.

[22] Olson MA, Bostic K, Seltzer MI. Berkeley DB. In:Proc. of the USENIX Annual Technical Conf. on FREENIX Track. 1999. 183-191.

[23] Palankar MR, Iamnitchi A, Ripeanu M, Garfinkel S. Amazon S3 for science grids:A viable solution? In:Proc. of the 2008 Int'l Workshop on Data-Aware Distributed Computing (DADC 2008). 2008. 55-64.

[24] Bernstein PA, Goodman N. Multiversion concurrency control-Theory and algorithms. ACM Trans. on Database Systems, 1983, 8(4):465-483.

[25] http://fallabs.com/tokyocabinet/

[26] Feinberg A. Project Voldemort:Reliable distributed storage. In:Proc. of the 27th Int'l Conf. on Data Engineering. 2011.

[27] Klophaus R, Core R. Building distributed applications without shared state. In:Proc. of the ACM SIGPLAN Commercial Users of Functional Programming. 2010.

[28] Corbett JC, Dean J, Epstein M, Fikes A, Frost CC, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh WC, Kanthak S, Kogan E, Li HY, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner:Google's globally-distributed database. In:Proc. of the OSDI. 2012. 261-264.

[29] Shute J, Vingralek R, Samwel B, Handy B, Whipkey C, Rollins E, Oancea M, Littlefield K, Menestrina D, Ellner S, Cieslewicz J, Rae I, Stancescu T, Apte H. F1:A distributed SQL database that scales. PVLDB, 2013,6(11):1068-1079.

[30] https://oceanbase.alipay.com/

[31] https://cloud.tencent.com/product/dcdb

[32] https://www.oschina.net/p/tidb

[33] WeChat _Wikipedia. https://en.wikipedia.org/wiki/WeChat

[34] Linked data. http://linkeddata.org/

[35] Graph 500|large-scale benchmarks. http://graph500.org/

[36] Dorogovtsev SN, Goltsev AV, Mendes JFF. K-Core organization of complex networks. Physical Review Letters, 2006,96(4):040601.

[37] Trusses CJ. Cohesive subgraphs for social network analysis. National Security Agency Technical Report, 2008. 16.

[38] Palla G, Derényi I, Farkas I, et al. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005,435(7043):814.

[39] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking:Bringing Order to the Web. Stanford InfoLab, 1999. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

[40] Jeh G, Widom J. SimRank:A measure of structural-context similarity. In:Proc. of the 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. ACM Press, 2002. 538-543.

[41] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998, 30(1-7):107-117.

[42] Deshpande M, Kuramochi M, Wale N, et al. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. on Knowledge and Data Engineering, 2005,17(8):1036-1050.

[43] Wasserman S, Faust K. Social Network Analysis:Methods and Applications. Cambridge University Press, 1994.

[44] Zou L, Mo J, Chen L, Özsu MT, Zhao D. gStore:Answering SPARQL queries via subgraph matching. PVLDB, 2011,4(8):482-493.

[45] Zou L, Özsu MT, Chen L, Shen X, Huang R, Zhao D. gStore:A graph-based SPARQL query engine. The VLDB Journal, 2014, 23(4):565-590.

[46] Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel:A system for large-scale graph processing. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2010. 505-516.

[47] Han M, Daudjee K. Giraph unchained:Barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. of the VLDB Endowment, 2015,8(9):950-961.

[48] Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph:Distributed graph-parallel computation on natural graphs. In:Proc. of the 10th USENIX Symp. on Operating Systems Design and Implementation. 2012. 17-30.

[49] Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM. GraphLab:A new framework for parallel machine learning. In:Proc. of the 26th Conf. on Uncertainty in Artificial Intelligence. 2010. 340-349.

[50] Zhang Q, Yan D, Cheng J. Quegel:A general-purpose system for querying big graphs. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2016. 2189-2192.

[51] Shao Y, Cui B, Ma L. PAGE:A partition aware engine for parallel graph computation. IEEE Trans. on Knowledge and Data Engineering, 2015,27(2):518-530.

[52] Shao B, Wang H, Li Y. Trinity:A distributed graph engine on a memory cloud. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York:ACM Press, 2013. 505-516.

[53] Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I. GraphX:Graph processing in a distributed dataflow framework. In:Proc. of the 11th USENIX Symp. on Operating Systems Design and Implementation. 2014. 599-613.

[54] Valiant LG. A bridging model for parallel computation. Communications of the ACM, 1990,33(8):103-111.

[55] Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark:Cluster computing with working sets. In:Proc. of the 2nd USENIX Workshop on Hot Topics in Cloud Computing. 2010.

[56] Webber J. A programmatic introduction to Neo4J. In:Proc. of the Conf. on Systems, Programming, and Applications:Software for Humanity. 2012. 217-218.

[57] Aberger CR, Lamb A, Tu S, Nötzli A, Olukotun K, Ré C. EmptyHeaded:A relational engine for graph processing. ACM Trans. on Database Systems, 2017,42(4):20:1-20:44.

[58] Aberger CR, Tu S, Olukotun K, Ré C. EmptyHeaded:A relational engine for graph processing. In:Proc. of the SIGMOD Conf. 2016. 431-446.

[59] Peng P, Zou L, Özsu MT, et al. Processing SPARQL queries over distributed RDF graphs. The VLDB Journal, 2016,25(2):243-268.

[60] Cypher. https://neo4j.com/docs/developer-manual/current/cypher/

[61] Ngo HQ, Porat E, Ré C, Rudra A. Worst-Case optimal join algorithms. Journal of the ACM, 2018,65(3):16:1-16:40.

[62] RDF. https://www.w3.org/RDF/

[63] SPARQL. https://www.w3.org/TR/rdf-sparql-query/

[64] Zou L, Huang R, Wang H, et al. Natural language question answering over RDF:A graph data driven approach. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2014. 313-324.

[65] http://www.omnicoreagency.com/twitter-statistics

[66] Arasu A, Babu S, Widom J. The CQL continuous query language:Semantic foundations and query execution. The VLDB Journal, 2006,15(2):121-142.

[67] Team S. StreamSQL:A data stream language extending SQL. 2017.

[68] Abadi DJ, Carney D, Çetintemel U, et al. Aurora:A new model and architecture for data stream management. The VLDB Journal, 2003,12(2):120-139.

[69] STREAM Group. STREAM:The Stanford Stream Data Manager. Stanford InfoLab, 2003.

[70] Chandrasekaran S, Cooper O, Deshpande A, et al. TelegraphCQ:Continuous dataflow processing. In:Proc. of the 2003 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2003. 668.

[71] Chen J, De Witt DJ, Tian F, et al. NiagaraCQ:A scalable continuous query system for Internet databases. ACM SIGMOD Record, 2000,29(2):379-390.

[72] Cranor C, Johnson T, Spataschek O, et al. Gigascope:A stream database for network applications. In:Proc. of the 2003 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2003. 647-651.

[73] http://storm.apache.org

[74] Toshniwal A, Taneja S, Shukla A, et al. Storm@Twitter. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. 2014. 147-156.

[75] http://spark.apache.org/streaming

[76] http://samza.apache.org

[77] Noghabi SA, Paramasivam K, Pan Y, et al. Samza:Stateful scalable stream processing at LinkedIn. Proc. of the VLDB Endowment, 2017,10(12):1634-1645.

[78] http://flink.apache.org

[79] Carbone P, Katsifodimos A, Ewen S, et al. Apache flink:Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015,36(4).

[80] http://kafka.apache.org

[81] Urban TL. Establishing delivery guarantee policies. European Journal of Operational Research, 2009,196(3):959-967.

[82] Weise T, Ramanath MV, Yan D, Knowles K. Learning Apache Apex:Real-Time Streaming Applications with Apex. Packt Publishing, 2017.

[83] http://spark.apache.org

[84] Zaharia M, Chowdhury M, Franklin MJ, et al. Spark:Cluster computing with working sets. HotCloud, 2010,10(10-10):95.

[85] http://lambda-architecture.net

[86] Hasani Z, Kon-Popovska M, Velinov G. Lambda architecture for real time big data analytic. In:Proc. of the ICT Innovations. 2014.

[87] http://rocksdb.org

[88] Kreps J, Narkhede N, Rao J. Kafka:A distributed messaging system for log processing. In:Proc. of the NetDB. 2011. 1-7.

[89] http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

[90] McGregor A. Graph stream algorithms:A survey. ACM SIGMOD Record, 2014,43(1):9-20.

[91] Song C, Ge T, Chen C, et al. Event pattern matching over graph streams. Proc. of the VLDB Endowment, 2014,8(4):413-424.

[92] Qiu X, Cen W, Qian Z, et al. Real-Time constrained cycle detection in large dynamic graphs. Proc. of the VLDB Endowment, 2018, 11(12).

[93] Sutanay C, Lawrence BH, George CJ, Khushbu A, John F. A selectivity based approach to continuous pattern detection in streaming graphs. In:Proc. of the 18th Int'l Conf. on Extending Database Technology. 2015. 157-168.

[94] Samet H. Foundations of Multidimensional and Metric Data Structure. Morgan Kaufmann Publishers, 2006.

[95] Cong G, Jensen CS. Querying geo-textual data:Spatial keyword queries and beyond. In:Proc. of the SIGMOD. 2016. 2207-2212.

[96] Enderle J, Schneider N, Seidl T. Efficiently processing queries on interval-and-value tuples in relational databases. In:Proc. of the VLDB. 2005. 385-396.

[97] Dignös A, Böhlen MH, Gamper J, Jensen CS. Extending the kernel of a relational DBMS with comprehensive support for sequenced temporal queries. ACM Trans. on Database Systems, 2016,41(4):26:1-26:46.

[98] Güting RH, Schneider M. Moving Objects Databases. Morgan Kaufmann Publishers, 2005.

[99] Dinh L, Aref WG, Mokbel MF. Spatio-Temporal access methods:Part 2(2003-2010). IEEE Data Engineering Bulletin, 2010, 33(2):46-55.

[100] Trajcevski G, Wolfson O, Hinrichs KH, Chamberlain S. Managing uncertainty in moving objects databases. ACM Trans. on Database Systems, 2004,29(3):463-507.

[101] Güting RH, de Almeida VT, Ding Z. Modeling and querying moving objects in networks. The VLDB Journal, 2006,15(2):165-190.

[102] Gao Y, Zheng B, Chen G, Chen C, Li Q. Continuous nearest-neighbor search in the presence of obstacles. ACM Trans. on Database Systems, 2011,36(2):9.

[103] Zhang C, Han J, Shou L, Lu J, La Porta TF. Splitter:Mining fine grained sequential patterns in semantic trajectories. PVLDB, 2014, 7(9):769-780.

[104] Zheng K, Su H. Go beyond raw trajectory data:Quality and semantics. IEEE Data Engineering Bulletin, 2015,38(2):27-34.

[105] Singh H, Bawa S. A survey of traditional and MapReduce based spatial query processing approaches. SIGMOD Record, 2017,46(2):18-29.

[106] Xie D, Li F, Yao B, Li GL, Zhou L, Guo M. Simba:Efficient in memory spatial analytics. In:Proc. of the SIGMOD. 2016. 1071-1085.

[107] Shang Z, Li GL, Bao Z. DITA:Distributed in-memory trajectory analytics. In:Proc. of the SIGMOD. 2018. 725-740.

[108] Ding X, Chen L, Gao Y, Jensen CS, Bao H. UlTraMan:A unified platform for big trajectory data management and analytics. PVLDB, 2018,11(7):787-799.

[109] Güting RH, Behr T, Düntgen C. SECONDO:A platform for moving objects database research and for publishing and integrating research implementations. IEEE Data Engineering Bulletin, 2010,33(2):56-63.

[110] Law E, Ahn L. Human computation. In:Proc. of the Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2011.

[111] Howe J. Crowdsourcing:Why the Power of the Crowd is Driving the Future of Business. Crown Business, 2009.

[112] Tong YX, Chen L, Shahabi C. Spatial crowdsourcing:Challenges, techniques, and applications. Proc. of the VLDB Endowment, 2017,10(12):1988-1991.

[113] Tong YX, Yuan Y, Cheng YR, Chen L, Wang GR. Survey on spatiotemporal crowdsourced data management techniques. Ruan Jian Xue Bao/Journal of Software, 2017,28(1):35-58(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5140.htm[doi:10.13328/j.cnki.jos.005140]

[114] Chittilappilly A, Chen L, Amer-Yahia S. A survey of general-purpose crowdsourcing techniques. IEEE Trans. on Knowledge and Data Engineering, 2016,28(9):2246-2266.

[115] Li GL, Wang J, Zheng Y, Franklin MJ. Crowdsourced data management:A survey. IEEE Trans. on Knowledge and Data Engineering, 2016,28(9):2296-2319.

[116] Feng JH, Li GL, Feng JH. A survey on crowdsourcing. Chinese Journal of Computers, 2015,38(9):1713-1726(in Chinese with English abstract).

[117] Ho C, Vaughan J. Online task assignment in crowdsourcing markets. In:Proc. of the 26th AAAI Conf. on Artificial Intelligence (AAAI 2012). Toronto, 2012. 45-51.

[118] Ho C, Jabbari S, Vaughan J. Adaptive task assignment for crowdsourced classification. In:Proc. of the 30th Int'l Conf. on Machine Learning (ICML 2013). Atlanta, 2013. 534-542.

[119] Tong YX, She JY, Ding B, Wang L, Chen L. Online mobile micro-task allocation in spatial crowdsourcing. In:Proc. of the 32nd Int'l Conf. on Data Engineering. 2016. 49-60.

[120] Tong YX, She JY, Ding B, et al. Online minimum matching in real-time spatial data:Experiments and analysis. Proc. of the VLDB Endowment, 2016,9(12):1053-1064.

[121] Song TS, Tong YX, Wang L, et al. Trichromatic online matching in real-time spatial crowdsourcing. In:Proc. of the 33rd Int'l Conf. on Data Engineering. 2017. 1009-1020.

[122] Tong YX, Wang L, Zhou Z, et al. Flexible online task assignment in real-time spatial data. Proc. of the VLDB Endowment, 2017, 10(11):1334-1345.

[123] Tong YX, Chen YQ, Zhou ZM, et al. The simpler the better:A unified approach to predicting original taxi demands based on large-scale online platforms. In:Proc. of the 23rd Int'l Conf. on Knowledge Discovery and Data Mining. 2017. 1653-1662.

[124] Tong YX, Zeng Y, Zhou ZM, et al. A unified approach to route planning for shared mobility. Proc. of the VLDB Endowment, 2018, 11(11):1633-1646.

[125] Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J. Whose vote should count more:Optimal integration of labels from labelers of unknown expertise. In:Proc. of the 23rd Annual Conf. on Neural Information Processing Systems (NIPS). British Columbia, 2018. 2035-2043.

[126] Raykar V, Yu S. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research, 2012,13:491-518.

[127] Sheng V, Provost F, Ipeirotis P. Get another label? Improving data quality and data mining using multiple, noisy labelers. In:Proc. of the 14th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD 2008). Las Vegas, 2018. 614-622.

[128] Dalvi N, Dasgupta A, Kumar R, Rastogi V. Aggregating crowdsourced binary ratings. In:Proc. of the 22nd Int'l World Wide Web Conf. (WWW 2013). Rio de Janeiro, 2013. 285-294.

[129] Karger D, Oh S, Shah D. Efficient crowdsourcing for multi-class labeling. In:Proc. of the ACM SIGMETRICS Int'l Conf. on Measurement and Modeling of Computer Systems (SIGMETRICS 2013). Pittsburgh, 2013. 81-92.

[130] Joglekar M, Garcia-Molina H, Parameswaran A. Evaluating the crowd with confidence. In:Proc. of the 19th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (SIGKDD 2013). Chicago, 2013. 686-694.

[131] Liu X, Lu M, Ooi B, Shen Y, Wu S, Zhang M. CDAS:A crowdsourcing data analytics system. Proc. of the VLDB Endowment (PVLDB), 2012,5(11):1495-1506.

[132] Cao CC, She JY, Tong YX, Chen L. Whom to ask? Jury selection for decision making tasks on micro-blog services. Proc. of the VLDB Endowment (PVLDB), 2012,5(11):1495-1506.

[133] Cao CC, Tong YX, Chen L, Jagadish H. WiseMarket:A new paradigm for managing wisdom of online social users. In:Proc. of the 19th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (SIGKDD 2013). Chicago, 2013. 455-463.

[134] Singer Y, Mittal M. Pricing mechanisms for crowdsourcing markets. In:Proc. of the 22nd Int'l World Wide Web Conf. (WWW 2013). Rio de Janeiro, 2013. 1157-1166.

[135] Singla A, Krause A. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In:Proc. of the 22nd Int'l World Wide Web Conf. (WWW 2013). Rio de Janeiro, 2013. 1167-1178.

[136] Gao Y, Parameswaran A. Finish them! Pricing algorithms for human computation. Proc. of the VLDB Endowment (PVLDB), 2014, 7(14):1965-1976.

[137] Tong YX, Chen L, Zhou ZM, et al. SLADE:A smart large-scale task decomposer in crowdsourcing. IEEE Trans. on Knowledge and Data Engineering, 2018.[doi:10.1109/TKDE.2018.2797962]

[138] Tong YX, Wang L, Zhou ZM, et al. Dynamic pricing in spatial crowdsourcing:A matching-based approach. In:Proc. of the Int'l Conf. on Management of Data. 2018. 773-788.

[139] Parameswaran A, Garcia-Molina H, Park H, Polyzotis N, Ramesh A, Widom J. Crowdscreen:Algorithms for filtering data with humans. In:Proc. of the 2012 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2012). Scottsdale, 2012. 361-372.

[140] Parameswaran A, Boyd S, Garcia-Molina H, Gupta A, Polyzotis N, Widom J. Optimal crowd-powered rating and filtering algorithms. Proc. of the VLDB Endowment (PVLDB), 2014,7(9):685-696.

[141] Guo S, Parameswaran A, Garcia-Molina H. So who won? Dynamic max discovery with the crowd. In:Proc. of the 2012 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2012). Scottsdale, 2012. 385-396.

[142] Davidson S, Khanna S, Milo T, Roy S. Using the crowd for top-k and group-by queries. In:Proc. of the 16th Int'l Conf. on Database Theory (ICDT 2013). Genoa, 2013. 225-236.

[143] Li KY, Zhang XH, Li GL. A rating-ranking method for crowdsourced top-k computation. In:Proc. of the Int'l Conf. on Management of Data. 2018. 975-990.

[144] Wang J, Kraska T, Franklin MJ, Feng J. CrowdER:Crowdsourcing entity resolution. Proc. of the VLDB Endowment (PVLDB), 2012,5(11):1483-1494.

[145] Wang J, Li G, Kraska T, Franklin MJ, Feng J. Leveraging transitive relations for crowdsourced joins. In:Proc. of the 2013 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2013). New York, 2013. 229-240.

[146] Vesdapunt N, Bellare K, Dalvi N. Crowdsourcing algorithms for entity resolution. Proc. of the VLDB Endowment (PVLDB), 2014, 7(12):1071-1082.

[147] Zhang C, Chen L, Jagadish H, Cao C. Reducing uncertainty of schema matching via crowdsourcing. Proc. of the VLDB Endowment (PVLDB), 2013,6(9):757-768.

[148] Fan J, Lu M, Ooi B, Tan W, Zhang M. A hybrid machine-crowdsourcing system for matching Web tables. In:Proc. of the 30th Int'l Conf. on Data Engineering (ICDE 2014). Chicago, 2014. 976-987.

[149] Marcus A, Karger D, Madden S, Miller R, Oh S. Counting with the crowd. Proc. of the VLDB Endowment (PVLDB), 2012,6(2):109-120.

[150] Trushkowsky B, Kraska T, Franklin MJ, Sarkar P. Crowdsourced enumeration queries. In:Proc. of the 29th Int'l Conf. on Data Engineering (ICDE 2013). Brisbane, 2013. 673-684.

[151] Amsterdamer Y, Grossman Y, Milo T, Senellart P. Crowd mining. In:Proc. of the 2013 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2013). New York, 2013. 241-252.

[152] Gomes R, Welinder P, Krause A, Perona P. Crowdclustering. In:Proc. of the 25th Annual Conf. on Neural Information Processing Systems (NIPS 2011). Granada, 2011. 558-566.

[153] Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R. CrowdDB:Answering queries with crowdsourcing. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. 2011. 61-72.

[154] Park H, Pang R, Parameswaran AG, Garcia-Molina H, Polyzotis N, Widom J. Deco:A system for declarative crowdsourcing. Proc. of the VLDB Endowment, 2012,5(12):1990-1993.

[155] Marcus A, Wu E, Karger DR, Madden S, Miller RC. Demonstration of qurk:A query processor for humanoperators. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. 2011. 1315-1318.

[156] Li GL, Yuan HT, Chai C, et al. CDB:Optimizing queries with crowd-based selections and joins. In:Proc. of the ACM Int'l Conf. on Management of Data. 2017. 1463-1478.

[157] Zheng Y, Li G, Cheng CK. DOCS:Domain-Aware crowdsourcing system. Proc. of the VLDB Endowment, 2016,10(4):361-372.

[158] Chen Z, Fu R, Zhao Z, et al. gMission:A general spatial crowdsourcing platform. Proc. of the VLDB Endowment, 2014,7(13):1629-1632.

[159] Li G, Zheng Y, Fan J, Wang J, Cheng R. Crowdsourced data management:Overview and challenges. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. ACM Press, 2017. 1711-1716.

[160] Balkesen C, Alonso G, Teubner J, Özsu MT. Multi-Core, main-memory joins:Sort vs. Hash revisited. PVLDB, 2013,7(1):85-96.

[161] Stehle E, Jacobsen HA. A memory bandwidth-efficient hybrid radix sort on GPUs. In:Proc. of the SIGMOD Conf. 2017. 417-432.

[162] Kara K, Giceva J, Alonso G. FPGA-Based data partitioning. In:Proc. of the SIGMOD Conf. 2017. 433-445.

[163] Shahvarani A, Jacobsen HA. A hybrid B+-tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms. In:Proc. of the SIGMOD Conf. 2016. 1523-1538.

[164] Barthels C, Loesing S, Alonso G, Kossmann D. Rack-Scale in-memory join processing using RDMA. In:Proc. of the SIGMOD Conf. 2015. 1463-1475.

[165] Yoon DY, Chowdhury M, Mozafari B. Distributed lock management with RDMA:Decentralization without starvation. In:Proc. of the SIGMOD Conf. 2018. 1571-1586.

[166] van Renen A, Leis V, Kemper A, Neumann T, Hashida T, Oe K, Doi Y, Harada L, Sato M. Managing non-volatile memory in database systems. In:Proc. of the SIGMOD Conf. 2018. 1541-1555.

[167] Wang TZ, Johnson R. Scalable logging through emerging non-volatile memory. PVLDB, 2014,7(10):865-876.

[168] Larson PÅ, Levandoski JJ. Modern main-memory database systems. PVLDB, 2016,9(13):1609-1610.

[169] Diaconu C, Freedman C, Ismert E, Larson PÅ, Mittal P, Stonecipher R, Verma N, Zwilling M. Hekaton:SQL server's memory-optimized OLTP engine. In:Proc. of the SIGMOD Conf. 2013. 1243-1254.

[170] Rao J, Ross KA. Making B+-trees cache conscious in main memory. In:Proc. of the SIGMOD Conf. 2000. 475-486.

[171] Levandoski JJ, Lomet DB, Sengupta S. The Bw-tree:A B-tree for new hardware platforms. In:Proc. of the ICDE. 2013. 302-313.

[172] Larson PÅ, Blanas S, Diaconu C, Freedman C, Patel JM, Zwilling M. High-Performance concurrency control mechanisms for main-memory databases. PVLDB, 2011,5(4):298-309.

[173] Ren K, Thomson A, Abadi DJ. Lightweight locking for main memory database systems. PVLDB, 2012,6(2):145-156.

[174] Chaudhuri S, Ding BL, Kandula S. Approximate query processing:No silver bullet. In:Proc. of the SIGMOD Conf. 2017. 511-519.

[175] Garofalakis MN, Gibbons PB. Approximate query processing:Taming the TeraBytes. In:Proc. of the VLDB. 2001.

[176] Li FF, Wu B, Yi K, Zhao ZY. Wander join:Online aggregation via random walks. In:Proc. of the SIGMOD Conf. 2016. 615-629.

[177] Agarwal S, Panda A, Mozafari B, Iyer AP, Madden S, Stoica I. Blink and it's done:Interactive queries on very large data. PVLDB, 2012,5(12):1902-1905.

[178] Ren L, Du Y, Ma S, Zhang XL, Dai GZ. Visual analytics towards big data. Ruan Jian Xue Bao/Journal of Software, 2014,25(9):1909-1936(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4645.htm[doi:10.13328/j.cnki.jos.004645]

[179] Herman I, Melancon G, Marshall MS. Graph visualization and navigation in information visualization:A survey. IEEE Trans. on Visualization and Computer Graphics, 2000,6(1):24

[180] Tobler W. Experiments in migration mapping by computer. The American Cartographer, 1987,14(2):155-163.

[181] Keim DA, Kriegel HP. Visualization techniques for mining large databases:A comparison. IEEE Trans. on Knowledge and Data Engineering, 1996,8(6):923-938.

附中文参考文献

[113] 童咏昕,袁野,成雨蓉,陈雷,王国仁.时空众包数据管理技术研究综述.软件学报,2017,28(1):35-58. http://www.jos.org.cn/1000-9825/5140.htm[doi:10.13328/j.cnki.jos.005140]

[116] 冯剑红,李国良,冯建华.众包技术研究综述.计算机学报,2015,38(9):1713-1726.

[178] 任磊,杜一,马帅,张小龙,戴国忠.大数据可视分析综述.软件学报,2014,25(9):1909-1936. http://www.jos.org.cn/1000-9825/4645.htm[doi:10.13328/j.cnki.jos.004645]

引用本文

崔斌,高军,童咏昕,许建秋,张东祥,邹磊.新型数据管理系统研究进展与趋势.软件学报,2019,30(1):164-193

复制

文章指标

点击次数:6627
下载次数: 15690
HTML阅读次数: 4472
引用次数: 0

历史

收稿日期:2018-07-03
最后修改日期:2018-08-21
录用日期:
在线发布日期: 2018-11-23
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码