数据管理技术的新格局

doi:10.3724/SP.J.1001.2013.04345

微信服务号

微信订阅号

2025年7月29日 11:59 星期二

首页 > 过刊浏览>2013年第24卷第2期 >175-197. DOI:10.3724/SP.J.1001.2013.04345

PDF HTML阅读 XML下载导出引用引用提醒

数据管理技术的新格局
DOI:
                        10.3724/SP.J.1001.2013.04345
                    
CSTR:
                        
                    
作者:
                        覃雄派覃雄派
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找
王会举王会举
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找
李芙蓉李芙蓉
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找
李翠平李翠平
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找
陈红陈红
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找
周烜周烜
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找
杜小勇杜小勇
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找
王珊王珊
教育部数据工程与知识工程重点实验室(中国人民大学),北京 100872;萨师烜大数据管理与分析研究中心(中澳),北京 100872;中国人民大学 信息学院,北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61070054,60873017,61170013);“核高基”国家科技重大专项(2010ZX01042-001-002,2010ZX01042-002-002-03);EMC中国研究院“EMC全球CTO办公室”资金

New Landscape of Data Management Technologies

Author:

QIN Xiong-Pai
QIN Xiong-Pai
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Hui-Ju
WANG Hui-Ju
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LI Fu-Rong
LI Fu-Rong
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LI Cui-Ping
LI Cui-Ping
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Hong
CHEN Hong
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Xuan
ZHOU Xuan
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
DU Xiao-Yong
DU Xiao-Yong
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Shan
WANG Shan
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China;Sa Shi-Xuan Big Data Management and Analytics Research Center (Sino-Australia), Beijing 100872, China;Information School, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [109]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

数据获取技术的革命性进步、存储器价格的显著下降以及人们希望从数据中获得知识的客观需要等,催生了大数据.数据管理技术迎来了大数据时代.关系数据库技术经历了20世纪70年代以来40年的发展,目前遇到了系统扩展性不足、支持数据类型单一等困难.近年来,noSQL技术异军突起,对多种类型的数据进行有效的管理、处理和分析;通过并行处理技术获得良好的系统性能;并以其高度的扩展性,满足不断增长的数据量的处理要求.试图沿着数据库技术进步的历史脉络,从应用维度(操作型与分析型应用)入手,为读者展开当今数据管理技术的新格局,讨论具有挑战性的重要问题,并介绍作者自己的研究工作.

关键词:关系数据库;noSQL;大数据;操作型;分析型;新格局

Abstract:

The revolutionary progress of data collecting techniques, dramatic decrease of the price of storage devices, as well as the desirability of people to extract information from the data have given birth to the so-called big data and data management technologies usher in the age of big data. RDBMS (relational database management system) undergoes a development of 40 years since the 1970s and now encounters some difficulties such as limited system scalability and limited data variety support. In recent years, noSQL technologies has risen suddenly as a new force. The technologies can manage, process, and analyze various types of data, achieve rather high performance with the help of parallel computing, can handle even bigger volume of data with the nice property of highly scalability. The paper follows the path of database technology progress and unfolds the new landscape of data management technologies from the angle of applications (operational as well as analytic applications). The paper also identifies some chanllenging and important issues that deserve further investigation, with the authors' recent research work introduced at the end.

Key words:RDBMS (relational database management system);noSQL;big data;operational;analytic;new landscape

参考文献

[1] Abadi DJ, Boncz PA, Harizopoulos S. Column-Oriented database systems. VLDB 2009 Tutorial, 2009. http://cs-www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf

[2] Datta A, Thomas H. Querying compressed data in data warehouses. Journal of Information Technology and Management, 2002,3(4):353-386. [doi: 10.1023/A:1019772807859]

[3] Bhuiyan MM, Hoque ASML. High performance SQL queries on compressed relational database. Journal of Computers, 2009,3(12):1263-1274.

[4] O''Connell SJ, Winterbottom N. Performing joins without decompression in a compressed database system. SIGMOD Record,2003,32(1):6-11. [doi: 10.1145/640990.640991]

[5] Olofson C. Feature: The database revolution. 2012. http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue1/FeatureHistory/

[6] Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jones EPC, Madden S, Stonebraker M, Zhang Y, Hugg J, Abadi DJ.H-Store: A high-performance, distributed main memory transaction processing system. Proc. of the VLDB Endowment, 2008,1(2):1496-1499.

[7] Lu HJ, Ng YY, Tian ZP. T-Tree or B-tree: Main memory database index structure revisited. In: Orlowska ME, ed. Proc. of theAustralasian Database Conf. 2000. Canberra: IEEE Computer Society, 2000. 65-73. [doi: 10.1109/ADC.2000.819815]

[8] Shatdal A, Kant C, Naughton JF. Cache conscious algorithms for relational query processing. In: Bocca JB, Jarke M, Zaniolo C,eds. Proc. of the VLDB''94. Chile: Morgan Kaufmann Publishers, 1994. 510-521.

[9] Luan H, Du XY, Wang S. Cache-Conscious data cube computation on a modern processor. Journal of Computer Science andTechnology, 2009,24(4):708-722.

[10] Rao J, Ross KA. Cache conscious indexing for decision-support in main memory. In: Atkinson MP, Orlowska ME, Valduriez P,Zdonik SB, Brodie ML, eds. Proc. of the VLDB''99. Edinburgh: Morgan Kaufmann Publishers, 1999. 78-89.

[11] He BS, Luo Q. Cache-Oblivious query processing. In: Hellerstein J, Stonebraker M, Weikum G, eds. Proc. of the CIDR 2007.Asilomar: CIDR Program Committee, 2007. 44-55.

[12] He BS, Luo Q. Cache-Oblivious databases: Limitations and opportunities. ACM Trans. on Database Systems, 2008,33(2):1-42.

[doi: 10.1145/1366102.1366105]

[13] Bender MA, Farach-Colton M, Fineman JT, Fogel YR, Kuszmaul BC, Nelson J. Cache-Oblivious streaming B-trees. In: GibbonsPB, Scheideler C, eds. Proc. of the SPAA 2007. Munich: ACM Press, 2007. 81-92. [doi: 10.1145/1248377.1248393]

[14] Pandis I, Tozün P, Johnson R, Ailamaki A. PLP: Page latch-free shared-everything OLTP. Proc. of the VLDB Endowment, 2011,4(10):610-621.

[15] Lee RB, Ding XN, Chen F, Lu QD, Zhang XD. MCC-DB: Minimizing cache conflicts in multi-core processors for databases. Proc.of the VLDB Endowment, 2009,2(1):373-384.

[16] Bakkum P, Skadron K. Accelerating SQL database operations on a GPU with CUDA. In: Kaeli DR, Leeser M, eds. Proc. of theGPGPU 2010. Pittsburgh: ACM Int''l Conf. Proc. Series, 2010. 94-103. [doi: 10.1145/1735688.1735706]

[17] Lin CF, Yuan SM. The design and evaluation of GPU based memory database. In: Watada J, Chung PC, Ho KC, eds. Proc. of theInt''l Conf. on Genetic and Evolutionary Computing 2011. Kinmen, Xiamen: IEEE Computer Society, 2011. 224-231. [doi: 10.1109/ICGEC.2011.61]

[18] Govindaraju NK, Gray J, Kumar R, Manocha D. GPUTeraSort: High performance graphics coprocessor sorting for large databasemanagement. In: Chaudhuri S, Hristidis V, Polyzotis N, eds. Proc. of the SIGMOD 2006. Chicago: ACM Press, 2006. 325-336.

[doi: 10.1145/1142473.1142511]

[19] He BS, Yu JX. HighThroughput transaction executions on graphics processors. Proc. of the VLDB Endowment, 2011,4(5):314-325.

[20] Volk PB, Habich D, Lehner W. GPU based speculative query processing for database operations. In: Bordawekar R, Lang CA, eds.Proc. of the VLDB Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures2010. Singapore: VLDB Endowment, 2010. Article No.7.

[21] Lee SW, Moon B. Design of flash-based DBMS: An in-page logging approach. In: Chan CY, Ooi BC, Zhou AY, eds. Proc. of theSIGMOD 2007. Beijing: ACM Press, 2007. 55-66. [doi: 10.1145/1247480.1247488]

[22] Koltsidas I, Viglas SD. Flashing up the storage layer. Proc. of the VLDB Endowment, 2008,1(1):514-525.

[23] Bonnet P, Bouganim L, Koltsidas I, Viglas SD. System co-design and data management for flash devices. VLDB 2011 Tutorial,2011. http://www.vldb.org/pvldb/vol4/p1504-bonnet-tutorial2.pdf

[24] Qin XP, Xiao YQ, Cao W, Wang S. A parallel recovery scheme for update intensive main memory database systems. In: RountreeN, ed. Proc. of the PDCAT 2008. Dunedin: IEEE Computer Society, 2008. 509-516. [doi: 10.1109/PDCAT.2008.69]

[25] Lee J, Kim K, Cha SK. Differential logging: A commutative and associative logging scheme for highly parallel main memorydatabase. In: Georgakopoulos D, Buchmann A, eds. Proc. of the ICDE 2001. Heidelberg: IEEE Computer Society, 2001. 173-182.

[doi: 10.1109/ICDE.2001.914826]

[26] Stonebraker. How to do complex analytics. 2012. http://www.slideshare.net/MassTLC/mike1

[27] Ni MX, Luo WM. Technology revolution in the age of data exploding. CCF Communications, 2011,7(7):12-20 (in Chinese withEnglish abstract).

[28] Brewer EA. Towards robust distributed systems. PODC 2000 Keynote Speech, 2000. http://openstorage.gunadarma.ac.id/～mwiryana/Kuliah/Database/PODC-keynote.pdf

[29] Lynch N, Gilbert S. Brewer''s conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACMSIGACT News, 2002,33(2):51-59. [doi: 10.1145/564585.564601]

[30] FAL Labs. Tokyo cabinet: A modern implementation of DBM. 2012. http://fallabs.com/tokyocabinet/

[31] Citrusbyte. Redis database. 2012. http://redis.io/

[32] Voldemort Team. Voldemort database. 2012. http://project-voldemort.com/

[33] Oracle. Oracle Berkeley DB 11g. 2012. http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html

[34] De Candia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W.Dynamo: Amazon''s highly available key-value store. ACM SIGOPS Operating Systems Review, 2007,41(6):205-220. [doi: 10.1145/1323293.1294281]

[35] Apache Foundation. Apache Cassandra. 2012. http://cassandra.apache.org/

[36] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributedstorage system for structured data. In: Bershad BN, Mogul JC, eds. Proc. of the OSDI 2006. Seattle: USENIX Association, 2006.15-22.

[37] Apache Foundation. Apache HBase. 2012. http://hbase.apache.org/

[38] Borthakur D, Gray J, Sarma JS, Muthukkaruppan K, Spiegelberg N, Kuang HR, Ranganathan K, Molkov D, Menon A, Rash S,Schmidt R, Aiyer A. Apache hadoop goes realtime at Facebook. In: Sellis TK, Miller RJ, Kementsietsidis A, Velegrakis Y, eds.Proc. of the SIGMOD 2011. Athens: ACM Press, 2011. 1071-1080. [doi: 10.1145/1989323.1989438]

[39] Apache Foundation. CouchDB. 2012. http://couchdb.apache.org/

[40] MongoDB Team. MongDB. 2012. http://www.mongodb.org/

[41] Riak Team. Riak database. 2012. http://basho.com/products/riak-overview/

[42] Neo4j Team. Neo4j database. 2012. http://neo4j.org/

[43] NetMesh. InforGrid Web graph database. 2012. http://infogrid.org/trac/

[44] Objectivity. InfiniteGraph database. 2012. http://www.infinitegraph.com/

[45] HyperGraphDB Team. HyperGraphDB. 2012. http://www.kobrix.com/hgdb.jsp

[46] Marcus A. The NoSQL EcoSystem. 2012. http://www.aosabook.org/en/nosql.html

[47] Baker J, Bond C, Corbett JC, Furman JJ, Khorlin A, Larson J, Leon JM, Li YW, Lloyd A, Yushprakh V. Megastore: Providingscalable, highly available storage for interactive services. In: Ailamaki A, Franklin M, Hellerstein J, eds. Proc. of the CIDR 2011.Asilomar: Online Proc., 2011. 223-234.

[48] Das S, Agrawal D, Abbadi AE. G-Store: A scalable data store for transactional multi key access in the cloud. In: Hellerstein JM,Chaudhuri S, Rosenblum M, eds. Proc. of the SOCC 2010. Indianapolis: ACM Press, 2010. 163-174. [doi: 10.1145/1807128.1807157]

[49] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Brewer E, Chen P, eds. Proc. of the OSDI2004. San Francisco: USENIX Association, 2004. 137-150.

[50] Kaldewey T, Shekita EJ, Tata S. Clydesdale: Structured data processing on MapReduce. In: Rundensteiner EA, Markl V,Manolescu I, Amer-Yahia S, Naumann F, Ari I, eds. Proc. of the EDBT 2012. Berlin: ACM Press, 2012. 15-25. [doi: 10.1145/2247596.2247600]

[51] Jindal A, Quiane-Ruiz JA, Dittrich J. Trojan data layouts: Right shoes for a running elephant. In: Chase JS, Abbadi AE, Babu S,Romano P, eds. Proc. of the SOCC. Cascais: ACM Press, 2011. Article No.21. [doi: 10.1145/2038916.2038937]

[52] Eltabakh MY, Tian YY, Ozcan F, Gemulla R, Krettek A, McPherson J. CoHadoop: Flexible data placement and its exploitation inHadoop. Proc. of the VLDB Endowment, 2011,4(9):575-585.

[53] Ananthanarayanan G, Agarwal S, Kandula S, Greenberg A, Stoica I, Harlan D, Harris E. Scarlett: Coping with skewed contentpopularity in mapreduce clusters. In: Kirsch CM, Heiser G, eds. Proc. of the EuroSys 2011. Salzburg: ACM Press, 2011. 287-300.

[doi: 10.1145/1966445.1966472]

[54] Ma Q, Yang B, Qian WN, Zhou AY. Query processing of massive trajectory data based on mapreduce. In: Meng XF, Wang HX,Chen Y, eds. Proc. of the CloudDB 2009. New York: ACM Press, 2009. 9-16. [doi: 10.1145/1651263.1651266]

[55] Chandramouli B, Goldstein J, Duan SY. Temporal analytics on big data for Web advertising. In: Kementsietsidis A, Salles MAV,eds. Proc. of the ICDE 2012. Washington: IEEE Computer Society, 2012. 90-101. [doi: 10.1109/ICDE.2012.55]

[56] White B, Yeh T, Lin J, Davis L. Web-Scale computer vision using MapReduce for multimedia data mining. In: Hua XS, Ngo CW,eds. Proc. of the KDD Workshop on Multimedia Data Mining 2010. New York: ACM Press, 2010. Article No.9. [doi: 10.1145/1814245.1814254]

[57] Neumeyer L, Robbins B, Nair A, Kesari A. S4: Distributed stream computing platform. In: Fan W, Hsu W, Webb GI, Liu B,Zhang CQ, Gunopulos D, Wu XD, eds. Proc. of the ICDM Workshops 2010. Sydney: IEEE Computer Society, 2010. 170-177.

[doi: 10.1109/ICDMW.2010.172]

[58] Chen QM, Hsu M. Continuous mapreduce for In-DB stream analytics. In: Meersman R, Dillon TS, Herrero P, eds. Proc. of theOTM 2010. Hersonissos: Springer-Verlag, 2010. 16-34. [doi: 10.1007/978-3-642-16961-8_9]

[59] Olston C, Chiou G, Chitnis L, Liu F, Han YP, Larsson M, Neumann A, Rao VBN, Sankarasubramanian V, Seth S, Tian C,ZiCornell T, Wang XD. Nova: Continuous Pig/Hadoop workflows. In: Sellis TK, Miller RJ, Kementsietsidis A, Velegrakis Y, eds.Proc. of the SIGMOD 2011. Athens: ACM Press, 2011. 1081-1090. [doi: 10.1145/1989323.1989439]

[60] Bu YY, Howe B, Balazinska M, Ernst MD. HaLoop: Efficient iterative data processing on large clusters. Proc. of the VLDBEndowment, 2010,3(1-2):285-296.

[61] Zhang SB, Han JZ, Liu ZY, Wang K, Feng SZ. Accelerating MapReduce with distributed memory cache. In: Pan Y, ed. Proc. ofthe ICPADS 2009. Shenzhen: IEEE, 2009. 472-478. [doi: 10.1109/ICPADS.2009.88]

[62] Afrati FN, Ullman JD. Optimizing joins in a map-reduce environment. In: Manolescu I, Spaccapietra S, Teubner J, Kitsuregawa M,Léger A, Naumann F, Ailamaki A, ?zcan F, eds. Proc. of the EDBT 2010. Lausanne: ACM Int''l Conf. Proc. Series, 2010. 99-110.

[doi: 10.1145/1739041.1739056]

[63] Afrati FN, Ullman JD. Optimizing multiway joins in a Map-Reduce environment. IEEE Trans. on Knowledge and DataEngineering, 2011,23(9):1282-1298. [doi: 10.1109/TKDE.2011.47]

[64] Afrati FN, Sarma AD, Menestrina D, Parameswaran A, Ullman JD. Fuzzy joins using MapReduce. Technical Report, #1006,Stanford: Stanford University, InfoLab, 2012.

[65] Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian YY. A comparison of join algorithms for log processing in MapReduce.In: Elmagarmid AK, Agrawal D, eds. Proc. of the SIGMOD 2010. Indianapolis: ACM Press, 2010. 975-986. [doi: 10.1145/1807167.1807273]

[66] Okcan A, Riedewald M. Processing theta-joins using MapReduce. In: Sellis TK, Miller RJ, Kementsietsidis A, Velegrakis Y, eds.Proc. of the SIGMOD 2011. Athens: ACM Press, 2011. 949-960. [doi: 10.1145/1989323.1989423]

[67] Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare: Sharing across multiple queries in MapReduce. Proc. of theVLDB Endowment, 2010,3(1-2):494-505.

[68] Chu CT, Kim SK, Lin YA, Yu YY, Bradski G, Ng AY, Olukotun K. Map-Reduce for machine learning on multicore. In:Sch?lkopf B, Platt JC, Hoffman T, eds. Proc. of the NIPS 2006. Vancouver: MIT Press, 2006. 281-288.

[69] Chen R, Chen HB, Zang BY. Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling.In: Salapura V, Gschwind M, Knoop J, eds. Proc. of the PACT 2010. Vienna: ACM Press, 2010. 523-534. [doi: 10.1145/1854273.1854337]

[70] He BS, Fang WB, Govindaraju NK, Luo Q, Wang TY. Mars: A MapReduce framework on graphics processors. In: Moshovos A,Tarditi D, Olukotun K, eds. Proc. of the PACT 2008. Toronto: ACM Press, 2008. 260-269. [doi: 10.1145/1454115.1454152]

[71] Hong CT, Chen DH, Chen WG, Zheng WM, Lin HB. MapCG: Writing parallel program portable between CPU and GPU. In:Salapura V, Gschwind M, Knoop J, eds. Proc. of the PACT 2010. Vienna: ACM Press, 2010. 217-226. [doi: 10.1145/1854273.1854303]

[72] Polo J, Carrera D, Becerra Y, Beltran V, Torres J, Ayguade E. Performance management of accelerated MapReduce workloads inheterogeneous clusters. In: Qin F, ed. Proc. of the ICPP 2010. San Diego: IEEE Computer Society, 2010. 653-662. [doi: 10.1109/ICPP.2010.73]

[73] Papagiannis A, Nikolopoulos DS. Rearchitecting MapReduce for heterogeneous multicore processors with explicitly managedmemories. In: Qin F, ed. Proc. of the ICPP 2010. San Diego: IEEE Computer Society, 2010. 121-130. [doi: 10.1109/ICPP.2010.21]

[74] You HH, Yang CC, Huang JL. A load-aware scheduler for MapReduce framework in heterogeneous cloud environments. In: ChuWC, Wong WE, Palakal MJ, Hung CC, eds. Proc. of the SAC 2011. Taichung: ACM Press, 2011. 127-132. [doi: 10.1145/1982185.1982218]

[75] Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: A simple technique for achievinglocality and fairness in cluster scheduling. In: Morin C, Muller G, eds. Proc. of the EuroSYS 2010. Paris: ACM Press, 2010.265-278.

[76] Chattopadhyay B, Lin L, Liu WR, Mittal S, Aragonda P, Lychagina V, Kwon YH, Wong M. Tenzing a SQL implementation onthe MapReduce framework. Proc. of the VLDB Endowment, 2011,4(4):1318-1327.

[77] Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive a warehousing solution over aMapReduce framework. Proc. of the VLDB Endowment, 2009,2(2):938-941.

[78] Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: A not-so-foreign language for data processing. In: Wang JTL, ed.Proc. of the SIGMOD 2008. Vancouver: ACM Press, 2008. 1099-1110. [doi: 10.1145/1376616.1376726]

[79] Ghoting A, Krishnamurthy R, Pednault E, Reinwald B, Sindhwani V, Tatikonda S, Tian YY, Vaithyanathan S. SystemML:Declarative machine learning on MapReduce. In: Abiteboul S, B?hm K, Koch C, Tan KL, eds. Proc. of the ICDE 2011. Hannover:IEEE Computer Society, 2011. 231-242. [doi: 10.1109/ICDE.2011.5767930]

[80] Apache Foundation. Mahout. 2012. http://mahout.apache.org/

[81] Roy I, Ramadan HE, Setty STV, Kilzer A, Shmatikov V, Witchel E. Airavat: Security and privacy for MapReduce. In: Castro M,Snoeren AC, eds. Proc. of the NSDI 2010. San Jose: USENIX Association, 2010. 297-312.

[82] Lang W, Patel JM. Energy management for MapReduce clusters. Proc. of the VLDB Endowment, 2010,3(1-2):129-139.

[83] Qin XP, Wang HJ, Du XY, Wang S. Big data analysis—Competition and symbiosis of RDBMS and MapReduce. RuanjianXuebao/Journal of Software, 2012,23(1):32-45 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4091.htm

[doi: 10.3724/SP.J.1001.2012.04091]

[84] Lee KH, Lee YJ, Choi H, Chung YD, Moon BK. Parallel data processing with MapReduce: A survey. SIGMOD Record, 2011,40(4):11-20. [doi: 10.1145/2094114.2094118]

[85] He YQ, Lee RB, Huai Y, Shao Z, Jain N, Zhang XD, Xu ZW. RCFile: A fast and space-efficient data placement structure inMapReduce-based warehouse systems. In: Abiteboul S, B?hm K, Koch C, Tan KL, eds. Proc. of the ICDE 2011. Hannover: IEEEComputer Society, 2011. 1199-1208. [doi: 10.1109/ICDE.2011.5767933]

[86] Floratou A, Patel JM, Shekita EJ, Tata S. Column-Oriented storage techniques for MapReduce. Proc. of the VLDB Endowment,2011,4(7):419-429.

[87] Li BD, Mazur E, Diao YL, McGregor A, Shenoy P. A platform for scalable one-pass analytics using MapReduce. In: Sellis TK,Miller RJ, Kementsietsidis A, Velegrakis Y, eds. Proc. of the SIGMOD 2011. Athens: ACM Press, 2011. 985-996. [doi: 10.1145/1989323.1989426]

[88] Abelló A, Ferrarons J, Romero O. Building cubes with MapReduce. In: Song IY, Ordonez C, eds. Proc. of the DOLAP 2010. NewYork: ACM Press, 2010. 17-24. [doi: 10.1145/2064676.2064680]

[89] Bose JH, Andrzejak A, Hogqvist M. Beyond online aggregation: Parallel and incremental data mining with online Map-Reduce. In:Nambiar U, McPherson J, Konopnicki D, eds. Proc. of the WWW Workshop on Massive Data Analytics on the Cloud 2010. NewYork: ACM Press, 2010. Article No.3. [doi: 10.1145/1779599.1779602]

[90] Omalley O. See what Yahoo! and Jeopardy! have in common. 2012. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/i’ll-take-hadoop-for-400-alex/

[91] Zhang CJ, Ma Q, Wang XL, Zhou AY. Distributed SLCA-based XML keyword search by Map-Reduce. In: Yoshikawa M, MengXF, Yumoto T, Ma Q, Sun LF, Watanabe C, eds. Proc. of the DASFAA 2010. Tsukuba: Springer-Verlag, 2010. 386-397. [doi:10.1007/978-3-642-14589-6_40]

[92] Lin J, Dyer C. Data-Intensive Text Processing with MapReduce. San Rafael: Morgan and Claypool Publishers, 2010. 40-56.

[93] Wiley K, Connolly A, Gardner JP, Krughof S, Balazinska M, Howe B, Kwon YC, Bu YY. Astronomy in the cloud: UsingMapReduce for image coaddition. Publications of the Astronomical Society of the Pacific, 2011,123(901):366-380. [doi: 10.1086/658877]

[94] Bahmani B, Kumar R, Vassilvitskii S. Densest subgraph in streaming and MapReduce. Proc. of the VLDB Endowment, 2012,5(5):454-465.

[95] Gray J, Liu DT, Nieto-Santisteban M, Szalay A, DeWitt DJ, Heber G. Scientific data management in the coming decade.SIGMOD Record, 2005,34(4):34-41. [doi: 10.1145/1107499.1107503]

[96] Xu Y, Kostamaa P, Gao LK. Integrating Hadoop and parallel DBMS. In: Elmagarmid AK, Agrawal D, eds. Proc. of the SIGMOD2010. Indianapolis: ACM Press, 2010. 969-974. [doi: 10.1145/1807167.1807272]

[97] Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A. HadoopDB: An architectural hybrid of MapReduce andDBMS technologies for analytical workloads. Proc. of the VLDB Endowment, 2009,2(1):922-933.

[98] Abouzied A, Bajda-Pawlikowski K, Huang JW, Abadi DJ, Silberschatz A. HadoopDB in action: Building real world applications.In: Elmagarmid AK, Agrawal D, eds. Proc. of the SIGMOD 2010. Indianapolis: ACM Press, 2010. 1111-1114. [doi: 10.1145/1807167.1807294]

[99] Bajda-Pawlikowski K, Abadi DJ, Silberschatz A, Paulson E. Efficient processing of data warehousing queries in a split executionenvironment. In: Sellis TK, Miller RJ, Kementsietsidis A, Velegrakis Y, eds. Proc. of the SIGMOD 2011. Athens: ACM Press,2011. 1165-1176. [doi: 10.1145/1989323.1989447]

[100] Pavlo A, Curino C, Zdonik S. Skew-Aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: CandanKS, Chen Y, Snodgrass RT, Gravano L, Fuxman A, eds. Proc. of the SIMGOD 2012. Scottsdale: ACM Press, 2012. 61-72. [doi:10.1145/2213836.2213844]

[101] Cao Y, Chen C, Guo F, Jiang DW, Lin YT, Ooi BC, Vo HT, Wu S, Xu QQ. ES2: A cloud data storage system for supporting bothOLTP and OLAP. In: Abiteboul S, B?hm K, Koch C, Tan KL, eds. Proc. of the ICDE 2011. Hannover: IEEE Computer Society,2011. 291-302. [doi: 10.1109/ICDE.2011.5767881]

[102] Kemper A, Neumann T. HyPer: A hybrid OLTP

引用本文

覃雄派,王会举,李芙蓉,李翠平,陈红,周烜,杜小勇,王珊.数据管理技术的新格局.软件学报,2013,24(2):175-197

复制

文章指标

点击次数:10365
下载次数: 19058
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2012-06-12
最后修改日期:2012-10-16
录用日期:
在线发布日期: 2013-02-02
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码