支持大数据管理的NoSQL系统研究综述
作者:
基金项目:

国家重点基础研究发展计划(973)(2012CB316201); 国家自然科学基金(61033007, 61003060)


Survey on NoSQL for Management of Big Data
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [82]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    针对大数据管理的新需求,呈现出了许多面向特定应用的NoSQL 数据库系统.针对基于key-value 数据模型的NoSQL 数据库的相关研究进行综述.首先,介绍了大数据的特点以及支持大数据管理系统面临的关键技术问题;然后,介绍了相关前沿研究和研究挑战,其中典型的包括系统体系结构、数据模型、访问方式、索引技术、事务特性、系统弹性、动态负载均衡、副本策略、数据一致性策略、基于flash 的多级缓存机制、基于MapReduce 的数据处理策略和新一代数据管理系统等;最后给出了研究展望.

    Abstract:

    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.

    参考文献
    [1] Big data. 2011. http://en.wikipedia.org/wiki/Big_data
    [2] Zhou XF, Lu JH, Li CP, Du XY. The challenges of big data from the perspective of data management. Communications of theChina Computer Federation, 2012,8(9):16-21 (in Chinese).
    [3] Li GJ. The scientific value of big data research. Communications of the China Computer Federation, 2012,8(9):8-15 (in Chinese).
    [4] The Internet Analysis Salon. Big data is coming. 2011 (in Chinese). http://www.techxue.com/portal.php?mod=view&aid=55
    [5] Informatica. Big data unleased. 2011. http://www.informatica.com/downloads/1601_big_data_wp.pdf
    [6] Ma S, Li JX, Hu CM. The challenge and thinking of big data science and engineering. Communications of the China ComputerFederation, 2012,8(9):22-28 (in Chinese).
    [7] Rys M. scalable SQL. Communications of the ACM, 2011,54(6):48-53. [doi: 10.1145/1953122.1953141]
    [8] NoSQL. 2011. http://zh.wikipedia.org/wiki/NoSQL
    [9] NoSQL. 2011. http://nosql-databases.org/
    [10] Campbell DG, Kakivaya G, Ellis N. Extreme scale with full SQL language support in microsoft SQL azure. In: Proc. of theSIGMOD. New York: ACM Press, 2010. 1021-1024. [doi: 10.1145/1807167.1807280]
    [11] Brantner M, Florescu D, Graf D, Kossmann D, Kraska T. Building a database on S3. In: Proc. of the SIGMOD. New York: ACMPress, 2008. 251-264. [doi: 10.1145/1376616.1376645]
    [12] Curino C, Jones EPC, Popa RA, Malviya N, Wu E, Madden S, Balakrishnan H, Zeldovich N. Relational cloud: A database as aservice for the cloud. In: Proc. of the CIDR. 2011. 235-240.
    [13] NOSQL. 2009. http://nosql.eventbrite.com/
    [14] Nosqleast conference. 2010. https://nosqleast.com/2009/
    [15] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Grubwr RE. Bigtable: A distributedstorage system for structured data. In: Proc. of the OSDI. New York: ACM Press, 2006.
    [16] Borthaku D. The hadoop distributed file system: Architecture and design. 2009. http://hadoop.apache.org/common/docs/r0.18.0/hdfs_design.pdf
    [17] Hbase Development Team. Hbase: Bigtable-like structured storage for hadoop HDFS. 2009. http://wiki.apache.org/hadoop/Hbase
    [18] DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo:Amazon’s highly available key-value store. In: Proc. of the SOSP. New York: ACM Press, 2001. 205-220. [doi: 10.1145/1294261.1294281]
    [19] Lakshman A, Malik P. Cassandra—A decentralized structured storage system. 2009. http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
    [20] The Document Collection of Tokyocabinet/Tokyotyrant. 2010 (in Chinese). http://www.162cm.com/p/tokyotyrant.html#toc5
    [21] Apache CouchDB: The apache CouchDB project. 2010. http://couchdb.apache.org/
    [22] MongoDB. 2010. http://www.mongodb.org
    [23] Redis. 2010. http://redis.io/
    [24] Gilbert S, Lynch N. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM SIGACTNews, 2002,33(2):51-59. [doi: 10.1145/564585.564601]
    [25] Pritchett D. BASE: An acid alternative. 2008. http://queue.acm.org/detail.cfm?id=1394128
    [26] Agrawal R, Ailamaki A, Bernstein P A, Brewer E A, Carey M J, Chaudhuri S, Doan A, Florescu D, Franklin M J, Garcia-Molina H,Gehrke J, Gruenwald L, Hass L M, Halevy A, Hellerstein J M, Ioannidis Y E, Korth H K, Kossmann D, Madden S, Magoulas R,Ooi B C, O’Reilly T, Ramakrishnan R, Sarawagi S, Stonebraker M, Szalay A S, Weikum G. The claremont report on databaseresearch. Communications of the ACM, 2009,52(6):56-65. [doi: 10.1145/1516046.1516062]
    [27] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the OSDI. New York: ACM Press,2004. 1-13.
    [28] Stonebraker M, Cattell R. 10 rules for scalable performance in ‘simple operation’ datastores. Communications of the ACM, 2011,54(6):72-80. [doi: 10.1145/1953122.1953144]
    [29] Baker J, Bond C, Corbett JC, Furman JJ, Khorlin A, Larson J, Léon JM, Li YW, Lloyd A, Yushprakh V. Megastore: Providingscalable, highly available storage for interactive services. In: Proc. of the CIDR. 2011. 223-234.
    [30] Das S, Agrawal D, Abbadi AE. G-Store: A scalable data store for transactional multi-key access in the cloud. In: Proc. of the SoCC.New York: ACM Press, 2010. [doi: 10.1145/1807128.1807157]
    [31] Vo HT, Chen C, Ooi BC. Towards elastic transactional cloud storage with range query support. Proc. of the VLDB Endowment,2010,3(1-2):506-514.
    [32] Wang S, Sa SX. Introduction to Database System. 4th ed., Beijing: Higher Education Press, 2006 (in Chinese).
    [33] Yang F, Shanmugasundaram J, Yerneni R. A scalable data platform for a large number of small applications. In: Proc. of the CIDR.2009.
    [34] Petersen K, Spreitzer MJ, Terry DB, Theimer MM, Demers AJ. Flexible update propagation for weakly consistent replication. InProc. of the SOSP. New York: ACM Press, 1997. 288-301. [doi: 10.1145/268998.266711]
    [35] Vogels W. Eventually consistent. Communications of the ACM, 2009,52(1):40-44. [doi: 10.1145/1435417.1435432]
    [36] Lamport L. The part-time parliament. ACM Trans. on Computer Systems, 1998,16(2):133-169. [doi: 10.1145/279227.279229]
    [37] Levandoski JJ, Lomet D, Mokbel MF, Zhao KK. Deuteronomy: Transaction support for cloud data. In: Proc. of the CIDR. 2011.123-133.
    [38] Curino C, Jones E, Zhang Y, Wu E, Madden S. Relational cloud: The case for a database service. Technical Report, MIT-CSAILTR-2010-014, Massachusetts Institute of Technology, 2010.
    [39] Das S, Agarwal D, Abbadi AE. ElasTraS: An elastic, scalable, and self managing transactional database for the cloud. TechnicalReport, 2010-4, Santa Barbara: University of California, 2010.
    [40] Das S, Agrawal D, Abbadi AE. ElasTraS: An elastic transactional data store in the cloud. In: Proc. of the USENIX HotCloudWorkshop. 2009.
    [41] Wei Z, Pierre G, Chi CH. CloudTPS: Scalable transactions for Web applications in the cloud. Technical Report, IR-CS-053,Amsterdam: Vrije Universiteit, 2010.
    [42] Tatemura J, Po O, Hsiung WP, Hacigümüs H. Partiqle: An elastic SQL engine over key-value stores. In: Proc. of the SIGMOD.New York: ACM Press, 2012. 629-632.
    [43] Konstantinou I, Angelou E, Tsoumakos D. TIRAMOLA: Elastic NoSQL provisioning through a cloud management platform. In:Proc. of the SIGMOD. New York: ACM Press, 2012. 725-728. [doi: 10.1145/2213836.2213943]
    [44] Okcan A, Riedewald M. Processing Theta-joins using MapReduce. In: Proc. of the SIGMOD. New York: ACM Press, 2011.949-960. [doi: 10.1145/1989323.1989423]
    [45] Nehme R, Bruno N. Automated partitioning design in parallel database systems. In: Proc. of the SIGMOD. New York: ACM Press,2011. 1137-1148. [doi: 10.1145/1989323.1989444]
    [46] Das S, Nishimura S, Agrawal D, Abbadi AE. Live database migration for elasticity in a multitenant database for cloud platforms.Technical Report, 2010-09, Department of Computer Science, University of California at Santa Barbara, 2010.
    [47] Elmore AJ, Das S, Agrawal D, Abbadi AE. Zephyr: Live migration in shared nothing databases for elastic cloud platforms. In: Proc.of the SIGMOD. New York: ACM Press, 2011. 301-312. [doi: 10.1145/1989323.1989356]
    [48] Kraska T, Hentschel M, Alonso G, Kossmann D. Consistency rationing in the cloud: Pay only when it matters. Proc. of the VLDBEndowment, 2009,2(1):253-264.
    [49] Gao L, Dahlin M, Nayate A, Zheng JD, Lyengar A. Application specific data replication for edge services. In: Proc. of the WWW.New York: ACM, 2003. 449-460. [doi: 10.1145/775152.775217]
    [50] Lu Y, Lu Y, Jiang H. Adaptive consistency guarantees for large-scale replicated services. In: Proc. of the NAS. IEEE, 2008. 89-96.[doi: 10.1109/NAS.2008.64]
    [51] Yu H, Vahdat A. Design and evaluation of a continuous consistency model for replicated services. In: Proc. of the OSDI. NewYork: ACM Press, 2000. 305-318.
    [52] Debnath B, Sengupta S, Li J. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In: Proc. of the SIGMOD.New York: ACM, 2011. 25-36. [doi: 10.1145/1989323.1989327]
    [53] Anand A, Muthukrishnan C, Kappes S, Akella A, Nath S. Cheap and large CAMs for high performance data-intensive networkedsystems. In: Proc. of the NSDI. 2010. 29.
    [54] Andersen1 DG, Franklin J, Kaminsky M, Phanishayee A, Tan L, Vasudevan V. FAWN: A fast array of wimpy nodes. In: Proc. ofthe SOSP. New York: ACM Press, 2009. [doi: 10.1145/1629575.1629577]
    [55] Debnath B, Sengupta S, Li J. ChunkStash: Speeding up inline storage deduplication using flash memory. In: Proc. of the USENIX.2010.
    [56] Debnath B, Sengupta S, Li J. FlashStore: High throughput persistent key-value store. In: Proc. of the VLDB. Morgan Kaufmann,ACM, 2010. 1414-1425.
    [57] Nykiel T, ?は嵴?扭物?孳?き崬??桩敳湨?????栠敋湯?????椠慇測朠??坵??佳漠楎?????器潡??吺??坨畡?卩??堠畡?兲兯??倠牭潵癬楴摩楰湬来?獱捵慥汲慩扥汳攠?摮愠瑍慡扰慒獥敤?獣敥爮瘠楉据攺猠?潲湯?琮栠敯?挠汴潨略摖???渮??偯牲潧捡??潋晡?瑦桭敡坮?匬????攬椠搲攰氱戰攮爠朴??匭瀵爰椵渮朼敢牲?噛攵爸汝愠杄???ひ?っ????ㄠ???扡牮?嬭??嵩?嘠潊杁攬氠獊?坮???瘠敁測琠畋慡汲汧祩?挠潙測猠楓獥瑴整湹琠??㈠こっ???栠瑊琮瀠???睯睯睰?愫氺氠瑍桡楫湩杮獧搠楡猠瑹牥楬扬畯瑷攠摥?捥潰浨??ぴ????㈠?敩癫敥渠瑡甠慣汨汥祥彴捡潨渠猨楷獩瑴敨湯瑵?桩瑴洠汥?扥牮?孮??嵩?剩慮潧????卮栺攠歐楲瑯慣?????呴慨瑥愠?卌??售猠楍湯杲?偡慮砠潋獡?瑦潭?扮畮椬氠摁?慍?猠挲愰氱愰戮氠攵??挭漵渲猷椮猼瑢敲渾瑛??慝渠摌?桮椠杙栬氠祁?慲癡慷楡汬愠扄氬攠?摨慥瑮愠獃琮漠牌敬???渺??健牶潥捲??潩普?琠档敯?噵??????ぴ????????????扡牬?孢??崠??畩牮爠潰睲獯????呩桮敧?捩桮甠扴扨祥?汍潡捰歒?獤敵牣癥椠捦敲?晭潥牷?汲潫漮猠敉汮示?捲潯畣瀮氠敯摦?摴楨獥琠牓楉扇畍瑏敄搮?獎祥獷琠教浯???渠??偍爠潐捲??潳昬?琲栰攱?伮匠????代??????????????戱爴?嬯??崸?刳攲攳搮?????甲渴煝甼敢楲爾慛??偝????獴楨浡火汵敲?瑄漬琠慓污汲祭?漠牊摓攬爠敇摲?批爠潊愮搠捁慰獡瑣?灥爠潨瑡潤捯潯汰???湥??偲牥潡捬??潭晥?瑡桴攠?坡潣牥止獯桯潫瀮?潉湮?????卣???て????ㄠ????孏摄漮椠????????????????????㈠????崮?戰爷?嬭??崸‰嘮椠杛晤畯獩猺漠渱‰央???栵漯挱欹永改爳′????永漹甴搳猸?愼瑢?琾桛收?捝爠潓獩獬牢潥慲摳獴?物敮猠敁愬爠捓桥?灲敳爠獒瀬攠捚瑨楯癵攠獗??匠灃牯楯湰来??自う???ㄠ?????ㄠは?ㄠ???孔摓漺椠???づ????????????づ?????ㄠ??嵯?扤爠?孡??嵨??楮湤?婳奥???慮楧?女塹???業湳????堺楐敲?奣??婯潦甠?全??剓敉獇敍慏牄挮栠?潥湷??汯潲畫携??慃瑍愠扐慲獥敳獳??刲田愱渱??椱愱渰?堭由攱??愮漠??潯畩爺渠愱氰?漱昱?匵漯昱琹眸愹爳攲??㈱??代?名??????????ㄠ?????楡湮??桄椬渠敋獲敡?睫楡琠桔??湌杯汥楳獩桮?愠打猬琠牍慥捲瑫???桓琬琠灍???睡睬眠?樬漠獐?潡牦杦?捡湵??ひ?う???????????栠瑭浯?孵摬潡楲???は??????卲偡????べび????ㄠ???????嵣?戠牯?嬠??嵥?坌慄湂朮?奍???卡畮渠?坡???婡桮潮甬?十??倬攠椲‰報儰???椵″堳夭?‵?收礮?呢敲挾桛渶漳汝漠杏椙敲獥?潬晬?搮椠獃瑡牳楳扡畮瑤敲摡?猠瑔潨牥愠杤敥?晩潮物?捩汶潥甠摇?捩潤浥瀮甠琲椰渱朰??剨畴慴湰??椯慩湳?塡畲敥??慡潳??潳畩牮湡愮汣?潭昮?卮漯晦琯眱愷爳攴??劳????资???????㈠??????楮湧??桗椬渠敏獯敩?睂楃琬栠??渠杋汌椮猠桅?慦扩獣瑩牥慮捴琠???桲瑥瑥瀠???睥睤眠?橮潤獥?潩牮杧?捦湯?ㄠっぬは???????ㄠ???档瑥浳?孩摮潧椮??????????匠偯????づ????あ?㈠?は????嵋aufmann, ACM, 2010.
    [65] Wang JB, Wu S, Gao H, Li JZ, Ooi BC. Indexing multi-dimensional data in a cloud system. In: Proc. of the SIGMOD. New York:ACM Press, 2010. 591-602. [doi: 10.1145/1807167.1807232]
    [66] Tsatsanifos G, Sacharidis D, Sellis T. MIDAS: Multi-Attribute indexing for distributed architecture systems. In: Proc. of the SSTD.Heidelberg: Springer-Verlag, 2011. 168-185. [doi: 10.1007/978-3-642-22922-0_11]
    [67] Escriva R, Wong B, Gün Sirer EG. HyperDex: A distributed, searchable key-value store. In: Proc. of the SIGCOMM. 2012. 1-12.[doi: 10.1145/2377677.2377681]
    [68] Meng BP, Wang TJ, Li HY, Yang DQ. Regional bitmap index: A secondary index for data management in cloud computingenvironment. Chinese Journal of Computers, 2012,35(11): 2306-2316 (in Chinese with English abstract).
    [69] Aguilera MK, Golab W, Shah MA. A practical scalable distributed B-tree. In: Proc. of the VLDB. Morgan Kaufmann, ACM, 2008.
    [70] Zhang XY, Ai J, Wang ZY, Lu JH, Meng XF. An efficient multi-dimensional index for cloud data management. In: Proc. of theCloudDB. New York: ACM Press, 2009. 17-24. [doi: 10.1145/1651263.1651267]
    [71] Cooper BF, Ramakrishnan R, Srivastava U, Silberstein A,Bohannon P, Jacobsen HA, Puz N, Weaver D, Yerneni R. PNUTS:Yahoo!’s hosted data serving platform. Proc. of the VLDB Endowment, 2008,1(2):1277-1288.
    [72] Amer-Yahia S, Halevy A, Alonso G, Kossmann D, Markl V, Doan AH, Weikum G. Databases and Web 2.0 panel at VLDB 2007.SIGMOD Record, 2008,37(1):49-52. [doi: 10.1145/1374780.1374794]
    [73] Bernstein PA, Cseri I, Dani N, Ellis N, Kalhan A, Kakivaya G, Lomet DB, Manne R, Novik L, Talius T. Adapting Microsoft SQLserver for cloud computing. In: Proc. of the ICDE. IEEE, 2011. 1255-1263. [doi: 10.1109/ICDE.2011.5767935]
    [74] Gray J, Lamport L. Consensus on transaction commit. ACM Trans. on Database System, 2006,31(1):133-160. [doi: 10.1145/1132863.1132867]
    [75] Aguilera MK, Merchant A, Shah M, Veitch A, Karamanolis C. Sinfonia: A new paradigm for building scalable distributed systems.In: Proc. of the SOSP. New York: ACM Press, 2007. [doi: 10.1145/1294261.1294278]
    [76] Thomson A, Diamond T, Weng SC. Calvin: Fast distributed transactions for partitioned database systems. In: Proc. of theSIGMOD. New York: ACM Press, 2012. [doi: 10.1145/2213836.2213838]
    [77] Das S, Nishimura S, Agrawal D, Abbadi AE. Albatross: Lightweight elasticity in shared storage databases for the cloud using livedata migration. In: Proc. of the VLDB. Morgan Kaufmann, ACM, 2011. 494-505.
    [78] Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A. Live migration of virtual machines. In: Proc. of theNSDI. 2005. 273-286.
    [79] Liu HK, Jin H, Liao XF, Hu LT, Yu C. Live migration of virtual machine based on full system trace and replay. In: Proc. of theHPDC. 2009. 101-110. [doi: 10.1145/1551609.1551630]
    [80] De Witt D, Gray J. Parallel database systems: The future of high performance database systems. Communications of the ACM,1992,35(6):85-98. [doi: 10.1145/129888.129894]
    [81] Ganguly S, Goel A, Silberschatz A. Efficient and accurate cost models for parallel query optimization. In: Proc. of the PODS. NewYork: ACM Press, 1996. 172-181. [doi: 10.1145/237661.237707]
    [82] Isard M, Budiu M, Yu Y. Dryad: Distributed data-parallel programs from sequential building blocks. In: Proc. of the EuroSys. NewYork: ACM Press, 2007. 59-72. [doi: 10.1145/1272996.1273005]
    [83] Yang HC, Dasdn A, Hsiao RL, Parker DS. Map-Reduce-Merge: Simplified relational data processing on large clusters. In: Proc. ofthe SIGMOD. New York: ACM Press, 2007. [doi: 10.1145/1247480.1247602]
    [84] Chaiken R, Jenkins B, Larson PÅ, Ramsey B, Shakib D, Weaver S, Zhou JR. Scope: Easy and efficient parallel processing ofmassive data sets. In: Proc. of the VLDB. New York: ACM Press, 2008. 1265-1276.
    [85] Curino C, Jones E, Zhang Y, Madden S. Schism: A workload-driven approach to database replication and partitioning. Proc. of theVLDB Endowment, 2010,3(1-2):48-57.
    [86] Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on ScientificComputing, 1998,20(1):359-392. [doi: 10.1137/S1064827595287997]
    [87] Ghandeharizadeh S, De Witt DJ. Hybrid-Range partitioning strategy: A new declustering strategy for multiprocessor databasemachines. In: Proc. of the VLDB. 1990. 481-492.
    [88] Gufler B, Augsten N, Reiser A, Kemper A. Load balancing in MapReduce based on scalable cardinality estimates. In: Proc. of theICDE. IEEE, 2012. [doi: 10.1109/ICDE.2012.58]
    [89] Kwon YC, Balazinska M, Howe B, Rolia J. SkewTune: Mitigating skew in MapReduce applications. In: Proc. of the SIGMOD.New York: ACM Press, 2012. 25-36. [doi
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

申德荣,于戈,王习特,聂铁铮,寇月.支持大数据管理的NoSQL系统研究综述.软件学报,2013,24(8):1786-1803

复制
分享
文章指标
  • 点击次数:14212
  • 下载次数: 21585
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2013-03-28
  • 最后修改日期:2012-10-19
  • 在线发布日期: 2013-05-23
文章二维码
您是第19893279位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号