分布式数据库中一致性与可用性的关系
作者:
基金项目:

国家高技术研究发展计划(863)(2015AA015307);国家自然科学基金(61772202)


Consistency and Availability in Distributed Database Systems
Author:
Fund Project:

National High Technology Research and Development Program of China (863) (2015AA015307); National Natural Science Foundation of China (61772202)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [60]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着各类应用在数据量和业务量上的扩展,单机数据库系统越发难以应对现实需求.分布式数据库能够根据业务的需求动态地扩容,因此逐步开始受到应用的青睐.近年来,分布式数据库产品层出不穷,并在互联网应用中被大量投入使用.然而,分布式数据库的系统复杂度前所未有.为了让系统可用,设计者需要在多种属性中作合理选择和折中.从而造成现有的数据库产品形态各异、优缺点对比分明.至今为止,尚未有人对分布式数据库的设计空间和折中方案进行过深入分析和整理.在对多个分布式数据库产品进行深入理解之后认识到:分布式数据库系统的设计方案可以通过3个属性进行基本刻画——操作一致性、事务一致性和系统可用性.虽然这3个属性并不新颖,但它们在数据库语境下的含义在文献中尚未得到充分澄清.对这3个属性进行澄清,并通过它们对典型数据库产品的格局进行概括、对现有的分布式数据库技术进行综述.此外,还对这3个属性之间的相互关系进行深入分析,以期帮助未来的开发者在分布式数据库的设计过程中做出合理选择.

    Abstract:

    The rapid growth of data and workload makes centralized database systems less and less favorable to today's applications. A distributed database system can scale out dynamically to satisfy the business development. As a result, it has gained much more attention from applications. Since the needs for distributed DB became apparent, an increasing number of products have emerged and been adopted by the Web. However, due to the complexity of distributed DB systems, their designers have to trade off among several desired properties, resulting in dramatic difference in their designs and advantages. To the best of public knowledge, no one has performed a comprehensive analysis on the design space and the tradeoff choices of modern distributed DB systems. After reviewing and understanding a significant number of real world DB products, this study believe that a distributed DB system can be generally described using three dimensions-operational consistency, transactional consistency and availability. While these dimensions are not new, their concepts are somehow blurred in the literature. This paper clarifies the three concepts in the context of database, based on which can draw a sensible landscape of the existing products and technologies. The paper also provides an analysis of the relationship among the three dimensions, intending to help developers make right choice when designing new distributed DB systems.

    参考文献
    [1] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber R. Bigtable:A distributed storage system for structured data. ACM Trans. on Computer Systems, 2008,26(2):4:1-4:26.[doi:10.1145/1365815.1365816]
    [2] Lakshman A, Malik P. Cassandra:A decentralized structured storage system. SIGOPS Operating Systems Review, 2010,44(2):35-40.[doi:10.1145/1773912.1773922]
    [3] Plugge E, Hawkins T, Membrey P. The definitive guide to MongoDB:The NoSQL database for cloud and desktop computing. In:Springer Ebooks. Berkely, 2010.
    [4] Anderson JC, Lehnardt J, Slater N. CouchDB:The Definitive Guide:Time to Relax. Sebastopol:O'Reilly Media, Inc., 2010.
    [5] DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo:Amazon's highly available key-value store. In:Proc. of the 21st SOSP. New York:ACM, 2007. 205-220.[doi:10.1145/1294261. 1294281]
    [6] Stonebraker M, Weisberg A. The VoltDB main memory DBMS. IEEE Data Engineering Bulletin, 2013,36(2):21-27.
    [7] Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner:Google's globally distributed database. ACM Trans. on Computer Systems, 2013,31(3):8:1-8:22.
    [8] Gray J, Reuter A. Transaction Processing:Concepts and Techniques. San Francisco:Morgan Kaufmann Publishers, 1993.
    [9] Brewer EA. Towards robust distributed systems. In:Proc. of the PODC. 2000. 7.[doi:10.1145/343477.343502]
    [10] Gilbert S, Lynch NA. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM SIGACT News, 2002,33(2):51-59.[doi:10.1145/564585.564601]
    [11] Vogels W. Eventually consistent. Communications of the ACM, 2009,52(1):40-44.[doi:10.1145/1435417.1435432]
    [12] Adya A, Liskov BH. Weak consistency:A generalized theory and optimistic implementations for distributed transactions[Ph.D. Thesis]. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1999.
    [13] Cerone A, Bernardi G, Gotsman A. A framework for transactional consistency models with atomic visibility. In:Proc. of the LIPIcs-Leibniz Int'l in Informatics. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. 2015. 42.[doi:10.4230/LIPIcs.CONCUR. 2015.58]
    [14] Bailis P, Fekete A, Hellerstein JM, Ghodsi A, Stoica I. Scalable atomic visibility with RAMP transactions. ACM Trans. on Database Systems, 2016,41(3):15:1-15:45.[doi:10.1145/2588555.2588562]
    [15] Ganesh A, Bamford RJ. Consistent read in a distributed database environment:U.S. Patent 7,334,004[P], 2008-2-19.
    [16] Ardekani MS, Sutra P, Shapiro M. Non-Monotonic snapshot isolation:Scalable and strong consistency for geo-replicated transactional systems. In:Proc. of the 32nd SRDS. 2013. 163-172.[doi:10.1109/SRDS.2013.25]
    [17] Thomson A, Diamond T, Weng SC, Ren K, Shao P, Abadi DJ. Calvin:Fast distributed transactions for partitioned database systems. In:Proc. of the SIGMOD. 2012. 1-12.[doi:10.1145/2213836.2213838]
    [18] Davison SB, Molina HG, Skeen D. Consistency in partitioned networks. ACM Computing Surveys, 1985,17(3):341-370.[doi:10. 1145/5505.5508]
    [19] Herlihy M, Wing JM. Linearizability:A correctness condition for concurrent objects. ACM Trans. on Programming Languages and Systems, 1990,12(3):463-492.[doi:10.1145/78969.78972]
    [20] Petersen K, Spreitzer M, Terry DB, Theimer M, Demers AJ. Flexible update propagation for weakly consistent replication. In:Proc. of the 16th SOSP. 1997. 288-301.[doi:10.1145/269005.266711]
    [21] Abadi D. Consistency tradeoffs in modern distributed database system design:CAP is only part of the story. IEEE Computer, 2012, 45(2):37-42.[doi:10.1109/MC.2012.33]
    [22] Bernstein PA, Das S:Rethinking eventual consistency. In:Proc. of the SIGMOD. 2013. 923-928.[doi:10.1145/2463676.2465339]
    [23] Papadimitriou CH. The serializability of concurrent database updates. Journal of the ACM, 1979,26(4):631-653.[doi:10.1145/322154.322158]
    [24] Mohan C, Haderle DJ, Lindsay BG, Pirahesh H, Schwarz PM. ARIES:A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. on Database Systems, 1992,17(1):94-162.[doi:10.1145/128765.128770]
    [25] Malviya N, Weisberg A, Madden S, Stonebraker M. Rethinking main memory OLTP recovery. In:Proc. of the 30th ICDE. 2014. 604-615.[doi:10.1109/ICDE.2014.6816685]
    [26] Yao C, Agrawal D, Chen G, Ooi BC, Wu S. Adaptive logging:Optimizing logging and recovery costs in distributed in-memory databases. In:Proc. of the SIGMOD. 2016. 1119-1134.[doi:10.1145/2882903.2915208]
    [27] Lomet DB, Tzoumas K, Zwilling MJ. Implementing performance competitive logical recovery. The Proc. of the VLDB Endowment, 2011,4(7):430-439.[doi:10.14778/1988776.1988779]
    [28] Wang JH, Cai P, Qian WN, Zhou AY. Log replication and recovery in cluster-based database system. Ruan Jian Xue Bao/Journal of Software, 2017,28(3):476-489(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5162.htm[doi:10.13328/j. cnki.jos.005162]
    [29] Bernstein PA, Hadzilacos V, Goodman N. Concurrency Control and Recovery in Database Systems. Boston:Addison-Wesley. 1987.
    [30] Kung HT, Robinson JT. On optimistic methods for concurrency control. In:Proc. of the VLDB. 1979. 351.[doi:10.1109/VLDB. 1979.718150]
    [31] Fekete A, Liarokapis D, O'Neil PE, Shasha DE. Making snapshot isolation serializable. ACM Trans. on Database Systems, 2005, 30(2):492-528.[doi:10.1145/1071610.1071615]
    [32] Berenson H, Bernstein P, Gray J, Melton J, O'Neil E, O'Neil P. A critique of ANSI SQL isolation levels. In:Proc. of the SIGMOD. 1995. 1-10.[doi:10.1145/568271.223785]
    [33] Lamport L. Paxos made simple, fast and Byzantine. In:Proc. of the 6th OPODIS. 2002. 7-9. http://dblp.org/rec/bib/conf/opodis/Lamport02
    [34] Lamport L. The part-time parliament. ACM Trans. on Computer Systems, 1998,16(2):133-169.[doi:10.1145/279227.279229]
    [35] Chandra TD, Griesemer R, Redstone J. Paxos made live:An engineering perspective. In:Proc. of the PODC. 2007. 398-407.[doi:10.1145/1281100.1281103]
    [36] Gray J, Lamport L. Consensus on transaction commit. ACM Trans. on Database Systems, 2006,31(1):133-160.[doi:10.1145/1132863.1132867]
    [37] Ongaro D, Ousterhout JK. In search of an understandable consensus algorithm. In:Proc. of the USENIX ATC. 2014. 305-319. https://www.usenix.org/conference/atc14
    [38] Burrows M. The Chubby lock service for loosely-coupled distributed systems. In:Proc. of the 7th OSDI. 2006. 335-350. http://www.usenix.org/events/osdi06/tech
    [39] O'Neil PE, Cheng E, Gawlick D, O'Neil EJ:The log-structured merge-tree (LSM-Tree). Acta Informatica, 1996,33(4):351-385.[doi:10.1007/s002360050048]
    [40] Ghemawat S, Gobioff H, Leung ST. The Google file system. In:Proc. of the 19th SOSP. 2003. 29-43. http://doi.acm.org/10.1145/945445
    [41] Baker J, Bond C, Corbett JC, Furman JJ, Khorlin A, Larson J, Leon JM, Li YW, Lloyd A, Yushprakh V. Megastore:Providing scalable, highly available storage for interactive services. In:Proc. of the CIDR. 2011. 223-234. http://cidrdb.org/cidr2011
    [42] Lloyd W, Freedman MJ, Kaminsky M, Andersen DG. Don't settle for eventual consistency. Communications of the ACM, 2014, 57(5):61-68.[doi:10.1145/2596624]
    [43] Lamport L. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 1978,21(7):558-565.[doi:10.1145/359545.359563]
    [44] Bacon DF, Bales N, Bruno N, Cooper BF, Dickinson A, Fikes A, Fraser C, Gubarev A, Joshi M, Kogan E, Lloyd A, Melnik S, Rao R, Shue D, Taylor C, van der Holst M, Woodford D. Spanner:Becoming a SQL system. In:Proc. of the SIGMOD. 2017. 331-343.[doi:10.1145/3035918.3056103]
    [45] Samaras G, Britton K, Citron A, Mohan C. Two-Phase commit optimizations and tradeoffs in the commercial environment. In:Proc. of the ICDE. 1993. 520-529.[doi:10.1109/ICDE.1993.344028]
    [46] Thomson A, Abadi DJ. The case for determinism in database systems. The Proc. of the VLDB Endowment, 2010,3(1):70-80.[doi:10.14778/1920841.1920855]
    [47] Stonebraker M, Madden S, Abadi DJ, Harizopoulos S, Hachem N, Helland P. The end of an architectural era (It's time for a complete rewrite). In:Proc. of the VLDB. 2007. 1150-1160. http://www.vldb.org/conf/2007/papers
    [48] RenK, Thomson A, Abadi DJ. Lightweight locking for main memory database systems. In:Proc. of the VLDB. 2012. 145-156.[doi:10.14778/2535568.2448947]
    [49] Bailis P, Davidson A, Fekete A, Ghodsi A, Hellerstein JM, Stoica I. Highly available transactions:Virtues and limitations. In:Proc. of the VLDB. 2013. 181-192.[doi:10.14778/2732232.2732237]
    [50] Viotti P, Vukolić M. Consistency in non-transactional distributed storage systems. ACM Computing Surveys, 2016,49(1):19:1-19:34.[doi:10.1145/2926965]
    [51] Levandoski JJ, Lomet DB, Mokbel MF, Zhao K. Deuteronomy:Transaction support for cloud data. In:Proc. of the CIDR. 2011. 123-133. http://cidrdb.org/cidr2011
    [52] Levandoski JJ, Lomet DB, Sengupta S, Stutsman R, Wang R. High performance transactions in deuteronomy. In:Proc. of the CIDR. 2015. http://cidrdb.org/cidr2015
    [53] Levandoski JJ, Lomet DB, Sengupta S, Stutsman R, Wang R. Multi-Version range concurrency control in Deuteronomy. The Proc. of the VLDB Endowment, 2015,8(13):2146-2157.[doi:10.14778/2831360.2831368]
    [54] Gray J, Helland P, O'Neil P, Shasha D. The dangers of replication and a solution. In:Proc. of the SIGMOD. 1996. 173-182.[doi:10.1145/233269.233330]
    [55] Mu S, Nelson L, Lloyd W, Li J. Consolidating concurrency control and consensus for commits under conflict. In:Proc. of the OSDI. 2016. 517-532. https://www.usenix.org/conference/osdi16
    [56] Zhang I, Sharma NK, Szekeres A, Krishnamurthy A, Ports DRK. Building consistent transactions with inconsistent replication. In:Proc. of the SOSP. 2015. 267-278.[doi:10.1145/2815400.2815404]
    [57] Lin ZY, Lai YX, Lin C, Xie Y, Zou Q. Research on cloud databases. Ruan Jian Xue Bao/Journal of Software, 2012,23(5):1148-1166(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4195.htm[doi:10.3724/SP.J.1001.2012.04195]
    附中文参考文献:
    [28] 王嘉豪,蔡鹏,钱卫宁,周傲英.集群数据库系统的日志复制和故障恢复.软件学报,2017,28(3):476-489. http://www.jos.org.cn/1000-9825/5162.htm[doi:10.13328/j.cnki.jos.005162]
    [57] 林子雨,赖永炫,林琛,谢怡,邹权.云数据库研究.软件学报,2012,23(5):1148-1166. http://www.jos.org.cn/1000-9825/4195.htm[doi:10.3724/SP.J.1001.2012.04195]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

朱涛,郭进伟,周欢,周烜,周傲英.分布式数据库中一致性与可用性的关系.软件学报,2018,29(1):131-149

复制
分享
文章指标
  • 点击次数:6430
  • 下载次数: 9339
  • HTML阅读次数: 4610
  • 引用次数: 0
历史
  • 收稿日期:2017-09-16
  • 最后修改日期:2017-10-16
  • 录用日期:2017-11-07
  • 在线发布日期: 2017-12-01
文章二维码
您是第19727885位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号