Consistency and Availability in Distributed Database Systems
Author:
Affiliation:

Fund Project:

National High Technology Research and Development Program of China (863) (2015AA015307); National Natural Science Foundation of China (61772202)

  • Article
  • | |
  • Metrics
  • |
  • Reference [60]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The rapid growth of data and workload makes centralized database systems less and less favorable to today's applications. A distributed database system can scale out dynamically to satisfy the business development. As a result, it has gained much more attention from applications. Since the needs for distributed DB became apparent, an increasing number of products have emerged and been adopted by the Web. However, due to the complexity of distributed DB systems, their designers have to trade off among several desired properties, resulting in dramatic difference in their designs and advantages. To the best of public knowledge, no one has performed a comprehensive analysis on the design space and the tradeoff choices of modern distributed DB systems. After reviewing and understanding a significant number of real world DB products, this study believe that a distributed DB system can be generally described using three dimensions-operational consistency, transactional consistency and availability. While these dimensions are not new, their concepts are somehow blurred in the literature. This paper clarifies the three concepts in the context of database, based on which can draw a sensible landscape of the existing products and technologies. The paper also provides an analysis of the relationship among the three dimensions, intending to help developers make right choice when designing new distributed DB systems.

    Reference
    [1] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber R. Bigtable:A distributed storage system for structured data. ACM Trans. on Computer Systems, 2008,26(2):4:1-4:26.[doi:10.1145/1365815.1365816]
    [2] Lakshman A, Malik P. Cassandra:A decentralized structured storage system. SIGOPS Operating Systems Review, 2010,44(2):35-40.[doi:10.1145/1773912.1773922]
    [3] Plugge E, Hawkins T, Membrey P. The definitive guide to MongoDB:The NoSQL database for cloud and desktop computing. In:Springer Ebooks. Berkely, 2010.
    [4] Anderson JC, Lehnardt J, Slater N. CouchDB:The Definitive Guide:Time to Relax. Sebastopol:O'Reilly Media, Inc., 2010.
    [5] DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo:Amazon's highly available key-value store. In:Proc. of the 21st SOSP. New York:ACM, 2007. 205-220.[doi:10.1145/1294261. 1294281]
    [6] Stonebraker M, Weisberg A. The VoltDB main memory DBMS. IEEE Data Engineering Bulletin, 2013,36(2):21-27.
    [7] Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner:Google's globally distributed database. ACM Trans. on Computer Systems, 2013,31(3):8:1-8:22.
    [8] Gray J, Reuter A. Transaction Processing:Concepts and Techniques. San Francisco:Morgan Kaufmann Publishers, 1993.
    [9] Brewer EA. Towards robust distributed systems. In:Proc. of the PODC. 2000. 7.[doi:10.1145/343477.343502]
    [10] Gilbert S, Lynch NA. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM SIGACT News, 2002,33(2):51-59.[doi:10.1145/564585.564601]
    [11] Vogels W. Eventually consistent. Communications of the ACM, 2009,52(1):40-44.[doi:10.1145/1435417.1435432]
    [12] Adya A, Liskov BH. Weak consistency:A generalized theory and optimistic implementations for distributed transactions[Ph.D. Thesis]. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1999.
    [13] Cerone A, Bernardi G, Gotsman A. A framework for transactional consistency models with atomic visibility. In:Proc. of the LIPIcs-Leibniz Int'l in Informatics. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. 2015. 42.[doi:10.4230/LIPIcs.CONCUR. 2015.58]
    [14] Bailis P, Fekete A, Hellerstein JM, Ghodsi A, Stoica I. Scalable atomic visibility with RAMP transactions. ACM Trans. on Database Systems, 2016,41(3):15:1-15:45.[doi:10.1145/2588555.2588562]
    [15] Ganesh A, Bamford RJ. Consistent read in a distributed database environment:U.S. Patent 7,334,004[P], 2008-2-19.
    [16] Ardekani MS, Sutra P, Shapiro M. Non-Monotonic snapshot isolation:Scalable and strong consistency for geo-replicated transactional systems. In:Proc. of the 32nd SRDS. 2013. 163-172.[doi:10.1109/SRDS.2013.25]
    [17] Thomson A, Diamond T, Weng SC, Ren K, Shao P, Abadi DJ. Calvin:Fast distributed transactions for partitioned database systems. In:Proc. of the SIGMOD. 2012. 1-12.[doi:10.1145/2213836.2213838]
    [18] Davison SB, Molina HG, Skeen D. Consistency in partitioned networks. ACM Computing Surveys, 1985,17(3):341-370.[doi:10. 1145/5505.5508]
    [19] Herlihy M, Wing JM. Linearizability:A correctness condition for concurrent objects. ACM Trans. on Programming Languages and Systems, 1990,12(3):463-492.[doi:10.1145/78969.78972]
    [20] Petersen K, Spreitzer M, Terry DB, Theimer M, Demers AJ. Flexible update propagation for weakly consistent replication. In:Proc. of the 16th SOSP. 1997. 288-301.[doi:10.1145/269005.266711]
    [21] Abadi D. Consistency tradeoffs in modern distributed database system design:CAP is only part of the story. IEEE Computer, 2012, 45(2):37-42.[doi:10.1109/MC.2012.33]
    [22] Bernstein PA, Das S:Rethinking eventual consistency. In:Proc. of the SIGMOD. 2013. 923-928.[doi:10.1145/2463676.2465339]
    [23] Papadimitriou CH. The serializability of concurrent database updates. Journal of the ACM, 1979,26(4):631-653.[doi:10.1145/322154.322158]
    [24] Mohan C, Haderle DJ, Lindsay BG, Pirahesh H, Schwarz PM. ARIES:A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. on Database Systems, 1992,17(1):94-162.[doi:10.1145/128765.128770]
    [25] Malviya N, Weisberg A, Madden S, Stonebraker M. Rethinking main memory OLTP recovery. In:Proc. of the 30th ICDE. 2014. 604-615.[doi:10.1109/ICDE.2014.6816685]
    [26] Yao C, Agrawal D, Chen G, Ooi BC, Wu S. Adaptive logging:Optimizing logging and recovery costs in distributed in-memory databases. In:Proc. of the SIGMOD. 2016. 1119-1134.[doi:10.1145/2882903.2915208]
    [27] Lomet DB, Tzoumas K, Zwilling MJ. Implementing performance competitive logical recovery. The Proc. of the VLDB Endowment, 2011,4(7):430-439.[doi:10.14778/1988776.1988779]
    [28] Wang JH, Cai P, Qian WN, Zhou AY. Log replication and recovery in cluster-based database system. Ruan Jian Xue Bao/Journal of Software, 2017,28(3):476-489(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5162.htm[doi:10.13328/j. cnki.jos.005162]
    [29] Bernstein PA, Hadzilacos V, Goodman N. Concurrency Control and Recovery in Database Systems. Boston:Addison-Wesley. 1987.
    [30] Kung HT, Robinson JT. On optimistic methods for concurrency control. In:Proc. of the VLDB. 1979. 351.[doi:10.1109/VLDB. 1979.718150]
    [31] Fekete A, Liarokapis D, O'Neil PE, Shasha DE. Making snapshot isolation serializable. ACM Trans. on Database Systems, 2005, 30(2):492-528.[doi:10.1145/1071610.1071615]
    [32] Berenson H, Bernstein P, Gray J, Melton J, O'Neil E, O'Neil P. A critique of ANSI SQL isolation levels. In:Proc. of the SIGMOD. 1995. 1-10.[doi:10.1145/568271.223785]
    [33] Lamport L. Paxos made simple, fast and Byzantine. In:Proc. of the 6th OPODIS. 2002. 7-9. http://dblp.org/rec/bib/conf/opodis/Lamport02
    [34] Lamport L. The part-time parliament. ACM Trans. on Computer Systems, 1998,16(2):133-169.[doi:10.1145/279227.279229]
    [35] Chandra TD, Griesemer R, Redstone J. Paxos made live:An engineering perspective. In:Proc. of the PODC. 2007. 398-407.[doi:10.1145/1281100.1281103]
    [36] Gray J, Lamport L. Consensus on transaction commit. ACM Trans. on Database Systems, 2006,31(1):133-160.[doi:10.1145/1132863.1132867]
    [37] Ongaro D, Ousterhout JK. In search of an understandable consensus algorithm. In:Proc. of the USENIX ATC. 2014. 305-319. https://www.usenix.org/conference/atc14
    [38] Burrows M. The Chubby lock service for loosely-coupled distributed systems. In:Proc. of the 7th OSDI. 2006. 335-350. http://www.usenix.org/events/osdi06/tech
    [39] O'Neil PE, Cheng E, Gawlick D, O'Neil EJ:The log-structured merge-tree (LSM-Tree). Acta Informatica, 1996,33(4):351-385.[doi:10.1007/s002360050048]
    [40] Ghemawat S, Gobioff H, Leung ST. The Google file system. In:Proc. of the 19th SOSP. 2003. 29-43. http://doi.acm.org/10.1145/945445
    [41] Baker J, Bond C, Corbett JC, Furman JJ, Khorlin A, Larson J, Leon JM, Li YW, Lloyd A, Yushprakh V. Megastore:Providing scalable, highly available storage for interactive services. In:Proc. of the CIDR. 2011. 223-234. http://cidrdb.org/cidr2011
    [42] Lloyd W, Freedman MJ, Kaminsky M, Andersen DG. Don't settle for eventual consistency. Communications of the ACM, 2014, 57(5):61-68.[doi:10.1145/2596624]
    [43] Lamport L. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 1978,21(7):558-565.[doi:10.1145/359545.359563]
    [44] Bacon DF, Bales N, Bruno N, Cooper BF, Dickinson A, Fikes A, Fraser C, Gubarev A, Joshi M, Kogan E, Lloyd A, Melnik S, Rao R, Shue D, Taylor C, van der Holst M, Woodford D. Spanner:Becoming a SQL system. In:Proc. of the SIGMOD. 2017. 331-343.[doi:10.1145/3035918.3056103]
    [45] Samaras G, Britton K, Citron A, Mohan C. Two-Phase commit optimizations and tradeoffs in the commercial environment. In:Proc. of the ICDE. 1993. 520-529.[doi:10.1109/ICDE.1993.344028]
    [46] Thomson A, Abadi DJ. The case for determinism in database systems. The Proc. of the VLDB Endowment, 2010,3(1):70-80.[doi:10.14778/1920841.1920855]
    [47] Stonebraker M, Madden S, Abadi DJ, Harizopoulos S, Hachem N, Helland P. The end of an architectural era (It's time for a complete rewrite). In:Proc. of the VLDB. 2007. 1150-1160. http://www.vldb.org/conf/2007/papers
    [48] RenK, Thomson A, Abadi DJ. Lightweight locking for main memory database systems. In:Proc. of the VLDB. 2012. 145-156.[doi:10.14778/2535568.2448947]
    [49] Bailis P, Davidson A, Fekete A, Ghodsi A, Hellerstein JM, Stoica I. Highly available transactions:Virtues and limitations. In:Proc. of the VLDB. 2013. 181-192.[doi:10.14778/2732232.2732237]
    [50] Viotti P, Vukolić M. Consistency in non-transactional distributed storage systems. ACM Computing Surveys, 2016,49(1):19:1-19:34.[doi:10.1145/2926965]
    [51] Levandoski JJ, Lomet DB, Mokbel MF, Zhao K. Deuteronomy:Transaction support for cloud data. In:Proc. of the CIDR. 2011. 123-133. http://cidrdb.org/cidr2011
    [52] Levandoski JJ, Lomet DB, Sengupta S, Stutsman R, Wang R. High performance transactions in deuteronomy. In:Proc. of the CIDR. 2015. http://cidrdb.org/cidr2015
    [53] Levandoski JJ, Lomet DB, Sengupta S, Stutsman R, Wang R. Multi-Version range concurrency control in Deuteronomy. The Proc. of the VLDB Endowment, 2015,8(13):2146-2157.[doi:10.14778/2831360.2831368]
    [54] Gray J, Helland P, O'Neil P, Shasha D. The dangers of replication and a solution. In:Proc. of the SIGMOD. 1996. 173-182.[doi:10.1145/233269.233330]
    [55] Mu S, Nelson L, Lloyd W, Li J. Consolidating concurrency control and consensus for commits under conflict. In:Proc. of the OSDI. 2016. 517-532. https://www.usenix.org/conference/osdi16
    [56] Zhang I, Sharma NK, Szekeres A, Krishnamurthy A, Ports DRK. Building consistent transactions with inconsistent replication. In:Proc. of the SOSP. 2015. 267-278.[doi:10.1145/2815400.2815404]
    [57] Lin ZY, Lai YX, Lin C, Xie Y, Zou Q. Research on cloud databases. Ruan Jian Xue Bao/Journal of Software, 2012,23(5):1148-1166(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4195.htm[doi:10.3724/SP.J.1001.2012.04195]
    附中文参考文献:
    [28] 王嘉豪,蔡鹏,钱卫宁,周傲英.集群数据库系统的日志复制和故障恢复.软件学报,2017,28(3):476-489. http://www.jos.org.cn/1000-9825/5162.htm[doi:10.13328/j.cnki.jos.005162]
    [57] 林子雨,赖永炫,林琛,谢怡,邹权.云数据库研究.软件学报,2012,23(5):1148-1166. http://www.jos.org.cn/1000-9825/4195.htm[doi:10.3724/SP.J.1001.2012.04195]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

朱涛,郭进伟,周欢,周烜,周傲英.分布式数据库中一致性与可用性的关系.软件学报,2018,29(1):131-149

Copy
Share
Article Metrics
  • Abstract:6440
  • PDF: 9354
  • HTML: 4652
  • Cited by: 0
History
  • Received:September 16,2017
  • Revised:October 16,2017
  • Adopted:November 07,2017
  • Online: December 01,2017
You are the first2033165Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063