数据库管理系统中数据异常体系化定义与分类
作者:
作者简介:

李海翔(1974-),男,硕士,腾讯首席架构师,CCF专业会员,主要研究领域为分布式计算,云数据库,事务处理,查询优化;
杜小勇(1963-),男,博士,教授,博士生导师,CCF会士,主要研究领域为智能信息检索,高性能数据库,非结构化数据管理;
李晓燕(1989-),女,博士生,主要研究领域为统计学习,智能信息处理;
卢卫(1981-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为数据库基础理论,大数据系统研制,时空背景下的查询处理,云数据库系统及应用;
刘畅(1997-),男,学士,腾讯软件工程师,主要研究领域为数据库产品的研发;
潘安群(1982-),男,硕士,腾讯云数据库专家软件工程师,CCF专业会员,主要研究领域为云计算,分布式数据库系统,区块链.

通讯作者:

李海翔,E-mail:blueseali@tencent.com

基金项目:

国家重点研发计划(2017YFB1001803);国家自然科学基金(61872008)


Systematic Definition and Classification of Data Anomalies in Data Base Management Systems
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [34]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    数据异常尚没有统一的定义,其含义是指可能破坏数据库一致性状态的特定数据操作模式.已知的数据异常有脏写、脏读、不可重复读、幻读、丢失更新、读偏序和写偏序等.为了提高并发控制算法的效率,数据异常也被用于定义隔离级别,采用较弱的隔离级别以提高事务处理系统的效率.体系化地研究了数据异常以及对应的隔离级别,发现了22种未被其他文献报告过的新的数据异常,并对全部数据异常进行分类.基于数据异常的分类,提出了新的且不同粒度的隔离级别体系,揭示基于数据异常定义隔离级别的规律,使得对于数据异常和隔离级别等相关概念的认知可以更加简明.

    Abstract:

    There is no unified definition of data anomalies, which refers to the specific data operation mode that may destroy the consistency of the database. Known data anomalies include Dirty Write, Dirty Read, Non-repeatable Read, Phantom, Read Skew, Write Skew, etc. In order to improve the efficiency of concurrency control algorithms, data anomalies are also used to define the isolation levels, because the weak isolation level can improve the efficiency of transaction processing systems. This work systematically studies the data anomalies and the corresponding isolation levels. Twenty-two new data anomalies are reported that have not been reported by other researches, and all data anomalies are classified miraculously. Based on the classification of data anomalies, two new isolation levels with different granularity are proposed, which reveals the rule of defining isolation levels based on data anomalies and makes the cognition of data anomalies and isolation levels more concise.

    参考文献
    [1] ANSI X3.135-1992. American National Standard for Information Systems-Database Language-SQL. 1992.
    [2] Berenson H, Bernstein PA, Gray J, Melton J, O'Neil EJ, O'Neil PE. A critique of ANSI SQL isolation levels. In:Proc. of the SIGMOD Conf. 1995. 1-10.
    [3] Adya A, Liskov B, O'Neil P. Generalized isolation level definitions. In:Proc. of the 16th Int'l Conf. on Data Engineering (ICDE 2000). Washington:IEEE Computer Society, 2000. 67-78.
    [4] Xie C, Su C, Littley C, Alvisi L, Kapritsos M, Wang Y. High-performance acid via modular concurrency control. In:Proc. of the 25th Symp. on Operating Systems Principles (SOSP 2015). New York:ACM, 2015. 279-294.
    [5] Schenkel R, Weikum G, Weißenberg N, et al. Federated transaction management with snapshot isolation. In:Proc. of the Transactions and Database Dynamics, Vol.1773. Berlin, Heidelberg:Springer, 2000. 1-25.
    [6] Bailis P, Fekete A, Hellerstein JM, Ghodsi A, Stoica I. Scalable atomic visibility with ramp transactions. In:Proc. of the SIGMOD. ACM, 2014. 27-38.
    [7] Cerone A, Gotsman A, Yang HS. Algebraic laws for weak consistency. In:Proc. of the Int'l Conf. on Concurrency Theory (CONCUR 2017). 2017. Article No.26.
    [8] Binnig C, Hildenbrand S, Färber F, Kossmann D, Lee JC, May N. Distributed snapshot isolation:Global transactions pay globally, local transactions pay locally. VLDB Journal, 2014, 23(6):987-1011.
    [9] Lamport L, Shostak R, Pease M. The Byzantine generals problem. ACM Trans. on Programming Languages and Systems (TOPLAS), 1982, 4(3):382-401.
    [10] Fekete A, O'Neil E, O'Neil P. A read-only transaction anomaly under snapshot isolation. SIGMOD Record, 2004, 33(3):12-14.
    [11] Fekete A, Liarokapis D, O'Neil E, O'Neil P, Shasha D. Making snapshot isolation serializable. ACM Trans. on Database Systems, 2005, 30(2):492-528.
    [12] Du XY, et al. Big Data Management. Beijing:Higher Education Press, 2017(in Chinese).
    [13] 2021. https://wiki.postgresql.org/wiki/SSI#Read_Only_transactions
    [14] Eswaran KP, Gray J, Lorie RA, Traiger IL. The notions of consistency and predicate locks in a database system. Communications of the ACM, 1976, 19(11):624-633.
    [15] Cerone A, Gotsman A. Analysing snapshot isolation. Journal of the ACM, 2018, 65(2):Article No.11.
    [16] Papadimitriou CH. The serializability of concurrent database updates. Journal of the ACM, 1979, 26(4):631-653.
    [17] Gray J, Lorie RA, Putzolu GR, Traiger IL. Granularity of locks and degrees of consistency in a shared data base. In:Proc. of the IFIP Working Conf. on Modelling in Data Base Management Systems. 1976. 365-394.
    [18] Wang S, Sa SX. Introduction to Database System. 5th ed., Beijing:Higher Education Press, 2014(in Chinese).
    [19] Bernstein P, Hadzilacos V, Goodman N. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.
    [20] Weikum G, Vossen G. Transactional Information Systems. Elsevier, 2001.
    [21] Elmasri R, Navathe SB. Fundamentals of Database Systems. 5th ed., Addison-Wesley:Longman Publishing Co., Inc., 2006.
    [22] Thomas RK, Sandhu RS. Towards a unified framework and theory for reasoning about security and correctness of transactions in multilevel databases. In:Proc. of the DBSec. 1993. 309-328.
    [23] Bernstein PA, Shipman D, Wong W. Formal aspects of serializability in database concurrency control. IEEE Trans. on Software Engineering, 1979, 5(3):203-216.
    [24] Zellag K, Kemme B. Real-time quantification and classification of consistency anomalies in multi-tier architectures. In:Proc. of the 27th IEEE Int'l Conf. on Data Engineering (ICDE 2011). IEEE Computer Society, 2011. 613-624.
    [25] Zellag K, Kemme B. How consistent is your cloud application? In:Proc. of the 3rd ACM Symp. on Cloud Computing (SoCC 2012). ACM, 2012. Article No.6.
    [26] Zellag K, Kemme B. Consistency anomalies in multi-tier architectures:Automatic detection and prevention. The VLDB Journal, 2014, 3(1):147-172.
    [27] Zellag K, Kemme B. ConsAD:A real-time consistency anomalies detector. In:Proc. of the SIGMOD Conf. ACM, 2012. 641-644.
    [28] Fekete A, Goldrei SN, Asenjo JP. Quantifying isolation anomalies. Proc. of the VLDB Endowment, 2009, 2(1):467-478.
    [29] Cahill MJ, Röhm U, Fekete AD. Serializable isolation for snapshot databases. ACM Trans. on Database Systems, 2009, 34(4): Article No.20.
    [30] Kung HT, Robinson JT. On optimistic methods for concurrency control. In:Proc. of the VLDB. IEEE Computer Society, 1979.
    [31] Papadimitriou CH, Kanellakis PC. On concurrency control by multiple versions. ACM Trans. on Database Systems, 1984, 9(1): 89-99.
    附中文参考文献:
    [12] 杜小勇, 等. 大数据管理. 北京:高等教育出版社, 2017.
    [18] 王珊, 萨师煊. 数据库系统概论. 第5 版, 北京:高等教育出版社, 2014.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李海翔,李晓燕,刘畅,杜小勇,卢卫,潘安群.数据库管理系统中数据异常体系化定义与分类.软件学报,2022,33(3):909-930

复制
分享
文章指标
  • 点击次数:2428
  • 下载次数: 5090
  • HTML阅读次数: 2802
  • 引用次数: 0
历史
  • 收稿日期:2021-06-30
  • 最后修改日期:2021-07-31
  • 在线发布日期: 2021-10-21
  • 出版日期: 2022-03-06
文章二维码
您是第19701019位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号