国家自然科学基金(62072082, U2241212, U1811261, 62202088); 辽宁省重点研发计划(2020JH2/10100037); 中央高校基本科研业务费(N2216015, N2216012)
数据复制是分布式数据库提高可用性的重要手段, 通过在不同区域放置数据库的部分副本, 还可以提高本地读写操作的响应速度, 增加副本数量也会提升读负载的线性扩展能力. 考虑到这些优良特性, 近年来国内外都出现了众多多副本分布式数据库系统, 包括Google Spanner、CockroachDB、TiDB、OceanBase等一系列主流的工业界系统, 也出现了包括Calvin、Aria、Berkeley Anna等一系列优秀的学术界系统. 然而, 多副本数据库带来诸多收益的同时, 也带来了一致性维护、跨节点事务、事务隔离等一系列挑战. 总结分析现有的复制架构、一致性维护策略、跨节点事务并发控制等技术, 对比几个代表性多副本数据库系统之间在分布式事务处理方面上的差异与共同点, 并在阿里云环境下搭建跨区域的分布式集群环境, 对几个代表性系统的分布式事务处理能力进行了实验测试分析.
Data replication is an important way to improve the availability of distributed databases. By placing multiple database replicas in different regions, the response speed of local reading and writing operations can be increased. Furthermore, increasing the number of replicas can improve the linear scalability of the read throughput. In view of these advantages, a number of multi-replica distributed database systems have emerged in recent years, including some mainstream systems from the industry such as Google Spanner, CockroachDB, TiDB, and OceanBase, as well as some excellent systems from academia such as Calvin, Aria, and Berkeley Anna. However, these multi-replica databases bring a series of challenges such as consistency maintenance, cross-node transactions, and transaction isolation while providing many benefits. This study summarizes the existing replication architecture, consistency maintenance strategy, cross-node transaction concurrency control, and other technologies. It also analyzes the differences and similarities between several representative multi-replica database systems in terms of distributed transaction processing. Finally, the study builds a cross-region distributed cluster environment on Alibaba Cloud and conducts multiple experiments to study the distributed transaction processing performance of these several representative systems.