Abstract:Many applications such as social networking, online shopping and online finance may receive highly concurrent data access from massive Internet users. In this scenario, traditional single node database systems gradually become the bottleneck of the system, and the main reason for many successful Internet applications is the use of cluster-based data management systems. Compared with traditional database systems, cluster-based distributed database systems have better scalability and availability, and log replication is one of the core components to build these features. Master-slave based log replication cannot handle the uncertain logs while failure occurs, resulting in the risk of inconsistency among different copies. Consensus algorithms cannot be directly applied to the database system due to the lack of transaction consistency model, and they also have issues in leader election with livelock, as well as double master and continuous election problem. This paper introduces a log replication strategy and corresponding recovery technique for cluster environments, which can effectively process the uncertain logs and provide two read consistency options, i.e. strong and weak consistency. A lightweight master election algorithm is also presented to avoid the master election issues. The algorithms are implemented in the OceanBase distributed database system and tested using benchmark tool. Experiments show that the proposed method can improve the scalability and availability.