支持乱序执行的Raft协议
作者:
作者单位:

作者简介:

谷晓松(1997-),男,学士,CCF学生会员,主要研究领域为分布式数据一致性,分布式系统,形式化方法.
乔磊(1982-),男,博士,研究员,CCF专业会员,主要研究领域为航天器嵌入式操作系统设计及验证.
魏恒峰(1986-),男,博士,CCF专业会员,主要研究领域为分布数据一致性,形式化方法.
黄宇(1982-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为分布式算法,分布式系统,网络化软件系统.

通讯作者:

魏恒峰,Email:hfwei@nju.edu.cn;乔磊,Email:fly2moon@aliyun.com

中图分类号:

基金项目:

国家自然科学基金(61932021,61772258);五〇二所空间先进计算与电子信息专业实验室开放基金(OBCandETL-2020-04)


Raft with Out-of-order Executions
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61932021, 61772258); Space Advanced Computing and Electronic Information Laboratory of BICE (OBCandETL-2020-04)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    PolarFS是阿里巴巴开发的分布式文件系统,它实现了分布式共识协议Raft的一种变体,称为ParallelRaft.ParallelRaft突破了Raft中顺序提交、顺序执行的限制,允许状态机乱序执行用户命令.然而文献表明:ParallelRaft并未开源,仅有简短的文字描述,更缺乏严格的形式化规约.更进一步,它的正确性也尚未经过必要的数学论证或形式化检验.旨在为ParallelRaft提供严格的形式化规约并证明其正确性,主要贡献包括:首先,为了理清ParallelRaft与Raft之间的关系,提出了允许乱序提交、顺序执行的ParallelRaft-SE (sequential execution)协议,并建立了从ParallelRaft-SE到Multi-Paxos的精化关系;其次,现有的ParallelRaft描述忽略了可能会违反状态一致性的“幽灵日志”问题,因此在ParallelRaft-SE的基础上提出了ParallelRaft-CE (concurrent execution)协议.ParallelRaft-CE限制了ParallelRaft-SE在乱序提交阶段的并行度,避免了“幽灵日志”问题,支持乱序执行,并保证乱序执行下的状态机一致性.证明了ParallelRaft-CE的正确性.最后,使用TLA+给出了ParallelRaft-SE和ParallelRaft-CE的形式化规约,并对协议参与者数量较小的情形,使用TLC模型检验与模拟测试工具验证了从ParallelRaft-SE到Multi-Paxos的精化关系以及ParallelRaft-CE的正确性.

    Abstract:

    PolarFS is a distributed file system developed by Alibaba Inc. with ultra-low latency and high availability. It implemented a variant of the Raft consensus protocol, called ParallelRaft. ParallelRaft breaks Raft's strict serialization restrictions in the commitment and execution of log entries and enables state machines to commit and execute log entries in an out-of-order way. However, ParallelRaft is not open-sourced. It has only a brief description, lacking formal specification. Moreover, the correctness of ParallelRaft has not been manually proven or formally checked. The purpose of the study is to provide a precise formal specification for ParallelRaft and to prove its correctness. Specifically, the following main contributions are accomplished. First, to clarify the relationship between Raft and ParallelRaft, ParallelRaft-SE (sequential execution) is proposed, which allows out-of-order commitment but prohibits out-of-order executions. Also, a refinement mapping from ParallelRaft-SE to Multi-Paxos is established. Second, it is discovered that ParallelRaft, according to its brief description in the literature, neglects the so-called "ghost log entries" phenomenon, which may violate the consistency among state machines. Therefore, based on ParallelRaft-SE, ParallelRaft-CE (concurrent execution) is proposed. ParallelRaft-CE avoids the "ghost log entries" phenomenon and ensures the consistency among state machine when executing concurrently by limiting parallelism in the commitment of log entries. The correctness of ParallelRaft-CE is proved manually. Finally, the formal specifications of ParallelRaft-SE and ParallelRaft-CE using TLA+ (TLA stands for temporal logic of actions) are provided, and the refinement mapping from ParallelRaft-SE to Multi-Paxos and the correctness of ParallelRaft-CE are verified using the TLC model checker when the number of participants of the protocols is small.

    参考文献
    相似文献
    引证文献
引用本文

谷晓松,魏恒峰,乔磊,黄宇.支持乱序执行的Raft协议.软件学报,2021,32(6):1748-1778

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-08-30
  • 最后修改日期:2021-01-18
  • 录用日期:
  • 在线发布日期: 2021-02-07
  • 出版日期: 2021-06-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号