拜占庭系统技术研究综述
作者:
基金项目:

国家自然科学基金(60925006); 国家高技术研究发展计划(863)(2013AA013201)


Research on the Technologies of Byzantine System
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [49]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    随着分布式系统规模的增大,设计复杂度也不断提升,系统可靠性所面临的问题也越来越严峻.由于拜占庭协议能够容忍包括人为失误、软件bug 和安全漏洞等各种形式的错误,其系统技术和实现方法越来越受到研究者们的重视.介绍和总结了目前拜占庭系统技术的研究成果,分析了目前拜占庭系统的研究现状,并探讨了拜占庭系统的发展趋势.通过分析得出:1) 拜占庭系统性能上仍然与已经实用的非拜占庭系统相距较大,占用资源数量仍然较多,需要进一步研究其性能和资源优化技术;2) 通过检测错误或者定期修复来降低系统中的错误,是延长系统可持续运行时间的方法,需要研究新的、高效的全面检测拜占庭服务器、合理定期修复等保障系统可持续运行的方法;3)实际应用背景和需求及其特定错误类型的处理方法对拜占庭协议和功能等提出了不一样的要求,需要研究拜占庭系统在实际中的应用和可用性.

    Abstract:

    Nowadays, in order to resolve the reliability problem in an enlarging distributed system, Byzantine fault tolerant system has researched popularly for its ability of tolerating arbitrary faults. In this paper, the definitions of Byzantine system and the estimation methods of improving the performance of Byzantine system are introduced. After that, some unresolved problems and some future development trends will be indicated. Finally, after analyzing the status of studies, several conclusions are drawn: 1) The cost of running a Byzantine system is still much higher than non-Byzantine system. Plans to increase the performance and decrease the overhead are need to be explored in further study. 2) While detecting Byzantine faults and proactive recovery can keep Byzantine system from breaking down, they still have some drawbacks. How to eliminate the drawbacks should be studied. 3) Different applications require different aspect of optimization. How to make practical Byzantine systems are needed to be studied.

    参考文献
    [1] Amazon. Amazon Web service. http://aws.amazon.com/products/
    [2] Houstn D, Ferdows A. Dropbox. https://www.dropbox.com/
    [3] Microsoft. Windows azure. http://www.windowsazure.com/en-us/
    [4] Schroeder B, Gibson GA. A large-scale study of failures in high-performance computing systems. IEEE Trans. on Dependable andSecure Computing, 2010. 337-350. [doi: 10.1109/TDSC.2009.4]
    [5] http://games.qq.com/a/20110503/000013.htm
    [6] http://www.chinanews.com/it/2011/09-05/3305704.shtml
    [7] http://tech.163.com/07/ 0813/11/3LP86PEM000915BD.html
    [8] Lamport L, Shostak R, Pease M. The Byzantine generals problem. ACM Trans. on Programming Languages and Systems, 1982,4(3):382-401. [doi: 10.1145/357172.357176]
    [9] Clement A, Kapritsos M, Lee S, Wang Y, Alvisi L, Dahlin M, Riche T. Upright cluster services. In: Proc. of the ACM SIGOPS22nd Symp. on Operating Systems Principles. New York: ACM Press, 2009. 277-290. [doi: 10.1145/1629575.1629602]
    [10] Castro M, Liskov B. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. on Computer Systems, 2002,20(4):398-461. [doi: 10.1145/571637.571640]
    [11] Cowling J, Myers D, Liskov B, Rodrigues R, Shrira L. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In:Proc. of the 7th Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 177-190.
    [12] Kotla R, Alvisi L, Dahlin M, Clement A, Wong E. Zyzzyva: Speculative Byzantine fault tolerance. In: Proc. of the 21st ACMSIGOPS Symp. on Operating Systems Principles. New York: ACM Press, 2007, 45-58. [doi: 10.1145/1294261.1294267]
    [13] Serafini M, Nokor P, Dobre D, Majuntke M, Suri M. Scrooge: Reducing the costs of fast Byzantine replication in presence ofunresponsive replicas. In: Proc. of the 2010 IEEE/IFIP Int’l Conf. on Dependable Systems and Networks. 2010. 353-362. [doi:10.1109/DSN.2010.5544295]
    [14] Yin J, Martin JP, Venkataramani A, Alvisi L, Dahlin M. Separating agreement from execution for Byzantine fault tolerant services.In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003. 253-267. [doi: 10.1145/945445.945470]
    [15] Wood T, Singh R, Venkataramani A, Shenoy P, Ceeehet E. ZZ and the art of practical BFT execution. In: Proc. of the 6th Conf. onComputer Systems. New York: ACM Press, 2011. 123-138. [doi: 10.1145/1966445.1966457]
    [16] Wester B, Cowling J, Nightingale EB, Chen PM, Flinn J, Liskov B. Tolerating latency in replicated state machines through clientspeculation. In: Proc. of the 6th USENIX Symp. on Networked Systems Design and Implementation. Berkeley: USENIXAssociation, 2009. 245-260.
    [17] Hendricks J, Sinnamohideen S, Ganger GR, Reiter MK. Zzyzx: Scalable fault tolerance through Byzantine locking. In: Proc. of the2010 IEEE/IFIP Int’l Conf. on Dependable Systems and Networks. 2010. 363-372. [doi: 10.1109/DSN.2010.5544297]
    [18] Pease M, Shostak R, Lamport L. Reaching agreement in the presence of faults. Journal of the ACM, 1980,27(2):228-234. [doi:10.1145/322186.322188]
    [19] Matin JP, Alvisi L. Fast Byzantine consensus. IEEE Trans. on Dependable and Secure Computing, 2006,3(3):202-215. [doi:10.1109/TDSC.2006.35]
    [20] Baker MG, Hartman JH, Kupfer MD, Shirriff KW, Ousterhout JK. Measurements of a distributed file system. In: Proc. of the 13thACM Symp. on Operating Systems Principles. New York: ACM Press, 1991,25(5):198-212. [doi: 10.1145/121132.121164]
    [21] Leung AW, Pasupathy S, Goodson G, Miller EL. Measurement and analysis of large-scale network file system workloads. In: Proc.of the USENIX 2008 Annual Technical Conf. on Annual Technical Conf. Berkeley: USENIX Association, 2008. 213-226.
    [22] Amir Y, Coan B, Kirsch J, Lane J. Byzantine replication under attack. In: Proc. of the DSN. 2008.
    [23] Malkhi D, Reiter M. Byzantine quorum systems. Jorunal of Distributed Computing, 1988,11(4):203-213.
    [24] Martin JP, Alvisi L, Dahlin M. Minimal Byzantine storage. In: Proc. of the 16th Int’l Conf. on Distributed Computing. London:Springer-Verlag, 2002. 311-325. [doi: 10.1007/3-540-36108-1_21]
    [25] Abd-EL-Malek M, Ganger GR, Goodson GR, Reiter MK, Wylie JJ. Lazy verification in fault-tolerant distributed storage systems.In: Proc. of the 24th IEEE Symp. on Reliable Distributed Systems. 2005. 179-190. [doi: 10.1109/RELDIS.2005.20]
    [26] Goodson GR, Wylie JJ, Ganger GR, Reiter MK. Efficient Byzantine-tolerant erasure-coded storage. In: Proc. of the 2004 Int’l Conf.on Dependable Systems and Networks. 2004. 135-144.
    [27] Hendricks J, Ganger GR, Reiter MK. Low-Overhead Byzantine fault-tolerant storage. In: Proc. of the 21st ACM SIGOPS Symp. onOperating Systems Principles. New York: ACM Press, 2007. 73-86. [doi: 10.1145/1294261.1294269]
    [28] Hendricks J, Granger GR, Reiter MK. Verifying distributed erasure-coded data. In: Proc. of the 26th annual ACM Symp. onPrinciples of Distributed Computing. New York: ACM Press, 2007. 139-146. [doi: 10.1145/1281100.1281122]
    [29] Abd-El-Malek M, Ganger G, Goodson G, Reiter M. Fault-Scalable Byzantine fault-tolerant services. In: Proc. of the 20th ACMSymp. on Operating Systems Principles. New York: ACM Press, 2005. 59-74. [doi: 10.1145/1095810.1095817]
    [30] Luo XH, Shu JW. Summary of research for erasure code in storage system. Journal of Computer Research and Development, 2012,49(1):1-11 (in Chinese with English abstract).
    [31] Li MQ, Shu JW, Zheng WM. GRID codes: Strip-based erasure codes with high fault tolerance for storage systems. ACM Trans. onStorage, 2009,4(4):1-22. [doi: 10.1145/1480439.1480444]
    [32] Patterson D, Chen P, Gibson G, Katz R. Introduction to redundant arrays of inexpensive disks (RAID). In: Proc. of the SpringCOMPCON’89. 1989. 112-117. [doi: 10.1109/CMPCON.1989.301912]
    [33] Cachin C, Tessaro S. Asynchronous verifiable information dispersal. In: Proc. of the 24th IEEE Symp. on Reliable DistributedSystems. 2005. 191-201. [doi: 10.1109/RELDIS.2005.9]
    [34] Alvisi L, Malkhi D, Pierce E, Reiter MK. Fault detection for Byzantine quorum systems. IEEE Trans. on Parallel and DistributedSystems, 2001,12(9):996-1007. [doi: 10.1109/71.954640]
    [35] Chandra TD, Toueg S. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 1996,43(2):225-267. [doi:10.1145/226643.226647]
    [36] Lima M, Greve F, Arantes L, Sens P. Byzantine failure detection for dynamic distributed systems. SANTOSDELIMA-2010-584597,RR-7222. Paris, 2010.
    [37] Delporte-Gallet C, Fauconnier H, Guerraoui R, Hadzilacos V, Houznetsov P, Toueq S. The weakest failure detectors to solvecertain fundamental problems in distributed computing. In: Proc. of the 23rd Annual ACM Symp. on Principles of DistributedComputing. New York: ACM Press, 2004. 338-346. [doi: 10.1145/1011767.1011818]
    [38] Liu G, Zhou J, Sun Y, Qin L. A fault detection mechanism in erasure-code Byzantine fault-tolerance quorum. Wuhan UniversityJournal of Natural Sciences, 2006,11(6):1453-1456. [doi: 10.1007/BF02831796]
    [39] Reiser HP, Kapitza R. Hypervisor-Based efficient proactive recovery. In: Proc. of the 26th IEEE Int’l Symp. on ReliableDistributed Systems. 2007. 83-92. [doi: 10.1109/SRDS.2007.25]
    [40] Sousa P, Bessani AN, Correia M, Neves NF, Verissimo P. Highly available intrusion-tolerant services with proactive-reactiverecovery. IEEE Trans. on Parallel and Distributed Systems, 2010,21(4):452-465. [doi: 10.1109/TPDS.2009.83]
    [41] Distler T, Kapitza R, Reiser HP. Efficient state transfer for hypervisor-based proactive recovery. In: Proc. of the 2nd Workshop onRecent Advances on Intrusiton-Tolerant Systems. New York: ACM Press, 2008. [doi: 10.1145/1413901.1413905]
    [42] Paulo E. Travelling through wormholes: A new look at distributed systems models. Journal of SIGACT News, 2006,37(1):66-81.[doi: 10.1145/1122480.1122497]
    [43] Chun BG, Maniatis P, Shenker S, Kubiatowicz J. Tiered fault tolerance for long-term integrity. In: Proc. of the 7th Conf. on Fileand Storage Technologies. Berkeley: USENIX Association, 2009. 267-282.
    [44] Chun BG, Maniatis P, Shenker S, Kubiatowicz J. Attested append-only memory: Making adversaries stick to their word. In: Proc.of the 21st ACM SIGOPS Symp. on Operating Systems Principles. New York: ACM Press, 2007. 189-204. [doi: 10.1145/1294261.1294280]
    [45] Felten EW. Understanding trusted computing: Will its benefits outweigh its drawbacks? Journal of IEEE Security Privacy, 2003,1(3):60-62. [doi: 10.1109/MSECP.2003.1203224]
    [46] Levin D, Douceur JR, Lorch JR, Moscibroda T. TrInc: Small trusted hardware for large distributed systems. In: Proc. of the 6thUSENIX Symp. on Networked Systems Design and Implementation. Berkeley: USENIX Association, 2009. 1-14.
    [47] Trusted Computing Group. Trusted platform module. http://www.trustedcomputinggroup.org/developers/trusted_platform_module
    [48] Singh A, Fonseca P, Kuznetsov P, Rodrigues R, Maniatis P. Zeno: Eventually consistent Byzantine-fault tolerance. In: Proc. of the6th USENIX Symp. on Networked Systems Design and Implementation. Berkeley: USENIX Association, 2009. 169-184.
    [49] Aiyer AS, Alvisi L, Clement A, Dahlin M, Martin JP, Porth C. BAR fault tolerance for cooperative services. In: Proc. of the 20thACM Symp. on Operating Systems Principles. New York: ACM Press, 2005. 45-58. [doi: 10.1145/1095810.1095816]
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

范捷,易乐天,舒继武.拜占庭系统技术研究综述.软件学报,2013,24(6):1346-1360

复制
分享
文章指标
  • 点击次数:9493
  • 下载次数: 16498
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2012-04-11
  • 最后修改日期:2013-03-07
  • 在线发布日期: 2013-04-01
文章二维码
您是第19732267位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号