面向云应用系统的容错即服务优化提供方法
作者:
作者简介:

杨娜(1991-),女,山东临沂人,硕士,主要研究领域为软件容错;刘靖(1981-),男,博士,副教授,CCF高级会员,主要研究领域为云计算,容错计算,软件测试.

通讯作者:

刘靖,E-mail:liujing@imu.edu.cn

基金项目:

国家自然科学基金(61662051,61262017)


Optimized Fault Tolerance as Services Provisioning for Cloud Applications
Author:
Fund Project:

National Natural Science Foundation of China (61662051, 61262017)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [25]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    通过提供高效且持续可用的容错服务以保障云应用系统的可靠运行是至关重要的.采用容错即服务的模式,提出了一种优化的云容错服务动态提供方法,从云应用组件的可靠性及响应时间等方面描述云应用容错需求,以常用的复制、检查点和NVP(N-version programming)等容错技术为基础,充分考虑容错服务动态切换开销,分别针对支撑容错服务的底层云资源是否足够的场景,给出可用容错即服务提供方案的最优化求解方法.实验结果表明,所提方法降低了云应用系统支付的容错服务费用及支撑容错服务的底层云资源的开销,提高了容错服务提供商为多个云应用实施高效、可靠容错即服务的能力.

    Abstract:

    It is important to provide efficient and continuously available fault tolerant services for cloud applications to ensure their reliable executions. This study adopts the fault tolerance as a service scheme to propose an optimized fault tolerance services provisioning method. The fault tolerance requirements for cloud applications are specified from certain aspects of cloud service components, such as reliability and response time. Based on major fault tolerance technologies, i.e., replication, checkpoint, and NVP (N-Version Programming), with consideration of the dynamic switching overhead among fault tolerance services, a novel method to compute optimal solution of feasible fault tolerance service provisioning is proposed according to the fault tolerance as a service scheme. Two analysis scenarios are considered, that is, whether cloud infrastructure resources used to support fault tolerance service are sufficient or not. The experimental results show that the proposed method reduces the fault tolerant service expenses for cloud application system, reduces the cost of cloud infrastructure resources supporting fault tolerance service, and improves the service capacity of fault tolerance service providers to provide efficient and reliable fault tolerance as a service for cloud application systems.

    参考文献
    [1] Jhawar R, Piuri V. Fault tolerance and resilience in cloud computing environments. In:Vacca J, ed. Computer and Information Security Handbook. Elsevier Inc., 2013. 125-141.[doi:10.1016/B978-0-12-394397-2.00007-6]
    [2] Dai HJ, Zhao SL, Zhang JT, Qiu MK, Tao LX. Security enhancement of cloud servers with a redundancy-based fault-tolerant cache structure. Future Generation Computer Systems, 2015,52:147-155.[doi:10.1016/j.future.2015.03.001]
    [3] Wang J, Bao WD, Zhu XM, Yang LT, Xiang Y. FESTAL:Fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds. IEEE Trans. on Computers, 2015,64(9):2545-2558.[doi:10.1109/TC.2014.2366751]
    [4] Jhawar R, Piuri V, Santambrogio M. Fault tolerance management in cloud computing:A system-level perspective. IEEE System Journal, 2013,7(2):288-297.[doi:10.1109/JSYST.2012.2221934]
    [5] Cheraghlou MN, Khadem-Zadeh A, Haghparast M. A survey of fault tolerance architecture in cloud computing. Journal of Network and Computer Applications, 2016,61:81-92.[doi:10.1016/j.jnca.2015.10.004]
    [6] Sun DW, Chang GR, Miao CS, Wang XW. Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environment. Journal of Super Computing, 2013,66(1):193-228.[doi:10.1007/s11227-013-0898-7]
    [7] Yi HZ, Wang F, Zuo K, Yang CQ, Du YF, Ma YQ. Asynchronous checkpoint/restart based on memory buffer. Journal of Computer Research and Development, 2014,51(6):1229-1239(in Chinese with English abstract).[doi:10.7544/issn1000-1239.2014.20121125]
    [8] Gao Y, Gupta SK, Wang YZ, Pedram M. An energy-aware fault tolerance scheduling framework for soft error resilient cloud computing systems. In:Proc. of the Design, Automation and Test in Europe Conference and Exhibition (DATE 2014). Dresden:German Press, 2014. 1-6.[doi:10.7873/DATE.2014.107]
    [9] Hamid B, Radermacher A, Vanuxeem P, Lanusse A, Gerard S. A fault-tolerance framework for distributed component systems. In:Proc. of the 34th Euromicro Conf. Software Engineering and Advanced Applications (SEAA 2008). Parma:IEEE Press, 2008. 84-91.[doi:10.1109/SEAA.2008.50]
    [10] Nandi BB, Paul HS, Banerjee A, Ghosh SC. Fault tolerance as a service. In:Proc. of the 6th IEEE Int'l Conf. on Cloud Computing (CLOUD 2013). IEEE Press, 2013. 446-453.[doi:10.1109/CLOUD.2013.75]
    [11] Martin A, Smaneoto T, Dietze T, Brito A, Fetzer C. User-constraint and self-adaptive fault tolerance for event stream processing systems. In:Proc. of the 45th Annual IEEE/IFIP Int'l Conf. on Dependable Systems and Networks (DSN 2015). Brazil:IEEE Press, 2015. 462-473.[doi:10.1109/DSN.2015.56]
    [12] Nakkeeran MM. A survey on task checkpointing and replication based fault tolerance in grid computing. Int'l Research Journal of Engineering and Technology, 2015,2(9):832-838.
    [13] Wu XG. Minimum-cost based data replication strategy in cloud computing environment. Computer Science, 2014,41(10):154-159(in Chinese with English absract).[doi:10.11896/j.issn.1002-137X.2014.10.035]
    [14] Al-Karaki JN. Performance analysis of repairable cluster of workstations. In:Proc. of the 18th Int'l Parallel and Distributed Processing Symposium (IPDPS 2004). New Mexico:IEEE Press, 2004. 26-30.[doi:10.1109/IPDPS.2004.1303316]
    [15] Yuan S, Guo YB, Liu W. Research on voting algorithm in NMR and NVP system. Application Research of Computers, 2008,25(11):3463-3467(in Chinese with English abstract).[doi:10.3969/j.issn.1001-3695.2008.11.079]
    [16] LevitinG. Reliability and performance analysis for fault-tolerant programs consisting of versions with different characteristics. Reliability Engineering and System Safety, 2004,86:75-81.[doi:10.1016/j.ress.2004.01.002]
    [17] Imamura K, Heckendorn RB, Soule T, Foster JA. N-version genetic programming via fault masking. In:Proc. of the 5th European Conf. on Genetic Programming. Kinsale:Springer-Verlag, 2002. 172-181.[doi:10.1007/3-540-45984-7_17]
    [18] Wolter K. Stochastic Models for Fault Tolerance:Restart, Rejuvenation and Checkpointing. New York:Springer-Verlag. 2010. 1-20.[doi:10.1007/978-3-642-11257-7]
    [19] Calheiros RN, Ranjan R, Beloglazov A, Rose C, Buyya R. CloudSim:A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software Practice and Experience, 2011,41(1):23-50.[doi:10.1002/spe.995]
    [20] Lingo (home page). http://www.lingo.com/
    [21] Zheng ZB, Lyu MR. Fault tolerance management in cloud computing:Selecting an optimal fault tolerance strategy for reliable service-oriented system with local and global constraints. IEEE Trans. on Computers, 2015,64(1):219-232.[doi:10.1109/TC.2013.189]
    附中文参考文献:
    [7] 易会战,王锋,左克,杨灿群,杜云飞,马亚青.基于内存缓存的异步检查点容错技术.计算机研究与发展,2014,51(6):1229-1239.[doi:10.7544/issn1000-1239.2014.20121125]
    [13] 吴修国.云计算环境下面向最小成本的数据副本策略.计算机科学,2014,41(10):154-159.[doi:10.11896/j.issn.1002-137X.2014. 10.035]
    [15] 袁顺,郭渊博,刘伟.NMR及NVP系统中表决算法分析与研究.计算机应用研究,2008,25(11):3463-3467.[doi:10.3969/j.issn. 1001-3695.2008.11.079]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杨娜,刘靖.面向云应用系统的容错即服务优化提供方法.软件学报,2019,30(4):1191-1202

复制
分享
文章指标
  • 点击次数:2217
  • 下载次数: 3769
  • HTML阅读次数: 1349
  • 引用次数: 0
历史
  • 收稿日期:2016-09-08
  • 最后修改日期:2017-01-18
  • 在线发布日期: 2019-04-01
文章二维码
您是第19791800位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号