一种面向移动计算的低代价透明检查点恢复协议
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60273075 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.863-306-11-01-06 (国家高技术研究发展计划(863))


A Transparent Low-Cost Recovery Protocol for Mobile-to-Mobile Communication
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [29]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    移动计算系统中的检查点恢复协议面临着许多与传统分布式系统所不同的问题.在目前已出现的支持移动计算的检查点恢复机制中,基于建立全局一致的检查点的方法不能确保错误的独立恢复;基于m-MSS-m通信的消息日志方法其移动站之间交换的消息需通过移动基站的转发.提出了一种基于消息日志的支持移动站之间直接通信(m-m)的容错协议并给出了相应的算法及正确性证明.与m-MSS-m通信相比,m-m通信有利于降低信道冲突;减少消息传递延迟.仿真结果表明,所设计的协议比传统协议具有更小的无错误状态下引入负载和错误恢复时间.

    Abstract:

    Mobile computing brings new challenges and requirements for checkpointing and recovery protocol. Existing checkpointing-only schemes can not guarantee the independent recovery through creating global consistent checkpoints. Message logging schemes based on mobile-MSS-mobile communication that exchanges messages among mobile hosts may incur large contention on the wireless network and high latency for message transmission relative to the direct mobile host to mobile host (m-m) communication. This paper presents a novel recovery protocol for m-m communication, in which two key problems, message order and duplicate message, are effectively solved. A proof of the protocol correctness is also given. Finally, simulation results indicate that the performance of the proposed approach is better than that of the traditional approaches in terms of fail-free and recovery overhead.

    参考文献
    [1]Pradhan DK, Krishna P, Vaidya NH. Recovery in mobile environments design and trade-off analysis. In: Tohma Y, ed. Proc. of the 26th Int'l Symp. Fault-Tolerant Computing. Sendai: IEEE Press, 1996. 16-25.
    [2]Koo R, Touge S. Checkpoinging and rollback-recovery for distributed systems. IEEE Trans. on Software Engineering, 1987,13(1):23-31.
    [3]Kim JL, Park T. An efficient algorithm for checkpointing recovery in distributed systems. IEEE Trans. on Parallel and Distributed Systems, 1993,4(8):955-960.
    [4]Chandy KM, Lamport L. Distributed snapshots: Determining global states of distributed systems. ACM Trans. on Computer Systems, 1985,3(1):63-75.
    [5]Ramanathan P, Shin KG. Use of common time base for checkpointing and rollback recovery in a distributed system. IEEE Trans. on Software Engineering, 1993,19(6):571-583.
    [6]Elnozahy EN, Johnson DB. The performance of consistent checkpointing. In: Harris C, ed. In: Proc. of the 11th Symp. on Reliable Distributed Systems. Houston: IEEE Press, 1992. 86-95.
    [7]Silva LM, Silva JG. Global checkpointing for distributed programs. In: Harris C, ed. Proc. of the 11th Symposium on Reliable Distributed Systems. Houston: IEEE Press, 1992. 155-162.
    [8]Prakash R, Singhal M. Low-Cost checkpointing and failure recovery in mobile computing systems. IEEE Trans. on Parallel and Distributed Systems, 1996,7(10):1035-1048.
    [9]Manivannan D, Singhal M. Quasi-Synchronous checkpointing: Models, characterization and classification. IEEE Trans. on Parallel and Distributed Systems, 1999,10(7):703-713.
    [10]Guohong C, Singhal M. Mutable checkpoints: A new checkpointing aporach for mobile computing systems. IEEE Trans. on Parallel and Distributed Systems, 2001,12(2):157-172.
    [11]Wang YM. Maximum and minimum consistent global checkpoints and their applications. In: Sipple RS, ed. Proc. of the 14th Symp. on Reliable Distributed Systems. Bad Neuenahr: IEEE Press, 1995. 86-95.
    [12]Randell BL. System structure for software fault tolerance. IEEE Trans. on Software Engineering, 1975,1(2):16-25.
    [13]Wang YM, Fuchs WK. Lazy checkpoint coordination for bounding rollback propagation. In: Werner R, ed. Proc. of the 12th Symp. on Reliable Distributed Systems. Princeton: IEEE Press, 1993. 78-85.
    [14]Alvisi L, Marzullo K. Message logging: Pessimistic, optimistic, causal, and optimal. IEEE Trans. On Software Engineering, 1998,24(2):145-149.
    [15]Elnozahy EN, Zwaenepoe W. Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output commit. IEEE Trans. on Computers, 1992,41(5):526-531.
    [16]Yao B, Ssu KF, Fuchs WK. Message logging in mobile computing. In: Martin DC, ed. Proc. of the 29th Fault-Tolerant Computing Symp. Madison: IEEE Press, 1999. 14-19.
    [17]Park T, Yeom HY. An asynchronous recovery scheme based on optimistic message logging for mobile computing systems. In: Werner B, ed. Proc. of the 20th Int'l Conf. on Distributed Computing Systems. Taipei: IEEE Press, 2000. 436-433.
    [18]Venkatesan S. Optimistic crash recovery without changing application messages. IEEE Trans. On Parallel and Distributed Systems, 1997,8(3)263-271.
    [19]Rao S, Vin HM. The cost of recovery in message logging protocols. In: Palagi L, ed. Proc. Of the 17th Symp. On Reliable Distributed Systems. West Lafayette: IEEE Press, 1998.10-18.
    [20]Pei D, Wang DS, Shen MM, Zheng WM. WOB: A novel approach to checkpoint active files. Acta Electronica Sinica, 2000,28(5)-9-12 (in Chinese with English abstract).
    [21]Li KY, Yang XZ. Improving the performance of a checkpointing scheme with task duplication. Acta Electronica Sinica, 2000,28(5):33-35 (in Chinese with English abstract).
    [22]Wei XH, Ju JB. SFT: A consistent checkpointing algorithm with short freezing time. Chinese Journal of Computers, 1999,22(6): 645-650 (in Chinese with English abstract).
    [23]Wang DS, Shen MM, Zheng WM, Pei D. A checkpoint-based rollback recovery and processes migration system. Journal of Software, 1999,10(1):69-73 (in Chinese with English abstract).
    [24]Lamport,L. Time, clocks, and the ordering of events in distributed systems. Communications of the ACM, 1978,21(7):558-565.
    [25]Higaki H, Takizawa M. Checkpointing-Recovery protocol for reliable mobile systems. In: Palagi L, ed. Proc. of the 17th Symp. on Reliable Distributed Systems. West Lafayette: IEEE Press, 1998. 93-99.
    [26]裴丹,汪东升,沈美明,郑纬民.WOB:一种新的文件检查点设置策略.电子学报,2000,28(5):9-12.
    [27]李凯原,杨孝宗.提高用任务重复的检查点方案的性能.电子学报,2000,28(5):33-35.
    [28]魏晓辉,鞠九滨.SFT:一个具有较短冻结时间的一致检查点算法.计算机学报,1999,22(6):645-650.
    [29]汪东升,沈美明,郑纬民,裴丹.一种基于检查点的卷回恢复与进程迁移系统.软件学报,1999,10(1):69-73.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李庆华,蒋廷耀,张红君.一种面向移动计算的低代价透明检查点恢复协议.软件学报,2005,16(1):135-144

复制
分享
文章指标
  • 点击次数:4035
  • 下载次数: 5459
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2002-12-16
  • 最后修改日期:2003-11-10
文章二维码
您是第19938896位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号