一种基于QoS的自适应网格失效检测器
作者:
基金项目:

Supported by the Defense Pre-Research Project of the 'Tenth Five-Year-Plan' of China under Grant No.41312.1.2 (国家"十五"国防预研项目); the Defense Pre-Research Foundation of China under Grant No.514160401HT0151 (国防预研基金项目)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [26]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    失效检测器是构建可靠的网格计算环境所必需的基础组件之一.由于网格中存在大量对失效检测有着不同QoS需求的分布式应用,对于一个网格失效检测器来说,为保持其有效性和可扩展性,应该既能够准确提供应用程序所需的失效检测QoS,又能够避免为满足不同QoS而设计多套失效检测器所产生的多余负载.基于QoS基本评价指标,采用PULL模式主动检测策略实现了一种新的失效检测器--GA-FD(adaptive failure detector for grid),可以同时支持多个应用程序定量描述的QoS需求,不需要关于消息行为和时钟同步的任何假设.同时,证明了GA-FD在部分同步模型下可实现一个◇P类的失效检测器,并给出了相应的实验及数据.

    Abstract:

    Task scheduling is a fundamental issue in achieving high performance in grid computing systems. However, it is a big challenge for efficient scheduling algorithm design and implementation. In this paper, the problem of scheduling independent tasks on tree-based grid computing platforms, where resources have different speeds of computation and communication, is discussed. In contrast to minimizing the total execution time, which is NP-hard in most formulations, an integer linear programming model for this problem is presented. Using the model, the optimal scheduling scheme that determines the optimal number of tasks assigned to each computing node is obtained. With the optimal scheduling scheme, two demand-driven and dynamic heuristic algorithms for task allocation are proposed: OPCHATA (optimization-based priority-computation heuristic algorithm for task allocation) and OPBHATA(optimization-based priority-bandwidth heuristic algorithm for task allocation). The experimental results show that the proposed algorithms for the scheduling problem obtain better performance than other algorithms.

    参考文献
    [1]Jin H,Zou DQ,Chen HH,Sun JH,Wu S.Fault-Tolerant grid architecture and practice.Journal of Computer Science and Technology,2003,18(4):423-433.
    [2]Horita Y,Taura K,Chikayama T.A scalable and efficient self-organizing failure detector for grid applications.In:Katz DS,ed.Proc.of the 6th IEEE/ACM Int'l Workshop on Grid Computing.Washington:IEEE CS Press,2005.202-210.
    [3]Fischer MJ,Lynch NA,Paterson MS.Impossibility of distributed consensus with one faulty process.Journal of the ACM,1985,32(2):374-382.
    [4]Chandra TD,Toueg S.Unreliable failure detectors for reliable distributed systems.Journal of the ACM,1996,43(2):225-267.
    [5]Jain A,Shyamasundar RK.Failure detection and membership management in grid environments.In:Buyya R,ed.Proc.of the 5th IEEE/ACM Int'l Workshop on Grid Computing.Pittsburgh:IEEE Computer Society Press,2004.44-52.
    [6]Zhang YH,Wang DS,Zheng WM.The automatic reconfiguration of COW.Acta Electronica Sinica,2000,28(5):13-16 (in Chinese with English abstract).
    [7]Braden R,ed.Requirements for internet hosts-communication layers.RFC 1122,1989.http://www.ietf.org/rfc/rfc1122.txt
    [8]Gupta I,Chandra TD,Goldszmidt GS.On scalable and efficient distributed failure detectors.In:Kshemkalyani A,Shavit N,eds.Proc.of the 20th Symp.on Principles of Distributed Computing (PODC 2001).New York:ACM Press,2001.170-179.
    [9]Chen W,Toueg S,Aguilera MK.On the quality of service of failure detectors.IEEE Trans.on Computers,2002,51(5):561-580.
    [10]Foster I,Kesselman C,Tuecke S.The anatomy of the grid.Int'l Journal of High Performance Computing Applications,2001,15(3):200-222.
    [11]Stelling P,Foster I,Kesselman C,Lee C,von Laszewski G.A fault detection service for wide area distributed computations.In:Schmidt D,ed.Proc.of the 7th IEEE Symp.on High Performance Distributed Computing.Chicago:IEEE Computer Society Press,1998.268-278.
    [12]van Renesse R,Minsky Y,Hayden M.A gossip-style failure detection service.In:Davies N,Raymond K,Seitz J,eds.Proc.of the IFIP Int'l Conf.on Distributed Systems Platforms and Open Distributed Processing Middleware.New York:Springer-Verlag,1998.55-70.
    [13]Bertier M,Marin O,Sens P.Implementation and performance evaluation of an adaptable failure detector.In:Martin DC,ed.Proc.of the 15th Int'l Conf.on Dependable Systems and Networks.Bethesda:IEEE CS Press,2002.354-363.
    [14]Chen NJ,Wei J,Yang B,Huang T.Adaptive failure detection in Web application server.Journal of Software,2005,16(11):1929-1938 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/16/1929.htm
    [15]Hayashibara N,Défago X,Yared R,Katayama T.The (-accrual failure detector.In:Titsworth FM,ed.IEEE Int'l Symp.on Reliable Distributed Systems (SRDS 2004).Florianopolis:IEEE Computer Society Press,2004.66-78.
    [16]Sotoma I,Madeira ERM.Adaptation-Algorithms to adaptive fault monitoring and their implementation on CORBA.In:Blair G,Schmidt D,Tari Z,eds.Int'l Symp.on Distributed-objects and Applications (DOA 2001).Rome:IEEE Computer Society Press,2001.219-228.
    [17]Fetaer C,Raynal M,Tronel F.An adaptive failure detection protocol.In:Williams AD,ed.Proc.of the 2001 Pacific Rim Int'l Symp.on Dependable Computing.Seoul:IEEE Computer Society Press,2001.146-153.
    [18]Tian D,Chen SY,Li J.Novel adaptive failure detector for distributed systems.Journal of Harbin Institute of Technology,2006,38(Suppl.):374-377 (in Chinese with English abstract).
    [19]Hayashibara N,Défago X,Katayama T.Two-Ways adaptive failure detection with the (-failure detector.In:Fich FE,ed.Proc.of the Workshop on Adaptive Distributed Systems (WADiS).Sorrento:Springer-Verlag,2003.22-27.
    [20]Shi XH,Jin H,Han ZF,Qiang WZ,Wu S,Zou DQ.ALTER:Adaptive failure detection services for grid.In:Cantarella JD,ed.Proc.of the IEEE Int'l Conf.on Services Computing.Orlando:IEEE CS Press,2005.355-358.
    [21]Falai L,Bonvadalli A.Experimental evaluation of the QoS of failure detectors on wide area network.In:Tsuchiya T,ed.Proc.of the Int'l Conf.on Dependable Systems and Networks (DSN 2005).Yokohama:IEEE CS Press,2005.624-633.
    [22]Aguilera MK,Delporte-Gallet C,Fauconnier H,Toueg S.On implementing omega with weak reliability and synchrony assumptions.In:Borowsky E,ed.Proc.of the 22nd ACM Symp.on Principles of Distributed Computing.Boston:ACM Press,2003.306-314.
    [23]Muller M.Performance evaluation of a failure detector using SNMP.Technical Report,LSR-REPORT-2004-034,Lausanne:école Polytechnique Fédérale de Lausanne,2004.12-24.
    [6]张悠慧,汪东升,郑纬民.工作站机群系统自动重构机制.电子学报,2000,28(5):13-16.
    [14]陈宁江,魏峻,杨波,黄涛.Web应用服务器的适应性失效检测.软件学报,2005,16(11):1929-1938.http://www.jos.org.cn/ 1000-9825/16/1929.htm
    [18]田东,陈蜀宇,李静.一种新的分布式系统自适应故障检测器.哈尔滨工业大学学报,2006,38(增刊):374-377.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

董剑,左德承,刘宏伟,杨孝宗.一种基于QoS的自适应网格失效检测器.软件学报,2006,17(11):2362-2372

复制
分享
文章指标
  • 点击次数:7532
  • 下载次数: 7670
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2006-06-06
  • 最后修改日期:2006-08-07
文章二维码
您是第20251597位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号