Fault-Torlerance Method for CPU-GPU Heterogeneous System
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In recent years, heterogeneous parallel architecture has become an important development trend of supercomputer because it mitigates the problem of increasingly high power consumption. As a high performance and power efficiency accelerator, GPU (graphics processing unit) has been extensively used in HPC (high performance computing) area. However, the inherent unreliability of the GPU hardware deteriorates the reliability of supercomputer. Presently, most research of FT (fault-tolerance) techniques for CPU-GPU heterogeneous system isolates the GPU from the system, and does FT work for it at the granularity of a single GPU invocation. This paper proposes a new Lazy FT method for CPU-GPU heterogeneous system, introduces a FT framework and its constraints based on directives, and demonstrates the validity of the Lazy FT method. The experimental results show that, compared with existing FT methods, the cost of LazyFT is very cheap.

    Reference
    Related
    Cited by
Get Citation

徐新海,杨学军,林宇斐,林一松,唐滔.一种面向CPU-GPU 异构系统的容错方法.软件学报,2011,22(10):2538-2552

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 28,2010
  • Revised:May 18,2011
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063