Efficient Partial Redundancy Fault Tolerance Compilation: Replicating Critical Subgraph of Error Flow
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [21]
  • |
  • Related [20]
  • |
  • Cited by [2]
  • | |
  • Comments
    Abstract:

    Traditional fault tolerance compilations replicate all computations and registers to guarantee fault tolerance.But this brought great overhead in both storage utilization and performance.This paper suggestes a new concept of critical subgraph of error flow graph based on error flow analyses.Methods are given to generate critical subgraphs from critical nodes or from critical paths,and partial redundancy algorithm is suggested to replicate only critical subgraph.Partial redundancy algorithm guarantees effective fault tolerance,and greatly improves performance,reduces power dissipations and reduces storage usage.Experimental results show that,compared with full redundancy which replicates full error flow graph,partial redundancy can reduce register usage by 6.25%, reduce power dissipation by over 17%,reduces total execution cycles by nearly 26%,and improves performance by over 22%,at the cost of 6.25% lower nodes coverage.

    Reference
    [1]Some RR,Ngo DC.REE:A COTS-based fault tolerant parallel processing supercomputer for spacecraft onboard scientific data analysis.In:Proc.of the 18th Digital Avionics Systems Conf.,Vol.2.St.Louis:IEEE CS,1999.7.B.3-1-7.B.3-12.
    [2]Madeira H,Some RR,Moreira F,Costa D,Rennels D.Experimental evaluation of a COTS system for space applications.In:Proc.of the Int'l Conf.on Dependable Systems and Networks (DSN 2002).Bethesda:IEEE CS,2002.325-330.
    [3]Katz DL,Springer PL,Granat R,Turmon M.Applications development for a parallel COTS spaceborne computer.In:Proc.of the 3rd High Performance Embedded Computing (HPEC'99).Lexington:IEEE CS,Lincoln Laboratory,MIT,1999.
    [4]Oh N.Software implemented hardware fault tolerance[Ph.D.Thesis].Stanford:Stanford University,2000.
    [5]Shirvani P.Fault tolerant computing for radiation environment[Ph.D.Thesis].Stanford:Stanford University,2001.
    [6]Oh N,Shirvani PP,McCluskey EJ.Error detection by duplicated instructions in super-scalar processors.IEEE Trans.on Reliability,2002,51(1):63-75.
    [7]Maurizio R,Matteo SR,Massimo V,Marco T.A source-to-source compiler for generating dependable software.In:Proc.of the 1st IEEE Int'l Workshop on Source Code Analysis and Manipulation.Washington:IEEE Computer Society,2001.33-42.
    [8]Gao L,Yang XJ,Efficient fault tolerant compilation:Compress error flow to reduce power and enhance performance.Journal of Software,2006,17(12):2425-2437 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/17/2425.htm
    [9]Gao L,Yang XJ.Error flow model:Error propagation modeling and analysis based on computational data flow model.Journal of Software,2007,18(4):808-820 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/18/808.htm
    [10]Gao L,Wang ZY,Jia J,Yang XJ.Promote performance of highly reliable space computer by software implemented hardware fault tolerance based on COTS.Journal of Computer Research and Development,2007,44(Suppl.):133-139 (in Chinese with English abstract)
    [11]Avizeinis A.The N-version approach to fault-tolerant software.IEEE Trans.on Software Engineering,1985,SE-11(12):1491-1501.
    [12]Randell B.System structure for software fault tolerance.IEEE Trans.on Software Engineering,1975,SE-1(2):220-223.
    [13]Huang KH,Abraham JA.Algorithm-Based fault tolerance for matrix operations.IEEE Trans.on Computers,1984,33(6):518-528.
    [14]Oh N,Mitra S,McCluskey EJ.ED4I:Error detection by diverse data and duplicated instructions.IEEE Trans.on Computers,2002,51(2):180-199.
    [15]Burger DC,Austin TM.The SimpleScalar tool set,version 2.0.ACM SIGARCH Computer Architecture News,1997,25(3):13-25.
    [16]http://cag.csail.mit.edu/streamit
    [17]Brooks D,Tiwari V,Martonosi M.Wattch:A framework for architectural-level power analysis and optimizations.In:Proc.of the 27th Annual Int'l Symp.on Computer Architecture (ISCA 2000).Vancouver:IEEE CS,2000.83-94.
    [18]Freescale Semiconductor Inc.MPC7447A RISC microprocessor hardware specifications.Technical Data,Rev.3,08/2005,Chandler:Freescale Semiconductor Inc.,2005.
    [8]高珑,杨学军.高效的容错编译技术:通过压缩错误流减小分支指令的功耗和性能开销.软件学报,2006,17(12):2425-2437.http://www.jos.org.cn/1000-9825/17/2425.htm
    [9]高珑,杨学军.错误流模型:基于计算数据流模型的错误传播建模与分析.软件学报,2007,18(4):808-820.http://www.jos.org.cn/ 1000-9825/18/808.htm
    [10]高珑,王之元,贾佳,杨学军.通过基于COTS 器件的软件容错技术提高空间高可靠计算机的性能.计算机研究与发展,2007,44(增刊):133-139.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

高珑,王之元,杨学军.高效的部分冗余容错编译:复制错误流关键子图.软件学报,2007,18(9):2105-2116

Copy
Share
Article Metrics
  • Abstract:4193
  • PDF: 5032
  • HTML: 0
  • Cited by: 0
History
  • Received:July 20,2006
  • Revised:July 20,2006
You are the first2034272Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063