Efficient Fault Tolerant Compilation: Compress Error Flow to Reduce Power and Enhance Performance
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [34]
  • |
  • Related [20]
  • |
  • Cited by [3]
  • | |
  • Comments
    Abstract:

    In many reliability-critical applications, computers are required to have higher performance, lower power dissipation and fault tolerance simultaneously. Traditional software fault tolerance uses a great deal of branch instructions to detect errors, thus brings great overhead in both performance and power dissipation. In this paper, an error flow model is suggested, and it is used to explain the algorithm of error flow compressing. In error flow compressing algorithm, branch instructions are reduced greatly, while total instructions remain the same. The simulated results on Wattch of FFT benchmark from project StreamIT show that compared with the traditional EDDI error detection algorithm, the EFC can reduce total branch instructions by over 24%, improve IPC by over 12%, and at the same time, reduce the power dissipation by nearly 5%, at loop parameter n=225. Further reasoning shows that the reduction of branch instructions can be as much as over 43% when there are 8 store instructions in the innermost iteration.

    Reference
    [1]2006.http://www.people.com.cn/GB/keji/25509/32328/
    [2]Oh N,Mitra S,McCluskey EJ.ED4I:Error detection by diverse data and duplicated instructions IEEE Trans.on Computers,2002,51(2):180-199.
    [3]Oh N.Software implemented hardware fault tolerance[Ph.D.Thesis].Stanford:Stanford University,2000.
    [4]Juan LA,Jose G,Antonio G.Power-Aware control speculation through selective throttling In:Proc.of the 9th Int'l Symp.on High-Performance Computer Architecture.Washington:IEEE Computer Society,2003.103.
    [5]Dharmesh P,Kevin S,Yan Z,Mircea S.Power-Aware Branch prediction:Characterization and design.IEEE Trans.on Computers,2004,53(2):168-186.
    [6]Oh N,Shirvani PP,McCluskey EJ.Error detection by duplicated instructions in super-scalar processors.IEEE Trans.on Reliability,2002,51(1):63-75.
    [7]Shirvani P.Fault tolerant computing for radiation environment[Ph.D.Thesis].Stanford:Stanford University,2001.
    [8]Huang KH,Abraham JA.Algorithm-Based fault tolerance for matrix operations IEEE Trans.on Computers,1984,33(6):518-528.
    [9]Maurizio R,Matteo SR,Massimo V,Marco T.A source-to-source compiler for generating dependable software In:Proc.of the 1st IEEE Int'l Workshop on Source Code Analysis and Manipulation Washington:IEEE Computer Society,2001.33-42.
    [10]Clark JA,Pradhan DK.Fault injection:A method for validating computer-system dependability.IEEE Computer,1995,28(6):47-56.
    [11]Ziegler JF.IBM experiments in soft fails in computer electronics (1978-1994).IBM Journal of Research and Development,1996,40(1):3-18.
    [12]Cheynet P,Nicolescu B,Velazco R,Rebaudengo M,Reorda MS,Violante M.Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors IEEE Trans.on Nuclear Science,2000,47(6):2231-2236.
    [13]Lyons RE,Vanderkulk W.The use of triple-modular redundancy to improve computer reliability.IBM Journal of Research and Development,1962,6(2):200-209.
    [14]Chen CL,Hsiao MY.Error-Correcting codes for semiconductor memory applications:A state-of-the-art review.IBM Journal of Research and Development,1984,28(2):124-134.
    [15]2006.http://www.stratus.com/
    [16]HP NonStop S88000,S78000,and S780 servers data sheet.Hewlett-Packard Development Company,2004.
    [17]Lu DJ.Watchdog processor and structural integrity checking.IEEE Trans.on Computers,1982,C-31(7):681-685.
    [18]Gomaa M,Scarbrough C,Vijaykumar TN,Pomeranz I.Transient-Fault recovery for chip multiprocessors.In:Proc.of the 30th Annual Int'l Symp.on Computer Architecture.New York:ACM Press,2003.98-109.
    [19]Vijaykumar T,Pomeranz I,Cheng K.Transient-Fault recovery using simultaneous multithreading.In:Proc.of the 29th Annual Int'l Symp.on Computer Architecture.Washington:IEEE Computer Society,2002.87-98.
    [20]Avizeinis A.The N-version approach to fault-tolerant software.IEEE Trans.on Software Engineering,1985,SE-11(12):1491-1501.
    [21]Randell B.System structure for software fault tolerance.IEEE Trans.on Software Engineering,1975,SE-1(2):220-223.
    [22]Alkalai L,Tai A,Chau S.COTS-Based fault tolerance in deep space:Qualitative and quantitative analyses of a bus network architecture.In:Proc.of the 4th IEEE Int'l Symp.on High Assurance.Washington:IEEE Computer Society,1999.97 -104.
    [23]Equils DJ.Method for enhancing the process of software tool evaluation and selection:COTS,heritage,and custom software reviewed.In:Proc.of the SpaceOps 2004.Pasadena:Jet Propulsion Laboratory,National Aeronautics and Space Administration,2004.1-10.
    [24]Liu P.Reliability Engineering Principles.Revised ed.,Beijing:Measurements Press,2002 (in Chinese).
    [25]Alpern B,Wegman MN,Zadeck FK.Detecting equality of values in programs.In:Proc.of the 15th ACM Symp.on Principles of Programming Languages.New York:ACM Press,1988.1-11.
    [26]Marc MB,Hanspeter M.Single-Pass generation of static single-assignment form for structured languages.ACM Trans.on Programming Languages and Systems,1994,16(6):1684-1698.
    [27]Burger DC,Austin TM.The SimpleScalar tool set,version 2.0.ACM SIGARCH Computer Architecture News,New York:ACM Press,1997,25(3):13-25.
    [28]Cliff Y,Michael DS.Static correlated branch prediction.ACM Trans.on Programming Languages and Systems,1999,21(5):1028-1075.
    [29]Wu Y,Larus JR.Static branch frequency and program profile analysis.In:Proc.of the 27th Annual Int'l Symp.on Microarchitecture.New York:ACM Press,1994.1-11.
    [30]Jason RC,Patterson.Accurate static branch prediction by value range propagation.In:Proc.of the ACM SIGPLAN'95 Conf.on Programming Language Design and Implementation.New York:ACM Press,1995.67-78.
    [31]2006.http://cag.csail.mit.edu/streamit
    [32]Brooks D,Tiwari V,Martonosi M.Wattch:A framework for architectural-level power analysis and optimizations.In:Proc.of the27th Annual Int'l Symp.on Computer Architecture.New York:ACM Press,2000.83-94.
    [33]Freescale Semiconductor Inc.MPC7447A RISC microprocessor hardware specifications.technical data.Chandler:Freescale Semiconductor Inc.,2005.
    [24]刘品.可靠性工程基础.修订版,北京:计量出版社,2002.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

高珑,杨学军.高性能低功耗的容错编译技术:错误流压缩算法.软件学报,2006,17(12):2425-2437

Copy
Share
Article Metrics
  • Abstract:4729
  • PDF: 5337
  • HTML: 0
  • Cited by: 0
History
  • Received:October 08,2005
  • Revised:February 23,2006
You are the first2032758Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063