Abstract:Application-Level checkpointing is a widely concerned technique used in large-scale scientific computing fields, and programmers to choose the appropriate place to save crucial data: henceforth, the fault-tolerant overhead can be reduced. There are two key issues in adopting this technique: find the proper place and reduce the scale of global checkpoints saving datum. The same problem is encountered when emerging heterogeneous systems with general purpose computation on GPUs. Towards architecture of heterogeneous system and characterization of application, this paper performs static analysis for the checkpointing configurations and placements, and two novelty approaches are proposed: ‘synchronous checkpoint placement’ and the ‘asynchronous checkpoint placement’. The placement problem of checkpoints can be mathematically modeled and solved. Finally, their performances are evaluated via conducting experiments.