Abstract:Extended graph-oriented distributed programming model (ExGOM) provides a system architecture to support dynamic configuration.Dynamic configuration involves system expansion and shrink during execution,upgrading while running,and reconfiguration after a fault occurs.One problem in reconfiguration is how to recover the system to the consistent states that exist just before the occurrence of faults.This paper is focused on this problem and proposes an asynchronous rollback algorithm and a crash recovery mechanism based on fault-sensitive graphs.The issue of multiple faulty processes on a single transient faulty host is addressed.Compared with other asynchronous rollback and recovery algorithms,the algorithm presented in this paper localizes the region of faults.Only fault-sensitive nodes are rolled back.This results in a minimized system overhead.