Abstract:The technology of cooperative checkpointing and rollback recovery as an effective method of fault tolerance, has been widely used on the parallel or distributed computer systems, such as cluster of computers. In order to reduce the overhead of time and space, a cooperative checkpointing algorithm based on message counting is given in this paper. While reducing a message complexity during synchronization from O(n2) to O(n), improving system's efficiency and scalability, this algorithm is also fit for those non-FIFO message passing systems.