Abstract:As the growth of graph data scale and complexity of graph processing, the trend of distributed graph processing shall be inevitable. However, graph processing jobs run with severe reliability problems caused by the uncertainty originated from inside and outside the distributed graph processing system. This study first analyzes the uncertainty factors of the distributed graph processing frameworks and the robustness of different types of graph processing jobs; then proposes an evaluation framework of fault tolerance for distributed graph processing based on cost, efficiency, and quality of fault tolerance. This study also analyzes, evaluates, and compares the four fault-tolerant mechanisms of distributed graph processing-checkpointing based fault tolerance, logging based fault tolerance, replication based fault tolerance, and algorithm compensation based fault tolerance-combining related researches. Finally, the direction of future researches is prospected.