Survey of State-of-the-art Fault Tolerance for Distributed Graph Processing Jobs
Author:
Affiliation:

Fund Project:

Key-area Research and Development Program of Guangdong Province, China (2020B010164003)

  • Article
  • | |
  • Metrics
  • |
  • Reference [76]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    As the growth of graph data scale and complexity of graph processing, the trend of distributed graph processing shall be inevitable. However, graph processing jobs run with severe reliability problems caused by the uncertainty originated from inside and outside the distributed graph processing system. This study first analyzes the uncertainty factors of the distributed graph processing frameworks and the robustness of different types of graph processing jobs; then proposes an evaluation framework of fault tolerance for distributed graph processing based on cost, efficiency, and quality of fault tolerance. This study also analyzes, evaluates, and compares the four fault-tolerant mechanisms of distributed graph processing-checkpointing based fault tolerance, logging based fault tolerance, replication based fault tolerance, and algorithm compensation based fault tolerance-combining related researches. Finally, the direction of future researches is prospected.

    Reference
    [1] Sahu S, Mhedhbi A, Salihoglu S, Lin J, Özsu MT. The ubiquity of large graphs and surprising challenges of graph processing. Proc. of the VLDB Endowment, 2017,11(4):420-431.[doi:10.1145/3164135.3164139]
    [2] Sahu S, Mhedhbi A, Salihoglu S, Lin J, Özsu MT. The ubiquity of large graphs and surprising challenges of graph processing:extended survey. The VLDB Journal, 2019,29(2-3):595-618.[doi:10.1007/s00778-019-00548-x]
    [3] Salihoglu S, Ozsu MT. Response to "scale up or scale out for graph processing". IEEE Internet Computing, 2018,22(5):18-24.
    [4] Grzegorz MG, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel:A system for large-scale graph processing. In:Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data. New York:Association for Computing Machinery, 2010. 135-146.[doi:10.1145/1807167.1807184]
    [5] Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM. GraphLab:A new framework for parallel machine learning. In:Proc. of the 26th Conf. on Uncertainty in Artificial Intelligence. Catalina Island, 2010. 340-349.
    [6] Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM. Distributed GraphLab:A framework for machine learning and data mining in the cloud. Proc. of the VLDB Endowment, 2012,5(8):716-727.[doi:10.14778/2212351.2212354]
    [7] Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph:Distributed graph-parallel computation on natural graphs. In:Proc. of the 10th USENIX Conf. on Operating Systems Design and Implementation. USENIX Association, 2012. 17-30.
    [8] Giraph. http://giraph.apache.org
    [9] Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I. GraphX:Graph processing in a distributed dataflow framework. In:Proc. of the 11th USENIX Conf. on Operating Systems Design and Implementation. USENIX Association, 2014. 599-613.
    [10] Gelly. http://flink.iteblog.com/dev/libs/gelly
    [11] Egwutuoha IP, Levy D, Selic B, Chen S. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. The Journal of Supercomputing, 2013,65(3):1302-1326.[doi:10.1007/s11227-013-0884-0]
    [12] Novaković D, Vasić N, Novaković S, Kostić D, Bianchini R. DeepDive:Transparently identifying and managing performance interference in virtualized environments. In:Proc. of the 2013 USENIX Conf. on Annual Technical Conf. USENIX Association, 2013. 219-230.
    [13] Wang KJ, Jia T, Li Y. State-of-the-art survey of scheduling and resource management technology for colocation jobs. Ruan Jian Xue Bao/Journal of Software, 2020,31(10):3100-3119(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6066.htm[doi:10.13328/j.cnki.jos.006066]
    [14] Fried J, Ruan Z, Ousterhout A, Belay A. Caladan:Mitigating interference at microsecond timescale. In:Proc. of the 14th USENIX Symp. on Operating Systems Design and Implementation. USENIX Association, 2020. 281-297.
    [15] Bartlett J, Gray J, Horst B. Fault tolerance in Tandem computer systems. In:The Evolution of Fault-tolerant Computing. 1987, 55-76.
    [16] Bartlett W, Spainhower L. Commercial fault tolerance:A tale of two systems. IEEE Trans. on Dependable and Secure Computing, 2004,1(1):87-96.[doi:10.1109/TDSC.2004.4]
    [17] Borg A, Blau W, Graetsch W, Herrmann F, Oberle W. Fault tolerance under UNIX. ACM Trans. on Computer Systems, 1989, 7(1):1-24.[doi:10.1145/58564.58565]
    [18] Zhong H, Nieh J. CRAK:Linux checkpoint/restart as a kernel module. 2001. http://systems.cs.columbia.edu/files/wpid-cucs-014-01.pdf
    [19] Agarwal S, Garg R, Gupta MS, Moreira JE. Adaptive incremental checkpointing for massively parallel systems. In:Proc. of the 18th Annual Int'l Conf. on Supercomputing. New York:Association for Computing Machinery, 2004. 277-286.[doi:10.1145/1006209.1006248]
    [20] Hargrove PH, Duell JC. Berkeley laboratory checkpoint/restart (BLCR) for Linux clusters. Journal of Physics (Conf. Series), 2006, 46(1):494-9.[doi:10.1088/1742-6596/46/1/067]
    [21] Plank JS, Kai L. ICKP:A consistent checkpointer for multicomputers. IEEE Parallel & Distributed Technology:Systems & Applications, 1994,2(2):62-67.[doi:10.1109/88.311574]
    [22] Plank JS, Kai L, Puening MA. Diskless checkpointing. IEEE Trans. on Parallel and Distributed Systems, 1998,9(10):972-986.[doi:10.1109/71.730527]
    [23] Sankaran S, Squyres JM, Barrett B, Sahay V, Lumsdaine A, Duell J, Hargrove P, Roman E. The LAM/MPI checkpoint/restart framework:System-initiated checkpointing. The Int'l Journal of High Performance Computing Applications, 2005,19(4):479-493.[doi:10.1177/1094342005056139]
    [24] Zheng G, Ni X, Kalé LV. A scalable double in-memory checkpoint and restart scheme towards exascale. In:Proc. of the IEEE/IFIP Int'l Conf. on Dependable Systems and Networks Workshops (DSN 2012). Boston, 2012. 1-6.[doi:10.1109/DSNW.2012. 6264677]
    [25] Heidari S, Simmhan Y, Calheiros RN, Buyya R. Scalable graph processing frameworks:A taxonomy and open challenges. ACM Computing Surveys (CSUR), 2018,51(3):1-53.
    [26] Mccune RR, Weninger T, Madey G. Thinking like a Vertex:A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys (CSUR), 2015,48(2):1-39.
    [27] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM, 2008,51(1):107-113.
    [28] Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing. In:Proc. of the Presented as Part of the 9th USENIX Symp. on Networked Systems Design and Implementation (NSDI 12). 2012. 15-28.
    [29] Stutz P, Bernstein A, Cohen W. Signal/collect:Graph algorithms for the (semantic) Web. In:Patel-Schneider PF, et al. eds. Proc. of the Int'l Semantic Web Conf. (ISWC). Berlin:Springer-Verlag, 2010. 764-780.
    [30] Bronevetsky G, Marques D, Pingali K, Stodghill P. Automated application-level checkpointing of MPI programs. In:Proc. of the 9th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. New York:Association for Computing Machinery, 2003. 84-94.[doi:10.1145/781498.781513]
    [31] Beguelin A, Seligman E, Stephan P. Application level fault tolerance in heterogeneous networks of workstations. Journal of Parallel and Distributed Computing, 1997,43(2):147-155.
    [32] Dathathri R, Gill G, Hoang L, Pingali K. Phoenix:A substrate for resilient distributed graph analytics. In:Proc. of the 24th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. New York:Association for Computing Machinery, 2019. 615-630.[doi:10.1145/3297858.3304056]
    [33] Hoang L, Pontecorvi M, Dathathri R, Gill G, You B, Pingali K, Ramachandran V. A round-efficient distributed betweenness centrality algorithm. In:Proc. of the 24th Symp. on Principles and Practice of Parallel Programming (PPoPP 2019). New York:Association for Computing Machinery, 2019. 272-286.
    [34] Iyer AP, Liu Z, Jin X, Venkataraman S, Braverman V, Stoica I. ASAP:Fast, approximate graph pattern mining at scale. In:Proc. of the 13th USENIX Conf. on Operating Systems Design and Implementation. Carlsbad:USENIX Association, 2018. 745-761.
    [35] Zhang Y, Gao Q, Gao L, Wang C. Maiter:An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Trans. on Parallel and Distributed Systems, 2014,25(8):2091-2100.[doi:10.1109/TPDS.2013.235]
    [36] Wang Z, Gao L, Gu Y, Bao Y, Yu G. A fault-tolerant framework for asynchronous iterative computations in cloud environments. In:Proc. of the 7th ACM Symp. on Cloud Computing. New York:Association for Computing Machinery, 2016. 71-83.
    [37] Wang Z, Gao L, Gu Y, Bao Y, Yu G. A fault-tolerant framework for asynchronous iterative computations in cloud environments. IEEE Trans. on Parallel and Distributed Systems, 2018,29(8):1678-1692.
    [38] Avizienis A, Laprie JC, Randell B, Landwehr C. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. on Dependable and Secure Computing, 2004,1(1):11-33.
    [39] Poola D, Salehi MA, Ramamohanarao K, Buyya R. Chapter 15-A Taxonomy and Survey of Fault-tolerant Workflow Management Systems in Cloud and Distributed Computing Environments. Elsevier Inc., 2017. 285-320.
    [40] Salfner F, Lenk M, Malek M. A survey of online failure prediction methods. ACM Computing Surveys, 2010,42(3):1-42.
    [41] Wang Z, Gu Y, Bao Y, Yu G, Gao L. An I/O-efficient and adaptive fault-tolerant framework for distributed graph computations. Distributed and Parallel Databases, 2017,35(2):177-196.
    [42] Jhawar R, Piuri V, Santambrogio M. A comprehensive conceptual system-level approach to fault tolerance in cloud computing. In:Proc. of the 2012 IEEE Int'l Systems Conf. (SysCon 2012). Vancouver, 2012. 1-5.
    [43] Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad:Distributed data-parallel programs from sequential building blocks. In:Proc. of the 2nd ACM SIGOPS/EuroSys European Conf. on Computer Systems. New York:Association for Computing Machinery, 2007. 59-72.
    [44] Power R, Li J. Piccolo:Building fast, distributed programs with partitioned tables. In:Proc. of the 9th USENIX Conf. on Operating Systems Design and Implementation. 2010. 293-306.
    [45] Carbone P, Fóra G, Ewen S, Haridi S, Tzoumas K. Lightweight asynchronous snapshots for distributed dataflows. arXiv Preprint arXiv:1506.08603, 2015.
    [46] Garg R, Kumar P. A review of checkpointing fault tolerance techniques in distributed mobile systems. Int'l Journal on Computer Science and Engineering, 2010,2(4):1052-1063.
    [47] Bi YH, Jiang SY, Wang ZG, Leng FL, Bao YB, Yu G, Qian L. A multi-level fault tolerance mechanism for disk-resident Pregel-like systems. Journal of Computer Research and Development, 2016,53(11):2530-2541
    [48] Yan D, Cheng J, Chen H, Long C, Bangalore P. Lightweight fault tolerance in Pregel-like systems. In:Proc. of the 48th Int'l Conf. on Parallel Processing. New York:Association for Computing Machinery, 2019. 1-10
    [49] Xue J, Yang Z, Qu Z, Hou S, Dai Y. Seraph:An efficient, low-cost system for concurrent graph processing. In:Proc. of the 23rd Int'l Symp. on High-performance Parallel and Distributed Computing. 2014. 227-238.
    [50] Xu C, Holzemer M, Kaul M, Soto J, Markl V. On fault tolerance for distributed iterative dataflow processing. IEEE Trans. on Knowledge and Data Engineering, 2017,29(8):1709-1722.
    [51] Xu C, Holzemer M, Kaul M, Markl V. Efficient fault-tolerance for iterative graph processing on distributed dataflow systems. In:Proc. of the 32nd IEEE Int'l Conf. on Data Engineering (ICDE). 2016. 613-624.
    [52] Vora K, Tian C, Gupta R, Hu Z. CoRAL:Confined recovery in distributed asynchronous graph processing. In:Proc. of the 32nd Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. 2017. 223-236.
    [53] Elnozahy EN, Alvisi L, Wang YM, Johnson D. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys, 2002,34(3):375-408.
    [54] Lu W, Shen Y, Wang T, Zhang M, Jagadish HV, Du X. Fast failure recovery in vertex-centric distributed graph processing systems. IEEE Trans. on Knowledge and Data Engineering, 2019,31(4):733-746.
    [55] Shen Y, Chen G, Jagadish HV, Lu W, Ooi BC, Tudor BM. Fast failure recovery in distributed graph processing systems. Proc. of the VLDB Endowment, 2014,8(4):437-448.
    [56] Kaur J, Kinger S. Analysis of different techniques used for fault tolerance. Int'l Journal of Computer Science and Information Technologies, 2014,5(3):4086-4090.
    [57] Pundir M, Leslie LM, Gupta I, Campbell RH. Zorro:Zero-cost reactive failure recovery in distributed graph processing. In:Proc. of the 6th ACM Symp. on Cloud Computing. New York:Association for Computing Machinery, 2015. 195-208.
    [58] Wang P, Zhang K, Chen R, Chen H, Guan H. Replication-based fault-tolerance for large-scale graph processing. In:Proc. of the 44th Annual IEEE/IFIP Int'l Conf. on Dependable Systems and Networks. 2014. 562-573.[doi:10.1109/DSN.2014.58]
    [59] Chen R, Yao Y, Wang P, Zhang K, Guan H, Zang B, Chen H. Replication-based fault-tolerance for large-scale graph processing. IEEE Trans. on Parallel and Distributed Systems, 2018,29(7):1621-1635.
    [60] Presser D, Lung LC, Correia M. Greft:Arbitrary fault-tolerant distributed graph processing. In:Proc. of the 2015 IEEE Int'l Congress on Big Data. New York, 2015. 452-459.
    [61] Schelter S, Ewen S, Tzoumas K, Markl V. "All roads lead to Rome":Optimistic recovery for distributed iterative data processing. In:Proc. of the 22nd ACM Int'l Conf. on Information & Knowledge Management. 2013. 1919-1928.
    [62] Marcotte P, Gregoire F, Petrillo F. Multiple fault-tolerance mechanisms in cloud systems:A systematic review. In:Proc. of the 2019 IEEE Int'l Symp. on Software Reliability Engineering Workshops (ISSREW). Berlin, 2019. 414-421.
    [63] Gan Y, Zhang Y, Hu K, Cheng D, He Y, Pancholi M, Delimitrou C. SEER:Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In:Proc. of the 24th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. 2019. 19-33.
    [64] Mariappan M, Vora K. Graphbolt:Dependency-driven synchronous processing of streaming graphs. In:Proc. of the 14th EuroSys Conf. 2019. 1-16.
    [65] Sheng F, Cao Q, Cai H, Yao J, Xie C. GraPU:Accelerate streaming graph analysis through preprocessing buffered updates. In:Proc. of the ACM Symp. on Cloud Computing. 2018. 301-312.
    [66] Feng G, Ma Z, Li D, Zhu X, Cai Y, Han W, Chen W. RisGraph:A real-time streaming system for evolving graphs. arXiv Preprint arXiv:2004.00803, 2020.
    [67] Mcsherry F, Murray DG, Isaacs R, Isard M. Differential dataflow. In:Proc. of the 6th Biennial Conf. on Innovative Data Systems Research (CIDR 2013). 2013.
    [68] Ammar K, McSherry F, Salihoglu S, Joglekar M. Distributed evaluation of subgraph queries using worstcase optimal lowmemory dataflows. arXiv Preprint arXiv:1802.03760, 2018.
    [69] Peng N, Poon H, Quirk C, Toutanova K, Yih W. Cross-sentence n-ary relation extraction with graph LSTMS. Trans. of the Association for Computational Linguistics, 2017,5:101-115.
    [70] Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. arXiv Preprint arXiv:1710.10903, 2017.
    [71] Bresson X, Laurent T. Residual gated graph ConvNets. arXiv Preprint arXiv:1711.07553, 2017.
    [72] Ma L, Yang Z, Miao Y, Xue J, Wu M, Zhou L, Dai Y. Neugraph:Parallel deep neural network computation on large graphs. In:Proc. of the 2019 USENIX Annual Technical Conf. (USENIX ATC 19). 2019. 443-458.
    [73] Zhu R, Zhao K, Yang H, Lin W, Zhou C, Ai B, Li Y, Zhou J. AliGraph:A comprehensive graph neural network platform. Proc. of the VLDB Endowment, 2019,12(12):2094-2105.
    附中文参考文献:
    [13] 王康瑾,贾统,李影.在离线混部作业调度与资源管理技术研究综述.软件学报,2020,31(10):3100-3119. http://www.jos.org.cn/1000-9825/6066.htm[doi:10.13328/j.cnki.jos.006066]
    [47] 毕亚辉,姜苏洋,王志刚,冷芳玲,鲍玉斌,于戈,钱岭.面向磁盘驻留的类Pregel系统的多级容错处理机制.计算机研究与发展,2016,53(11):2530-2541.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张程博,李影,贾统.面向分布式图计算作业的容错技术研究综述.软件学报,2021,32(7):2078-2102

Copy
Share
Article Metrics
  • Abstract:2589
  • PDF: 6408
  • HTML: 4495
  • Cited by: 0
History
  • Received:September 15,2020
  • Revised:October 26,2020
  • Online: January 22,2021
  • Published: July 06,2021
You are the first2038061Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063