分布式系统动态测试技术研究综述
作者:
通讯作者:

姜宇,E-mail:jiangyu198964@126.com

中图分类号:

TP311

基金项目:

国家重点研发计划(2022YFB3104000)


Survey on Dynamic Testing Technologies for Distributed Systems
Author:
Fund Project:

National Key Research and Development Project (2022YFB3104000);

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [109]
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    分布式系统是当今计算生态系统的支柱, 它使得现代计算更加强大、可靠和灵活, 覆盖了从云计算、大数据处理到物联网等多个关键领域. 然而, 由于系统的复杂性, 分布式系统在代码实现过程中总是不可避免地引入一些代码缺陷, 从而对系统的可用性、鲁棒性以及安全性造成巨大威胁. 因此, 分布式系统的测试以及缺陷挖掘工作十分重要. 动态测试技术在系统运行中进行实时分析, 以挖掘其缺陷, 评估其行为和功能, 被广泛用于各种系统应用的缺陷检测中, 并成功发现了许多代码缺陷. 首先提出了分布式系统4层缺陷威胁模型, 并基于它分析了分布式系统测试需求与主要挑战, 提出了对分布式系统进行动态测试的一般框架. 从挖掘不同类型系统缺陷的角度介绍了典型的分布式系统动态测试工具. 总结了包括不同维度测试输入生成、系统关键状态感知、缺陷判定准则构建在内的分布式动态测试的关键技术. 对当前主流分布式系统动态测试工具的覆盖率和缺陷发现能力进行了评估, 从初步实验结果中可以看出多维度测试输入技术能有效提高分布式系统测试效率. 最后, 讨论了分布式系统动态测试的新趋势以及可能的未来发展方向.

    Abstract:

    Distributed systems are the pillars of the current computing ecosystem, which make modern computing more powerful, reliable, and flexible, covering several key fields from cloud computing and big data processing to the Internet of Things. However, due to the complexity of the system, some code defects are inevitably introduced during the code implementation of distributed systems, thus posing a huge threat to the availability, robustness, and security of the system. Therefore, the testing and defect detection work of distributed systems is very important. Dynamic testing technology conducts real-time analysis during the system operation to detect its defects and evaluate its behavior and functions, and is widely used in the defect detection of various system applications and has successfully found many code defects. A four-layer defect threat model of distributed systems is proposed in this study. Based on it, the testing requirements and main challenges of distributed systems are analyzed, and a general framework for dynamic testing of distributed systems is proposed. Then, typical dynamic testing tools for distributed systems are introduced from the perspective of detecting different types of system defects. Next, the study highlights critical techniques such as multidimensional test input generation, system-critical state awareness, and defect judgment criteria. Additionally, the paper reviews popular dynamic testing tools and evaluates their effectiveness in defect discovery and test coverage. The coverage and defect discovery capabilities of the current mainstream dynamic testing tools for distributed systems are evaluated. The findings show that multidimensional input generation significantly enhances testing efficiency. Finally, the study discusses emerging trends and future directions in dynamic testing of distributed systems, aiming to address their inherent challenges and improve testing outcomes.

    参考文献
    [1] Ahmed W, Wu YW. A survey on reliability in distributed systems. Journal of Computer and System Sciences, 2013, 79(8): 1243–1255.
    [2] Orgerie AC, de Assuncao MD, Lefevre L. A survey on techniques for improving the energy efficiency of large-scale distributed systems. ACM Computing Surveys (CSUR), 2014, 46(4): 47.
    [3] 张晓丽, 杨家海, 孙晓晴, 吴建平. 分布式云的研究进展综述. 软件学报, 2018, 29(7): 2116–2132. http://www.jos.org.cn/1000-9825/5555.htm
    Zhang XL, Yang JH, Sun XQ, Wu JP. Survey of geo-distributed cloud research progress. Ruan Jian Xue Bao/Journal of Software, 2018, 29(7): 2116–2132 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5555.htm
    [4] Martinho C, Strickx T. Understanding how Facebook disappeared from the Internet. 2024. https://blog.cloudflare.com/october-2021-facebook-outage/
    [5] ZooKeeper. Support new configuration syntax for resilient network feature. 2024. https://issues.apache.org/jira/browse/ZOOKEEPER-3189
    [6] Legay A, Delahaye B, Bensalem S. Statistical model checking: An overview. In: Proc. of the 1st Int’l Conf. on Runtime Verification. St. Julians: Springer, 2010. 122–135. [doi: 10.1007/978-3-642-16612-9_11]
    [7] Sinha R, Patil S, Gomes L, Vyatkin V. A survey of static formal methods for building dependable industrial automation systems. IEEE Trans. on Industrial Informatics, 2019, 15(7): 3772–3783.
    [8] Umar MA, Chen ZF. A comparative study of dynamic software testing techniques. Int’l Journal of Advanced Networking and Applications, 2020, 12(3): 4575–4584.
    [9] Cimatti A, Clarke E, Giunchiglia F, Roveri M. NuSMV: A new symbolic model verifier. In: Proc. of the 11th Int’l Conf. on Computer Aided Verification. Trento: Springer, 1999. 495–499. [doi: 10.1007/3-540-48683-6_44]
    [10] Kwiatkowska M, Norman G, Parker D. PRISM: Probabilistic symbolic model checker. In: Proc. of the 12th Int’l Conf. on Computer Performance Evaluation: Modelling Techniques and Tools. London: Springer, 2002. 200–204. [doi: 10.1007/3-540-46029-2_13]
    [11] Holzmann G, Najm E, Serhrouchni A. SPIN model checking: An introduction. Int’l Journal on Software Tools for Technology Transfer, 2000, 2(4): 321–327.
    [12] Chen J. On using static analysis in distributed system testing. In: Proc. of the 2nd Int’l Workshop on Engineering Distributed Objects. Davis: Springer, 2000. 145–162. [doi: 10.1007/3-540-45254-0_13]
    [13] Lenarduzzi V, Saarimaki N, Taibi D. On the diffuseness of code technical debt in Java projects of the apache ecosystem. In: Proc. of the 2019 IEEE/ACM Int’l Conf. on Technical Debt (TechDebt). Montreal: IEEE, 2019. 98–107. [doi: 10.1109/TechDebt.2019.00028]
    [14] Imtiaz N, Murphy B, Williams L. How do developers act on static analysis alerts? An empirical study of coverity usage. In: Proc. of the 30th Int’l Symp. on Software Reliability Engineering. Berlin: IEEE, 2019. 323–333. [doi: 10.1109/ISSRE.2019.00040]
    [15] Meyer FJ, Pradhan DK. Dynamic testing strategy for distributed systems. IEEE Trans. on Computers, 1989, 38(3): 356–365.
    [16] Ulrich A, K?nig H. Architectures for testing distributed systems. In: Csopaki G, Dibuz S, Tarnay K, eds. Testing of Communicating Systems: Methods and Applications. Boston: Springer, 1999. 93–108. [doi: 10.1007/978-0-387-35567-2_7]
    [17] Torens C, Ebrecht L. RemoteTest: A framework for testing distributed systems. In: Proc. of the 5th Int’l Conf. on Software Engineering Advances. Nice: IEEE, 2010. 441–446. [doi: 10.1109/ICSEA.2010.75]
    [18] Lefeuvre H, B?doiu VA, Chien Y, Huici F, Dautenhahn N, Olivier P. Assessing the impact of interface vulnerabilities in compartmentalized software. In: Proc. of the 30th Network and Distributed System Security. San Diego: NDSS, 2023. [doi: 10.14722/ndss.2023.24117]
    [19] Han S, Shin KG, Rosenberg HA. DOCTOR: An integrated software fault injection environment for distributed real-time systems. In: Proc. of the 1995 IEEE Int’l Computer Performance and Dependability Symp. Erlangen: IEEE, 1995. 204–213. [doi: 10.1109/IPDS.1995.395831]
    [20] Lu J, Liu C, Li L, Feng XB, Tan F, Yang J, You L. CrashTuner: Detecting crash-recovery bugs in cloud systems via meta-info analysis. In: Proc. of the 27th ACM Symp. on Operating Systems Principles. Huntsville: ACM, 2019. 114–130. [doi: 10.1145/3341301.3359645]
    [21] Chen YL, Ma FC, Zhou YH, Gu M, Liao Q, Jiang Y. Chronos: Finding timeout bugs in practical distributed systems by deep-priority fuzzing with transient delay. In: Proc. of the 2024 IEEE Symp. on Security and Privacy. San Francisco: IEEE, 2024. 1939–1955. [doi: 10.1109/SP54263.2024.00109]
    [22] 刘强, 崔莉, 陈海明. 物联网关键技术与应用. 计算机科学, 2010, 37(6): 1–4, 10.
    Liu Q, Cui L, Chen HM. Key technologies and applications of Internet of Things. Computer Science, 2010, 37(6): 1–4, 10 (in Chinese with English abstract).
    [23] Lahami M, Krichen M. A survey on runtime testing of dynamically adaptable and distributed systems. Software Quality Journal, 2021, 29(2): 555–593.
    [24] Gosain A, Sharma G. A survey of dynamic program analysis techniques and tools. In: Satapathy SC, Biswal BN, Udgata SK, Mandal JK, eds. Proc. of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications. Cham: Springer, 2015. 113–122. [doi: 10.1007/978-3-319-11933-5_13]
    [25] Sternberg RJ, Grigorenko EL. All testing is dynamic testing. Issues in Education, 2001, 7(2): 137–170.
    [26] Fairley RE. Tutorial: Static analysis and dynamic testing of computer software. Computer, 1978, 11(4): 14–23.
    [27] 张健, 张超, 玄跻峰, 熊英飞, 王千祥, 梁彬, 李炼, 窦文生, 陈振邦, 陈立前, 蔡彦. 程序分析研究进展. 软件学报, 2019, 30(1): 80–109. http://www.jos.org.cn/1000-9825/5651.htm
    Zhang J, Zhang C, Xuan JF, Xiong YF, Wang QX, Liang B, Li L, Dou WS, Chen ZB, Chen LQ, Cai Y. Recent progress in program analysis. Ruan Jian Xue Bao/Journal of Software, 2019, 30(1): 80–109 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5651.htm
    [28] Sun XD, Cheng RX, Chen JY, Ang E, Legunsen, Xu TY. Testing configuration changes in context to prevent production failures. In: Proc. of the 14th USENIX Symp. on Operating Systems Design and Implementation. USENIX Association, 2020. 42.
    [29] Wang T, Jia ZY, Li SS, Zheng S, Yu Y, Xu EC, Peng SL, Liao XK. Understanding and detecting on-the-fly configuration bugs. In: Proc. of the 45th Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 628–639. [doi: 10.1109/ICSE48619.2023.00062]
    [30] Chen QR, Wang T, Legunsen O, Li SS, Xu TY. Understanding and discovering software configuration dependencies in cloud and datacenter systems. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. ACM, 2020. 362–374. [doi: 10.1145/3368089.3409727]
    [31] Hoffmann A, Neubauer B. Deployment and configuration of distributed systems. In: Proc. of the 4th Int’l SDL and MSC Workshop on System Analysis and Modeling. Ottawa: Springer, 2005. 1–16. [doi: 10.1007/978-3-540-31810-1_1]
    [32] Oppenheimer D. The importance of understanding distributed system configuration. In: Proc. of the 2003 Conf. on Human Factors in Computer Systems Workshop. Ft. Lauderdale: ACM Press, 2003.
    [33] Barroso LA, H?lzle U, Ranganathan P. The Datacenter as A Computer: Designing Warehouse-scale Machines. 3rd ed., Cham: Springer, 2019. 189. [doi: 10.1007/978-3-031-01761-2]
    [34] Shieber J. Facebook outage. 2024. https://techcrunch.com/2019/03/14/facebook-blames-a-misconfigured-server-for-yesterdays-outage/
    [35] Gunawi HS, Hao MZ, Suminto RO, Laksono A, Satria AD, Adityatama J, Eliazar KJ. Why does the cloud stop computing? Lessons from hundreds of service outages. In: Proc. of the 7th ACM Symp. on Cloud Computing. Santa Clara: ACM, 2016. 1–16. [doi: 10.1145/2987550.2987583]
    [36] Liu HP, Lu S, Musuvathi M, Nath S. What bugs cause production cloud incidents? In: Proc. of the Workshop on Hot Topics in Operating Systems. Bertinoro: ACM, 2019. 155–162. [doi: 10.1145/3317550.3321438]
    [37] Nagaraja K, Oliveira F, Bianchini R, Martin RP, Nguyen TD. Understanding and dealing with operator mistakes in internet services. In: Proc. of the 6th Symp. on Operating Systems Design & Implementation. San Francisco: USENIX Association, 2004. 61–76.
    [38] Oppenheimer D, Ganapathi A, Patterson DA. Why do Internet services fail, and what can be done about it? In: Proc. of the 4th Conf. on USENIX Symp. on Internet Technologies and Systems. Seattle: USENIX Association, 2003. 1. [doi: 10.5555/1251460.1251461]
    [39] Yuan D, Luo Y, Zhuang X, Rodrigues GR, Zhao X, Zhang YL, Jain PU, Stumm M. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In: Proc. of the 11th USENIX Symp. on Operating Systems Design and Implementation. Broomfield: USENIX Association, 2014. 249–265. [doi: 10.5555/2685048.2685068]
    [40] Hale B. Why every IT practitioner should care about network change & configuration management. SolarWinds, 2012.
    [41] Cloudflare. 轰动一时的DDoS攻击: 有史以来规模最大的DDoS攻击. 2024. https://www.cloudflare.com/zh-cn/learning/ddos/famous-ddos-attacks/
    Cloudflare. Famous DDoS attacks in history. 2024 (in Chinese). https://www.cloudflare.com/zh-cn/learning/ddos/famous-ddos-attacks/
    [42] Huang CF, Tseng YC, Wu HL. Distributed protocols for ensuring both coverage and connectivity of a wireless sensor network. ACM Trans. on Sensor Networks (TOSN), 2007, 3(1): 5–es.
    [43] Porambage P, Schmitt C, Kumar P, Gurtov A, Ylianttila M. Two-phase authentication protocol for wireless sensor networks in distributed IoT applications. In: Proc. of the 2014 IEEE Wireless Communications and Networking Conf. Istanbul: IEEE, 2014. 2728–2733.
    [44] Zhong WY, Yang C, Liang W, Cai JH, Chen L, Liao J, Xiong NX. Byzantine fault-tolerant consensus algorithms: A survey. Electronics, 2023, 12(18): 3801.
    [45] Distler T. Byzantine fault-tolerant state-machine replication from a systems perspective. ACM Computing Surveys (CSUR), 2021, 54(1): 24.
    [46] Abd-El-Malek M, Ganger GR, Goodson GR, Reiter MK, Wylie JJ. Fault-scalable Byzantine fault-tolerant services. ACM SIGOPS Operating Systems Review, 2005, 39(5): 59–74.
    [47] CVE. CVE-2021-39137. 2024. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-39137
    [48] CVE. CVE-2020-26265. 2024. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-26265
    [49] da Silva FSC. Knowledge-based interaction protocols for intelligent interactive environments. Knowledge and Information Systems, 2013, 34(1): 219–242.
    [50] Amazon Simple Storage Service (S3) Team. Summary of the amazon S3 service disruption in the northern Virginia (US-EAST-1) region. 2024. https://aws.amazon.com/cn/message/41926
    [51] Ulrich A, K?nig H. Architectures for testing distributed systems. In: Testing of Communicating Systems: Methods and Applications. Springe, 1999. 93–108.
    [52] Yang JF, Chen TS, Wu M, Xu ZL, Liu XZ, Lin HX, Yang M, Long F, Zhang LT, Zhou LD. MODIST: Transparent model checking of unmodified distributed systems. In: Proc. of the 6th USENIX Symp. on Networked Systems Design and Implementation. Boston: USENIX Association, 2009. 213–228. [doi: 10.5555/1558977.1558992]
    [53] Gupta D, Vishwanath KV, McNett M, Vahdat A, Yocum K, Snoeren A, Voelker GM. DieCast: Testing distributed systems with an accurate scale model. ACM Trans. on Computer Systems (TOCS), 2011, 29(2): 4.
    [54] RUBiS. 2024. http://www.rubis.com.cn/
    [55] Panasas. 2024. https://www.panasas.com/wp-content/uploads/2020/10/Panasas_Product_Brief_PanFS.pdf
    [56] Wang Y, Kapritsos M, Schmidt L, Dahlin M. Exalt: Empowering researchers to evaluate large-scale storage systems. In: Proc. of the 11th USENIX Symp. on Networked Systems Design and Implementation. Seattle: USENIX Association, 2014. 129–141. [doi: 10.5555/2616448.2616461]
    [57] Leesatapornwongsa T, Hao MZ, Joshi P, Lukman JF, Gunawi HS. SAMC: Semantic-aware model checking for fast discovery of deep bugs in cloud systems. In: Proc. of the 11th USENIX Conf. on Operating Systems Design and Implementation. Broomfield: USENIX Association, 2014. 399–414. [doi: 10.5555/2685048.2685080]
    [58] Jepsen. Distributed systems safety research. 2024. http://jepsen.io/
    [59] Netflix. Chaos monkey. 2024. https://netflix.github.io/chaosmonkey/
    [60] Li W, Li SS, Liao XK, Xu XY, Zhou SL, Jia ZY. ConfTest: Generating comprehensive misconfiguration for system reaction ability evaluation. In: Proc. of the 21st Int’l Conf. on Evaluation and Assessment in Software Engineering. Karlskrona: ACM, 2017. 88–97. [doi: 10.1145/3084226.3084244]
    [61] Ozkan BK, Majumdar R, Niksic F, Befrouei MT, Weissenbacher G. Randomized testing of distributed systems with probabilistic guarantees. Proc. of the ACM on Programming Languages, 2018, 2(OOPSLA): 160.
    [62] Stuardo CA, Leesatapornwongsa T, Suminto RO, Ke H, Lukman JF, Chuang WC, Lu S, Gunawi HS. Scalecheck: A single-machine approach for discovering scalability bugs in large distributed systems. In: Proc. of the 17th USENIX Conf. on File and Storage Technologies. Boston: USENIX Association, 2019. 359–373. [doi: 10.5555/3323298.3323332]
    [63] Lukman JF, Ke H, Stuardo CA, Suminto RO, Kurniawan DH, Simon D, Priambada S, Tian C, Ye F, Leesatapornwongsa T, Gupta A, Lu S, Gunawi HS. FlyMC: Highly scalable testing of complex interleavings in distributed systems. In: Proc. of the 14th EuroSys Conf. Dresden: ACM, 2019. 20. [doi: 10.1145/3302424.3303986]
    [64] Chen HC, Dou WS, Wang D, Qin F. CoFI: Consistency-guided fault injection for cloud systems. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering. ACM, 2020. 536–547. [doi: 10.1145/3324884.3416548]
    [65] Yuan XH, Yang JF. Effective concurrency testing for distributed systems. In: Proc. of the 25th Int’l Conf. on Architectural Support for Programming Languages and Operating Systems. Lausanne: ACM, 2020. 1141–1156. [doi: 10.1145/3373376.3378484]
    [66] Pranata AA, Barais O, Bourcier J, Noirie L. ChaT: Evaluation of reconfigurable distributed network systems using metamorphic testing. In: Proc. of the 2021 IEEE Global Communications Conf. Madrid: IEEE, 2021. 1–6. [doi: 10.1109/GLOBECOM46510.2021.9685879]
    [67] Nikolaidis F, Chazapis A, Marazakis M, Bilas A. Frisbee: Automated testing of Cloud-native applications in Kubernetes. arXiv: 2109.10727, 2021.
    [68] Hsaini S, Azzouzi S, El Hassan Charaf M. A temporal based approach for MapReduce distributed testing. Int’l Journal of Parallel, Emergent and Distributed Systems, 2021, 36(4): 293–311.
    [69] Yang Y, Kim T, Chun BG. Finding consensus bugs in ethereum via multi-transaction differential fuzzing. In: Proc. of the 15th USENIX Symp. on Operating Systems Design and Implementation. USENIX Association, 2021. 349–365.
    [70] Zhang YL, Yang JW, Jin ZQ, Sethi U, Rodrigues K, Lu S, Yuan D. Understanding and detecting software upgrade failures in distributed systems. In: Proc. of the ACM SIGOPS 28th Symp. on Operating Systems Principles. ACM, 2021. 116–131. [doi: 10.1145/3477132.3483577]
    [71] Kim BH, Kim T, Lie D. Modulo: Finding convergence failure bugs in distributed systems with divergence resync models. In: Proc. of the 31st USENIX Annual Technical Conf. Boston: USENIX Association, 2022. 383–398.
    [72] Lu J, Li HF, Liu C, Li L, Cheng K. Detecting missing-permission-check vulnerabilities in distributed cloud systems. In: Proc. of the 2022 ACM SIGSAC Conf. on Computer and Communications Security. Los Angeles: ACM, 2022. 2145–2158. [doi: 10.1145/3548606.3560589]
    [73] Lu RM, Xu E, Zhang YM, Zhu FY, Zhu ZS, Wang MT, Zhu ZP, Xue GT, Shu JW, Li ML, Wu JS. PERSEUS: A fail-slow detection framework for cloud storage systems. In: Proc. of the 21st USENIX Conf. on File and Storage Technologies. Santa Clara: USENIX Association, 2023. 4. [doi: 10.5555/3585938.3585942]
    [74] Wang D, Dou WS, Gao Y, Wu CN, Wei J, Huang T. Model checking guided testing for distributed systems. In: Proc. of the 18th European Conf. on Computer Systems. Rome: ACM, 2023. 127–143. [doi: 10.1145/3552326.3587442]
    [75] Ma FC, Chen YL, Ren M, Zhou YH, Jiang Y, Chen T, Li HZ, Sun JG. LOKI: State-aware fuzzing framework for the implementation of blockchain consensus protocols. In: Proc. of the 2023 Network and Distributed System Security Symp. San Diego: NDSS, 2023. [doi: 10.14722/ndss.2023.24078]
    [76] Chen YL, Ma FC, Zhou YH, Jiang Y, Chen T, Sun JG. Tyr: Finding consensus failure bugs in blockchain system with behaviour divergent model. In: Proc. of the 2023 IEEE Symp. on Security and Privacy. San Francisco: IEEE, 2023. 2517–2532. [doi: 10.1109/SP46215.2023.10179386]
    [77] Meng RJ, P?rlea G, Roychoudhury A, Sergey I. Greybox fuzzing of distributed systems. In: Proc. of the 2023 ACM SIGSAC Conf. on Computer and Communications Security. Copenhagen: ACM, 2023. 1615–1629. [doi: 10.1145/3576915.3623097]
    [78] Ma FC, Chen YL, Zhou YH, Sun JX, Su Z, Jiang Y, Sun JG, Li HZ. Phoenix: Detect and locate resilience issues in blockchain via context-sensitive chaos. In: Proc. of the 2023 ACM SIGSAC Conf. on Computer and Communications Security. Copenhagen: ACM, 2023. 1182–1196. [doi: 10.1145/3576915.3623071]
    [79] Gao Y, Dou WS, Wang D, Feng WH, Wei J, Zhong H, Huang T. Coverage guided fault injection for cloud systems. In: Proc. of the 45th Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 2211–2223. [doi: 10.1109/ICSE48619.2023.00186]
    [80] Li JQ, Li SY, Li KY, Luo FL, Yu HF, Li SS, Li X. ECFuzz: Effective configuration fuzzing for large-scale systems. In: Proc. of the 46th IEEE/ACM Int’l Conf. on Software Engineering. Lisbon: ACM, 2024. 48. [doi: 10.1145/3597503.3623315]
    [81] Serebryany K, Bruening D, Potapenko A, Vyukov D. AddressSanitizer: A fast address sanity checker. In: Proc. of the 2012 USENIX Conf. on Technical Conf. Boston: USENIX Association, 2012. 309–318. [doi: 10.5555/2342821.2342849]
    [82] Panda B, Srinivasan D, Ke H, Gupta K, Khot V, Gunawi HS. IASO: A fail-slow detection and mitigation framework for distributed storage services. In: Proc. of the 2019 USENIX Conf. on USENIX Annual Technical Conf. Renton: USENIX Association, 2019. 47–61. [doi: 10.5555/3358807.3358812]
    [83] DBSCAN. Density-based spatial clustering of applications with noise. 2024. https://blog.csdn.net/hyangyuchen/article/details/143135345
    [84] Machado N, Maia F, Neves F, Coelho F, Pereira J. Minha: Large-scale distributed systems testing made practical. 2020.
    [85] Carroll JJ, Anand P, Guo D. Preproduction deploys: Cloud-native integration testing. In: Proc. of the 2021 IEEE Cloud Summit. Hempstead: IEEE, 2021. 41–48. [doi: 10.1109/IEEECloudSummit52029.2021.00015]
    [86] Alibaba. ChaosBlade. 2024. https://github.com/chaosblade-io/chaosblade
    [87] Nú?ez A, Ca?izares PC, Nú?ez M, Hierons RM. TEA-Cloud: A formal framework for testing cloud computing systems. IEEE Trans. on Reliability, 2021, 70(1): 261–284.
    [88] Bodnarchuk R, Bunt R. A synthetic workload model for a distributed system file server. In: Proc. of the 1991 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems. San Diego: ACM, 1991. 50–59. [doi: 10.1145/107971.107978]
    [89] Yin JW, Lu XJ, Zhao XK, Chen HW, Liu X. BURSE: A bursty and self-similar workload generator for cloud computing. IEEE Trans. on Parallel and Distributed Systems, 2015, 26(3): 668–680.
    [90] Swagger. API testing. 2024. https://swagger.io/solutions/api-testing/
    [91] Halili EH. Apache JMeterTM. 2024. https://jmeter.apache.org/
    [92] Pact. Contract testing. 2024. https://docs.pact.io/
    [93] Brown MD, Schultz B. Techniques for fuzzing embedded and distributed systems. Georgia Research Tech Institute. 2020. https://www.itea.org/wp-content/uploads/2021/01/FuzzingEmbeddedDistributedSystems_Brown.pdf
    [94] Eddington M. Peach fuzzer: Discover unknown vulnerabilities. 2024. https://peachtech.gitlab.io/peach-fuzzer-community/
    [95] Liu HP, Li GP, Lukman JF, Li JX, Lu S, Gunawi HS, Tian C. DCatch: Automatically detecting distributed concurrency bugs in cloud systems. ACM SIGPLAN Notices, 2017, 52(4): 677–691.
    [96] Joshi P, Gunawi HS, Sen K. PREFAIL: A programmable tool for multiple-failure injection. In: Proc. of the 2011 ACM Int’l Conf. on Object Oriented Programming Systems Languages and Applications. Portland: ACM, 2011. 171–188. [doi: 10.1145/2048066.2048082]
    [97] Gunawi HS, Do T, Joshi P, Alvaro P. FATE and DESTINI: A framework for cloud recovery testing. In: Proc. of the 8th USENIX Conf. on Networked Systems Design and Implementation. Boston: USENIX Association, 2011. 238–252. [doi: 10.5555/1972457.1972482]
    [98] Gcov/Lcov. A test coverage program. 2024. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
    [99] JaCoCo java code coverage library. 2024. https://www.eclemma.org/jacoco/
    [100] Kallas K, Niksic F, Stanford C, Alur R. DiffStream: Differential output testing for stream processing programs. In: Proc. of the ACM on Programming Languages, 2020, 4(OOPSLA): 153.
    [101] Lamport Leslie. Specifying systems: The TLA+ language and tools for hardware and software engineers. Computer, 2002, 35(9): 81. [doi: 10.1109/MC.2002.1033032]
    [102] Deligiannis P, McCutchen M, Thomson P, Chen S, Donaldson AF, Erickson J, Huang C, Lal A, Mudduluru R, Qadeer S, Schulte W. Uncovering bugs in distributed storage systems during testing (not in production!). In: Proc. of the 14th USENIX Conf. on File and Storage Technologies. Santa Clara: USENIX Association, 2016. 249–262. [doi: 10.5555/2930583.2930602]
    [103] Deligiannis P, Donaldson AF, Ketema J, Lal A, Thomson P. Asynchronous programming, analysis and testing with state machines. In: Proc. of the 36th ACM SIGPLAN Conf. on Programming Language Design and Implementation. Portland: ACM, 2015. 154–164. [doi: 10.1145/2737924.2737996]
    [104] Alquraan A, Takruri H, Alfatafta M, Al-Kiswany S. An analysis of network-partitioning failures in cloud systems. In: Proc. of the 13th USENIX Conf. on Operating Systems Design and Implementation. Carlsbad: USENIX Association, 2018. 51–68. [doi: 10.5555/3291168.3291173]
    [105] GoogleTest. 2024. https://github.com/google/googletest
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

陈元亮,马福辰,周远航,颜臻,姜宇,孙家广.分布式系统动态测试技术研究综述.软件学报,2025,36(7):2964-3002

复制
相关视频

分享
文章指标
  • 点击次数:315
  • 下载次数: 322
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2024-08-21
  • 最后修改日期:2024-10-15
  • 在线发布日期: 2024-12-10
文章二维码
您是第20060366位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号