Survey on Service Dependency Discovery Technologies for Microservice Systems
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [93]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Microservice architectures have been widely deployed and applied, which can greatly improve the efficiency of software system development, reduce the cost of system update and maintenance, and enhance the extendibility of software systems. However, However, microservices are characterized by frequent changes and heterogeneous fusion, which result in frequent faults, fast fault propagation, and great influence. Meanwhile, complex call dependency or logical dependency between microservices makes it difficult to locate and diagnose faults timely and accurately, which poses a challenge to the intelligent operation and maintenance of microservice architecture systems. The service dependency discovery technology identifies and deduces the call dependency or logical dependency between services from data during system running and constructs a service dependency graph, which helps to timely and accurately discover and locate faults and diagnose causes during system running and is conducive to intelligent operation and maintenance requirements such as resource scheduling and change management. This study first analyzes the problem of service dependency discovery in microservice systems and then summarizes the technical status of the service dependency discovery from the perspective of three types of runtime data, such as monitoring data, system log data, and trace data. Then, based on the fault cause location, resource scheduling, and change management of the service dependency graph, the study discusses the application of service dependency discovery technology to intelligent operation and maintenance. Finally, the study discusses how service dependency discovery technology can accurately discover call dependency or logical dependency and use service dependency graph to conduct change management and predicts future research directions.

    Reference
    [1] Balalaie A, Heydarnoori A, Jamshidi P. Microservices architecture enables devops: Migration to a cloud-native architecture. IEEE Software, 2016, 33(3): 42–52. DOI: 10.1109/MS.2016.64.
    [2] IBM. Tivoli. 2021. https://www.ibm.com/docs/en/tivoli-monitoring/6.3.0
    [3] Microsoft. Microsoft operations manager. 2006. http://msdn.microsoft.com/en-us/library/aa505337.aspx
    [4] Kar G, Keller A, Calo S. Managing application services over service provider networks: Architecture and dependency analysis. In: Proc. of the 2000 IEEE/IFIP Network Operations and Management Symp. The Networked Planet: Management Beyond 2000. Honolulu: IEEE, 2000. 61–74.
    [5] Bahl P, Chandra R, Greenberg A, Kandula S, Maltz DA, Zhang M, Claims AI. Towards highly reliable enterprise network services via inference of multi-level dependencies. ACM SIGCOMM Computer Communication Review, 2007, 37(4): 13–24. [doi: 10.1145/1282427.1282383]
    [6] Bahl P, Barham P, Black R, Chandra R, Goldszmidt M, Isaacs R, Kandula S, Li L, MacCormick J, Maltz DA, Mortier R, Wawrzoniak M, Zhang M. Discovering dependencies for network management. In: Proc. of the 5th ACM Workshop on Hot Topics in Networks. Irvine: ACM, 2006. 1–6.
    [7] Barham P, Black R, Goldszmidt M, Isaacs R, MacCormick J, Mortier R, Simma A. Constellation: Automated discovery of service and host dependencies in networked systems. Technical Report, 2008. 1–14.
    [8] Zand A, Vigna G, Kemmerer R, Kruegel C. Rippler: Delay injection for service dependency detection. In: Proc. of the 2014 IEEE INFOCOM-IEEE Conf. on Computer Communications. Toronto: IEEE, 2014. 2157–2165.
    [9] Chen X, Zhang M, Mao ZM, Bahl P. Automating network application dependency discovery: Experiences, limitations, and new solutions. In: Proc. of the 8th USENIX Conf. on Operating Systems Design and Implementation. San Diego: USENIX Association, 2008. 117–130.
    [10] Kandula S, Chandra R, Katabi D. What’s going on: Learning communication rules in edge networks. In: Proc. of the 2008 ACM SIGCOMM Conf. on Data Communication. Seattle: ACM, 2008. 87–98.
    [11] Popa L, Chun BG, Stoica I, Chandrashekar J, Taft N. Macroscope: End-point approach to networked application dependency discovery. In: Proc. of the 5th Int’l Conf. on Emerging Networking Experiments and Technologies. Rome: ACM, 2009. 229–240.
    [12] Natarajan A, Ning P, Liu Y, Jajodia S, Hutchinson SE. NSDMiner: Automated discovery of network service dependencies. In: Proc. of the 2012 IEEE INFOCOM. Orlando: IEEE, 2012. 2507–2515.
    [13] Peddycord III B, Ning P, Jajodia S. On the accurate identification of network service dependencies in distributed systems. In: Proc. of the 26th Int’l Conf. on Large Installation System Administration: Strategies, Tools, and Techniques. San Diego: USENIX Association, 2012. 181–194.
    [14] Ding M, Singh V, Zhang YP, Jiang GF. Application dependency discovery using matrix factorization. In: Proc. of the 20th IEEE Int’l Workshop on Quality of Service. Coimbra: IEEE, 2012. 1–4.
    [15] Brown A, Kar G, Keller A. An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. In: Proc. of the 2001 IEEE/IFIP Int’l Symp. on Integrated Network Management. Integrated Network Management VII. Integrated Management Strategies for the New Millennium. Seattle: IEEE, 2001. 377–390.
    [16] Bagchi S, Kar G, Hellerstein JL. Dependency analysis in distributed systems using fault injection: Application to problem determination in an e-commerce environment. In: Proc. of the 12th Int’l Workshop on Distributed Systems. Nancy: INRIA, 2001. 151–164.
    [17] Gupta M, Neogi A, Agarwal MK, Kar G. Discovering dynamic dependencies in enterprise environments for problem determination. In: Proc. of the 14th Int’l Workshop on Distributed Systems: Operations and Management. Heidelberg: Springer, 2003. 221–233.
    [18] Novotny P, Ko BJ, Wolf AL. On-demand discovery of software service dependencies in MANETs. IEEE Transactions on Network and Service Management, 2015, 12(2): 278–292. [doi: 10.1109/TNSM.2015.2410693]
    [19] Novotny P, Wolf AL, Ko BJ. Discovering service dependencies in mobile ad hoc networks. In: Proc. of the 2013 IFIP/IEEE Int’l Symp. on Integrated Network Management. Ghent: IEEE, 2013. 527–533.
    [20] Wu LJ, Li HW, Cheng YJ, Wu YS, Lin HC. Application dependency tracing for message oriented middleware. In: Proc. of the 16th Asia-Pacific Network Operations and Management Symp. Hsinchu: IEEE, 2014. 1–6.
    [21] Apte R, Hu LT, Schwan K, Ghosh A. Look who’s talking: Discovering dependencies between virtual machines using CPU utilization. In: Proc. of the 2nd USENIX Conf. on Hot Topics in Cloud Computing. Boston: USENIX Association, 2010. 17.
    [22] Sangpetch A, Kim HS. VDEP: VM dependency discovery in multi-tier cloud applications. In: Proc. of the 8th IEEE Int’l Conf. on Cloud Computing. New York: IEEE, 2015. 694–701.
    [23] Microservices-Demo. 2022. https://github.com/GoogleCloudPlatform/microservices-demo
    [24] 杨勇, 李影, 吴中海. 分布式追踪技术综述. 软件学报, 2020, 31(7): 2019–2039. http://www.jos.org.cn/1000-9825/6047.htm
    Yang Y, Li Y, Wu ZH. Survey of state-of-the-art distributed tracing technology. Ruan Jian Xue Bao/Journal of Software, 2020, 31(7): 2019–2039 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6047.htm
    [25] Casalicchio E. Dependencies discovery and analysis in distributed systems. In: Proc. of the 6th Int’l Workshop on Critical Information Infrastructures Security. Lucerne: Springer, 2011. 205–208.
    [26] WinPcap. 2018. http://www.winpcap.org/
    [27] Lange M, Möller R. Time series data mining for network service dependency analysis. In: Proc. of the 2016 Int’l Workshop on Soft Computing Models in Industrial and Environmental Applications Computational Intelligence in Security for Information Systems Conf. Int’l Conf. on European Transnational Education. San Sebastián: Springer, 2016. 584–594.
    [28] Tsubouchi Y, Furukawa M, Matsumoto R. Transtracer: Socket-based tracing of network dependencies among processes in distributed applications. In: Proc. of the 44th IEEE Annual Computers, Software, and Applications Conf. Madrid: IEEE, 2020. 1206–1211.
    [29] Carroll TE, Chikkagoudar S, Arthur-Durett K. Impact of network activity levels on the performance of passive network service dependency discovery. In: Proc. of the 2015 MILCOM IEEE Military Communications Conf. Tampa: IEEE, 2015. 1341–1347.
    [30] EIDefrawy K, Kim T, Sylla P. Automated inference of dependencies of network services and applications via transfer entropy. In: Proc. of the 40th IEEE Annual Computer Software and Applications Conf. Atlanta: IEEE, 2016. 32–37.
    [31] Yin JW, Zhao XK, Tang Y, Zhi C, Chen ZN, Wu ZH. CloudScout: A non-intrusive approach to service dependency discovery. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(5): 1271–1284. [doi: 10.1109/TPDS.2016.2619715]
    [32] Shah SY, Yuan ZW, Lu SW, Zerfos P. Dependency analysis of cloud applications for performance monitoring using recurrent neural networks. In: Proc. of the 2017 IEEE Int’l Conf. on Big Data. Boston: IEEE, 2017. 1534–1543.
    [33] Jiang GF, Chen HF, Yoshihira K. Efficient and scalable algorithms for inferring likely invariants in distributed systems. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(11): 1508–1523. [doi: 10.1109/TKDE.2007.190648]
    [34] Thalheim J, Rodrigues A, Akkus IE, Bhatotia P, Chen RC, Viswanath B, Jiao L, Fetzer C. Sieve: Actionable insights from monitored metrics in distributed systems. In: Proc. of the 18th ACM/IFIP/USENIX Middleware Conf. Las Vegas: ACM, 2017. 14–27.
    [35] Schulz A, Kotson M, Meiners C, Meunier T, O’Gwynn D, Trepagnier P, Weller-Fahy D. Active dependency mapping: A data-driven approach to mapping dependencies in distributed systems. In: Proc. of the 2017 IEEE Int’l Conf. on Information Reuse and Integration. San Diego: IEEE, 2017. 84–91.
    [36] Yuan Y, Anu H, Shi WC, Liang B, Qin B. Learning-based anomaly cause tracing with synthetic analysis of logs from multiple cloud service components. In: Proc. of the 43rd IEEE Annual Computer Software and Applications Conf. Milwaukee: IEEE, 2019. 66–71.
    [37] Tak BC, Tao S, Yang L, Zhu C, Ruan YP. LOGAN: Problem diagnosis in the cloud using log-based reference models. In: Proc. of the 2016 IEEE Int’l Conf. on Cloud Engineering. Berlin: IEEE, 2016. 62–67.
    [38] Zhao X, Zhang YL, Lion D, Ullah MF, Luo Y, Yuan D, Stumm M. Lprof: A non-intrusive request flow profiler for distributed systems. In: Proc. of the 11th USENIX Conf. on Operating Systems Design and Implementation. Broomfield: USENIX Association, 2014. 629–644.
    [39] Yin K, Yan M, Xu L, Xu Z, Li Z, Yang D, Zhang XH. Improving log-based anomaly detection with component-aware analysis. In: Proc. of the 2020 IEEE Int’l Conf. on Software Maintenance and Evolution. Adelaide: IEEE, 2020. 667–671.
    [40] Nandi A, Mandal A, Atreja S, Dasgupta GB, Bhattacharya S. Anomaly detection using program control flow graph mining from execution logs. In: Proc. of the 22nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016. 215–224.
    [41] Yu X, Joshi P, Xu JW, Jin GL, Zhang H, Jiang GF. CloudSeer: Workflow monitoring of cloud infrastructures via interleaved logs. ACM SIGARCH Computer Architecture News, 2016, 44(2): 489–502. [doi: 10.1145/2980024.2872407]
    [42] Jia T, Chen PF, Yang L, Li Y, Meng FJ, Xu JM. An approach for anomaly diagnosis based on hybrid graph model with logs for distributed services. In: Proc. of the 2017 IEEE Int’l Conf. on Web Services. Honolulu: IEEE, 2017. 25–32.
    [43] Sigelman BH, Barroso LA, Burrows M, Stephenson P. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report, Dapper-2010-1, 2010.
    [44] Mi HB, Wang HM, Zhou YF, Lyu MRT, Cai H. Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(6): 1245–1255. [doi: 10.1109/TPDS.2013.21]
    [45] Mi HB, Wang HM, Cai H, Zhou YF, Lyu MR, Chen ZB. P-Tracer: Path-based performance profiling in cloud computing systems. In: Proc. of the 36th IEEE Annual Computer Software and Applications Conf. Izmir: IEEE, 2012. 509–514.
    [46] Yang Y, Wang L, Gu J, Li Y. Transparently capturing execution path of service/job request processing. In: Proc. of the 16th Int’l Conf. on Service-oriented Computing. Hangzhou: Springer, 2018. 879–887.
    [47] Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint: Problem determination in large, dynamic internet services. In: Proc. of the 2002 Int’l Conf. on Dependable Systems and Networks. Washington: IEEE, 2002. 595–604.
    [48] Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proc. of the 6th Conf. on Symp. on Operating Systems Design & Implementation. San Francisco: USENIX Association, 2004. 18.
    [49] Mace J, Roelke R, Fonseca R. Pivot tracing: Dynamic causal monitoring for distributed systems. ACM Transactions on Computer Systems, 2017, 35(4): 11. [doi: 10.1145/3208104]
    [50] Chow M, Meisner D, Flinn J, Peek D, Wenisch TF. The mystery machine: End-to-end performance analysis of large-scale internet services. In: Proc. of the 11th USENIX Conf. on Operating Systems Design and Implementation. Broomfield: USENIX Association, 2014. 217–231.
    [51] Zhou X, Peng X, Xie T, Sun J, Ji C, Li WH, Ding D. Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study. IEEE Transactions on Software Engineering, 2021, 47(2): 243–260. [doi: 10.1109/TSE.2018.2887384]
    [52] Guo XF, Peng X, Wang HZ, Li WX, Jiang H, Ding D, Xie T, Su LF. Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Sacramento: ACM, 2020. 1387–1397.
    [53] Liu DW, He C, Peng X, Lin F, Zhang CX, Gong SF, Li Z, Ou JY, Wu ZS. MicroHECL: High-efficient root cause localization in large-scale microservice systems. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering: Software Engineering in Practice. Madrid: IEEE, 2021. 338–347.
    [54] Chen PF, Qi Y, Hou D. Causeinfer: Automated end-to-end performance diagnosis with hierarchical causality graph in cloud environment. IEEE Transactions on Services Computing, 2019, 12(2): 214–230. [doi: 10.1109/TSC.2016.2607739]
    [55] Lin JJ, Chen PF, Zheng ZB. Microscope: Pinpoint performance issues with causal graphs in micro-service environments. In: Proc. of the 16th Int’l Conf. on Service-oriented Computing. Hangzhou: Springer, 2018. 3–20.
    [56] Nguyen H, Shen ZM, Tan YM, Gu XH. FChain: Toward black-box online fault localization for cloud systems. In: Proc. of the 33rd IEEE Int’l Conf. on Distributed Computing Systems. Philadelphia: IEEE, 2013. 21–30.
    [57] Kim M, Sumbaly R, Shah S. Root cause detection in a service-oriented architecture. ACM SIGMETRICS Performance Evaluation Review, 2013, 41(1): 93–104. [doi: 10.1145/2494232.2465753]
    [58] Aggarwal P, Gupta A, Mohapatra P, Nagar S, Mandal A, Wang Q, Paradkar A. Localization of operational faults in cloud applications by mining causal dependencies in logs using golden signals. In: Proc. of the 2020 Int’l Conf. on Service-oriented Computing. Dubai: Springer, 2020. 137–149.
    [59] Zhang ZK, Li B, Wang J, Liu LQ. AAMR: Automated anomalous microservice ranking in cloud-native environment. In: Proc. of the 33rd Int’l Conf. on Software Engineering and Knowledge Engineering. Pittsburgh: KSI Research Inc., 2021. 86–91.
    [60] Wu L, Tordsson J, Elmroth E, Kao O. MicroRCA: Root cause localization of performance issues in microservices. In: Proc. of the NOMS 2020 IEEE/IFIP Network Operations and Management Symp. Budapest: IEEE, 2020. 1–9.
    [61] Wang P, Xu JM, Ma M, Lin WL, Pan DS, Wang Y, Chen PF. CloudRanger: Root cause identification for cloud native systems. In: Proc. of the 18th IEEE/ACM Int’l Symp. on Cluster, Cloud and Grid Computing. Washington: IEEE, 2018. 492–502.
    [62] Ma M, Lin WL, Pan DS, Wang P. MS-Rank: Multi-metric and self-adaptive root cause diagnosis for microservice applications. In: Proc. of the 2019 IEEE Int’l Conf. on Web Services. Milan: IEEE, 2019. 60–67.
    [63] Ma M, Xu JM, Wang Y, Chen PF, Zhang ZH, Wang P. AutoMAP: Diagnose your microservice-based Web applications automatically. In: Proc. of the 2020 Web Conf. Taipei: ACM, 2020. 246–258.
    [64] Kandula S, Mahajan R, Verkaik P, Agarwal S, Padhye J, Bahl P. Detailed diagnosis in enterprise networks. In: Proc. of the 2009 ACM SIGCOMM Conf. on Data Communication. Barcelona: ACM, 2009. 243–254.
    [65] Soldani J, Brogi A. Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey. ACM Computing Surveys, 2023, 55(3): 59. [doi: 10.1145/3501297]
    [66] Zhang YQ, Hua WZ, Zhou ZZ, Suh GE, Delimitrou C. Sinan: ML-based and QoS-aware resource management for cloud microservices. In: Proc. of the 26th ACM Int’l Conf. on Architectural Support for Programming Languages and Operating Systems. Detroit: ACM, 2021. 167–181.
    [67] Baarzi AF, Kesidis G. SHOWAR: Right-sizing and efficient scheduling of microservices. In: Proc. of the 2021 ACM Symp. on Cloud Computing. Seattle: ACM, 2021. 427–441.
    [68] Qiu HR, Banerjee SS, Jha S, Kalbarczyk ZT, Iyer RK. FIRM: An intelligent fine-grained resource management framework for SLO-oriented microservices. In: Proc. of the 14th USENIX Conf. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2020. 46.
    [69] Mirhosseini A, Elnikety S, Wenisch TF. Parslo: A gradient descent-based approach for near-optimal partial SLO allotment in microservices. In: Proc. of the 2021 ACM Symp. on Cloud Computing. Seattle: ACM, 2021. 442–457.
    [70] Gias AU, Casale G, Woodside M. ATOM: Model-driven autoscaling for microservices. In: Proc. of the 39th IEEE Int’l Conf. on Distributed Computing Systems. Dallas: IEEE, 2019. 1994–2004.
    [71] Narantuya J, Ha T, Bae J, Lim H. Dependency analysis based approach for virtual machine placement in software-defined data center. Applied Sciences, 2019, 9(16): 3223. [doi: 10.3390/app9163223]
    [72] Narantuya J, Zang HN, Lim H. Service-aware cloud-to-cloud migration of multiple virtual machines. IEEE Access, 2018, 6: 76663–76672. [doi: 10.1109/ACCESS.2018.2882651]
    [73] Zhai EN, Chen A, Piskac R, Balakrishnan M, Tian BC, Song B, Zhang HL. Check before you change: Preventing correlated failures in service updates. In: Proc. of the 17th USENIX Conf. on Networked Systems Design and Implementation. Santa Clara: USENIX Association, 2020. 575–589.
    [74] Alipourfard O, Gao JQ, Koenig J, Harshaw C, Vahdat A, Yu ML. Risk based planning of network changes in evolving data centers. In: Proc. of the 27th ACM Symp. on Operating Systems Principles. Huntsville: ACM, 2019. 414–429.
    [75] de Andrade C, Mahimkar A, Sinha R, Zhang WY, Cire A, Rana G, Ge ZH, Puthenpura S, Yates J, Riding R. Minimizing effort and risk with network change deployment planning. In: Proc. of the 2021 IFIP Networking Conf. (IFIP Networking). Espoo and Helsinki: IEEE, 2021. 1–9.
    [76] Mahimkar A, Ge ZH, Wang J, Yates J, Zhang Y, Emmons J, Huntley B, Stockert M. Rapid detection of maintenance induced changes in service performance. In: Proc. of the 7th Conf. on Emerging Networking Experiments and Technologies. Tokyo: ACM, 2011. 13.
    [77] Mahimkar A, Ge ZH, Yates J, Hristov C, Cordaro V, Smith S, Xu J, Stockert M. Robust assessment of changes in cellular networks. In: Proc. of the 9th ACM Conf. on Emerging Networking Experiments and Technologies. Santa Barbara: ACM, 2013. 175–186.
    [78] Mahimkar A, Ge ZH, Ahuja S, Pathak S, Shafi N. Rigorous, effortless and timely assessment of cellular network changes. In: Proc. of 49th Annual IEEE/IFIP Int’l Conf. on Dependable Systems and Networks. Portland: IEEE, 2019. 256–263.
    [79] Zhang SL, Liu Y, Pei D, Chen Y, Qu XP, Tao SM, Zang Z, Jing XW, Feng M. FUNNEL: Assessing software changes in web-based services. IEEE Transactions on Services Computing, 2018, 11(1): 34–48. [doi: 10.1109/TSC.2016.2539945]
    [80] Filebeat. 2022. https://github.com/elastic/beats
    [81] Logstash. 2022. https://github.com/elastic/logstash
    [82] Flume. 2022. https://flume.apache.org
    [83] Elasticsearch. 2022. https://github.com/elastic/elasticsearch
    [84] Kibana. 2022. https://github.com/elastic/kibana
    [85] Splunk. 2022. https://www.splunk.com
    [86] ZABBIX. 2022. https://www.zabbix.com
    [87] Prometheus. 2022. https://prometheus.io
    [88] Opentracing. 2022. https://opentracing.io
    [89] Zipkin. 2022. https://zipkin.io
    [90] Jaeger. 2022. https://www.jaegertracing.io
    [91] SkyWalking. 2022. https://skywalking.apache.org
    [92] DeepFlow. 2022. https://deepflow.yunshan.net/index.html
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张齐勋,吴一凡,杨勇,贾统,李影,吴中海.微服务系统服务依赖发现技术综述.软件学报,2024,35(1):118-135

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 02,2021
  • Revised:August 22,2022
  • Online: May 31,2023
  • Published: January 06,2024
You are the first2033307Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063