Transparent Request Tracing and Sampling Method for Java-based Microservice System
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [42]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Microservice is becoming the mainstream architecture of the cloud-based software systems because of its agile development and rapid deployment. However, the structure of a microservice system is complex, it often has hundred of service instances. Moreover, the call relationship between services is extremely complex. When an anomaly occurs in the microservice system, it is difficult to locate the root causes of the anomaly. The end-to-end request tracing method becomes the standard configuration of a microservice system to solve this problem. However, current methods of distributed request tracing are intrusive to applications and heavily rely on the developers’ expertise in request tracing. Besides, it is unable to start or stop the tracing functionality at runtime. These defects not only increase the burden of developers but also restrict the adoption of distributed request tracing technique in practice. This study designs and implements a transparent request tracing system named Trace++, which can generate tracing code automatically and inject the generated code into the running application by using dynamic code instrumentation technology. Trace++ is low intrusive to programs, transparent to developers, and can start or stop the tracing functionality flexibly. In addition, the adaptive sampling method of Trace++ effectively reduces the cost of request tracing. The results of the experiments conducted on TrainTicket, a microservice system, show that Trace++ can discover the dependencies between services accurately and its performance cost is close to the source code instrumentation method when it starts request tracing. When the request tracing functionality is stopped, Trace++ incurs no performance cost. Moreover, the adaptive sampling method can preserve the representative trace data while 89.4% of trace data are reduced.

    Reference
    [1] Lin JJ, Chen PF, Zheng ZB. Microscope: Pinpoint performance issues with causal graphs in micro-service environments. In: Proc. of the 16th Int’l Conf. on Service-oriented Computing. Hangzhou: Springer, 2018. 3–20.
    [2] Yu GB, Chen PF, Zheng ZB. Microscaler: Automatic scaling for microservices with an online learning approach. In: Proc. of the 2019 IEEE Int’l Conf. on Web Services. Milan: IEEE, 2019. 68–75.
    [3] Yu GB, Chen PF, Chen HY, Guan ZJ, Huang ZC, Jing LX, Weng TJ, Sun XM, Li XY. MicroRank: End-to-end latency issue localization with extended spectrum analysis in microservice environments. In: Proc. of the 2021 Web Conf. Ljubljana: ACM, 2021. 3087–3098.
    [4] 杨勇, 李影, 吴中海. 分布式追踪技术综述. 软件学报, 2020, 31(7): 2019–2039. http://www.jos.org.cn/1000-9825/6047.htm
    Yang Y, Li Y, Wu ZH. Survey of state-of-the-art distributed tracing technology. Ruan Jian Xue Bao/Journal of Software, 2020, 31(7): 2019–2039 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6047.htm
    [5] Chanda A, Cox AL, Zwaenepoel W. Whodunit: Transactional profiling for multi-tier applications. In: Proc. of the 2nd ACM SIGOPS/EuroSys European Conf. on Computer Systems. Lisbon: ACM, 2007. 17–30.
    [6] Sambasivan RR, Fonseca RLC, Shafer I, Ganger RG. So, you want to trace your distributed system? Key design insights from years of practical experience. Technical Report, Pittsburgh: Carnegie Mellon University, 2014.
    [7] He ZL, Chen PF, Li XY, Wang YF, Yu GB, Chen CL, Li XR, Zheng ZB. A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems. IEEE Trans. on Neural Networks and Learning Systems, 2020: 1–15.
    [8] Barham P, Donnelly A, Isaacs R, Mortier R. Using magpie for request extraction and workload modelling. In: Proc. of the 6th Conf. on Symp. on Operating Systems Design and Implementation. San Francisco: USENIX Association, 2004. 18.
    [9] Chen MY, Accardi A, Kıcıman E, Lloyd J, Patterson D, Fox A, Brewer E. Path-based failure and evolution management. In: Proc. of the 1st Symp. on Networked Systems Design and Implementation. San Francisco: DBLP, 2004. 309–322.
    [10] Fonseca R, Freedman MJ, Porter G. Experiences with tracing causality in networked services. In: Proc. of the 2010 Internet Network Management Conf. on Research on Enterprise Networking. San Jose: USENIX Association, 2010. 10.
    [11] Fonseca RLC, Porter G, Katz RH, Shenker S, Stoica I. X-trace: A pervasive network tracing framework. In: Proc. of the 4th USENIX Conf. on Networked Systems Design & Implementation. Cambridge: USENIX Association, 2007. 271–284.
    [12] Reynolds P, Killian CE, Wiener JL, Mogul JC, Shah MA, Vahdat A. Pip: Detecting the unexpected in distributed systems. In: Proc. of the 3rd USENIX Conf. on Networked Systems Design and Implementation. San Jose: USENIX Association, 2006. 115–128.
    [13] Sambasivan RR, Zheng AX, De Rosa M, Krevat E, Whitman S, Stroucken M, Wang W, Xu LH, Ganger GR. Diagnosing performance changes by comparing request flows. In: Proc. of the 8th USENIX Conf. on Networked Systems Design and Implementation. Boston: USENIX Association, 2011. 43–56.
    [14] Sigelman BH, Barroso LA, Burrows M, Stephenson P, Plakal M, Beaver D, Jaspan S, Shanbhag C. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report, Google Inc., 2010.
    [15] Fonseca R, Dutta P, Levis P, Stoica I. Quanto: Tracking energy in networked embedded systems. In: Proc. of the 8th USENIX Symp. on Operating Systems Design and Implementation. San Diego: USENIX Association, 2008. 323–338.
    [16] Thereska E, Salmon B, Strunk J, Wachs M, Abd-El-Malek M, Lopez J, Ganger GR. Stardust: Tracking activity in a distributed storage system. ACM SIGMETRICS Performance Evaluation Review, 2006, 34(1): 3–14. [doi: 10.1145/1140103.1140280]
    [17] Kiczales G, Hilsdale E, Hugunin J, Kersten M, Palm J, Griswold WG. An overview of AspectJ. In: Proc. of the 15th European Conf. Budapest on Object-oriented Programming. Hungary: Springer, 2001. 327–354.
    [18] Mace J, Roelke R, Fonseca R. Pivot tracing: Dynamic causal monitoring for distributed systems. In: Proc. of the 25th Symp. on Operating Systems Principles. Monterey: ACM, 2015. 378–393.
    [19] Erlingsson Ú, Peinado M, Peter S, Budiu M, Mainar-Ruiz G. Fay: Extensible distributed tracing from kernels to clusters. In: Proc. of the 23rd ACM Symp. on Operating Systems Principles. Cascais: ACM, 2011. 311–326.
    [20] (本条文献没有信息, 请核查)Aguilera MK, Mogul JC, Wiener JL, Reynolds P, Muthitacharoen A. Performance debugging for distributed systems of black boxes. ACM SIGOPS Operating Systems Review, 2003, 37(5): 74–89. [doi: 10.1145/1165389.945454]
    [21] Koskinen E, Jannotti J. Borderpatrol: Isolating events for black-box tracing. ACM SIGOPS Operating Systems Review, 2008, 42(4): 191–203. [doi: 10.1145/1357010.1352613]
    [22] Reynolds P, Wiener JL, Mogul JC, Aguilera MK, Vahdat A. WAP5: Black-box performance debugging for wide-area systems. In: Proc. of the 15th Int’l Conf. on World Wide Web. Edinburgh: ACM, 2006. 347–356.
    [23] Tak BC, Tang CQ, Zhang C, Govindan S, Urgaonkar B, Chang RN. vPath: Precise discovery of request processing paths from black-box observations of thread and network activities. In: Proc. of the 2009 Conf. on USENIX Annual Technical Conf. San Diego: USENIX Association, 2009. 259–272.
    [24] Anandkumar A, Bisdikian C, Agrawal D. Tracking in a spaghetti bowl: Monitoring transactions using footprints. ACM SIGMETRICS Performance Evaluation Review, 2008, 36(1): 133–144. [doi: 10.1145/1384529.1375473]
    [25] Sengupta B, Banerjee N, Bisdikian C, Hurley P. Tracking transaction footprints for non-intrusive end-to-end monitoring. Cluster Computing, 2009, 12(1): 59–72. [doi: 10.1007/s10586-008-0066-7]
    [26] Wang T, Perng CS, Tao T, Tang CQ, So E, Zhang C, Chang R, Liu L. A temporal data-mining approach for discovering end-to-end transaction flows. In: Proc. of the 2008 IEEE Int’l Conf. on Web Services. Beijing: IEEE, 2008. 37–44.
    [27] Zhou X, Peng X, Xie T, Sun J, Xu CJ, Ji C, Zhao WY. Benchmarking microservice systems for software engineering research. In: Proc. of the 40th IEEE/ACM Int’l Conf. on Software Engineering: Companion (ICSE-Companion). Gothenburg: ACM, 2018. 323–324.
    [28] Basiri A, Behnam N, De Rooij R, Hochstein L, Kosewski L, Reynolds J, Rosenthal C. Chaos engineering. IEEE Software, 2016, 33(3): 35–41. [doi: 10.1109/MS.2016.60]
    [29] Kaldor J, Mace J, Bejda M, Gao E, Kuropatwa W, O’Neill J, Ong KW, Schaller B, Shan PJ, Viscomi B, Venkataraman V, Veeraraghavan K, Song YJ. Canopy: An end-to-end performance tracing and analysis system. In: Proc. of the 26th Symp. on Operating Systems Principles. Shanghai: ACM, 2017. 34–50.
    [30] Lai CA, Kimball J, Zhu T, Wang QY, Pu C. milliScope: A fine-grained monitoring framework for performance debugging of n-tier Web services. In: Proc. of the 37th IEEE Int’l Conf. on Distributed Computing Systems (ICDCS). Atlanta: IEEE, 2017. 92–102.
    [31] Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint: Problem determination in large, dynamic internet services. In: Proc. of the 2002 Int’l Conf. on Dependable Systems and Networks. Washington: IEEE, 2002. 595–604.
    [32] Killian CE, Anderson JW, Braud R, Jhala R, Vahdat AM. Mace: Language support for building distributed systems. ACM SIGPLAN Notices, 2007, 42(6): 179–188. [doi: 10.1145/1273442.1250755]
    [33] Pham C, Wang L, Tak BC, Baset S, Tang CQ, Kalbarczyk Z, Iyer RK. Failure diagnosis for distributed systems using targeted fault injection. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(2): 503–516. [doi: 10.1109/TPDS.2016.2575829]
    [34] Zhou H, Chen M, Lin Q, Wang Y, She XB, Liu SF, Gu R, Ooi BC, Yang JF. Overload control for scaling wechat microservices. In: Proc. of the 2018 ACM Symp. on Cloud Computing. Carlsbad: ACM, 2018. 149–161.
    [35] Liu HF, Zhang JJ, Shan HS, Li M, Chen Y, He XF, Li XW. JCallGraph: Tracing microservices in very large scale container cloud platforms. In: Proc. of the 12th Int’l Conf. on Cloud Computing. San Diego: Springer, 2019. 287–302.
    [36] Bauer M, van der Aa H, Weidlich M. Estimating process conformance by trace sampling and result approximation. In: Proc. of the 7th Int’l Conf. on Business Process Management. Vienna: Springer, 2019. 179–197.
    [37] Las-Casas P, Mace J, Guedes D, Fonseca R. Weighted sampling of execution traces: Capturing more needles and less hay. In: Proc. of the 2018 ACM Symp. on Cloud Computing. Carlsbad: ACM, 2018. 326–332.
    [38] Las-Casas P, Papakerashvili G, Anand V, Math J. Sifter: Scalable sampling for distributed traces, without feature engineering. In: Proc. of the 2019 ACM Symp. on Cloud Computing. Santa Cruz: ACM, 2019. 312–324.
    [39] Yan Y, Chen LJ, Zhang Z. Error-bounded sampling for analytics on big sparse data. Proceedings of the VLDB Endowment, 2014, 7(13): 1508–1519. [doi: 10.14778/2733004.2733022]
    [40] Hoeffding W. Probability inequalities for sums of bounded random variables. In: Fisher NI, Sen PK, eds. The Collected Works of Wassily Hoeffding. New York: Springer, 1994. 409–426.
    [41] Dinn AE. Flexible, dynamic injection of structured advice using byteman. In: Proc. of the 10th Int’l Conf. on Aspect-oriented Software Development Companion. Porto de Galinhas: ACM, 2011. 41–50.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

黄梓程,陈鹏飞,余广坝,陈泓仰.面向Java微服务系统的透明请求追踪及采样方法.软件学报,2023,34(7):3167-3187

Copy
Share
Article Metrics
  • Abstract:1085
  • PDF: 2513
  • HTML: 1435
  • Cited by: 0
History
  • Received:February 08,2021
  • Revised:August 03,2021
  • Online: November 30,2022
  • Published: July 06,2023
You are the first2035257Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063