基于调用链控制流分析的大型微服务系统性能建模与异常定位
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家重点研发计划(2019YFB1802504, 2019YFE0105500); 国家自然科学基金(62072264)


Performance Modeling and Anomaly Location of Large Microservice Systems Based on Trace Control Flow Analysis
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    大型微服务系统中组件众多、依赖关系复杂, 由于故障传播的涟漪效应, 一个故障可能引起大规模服务异常, 快速识别异常并定位根因是服务质量保证的关键. 目前主要采用的调用链分析方法, 常常面临调用链结构复杂、实例数量庞大、存在大量小样本等问题, 因此提出基于调用链控制流分析, 将大量调用链结构聚合为少量方法调用模型; 并提出基于方法调用模型的执行时间分解模型及预测方法, 将实际值与预测值的相对误差超过设定阈值的待检测数据判定为异常. 采用百度凤巢广告业务系统某天超过17亿条调用链日志记录开展实验分析, 结果表明: 与数据驱动的调用序列分析方法相比, 提出的基于模型的方法可以大幅缩减调用链结构数量, 并有效分析和检测微服务性能异常及其根因.

    Abstract:

    In a large microservice system, there usually exist many services with complex dependencies among them. A failure in one component may propagate widely and cause large-scale service anomalies. To ensure system quality, it is critical to effectively identify abnormalities and locate root causes. Invocation-chain analysis is a commonly used method for service performance modeling and anomaly detection. Existing techniques are mostly data-driven, facing many challenges of big data analysis such as diversified chain structures, a vast number of instances, and imbalanced datasets that many structures have only a small number of samples. In counter to the problems, the study proposes a model-based approach which builds high-level abstractions of method invocation models based on control-flow analysis. The instances of various invocation-chain structures are clustered into various method invocation models, which can greatly reduce the size of chain structures. Performance models are built for the method invocation models, and thresholds are defined based on the predicted execution time derived from the performance model. Outliers in the trace logs are thus identified as candidates of anomalies. Experiments were exercised on real industry logs from Baidu PhoenixNest Ads system. A one-day log with over 1.7 billion records was selected. The experiment results show that, compared with pure data-driven sequence analysis methods, the proposed model-based approach can greatly reduce the size of invocation-chain structures while effectively analyzing and detecting microservice performance anomalies and root causes.

    参考文献
    相似文献
    引证文献
引用本文

于庆洋,白晓颖,李明杰,李奇原,刘涛,刘泽胤,裴丹.基于调用链控制流分析的大型微服务系统性能建模与异常定位.软件学报,2022,33(5):1849-1864

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-03-30
  • 最后修改日期:2020-06-11
  • 录用日期:
  • 在线发布日期: 2022-05-09
  • 出版日期: 2022-05-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号