基于神经网络的分布式追踪数据压缩和查询方法

doi:10.13328/j.cnki.jos.007315

微信服务号

微信订阅号

2025年7月19日 21:06 星期六

首页 > 过刊浏览>2025年第36卷第9期 >4287-4312. DOI:10.13328/j.cnki.jos.007315

PDF HTML阅读 XML下载导出引用引用提醒

基于神经网络的分布式追踪数据压缩和查询方法
DOI:
                        10.13328/j.cnki.jos.007315
                    
CSTR:
                        
                    
作者:
                        王尚王尚
复旦大学 计算机科学技术学院, 上海 200438;上海市数据科学重点实验室(复旦大学), 上海 200438
在期刊界中查找
在百度中查找
在本站中查找
张晨曦张晨曦
复旦大学 计算机科学技术学院, 上海 200438;上海市数据科学重点实验室(复旦大学), 上海 200438
在期刊界中查找
在百度中查找
在本站中查找
彭鑫彭鑫
复旦大学 计算机科学技术学院, 上海 200438;上海市数据科学重点实验室(复旦大学), 上海 200438
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP311
基金项目:

Neural-network-based Compression and Query Approach for Distributed Tracing Data

Author:

WANG Shang
WANG Shang
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 200438, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Chen-Xi
ZHANG Chen-Xi
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 200438, China
在期刊界中查找
在百度中查找
在本站中查找
PENG Xin
PENG Xin
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 200438, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

分布式追踪数据作为一种重要的可观测性数据, 对性能分析、故障诊断、系统理解等运维任务起着至关重要的作用. 由于系统规模和复杂性的快速增加, 追踪数据的规模愈发庞大, 对存储提出了更高的要求. 为了降低追踪数据的存储成本, 数据压缩成为一种至关重要的方式. 现有的压缩方法无法充分利用追踪的数据特征实现高效压缩, 而且不支持对压缩数据的复杂查询. 提出了一种基于神经网络的分布式追踪数据压缩和查询方法. 该方法采用一种新的冗余抽取方式来识别追踪数据中的模式冗余和结构冗余, 并利用神经网络模型和算术编码实现高效的数据压缩. 同时, 该方法可以在压缩数据上进行高效查询, 而无需解压所有数据. 在4个开源微服务系统上收集多个不同大小的追踪数据集, 并对该方法展开评估. 实验结果表明, 该方法实现了较高的压缩比(105.5–197.6), 平均是现有通用压缩算法的4倍. 此外, 还验证了该方法在压缩数据上的查询效率, 在最优情况下快于现有查询工具.

关键词:分布式追踪;无损压缩;查询;神经网络

Abstract:

As an essential type of observability data, distributed tracing data plays a crucial role in operation and maintenance tasks like performance analysis, fault diagnosis, and system understanding. Due to the rapid increase in system scale and complexity, the volume of tracing data grows exponentially, putting forward higher storage requirements. To mitigate the storage cost of tracing data, data compression becomes a crucial approach. Existing compression methods fail to fully exploit tracing data features for achieving efficient compression, and they do not support complex queries on compressed data either. This study introduces a neural-network-based approach for compressing and querying distributed tracing data. It employs a novel redundancy extraction technique to identify pattern and structural redundancies within tracing data, and leverages neural network models and arithmetic coding to achieve efficient data compression. Meanwhile, the method enables efficient querying of compressed data without decompressing all the data. Variously sized tracing datasets are collected from four open-source microservices systems, and the proposed method is evaluated. Experimental results show relatively high compression ratios (105.5–197.6) are achieved by the proposed method, which are four times those of state-of-the-art general compression algorithms on average. Additionally, the querying efficiency of the proposed method on the compressed data is validated, showcasing faster performance than existing query tools in optimal scenarios.

Key words:distributed tracing;lossless compression;querying;neural network

引用本文

王尚,张晨曦,彭鑫.基于神经网络的分布式追踪数据压缩和查询方法.软件学报,2025,36(9):4287-4312

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-01-15
最后修改日期:2024-06-06
录用日期:
在线发布日期: 2025-06-04
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码