基于BERT与自编码器的概念漂移恶意软件分类优化

doi:10.13328/j.cnki.jos.007253

微信服务号

微信订阅号

2025年4月3日 10:04 星期四

首页 > 过刊浏览>年第卷第期 >1-17. DOI:10.13328/j.cnki.jos.007253

PDF HTML阅读 XML下载导出引用引用提醒

基于BERT与自编码器的概念漂移恶意软件分类优化
DOI:
                        10.13328/j.cnki.jos.007253
                    
CSTR:
                        
                    
作者:
                        赵浩钧赵浩钧
大数据技术与系统国家地方联合工程研究中心, 湖北 武汉 430074;服务计算技术与系统教育部重点实验室, 湖北 武汉 430074;大数据安全湖北省工程研究中心, 湖北 武汉 430074;华中科技大学 网络空间安全学院, 湖北 武汉 430074
在期刊界中查找
在百度中查找
在本站中查找
邹德清邹德清
大数据技术与系统国家地方联合工程研究中心, 湖北 武汉 430074;服务计算技术与系统教育部重点实验室, 湖北 武汉 430074;大数据安全湖北省工程研究中心, 湖北 武汉 430074;华中科技大学 网络空间安全学院, 湖北 武汉 430074
在期刊界中查找
在百度中查找
在本站中查找
薛文杰薛文杰
大数据技术与系统国家地方联合工程研究中心, 湖北 武汉 430074;服务计算技术与系统教育部重点实验室, 湖北 武汉 430074;大数据安全湖北省工程研究中心, 湖北 武汉 430074;华中科技大学 网络空间安全学院, 湖北 武汉 430074
在期刊界中查找
在百度中查找
在本站中查找
吴月明吴月明
南洋理工大学 计算机与数据科学学院, 新加坡 639798
在期刊界中查找
在百度中查找
在本站中查找
金海金海
大数据技术与系统国家地方联合工程研究中心, 湖北 武汉 430074;服务计算技术与系统教育部重点实验室, 湖北 武汉 430074;集群与网格计算湖北省重点实验室, 湖北 武汉 430074;华中科技大学 计算机科学与技术学院, 湖北 武汉 430074
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP311
基金项目:国家自然科学基金面上项目(62172168)

Optimization of Concept Drift Malware Classification Based on BERT and Autoencoder

Author:

ZHAO Hao-Jun
ZHAO Hao-Jun
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Engineering Research Center on Big Data Security, Wuhan 430074, China;School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
在期刊界中查找
在百度中查找
在本站中查找
ZOU De-Qing
ZOU De-Qing
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Engineering Research Center on Big Data Security, Wuhan 430074, China;School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
在期刊界中查找
在百度中查找
在本站中查找
XUE Wen-Jie
XUE Wen-Jie
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Engineering Research Center on Big Data Security, Wuhan 430074, China;School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
在期刊界中查找
在百度中查找
在本站中查找
WU Yue-Ming
WU Yue-Ming
School of Computing and Data Science, Nanyang Technological University, Singapore 639798, Singapore
在期刊界中查找
在百度中查找
在本站中查找
JIN Hai
JIN Hai
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Key Laboratory of Cluster and Grid Computing, Wuhan 430074, China;School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

软件概念漂移指同类型软件的软件结构和组成成分会随着时间的推移而改变. 在恶意软件分类领域, 发生概念漂移意味着同一家族的恶意样本的结构和组成特征会随时间发生变化, 这会导致固定模式的恶意软件分类算法的性能会随时间推移而发生下降. 现有的恶意软件静态分类研究方法在面临概念漂移场景时都会有显著的性能下降, 因此难以满足实际应用的需求. 针对这一问题, 鉴于自然语言理解领域与二进制程序字节流分析领域的共性, 基于BERT和自定义的自编码器架构提出一种高精度、鲁棒的恶意软件分类方法. 该方法首先通过反汇编分析提取执行导向的恶意软件操作码序列, 减少冗余信息; 然后使用BERT理解序列的上下文语义并进行向量嵌入, 有效地理解恶意软件的深层程序语义; 再通过几何中位数子空间投影和瓶颈自编码器进行任务相关的有效特征筛选; 最后通过全连接层构成的分类器输出分类结果. 在普通场景和概念漂移场景中, 通过与最先进的9种恶意软件分类方法进行对比实验验证所提方法的实际有效性. 实验结果显示: 所提方法在普通场景下的分类F1值达到99.49%, 高于所有对比方法, 且在概念漂移场景中的分类F1值比所有对比方法提高10.78%–43.71%.

关键词:恶意软件静态分析;概念漂移;鲁棒性优化

Abstract:

Software concept drift means that the structure and composition of the same type of software will change over time. In malware classification, concept drift means that the structure and composition characteristics of malware samples from the same family can change over time. This will cause a decline in the performance of fixed-mode malware classification algorithms over time. Existing methods for static malware classification experience significant performance degradation when faced with concept drift scenarios, making it difficult to meet the needs of practical applications. To address this problem, given the commonalities between natural language understanding and binary byte stream analysis, a highly accurate and robust malware classification method is proposed based on BERT and a custom autoencoder architecture. This method extracts execution-oriented malware opcode sequences through disassembly analysis to reduce redundant information. Then, it uses BERT to understand the contextual semantics of the sequences and perform vector embedding to effectively understand the deep program semantics of the malware samples. It also screens effective task-related features through the geometric median subspace projection and bottleneck autoencoders. Finally, a classifier composed of fully connected layers is used to output the classification results. The practical effectiveness of the proposed method is validated through comparative experiments with nine state-of-the-art malware classification methods in both normal and concept drift scenarios. Experimental results show that the proposed method achieves an F1 score of 99.49% in normal scenarios, outperforming those nine methods. Moreover, in concept drift scenarios, the F1 score is improved by 10.78% to 43.71% compared to the nine methods.

Key words:malware static analysis;concept drift;robust optimization

引用本文

赵浩钧,邹德清,薛文杰,吴月明,金海.基于BERT与自编码器的概念漂移恶意软件分类优化.软件学报,,():1-17

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-12-09
最后修改日期:2024-04-28
录用日期:
在线发布日期: 2024-12-04
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码