Optimization of Concept Drift Malware Classification Based on BERT and Autoencoder

doi:10.13328/j.cnki.jos.007253

微信服务号

微信订阅号

2025-4-11- 20

Home > Archive>Volume , Issue , >1-17. DOI:10.13328/j.cnki.jos.007253

PDF HTML XML Export Cite reminder

Optimization of Concept Drift Malware Classification Based on BERT and Autoencoder
DOI:
                        10.13328/j.cnki.jos.007253
                    
Author:
                        ZHAO Hao-JunZHAO Hao-Jun
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Engineering Research Center on Big Data Security, Wuhan 430074, China;School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZOU De-QingZOU De-Qing
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Engineering Research Center on Big Data Security, Wuhan 430074, China;School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XUE Wen-JieXUE Wen-Jie
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Engineering Research Center on Big Data Security, Wuhan 430074, China;School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU Yue-MingWU Yue-Ming
School of Computing and Data Science, Nanyang Technological University, Singapore 639798, Singapore
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
JIN HaiJIN Hai
National Engineering Research Center for Big Data Technology and System, Wuhan 430074, China;Key Laboratory of Services Computing Technology and System, Ministry of Education, Wuhan 430074, China;Hubei Key Laboratory of Cluster and Grid Computing, Wuhan 430074, China;School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Software concept drift means that the structure and composition of the same type of software will change over time. In malware classification, concept drift means that the structure and composition characteristics of malware samples from the same family can change over time. This will cause a decline in the performance of fixed-mode malware classification algorithms over time. Existing methods for static malware classification experience significant performance degradation when faced with concept drift scenarios, making it difficult to meet the needs of practical applications. To address this problem, given the commonalities between natural language understanding and binary byte stream analysis, a highly accurate and robust malware classification method is proposed based on BERT and a custom autoencoder architecture. This method extracts execution-oriented malware opcode sequences through disassembly analysis to reduce redundant information. Then, it uses BERT to understand the contextual semantics of the sequences and perform vector embedding to effectively understand the deep program semantics of the malware samples. It also screens effective task-related features through the geometric median subspace projection and bottleneck autoencoders. Finally, a classifier composed of fully connected layers is used to output the classification results. The practical effectiveness of the proposed method is validated through comparative experiments with nine state-of-the-art malware classification methods in both normal and concept drift scenarios. Experimental results show that the proposed method achieves an F1 score of 99.49% in normal scenarios, outperforming those nine methods. Moreover, in concept drift scenarios, the F1 score is improved by 10.78% to 43.71% compared to the nine methods.

Key words:malware static analysis;concept drift;robust optimization

Get Citation

赵浩钧,邹德清,薛文杰,吴月明,金海.基于BERT与自编码器的概念漂移恶意软件分类优化.软件学报,,():1-17

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 09,2023
Revised:April 28,2024
Adopted:
Online: December 04,2024
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History