Optimization of Concept Drift Malware Classification Based on BERT and Autoencoder
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Software concept drift means that the structure and composition of the same type of software will change over time. In malware classification, concept drift means that the structure and composition characteristics of malware samples from the same family can change over time. This will cause a decline in the performance of fixed-mode malware classification algorithms over time. Existing methods for static malware classification experience significant performance degradation when faced with concept drift scenarios, making it difficult to meet the needs of practical applications. To address this problem, given the commonalities between natural language understanding and binary byte stream analysis, a highly accurate and robust malware classification method is proposed based on BERT and a custom autoencoder architecture. This method extracts execution-oriented malware opcode sequences through disassembly analysis to reduce redundant information. Then, it uses BERT to understand the contextual semantics of the sequences and perform vector embedding to effectively understand the deep program semantics of the malware samples. It also screens effective task-related features through the geometric median subspace projection and bottleneck autoencoders. Finally, a classifier composed of fully connected layers is used to output the classification results. The practical effectiveness of the proposed method is validated through comparative experiments with nine state-of-the-art malware classification methods in both normal and concept drift scenarios. Experimental results show that the proposed method achieves an F1 score of 99.49% in normal scenarios, outperforming those nine methods. Moreover, in concept drift scenarios, the F1 score is improved by 10.78% to 43.71% compared to the nine methods.

    Reference
    Related
    Cited by
Get Citation

赵浩钧,邹德清,薛文杰,吴月明,金海.基于BERT与自编码器的概念漂移恶意软件分类优化.软件学报,,():1-17

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 09,2023
  • Revised:April 28,2024
  • Adopted:
  • Online: December 04,2024
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063