基于双分支特征提取和自适应胶囊网络的DGA域名检测方法
作者:
作者简介:

杨宏宇(1969-), 男, 博士, 教授, 博士生导师, CCF高级会员, 主要研究领域为网络与系统安全, 软件安全检测, 网络安全态势感知;成翔(1988-), 男, 博士, 实验师, CCF专业会员, 主要研究领域为网络与系统安全, 网络安全态势感知, APT攻击检测;章涛(1995-), 男, 硕士生, 主要研究领域为网络信息安全, DGA域名检测;胡泽(1989-), 男, 博士, CCF专业会员,主要研究领域为人工智能, 自然语言处理, 网络信息安全;张良(1987-), 男, 博士, 研究员, 主要研究领域为强化学习, 基于深度学习的信号处理, 网络与系统安全.

通讯作者:

胡泽, E-mail: zhu@cauc.edu.cn

基金项目:

国家自然科学基金(62201576, U1833107); 中央高校基本科研业务费专项资金(3122022050); 中国民航大学信息安 全测评中心开放基金(ISECCA-202202); 中国民航大学学科经费


DGA Domain Name Detection Method Based on Double Branch Feature Extraction and Adaptive Capsule Network
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [45]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    面向域名生成算法(domain generation algorithm, DGA)的域名检测方法普遍具有特征提取能力弱、特征信息压缩比高等特点, 这导致特征信息丢失、特征结构破坏以及域名检测效果较差等诸多不足. 针对上述问题, 提出一种基于双分支特征提取和自适应胶囊网络的DGA域名检测方法. 首先, 通过样本清洗和字典构建重构原始样本并生成重构样本集; 其次, 通过双分支特征提取网络处理重构样本, 在其中, 利用切片金字塔网络提取域名局部特征, 利用Transformer提取域名全局特征, 并利用轻量级注意力融合不同层次的域名特征; 然后, 利用自适应胶囊网络计算域名特征图的重要度系数, 将域名文本特征转换为向量域名特征, 并通过特征转移计算基于文本特征的域名分类概率; 同时, 利用多层感知机处理域名统计特征, 以此计算基于统计特征的域名分类概率; 最后, 通过合并得到的两种不同视角的域名分类概率进行域名检测. 大量的实验表明, 所提方法在DGA域名检测以及DGA域名家族检测分类方面均取得了当前领先的检测效果. 在DGA域名检测中, F1分数提升了0.76%-5.57%; 在DGA域名家族检测分类中, F1分数(宏平均)提升了1.79%-3.68%.

    Abstract:

    The existing domain name detection methods for domain generation algorithm (DGA) generally have the characteristics of weak feature extraction ability and high feature information compression ratio, which lead to feature information loss, feature structure destruction, and poor domain name detection performance. Aiming at the above problems, a DGA domain name detection method based on double branch feature extraction and adaptive capsule network is proposed. Firstly, the original samples are reconstructed through sample cleaning and dictionary construction, and the reconstructed sample set is generated. Secondly, the reconstructed samples are processed by a double branch feature extraction network, in which the local features of domain name are extracted by using a sliced pyramid network, the global features of domain name are extracted by using a transformer, and the features at different levels are fused by using lightweight attention. Then, an adaptive capsule network is used to calculate the importance coefficient of the domain name feature map, convert domain name text features into vector domain name features, and calculate the domain name classification probability based on text features by feature transfer. Meanwhile, multilayer perceptron is used to process domain name statistical features to calculate the domain name classification probability based on statistical features. Finally, domain name detection is performed by combining the domain name classification probabilities from two different perspectives. A large number of experiments show that the method proposed in this study achieves leading detection results in DGA domain name detection and DGA domain name family detection and classification, where the F1-score in DGA domain name detection increased by 0.76% to 5.57%, and the F1-score (macro average) in DGA domain name family detection classification increased by 1.79% to 3.68%.

    参考文献
    [1] Zhao H, Chen ZW, Yan RJ. Malicious domain names detection algorithm based on statistical features of URLs. In: Proc. of the 25th Int’l Conf. on Computer Supported Cooperative Work in Design. Hangzhou: IEEE, 2022. 11-16.
    [2] Cui J, Zhang L, Liu ZH, et al. An efficient framework for online malicious domain detection. In: Proc. of the 11th Int’l Congress on Image and Signal Processing, BioMedical Engineering and Informatics. Beijing: IEEE, 2018. 1-6.
    [3] 邹福泰, 谭越, 王林, 等. 基于生成对抗网络的僵尸网络检测. 通信学报, 2021, 42(7): 95-106. https://www.infocomm- journal.com/txxb/CN/10.11959/j.issn.1000-436x.2021082 [doi: 10.11959/j.issn.1000-436x.2021082]
    Zou FT, Tan Y, Wang L, et al. Botnet detection based on generative adversarial network. Journal on Communications, 2021, 42(7): 95-106 (in Chinese with English abstract). https://www.infocomm-journal.com/txxb/CN/10.11959/j.issn.1000-436x.2021082 [doi: 10.11959/j.issn.1000-436x.2021082]
    [4] Hoang XD, Vu XH. An improved model for detecting DGA botnets using random forest algorithm. Information Security Journal: A Global Perspective, 2022, 31(4): 441-450.
    [5] Anderson HS, Woodbridge J, Filar B. DeepDGA: Adversarially-tuned domain generation and detection. In: Proc of the 2016 ACM Workshop on Artificial Intelligence and Security. New York: ACM, 2016. 13-21.
    [6] Yoshida K, Fujiwara K, Sato A, et al. Cardinality analysis to classify malicious domain names. In: Proc. of the 44th IEEE Annual Computers, Software, and Applications Conf. Madrid: IEEE, 2020. 826-832.
    [7] Chiba D, Yagi T, Akiyama M, et al. DomainProfiler: Discovering domain names abused in future. In: Proc. of the 46th Annual IEEE/IFIP Int’l Conf. on Dependable Systems and Networks. Toulouse: IEEE, 2016. 491-502.
    [8] Bilge L, Sen S, Balzarotti D, et al. Exposure: A passive DNS analysis service to detect and report malicious domains. ACM Trans. on Information and System Security, 2014, 16(4): 1-28.
    [9] Schüppen S, Teubert D, Herrmann P, et al. FANCI: Feature-based automated NXDomain classification and intelligence. In: Proc. of the 27th USENIX Security Symp. Berkeley: USENIX Association, 2018. 1165-1181.
    [10] Shahzad H, Sattar AR, Skandaraniyam J. DGA domain detection using deep learning. In: Proc. of the 5th IEEE Int’l Conf. on Cryptography, Security and Privacy. Zhuhai: IEEE, 2021. 139-143.
    [11] Curtin RR, Gardner AB, Grzonkowski S, et al. Detecting DGA domains with recurrent neural networks and side information. In: Proc. of the 14th Int’l Conf. on Availability. New York: ACM, 2019. 1-10.
    [12] Pei XJ, Tian SW, Yu L, et al. A two-stream network based on capsule networks and sliced recurrent neural networks for DGA botnet detection. Journal of Network and Systems Management, 2020, 28: 1694-1721.
    [13] Ravi V, Alazab M, Srinivasan S, et al. Adversarial defense: DGA-based botnets and DNS homographs detection through integrated deep learning. IEEE Trans. on Engineering Management, 2021, 70(1): 249-266.
    [14] Hu XY, Chen H, Li M, et al. ReplaceDGA: BiLSTM based adversarial DGA with high anti-detection ability. IEEE Trans. on Information Forensics and Security, 2023, 18: 4406-4421.
    [15] Vinayakumar R, Soman KP, Poornachandran P. Detecting malicious domain names using deep learning approaches at scale. Journal of Intelligent & Fuzzy Systems, 2018, 34(3): 1355-1367.
    [16] Tran D, Mac H, Van T, et al. A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, 2018, 275: 2401-2413.
    [17] Xu CY, Shen JZ, Du X. Detection method of domain names generated by DGAs based on semantic representation and deep neural network. Computers & Security, 2019, 85: 77-88.
    [18] Yang LH, Liu GJ, Wang JW, et al. Fast3DS: A real-time full-convolutional malicious domain name detection system. Journal of Information Security and Applications, 2021, 61: 102933-102946.
    [19] Namgung J, Son S, Moon YS. Efficient deep learning models for DGA domain detection. Security and Communication Networks, 2021, 2021: 1-15.
    [20] Highnam K, Puzio D, Luo S, et al. Real-time detection of dictionary DGA network traffic using deep learning. SN Computer Science, 2021, 2(2): 110-126.
    [21] Tuan TA, Long HV, Taniar D. On detecting and classifying DGA botnets and their families. Computers & Security, 2022, 113: 102549-102565.
    [22] Huang WQ, Zong YY, Shi ZX, et al. PEPC: A deep parallel convolutional neural network model with pre-trained embeddings for DGA detection. In: Proc. of the 2022 Int’l Joint Conf. on Neural Networks. Padua: IEEE, 2022. 1-8.
    [23] Liu XY, Liu JM. DGA botnet detection method based on capsule network and k-means routing. Neural Computing and Applications, 2022, 34(11): 8803-8821.
    [24] 刘璐璐. 面向物联网僵尸网络的DGA检测算法研究 [硕士学位论文]. 西安: 西安电子科技大学, 2023. [doi: 10.27389/d.cnki. gxadu.2022.002288]
    Liu LL. Research on DGA detection algorithm for IoT botnet [MS. Thesis]. Xi’an: Xidian University, 2023 (in Chinese with English abstract). [doi: 10.27389/d.cnki.gxadu.2022.002288]
    [25] Abu Al-Haija Q, Alohaly M, Odeh A. A lightweight double-stage scheme to identify malicious DNS over HTTPS traffic using a hybrid learning approach. Sensors, 2023, 23(7): 3489.
    [26] Lyu M, Gharakheili HH, Sivaraman V. A survey on DNS encryption: Current development, malware misuse, and inference techniques. ACM Computing Surveys, 2022, 55(8): 1-28.
    [27] 刘小洋, 刘加苗, 刘超, 等. 融合字符级滑动窗口和深度残差网络的僵尸网络DGA域名检测方法. 电子学报, 2022, 50(1): 250-256. https://www.ejournal.org.cn/CN/Y2022/V50/I1/250 [doi: 10.12263/DZXB.20200619]
    Liu XY, Liu JM, Liu C, et al. Novel botnet DGA domain detection method based on character level sliding windows and deep residual network. Acta Electronica Sinica, 2022, 50(1): 250-256 (in Chinese with English abstract). https://www.ejournal.org. cn/CN/Y2022/V50/I1/250 [doi: 10.12263/DZXB.20200619]
    [28] Chen YP, Dai XY, Chen DD, et al. Mobile-Former: Bridging mobilenet and transformer. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Louisiana: IEEE, 2022. 5270-5279.
    [29] Albawi S, Mohammed TA, Al-Zawi S. Understanding of a convolutional neural network. In: Proc. of the 2017 Int’l Conf. on Engineering and Technology. Antalya: IEEE, 2017. 1-6.
    [30] Sandler M, Howard A, Zhu ML, et al. Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake: IEEE, 2018. 4510-4520.
    [31] Zhang K, Sun M, Han TX, et al. Residual networks of residual networks: multilevel residual networks. IEEE Trans. on Circuits and Systems for Video Technology, 2017, 28(6): 1303-1314.
    [32] Hua BS, Tran MK, Yeung SK. Pointwise convolutional neural networks. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Salt Lake: IEEE, 2018. 984-993.
    [33] Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. New York: ACL, 2017. 562-570.
    [34] Rasamoelina AD, Adjailia F, Sinčák P. A review of activation function for artificial neural network. In: Proc. of the 18th IEEE World Symp. on Applied Machine Intelligence and Informatics (SAMI). Herlany: IEEE, 2020. 281-286.
    [35] Gao B, Pavel L. On the properties of the softmax function with application in game theory and reinforcement learning. arXiv: 1704.00805, 2017.
    [36] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proc of the 31st Int’l Conf. on Neural Information Processing System. Curran Associates, 2017. 6000-6010.
    [37] Hendrycks D, Gimpel K. Gaussian error linear units (gelus). arXiv:1606.08415, 2016. 1-11.
    [38] Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. In: Proc of the 31st Int’l Conf. on Neural Information Processing System. Curran Associates, 2017. 1-11.
    [39] Hu J, Shen L, Sun G. Squeeze-and-Excitation networks. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake: IEEE, 2018. 7132-7141.
    [40] Park KH, Song HM, YOO JD, et al. Unsupervised malicious domain detection with less labeling effort. Computers & Security, 2022, 116: 102662-102675.
    [41] The majestic million. 2023. https://majestic.com/reports/majestic-million
    [42] Zago M, Pérez MG, Pérez GM. UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection. Data in Brief, 2020, 30: 105400-105416.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杨宏宇,章涛,张良,成翔,胡泽.基于双分支特征提取和自适应胶囊网络的DGA域名检测方法.软件学报,2024,35(8):3626-3646

复制
分享
文章指标
  • 点击次数:534
  • 下载次数: 2761
  • HTML阅读次数: 849
  • 引用次数: 0
历史
  • 收稿日期:2023-09-10
  • 最后修改日期:2023-10-30
  • 在线发布日期: 2024-01-05
  • 出版日期: 2024-08-06
文章二维码
您是第19728356位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号