MCL4DGA: DGA Domain Detection Method Based on Multi-view Contrastive Learning

doi:10.13328/j.cnki.jos.007003

微信服务号

微信订阅号

2025-6-1- 23

Home > Archive>Volume 35, Issue 11, 2024 >5228-5248. DOI:10.13328/j.cnki.jos.007003

PDF HTML XML Export Cite reminder

MCL4DGA: DGA Domain Detection Method Based on Multi-view Contrastive Learning
DOI:
                        10.13328/j.cnki.jos.007003
                    
Author:
                        WANG Ji-HuWANG Ji-Hu
School of Software, Shandong University, Jinan 250101, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU Zi-YanLIU Zi-Yan
Information and Telecommunication Company of State Grid Shandong Electric Power Company, Jinan 250021, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
NI Jin-ChaoNI Jin-Chao
Information and Telecommunication Company of State Grid Shandong Electric Power Company, Jinan 250021, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
KONG Fan-YuKONG Fan-Yu
School of Software, Shandong University, Jinan 250101, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHI Yu-LiangSHI Yu-Liang
School of Software, Shandong University, Jinan 250101, China;Dareway Software Co. Ltd., Jinan 250200, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP393
Fund Project:

Article

Figures

Metrics

Reference [46]

Related [20]

Cited by

Materials

Comments

Abstract:

In the field of cyber security, the mendacious domains generated by the domain generation algorithm (DGA) are called DGA domains. Similar to real domains, they are usually a random combination of characters or numbers, which makes DGA domains highly camouflaged. Hackers take advantage of the disguised nature of DGA domains to carry out cyber attacks, so as to bypass security detection. How to effectively detect DGA domains has become a research hotspot. Traditional statistical machine learning detection methods require the manual construction of domain feature sets. However, the quality of domain features constructed manually or semi-automatically varies, which affects the accuracy of detection. In view of the powerful automatic feature extraction and representation capability of deep neural networks, a DGA domain detection method based on multi-view contrastive learning (MCL4DGA) is proposed. Different from existing methods, it incorporates attentional neural networks, convolutional neural networks, and recurrent neural networks to effectively capture global, local, and bidirectional multi-view feature dependencies of domain sequences. Besides, the self-supervision signals derived by contrastive learning can enhance the expressiveness between multi-view feature learning encoders and thus improve the accuracy of detection. The effectiveness of the proposed method is verified by experimental comparison with current methods on a real dataset.

Key words:cyber security;DGA (domain generation algorithm) domain detection;deep neural network (DNN);contrastive learning (CL)

Reference

[1] 郝志超, 王旨思虹. 2021年全球网络空间安全态势分析. 信息安全与通信保密, 2022(1): 2–10.

Hao ZC, Wang ZSH. Analysis of the global cyberspace security posture in 2021. Information Security and Communications Privacy, 2022(1): 2–10 (in Chinese with English abstract).

[2] 刘善玲, 祁正华. 基于特征多样化的恶意域名检测. 南京邮电大学学报(自然科学版), 2021, 41(6): 95–100.

Liu SL, Qi ZH. Malicious domain detection based on diversified characteristics. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2021, 41(6): 95–100 (in Chinese with English abstract).

[3] Sood AK, Zeadally S. A taxonomy of domain-generation algorithms. IEEE Security and Privacy, 2016, 14(4): 46–53. [doi: 10.1109/MSP.2016.76]

[4] Tong V, Nguyen G. A method for detecting DGA botnet based on semantic and cluster analysis. In: Proc. of the 7th Symp. on Information & Communication Technology. Ho Chi Minh City: ACM, 2016. 272–277.

[5] Han CY, Zhang YZ. CODDULM: An approach for detecting C&C domains of DGA on passive DNS traffic. In: Proc. of the 6th Int’l Conf. on Computer Science and Network Technology (ICCSNT). Dalian: IEEE, 2017. 385–388.

[6] Chen Y, Yan S, Pang TY, Chen R. Detection of DGA domains based on support vector machine. In: Proc. of the 3rd Int’l Conf. on Security of Smart Cities, Industrial Control System and Communications. Shanghai: IEEE, 2018. 1–4.

[7] Wang Z, Jia ZT, Zhang B. A detection scheme for DGA domain names based on SVM. In: Proc. of the 2018 Int’l Conf. on Mathematics, Modelling, Simulation and Algorithms (MMSA 2018). Chengdu: Atlantis Press, 2018. 257–263.

[8] Antonakakis M, Perdisci R, Nadji Y, Vasiloglou N, Abu-Nimeh S, Lee WK, Dagon D. From throw-away traffic to bots: Detecting the rise of DGA-based malware. In: Proc. of the 21st USENIX Conf. on Security Symp. Bellevue: USENIX Association, 2012. 491–506.

[9] Woodbridge J, Anderson HS, Ahuja A, Grant D. Predicting domain generation algorithms with long short-term memory networks. arXiv: 1611.00791, 2016.

[10] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. [doi: 10.1162/neco.1997.9.8.1735]

[11] Yu B, Gray DL, Pan J, De Cock M, Nascimento ACA. Inline DGA detection with deep networks. In: Proc. of the 2017 IEEE Int’l Conf. on Data Mining Workshops. New Orleans: IEEE, 2017. 683–692.

[12] Lecun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4): 541–551. [doi: 10.1162/neco.1989.1.4.541]

[13] Highnam K, Puzio D, Luo S, Jennings NR. Real-time detection of dictionary DGA network traffic using deep learning. SN Computer Science, 2021, 2(2): 110. [doi: 10.1007/s42979-021-00507-w]

[14] Qiao YC, Zhang B, Zhang WZ, Sangaiah AK, Wu HL. DGA domain name classification method based on long short-term memory with attention mechanism. Applied Sciences, 2019, 9(20): 4205. [doi: 10.3390/app9204205]

[15] Doersch C, Gupta A, Efros AA. Unsupervised visual representation learning by context prediction. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 1422–1430.

[16] Gidaris S, Singh P, Komodakis N. Unsupervised representation learning by predicting image rotations. arXiv:1803.07728, 2018.

[17] Plohmann D, Yakdan K, Klatt M, Bader J, Gerhards-Padilla E. A comprehensive measurement study of domain generating malware. In: Proc. of the 25th USENIX Conf. on Security Symp. Austin: USENIX Association, 2016. 263–278.

[18] Liao K, Zhao ZM, Doupé A, Ahn GJ. Behind closed doors: Measurement and analysis of Cryptolocker ransoms in Bitcoin. In: Proc. of the 2016 APWG Symp. on Electronic Crime Research (eCrime). Toronto: IEEE, 2016. 1–13.

[19] Kuhn J, Mueller L, Kessem L. The dyre wolf: Attacks on corporate banking accounts. 2015. https://portal.sec.ibm.com/mss/html/en_US/support_resources/pdf/Dyre_Wolf_MSS_Threat_Report.pdf

[20] Mac H, Tran D, Tong V, Nguyen LG, Tran HA. DGA botnet detection using supervised learning methods. In: Proc. of the 8th Int’l Symp. on Information and Communication Technology. Nha Trang City: ACM, 2017. 211–218.

[21] Namgung J, Son S, Moon YS. Efficient deep learning models for dga domain detection. Security and Communication Networks, 2021, 2021: 8887881. [doi: 10.1155/2021/8887881]

[22] Sivaguru R, Choudhary C, Yu B, Tymchenko V, Nascimento A, de Cock M. An evaluation of DGA classifiers. In: Proc. of the 2018 IEEE Int’l Conf. on Big Data (Big Data). Seattle: IEEE, 2018. 5058–5067.

[23] Stiborek J, Pevný T, Rehák M. Probabilistic analysis of dynamic malware traces. Computers & Security, 2018, 74: 221–239. [doi: 10.1016/j.cose.2018.01.012]

[24] Bilge L, Sen S, Balzarotti D, Kirda E, Kruegel C. EXPOSURE: A passive DNS analysis service to detect and report malicious domains. ACM Trans. on Information & System Security, 2014, 16(4): 1–28.

[25] Luo X, Wang LM, Xu Z, Yang J, Sun M, Wang J. DGASensor: Fast detection for DGA-based malwares. In: Proc. of the 5th Int’l Conf. on Communications and Broadband Networking. Bali: ACM, 2017. 47–53.

[26] Alenazi A, Traore I, Ganame K, Woungang I. Holistic model for HTTP botnet detection based on DNS traffic analysis. In: Proc. of the 1st Int’l Conf. on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. Vancouver: Springer, 2017. 1–18.

[27] Yadav S, Reddy AKK, Reddy ALN, Ranjan S. Detecting algorithmically generated malicious domain names. In: Proc. of the 10th ACM SIGCOMM Conf. on Internet Measurement. Melbourne: ACM, 2010. 48–61.

[28] Upadhyay S, Ghorbani A. Feature extraction approach to unearth domain generating algorithms (DGAs). In: Proc. of the 2020 IEEE Int’l Conf. on Dependable, Autonomic and Secure Computing, Int’l Conf. on Pervasive Intelligence and Computing, Int’l Conf. on Cloud and Big Data Computing, Int’l Conf. on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). Calgary: IEEE, 2020. 399–405.

[29] Zhu JC, Zou FT. Detecting malicious domains using modified SVM model. In: Proc. of the 21st IEEE Int’l Conf. on High Performance Computing and Communications; the 17th IEEE Int’l Conf. on Smart City; the 5th IEEE Int’l Conf. on Data Science and Systems (HPCC/SmartCity/DSS). Zhangjiajie: IEEE, 2019. 492–499.

[30] da Silva LM, Silveira MR, Cansian AM, Kobayashi HK. Multiclass classification of malicious domains using passive DNS with XGBoost: (Work in progress). In: Proc. of the 19th IEEE Int’l Symp. on Network Computing and Applications (NCA). Cambridge: IEEE, 2020. 1–3.

[31] Curtin RR, Gardner AB, Grzonkowski S, Kleymenov A, Mosquera A. Detecting DGA domains with recurrent neural networks and side information. In: Proc. of the 14th Int’l Conf. on Availability, Reliability and Security. Canterbury: ACM, 2019. 20.

[32] Tong MK, Sun XQ, Yang JH, Zhang H, Zhu S, Liu XR, Liu H. D3N: DGA detection with deep-learning through NXDomain. In: Proc. of the 12th Int’l Conf. on Knowledge Science, Engineering and Management. Athens: Springer, 2019. 464–471.

[33] Hu XY, Li M, Cheng G, Li RD, Wu H, Gong J. Towards accurate DGA detection based on siamese network with insufficient training samples. In: Proc. of the 2022 ICC IEEE Int’l Conf. on Communications. Seoul: IEEE, 2022. 2670–2675.

[34] Tomas M, Ilya S, Kai C, Greg C, Jeffrey D. Distributed representations of words and phrases and their compositionality. In: Proc. of the 26th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 3111–3119.

[35] Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing. Doha: Association for Computational Linguistics, 2014. 1532–1543.

[36] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics, 2019. 4171–4186.

[37] Boukkouri HE, Ferret O, Lavergne T, Noji H, Zweigenbaum P, Tsujii J. CharacterBERT: Reconciling ELMo and BERT for word-level open-vocabulary representations from characters. In: Proc. of the 28th Int’l Conf. on Computational Linguistics. Barcelona: Int’l Committee on Computational Linguistics, 2020. 6903–6915.

[38] Xie X, Sun F, Liu ZY, Wu SW, Gao JY, Zhang JD, Ding BL, Cui B. Contrastive learning for sequential recommendation. In: Proc. of the 38th IEEE Int’l Conf. on Data Engineering (ICDE). Kuala Lumpur: IEEE, 2022. 1259–1273.

[39] van den Oord A, Li YZ, Vinyals O. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2019.

[40] Diba A, Sharma V, Safdari R, Lotfi D, Sarfraz MS, Stiefelhagen R, van Gool L. Vi²CLR: Video and image for visual contrastive learning of representation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 1482–1492.

[41] Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2017.

[42] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929–1958.

[43] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proc. of the 32nd Int’l Conf. on Machine Learning. Lille: JMLR.org, 2015. 448–456.

[44] Spooren J, Preuveneers D, Desmet L, Janssen P, Joosen W. On the use of DGAs in malware: An everlasting competition of detection and evasion. ACM SIGAPP Applied Computing Review, 2019, 19(2): 31–43. [doi: 10.1145/3357385.3357388]

Get Citation

王继虎,刘子雁,倪金超,孔凡玉,史玉良. MCL4DGA: 基于多视角对比学习的DGA域名检测方法.软件学报,2024,35(11):5228-5248

Copy

Article Metrics

Abstract:508
PDF: 1910
HTML: 561
Cited by: 0

History

Received:March 28,2022
Revised:February 04,2023
Adopted:
Online: November 29,2023
Published: November 06,2024

You are the first2049659Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History