Malicious Domain Name Detection Method Based on Graph Contrastive Learning
Author:
Affiliation:

Clc Number:

TP393

  • Article
  • | |
  • Metrics
  • |
  • Reference [44]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    The domain name plays an important role in cybercrimes. Existing malicious domain name detection methods are not only difficult to use with rich topology and attribute information but also require a large amount of label data, resulting in limited detection effects and high costs. To address this problem, this study proposes a malicious domain name detection method based on graph contrastive learning. The domain name and IP address are taken as two types of nodes in a heterogeneous graph, and the feature matrix of corresponding nodes is established according to their attributes. Three types of meta paths are constructed based on the inclusion relationship between domain names, the measure of similarity, and the correspondence between domain names and IP addresses. In the pre-training stage, the contrast learning model based on the asymmetric encoder is applied to avoid the damage to graph structure and semantics caused by graph data augmentation operation and reduce the demand for computing resources. By using the inductive graph neural network graph encoders HeteroSAGE and HeteroGAT, a node-centric mini-batch training strategy is adopted to explore the aggregation relationship between the target node and its neighbor nodes, which solves the problem of poor applicability of the transductive graph neural networks in dynamic scenarios. The downstream classification detection task contrastively utilizes logistic regression and random forest algorithms. Experimental results on publicly available data sets show that detection performance is improved by two to six percentage points compared with that of related works.

    Reference
    [1] 刘文峰, 张宇, 张宏莉, 方滨兴. 域名系统测量研究综述. 软件学报, 2022, 33(1): 211–232. http://www.jos.org.cn/1000-9825/6218.htm
    Liu WF, Zhang Y, Zhang HL, Fang BX. Survey on domain name system measurement research. Ruan Jian Xue Bao/Journal of Software, 2022, 33(1): 211–232 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6218.htm
    [2] 樊昭杉, 王青, 刘俊荣, 崔泽林, 刘玉岭, 刘松. 域名滥用行为检测技术综述. 计算机研究与发展, 2022, 59(11): 2581–2605. [doi: 10.7544/issn1000-1239.20210121]
    Fan ZS, Wang Q, Liu JR, Cui ZL, Liu YL, Liu S. Survey on domain name abuse detection technology. Journal of Computer Research and Development, 2022, 59(11): 2581–2605 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.20210121]
    [3] Plohmann D, Yakdan K, Klatt M, Bader J, Gerhards-Padilla E. A comprehensive measurement study of domain generating malware. In: Proc. of the 25th USENIX Int’l Symp. on Security. Austin: USENIX Association, 2016. 263–278.
    [4] Yadav S, Reddy AKK, Reddy ALN, Ranjan S. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Transactions on Networking, 2012, 20(5): 1663–1677. [doi: 10.1109/TNET.2012.2184552]
    [5] Holz T, Gorecki C, Rieck K, Freiling FC. Measuring and detecting fast-flux service networks. In: Proc. of the 2008 NDSS Int’l Symp. on Network and Distributed System Security. San Diego: ISOC, 2008. 1–12.
    [6] Zhauniarovich Y, Khalil I, Yu T, Dacier M. A survey on malicious domains detection through DNS data analysis. ACM Computing Surveys, 2019, 51(4): 67. [doi: 10.1145/3191329]
    [7] 韩春雨, 张永铮, 张玉. Fast-flucos: 基于DNS流量的Fast-flux恶意域名检测方法. 通信学报, 2020, 41(5): 37–47. [doi: 10.11959/j.issn.1000-436x.2020094]
    Han CY, Zhang YZ, Zhang Y. Fast-flucos: Malicious domain name detection method for Fast-flux based on DNS traffic. Journal on Communications, 2020, 41(5): 37–47 (in Chinese with English abstract). [doi: 10.11959/j.issn.1000-436x.2020094]
    [8] 张维维, 龚俭, 刘茜, 刘尚东, 胡晓艳. 基于词素特征的轻量级域名检测算法. 软件学报, 2016, 27(9): 2348–2364. http://www.jos.org.cn/1000-9825/4913.htm
    Zhang WW, Gong J, Liu Q, Liu SD, Hu XY. Lightweight domain name detection algorithm based on morpheme features. Ruan Jian Xue Bao/Journal of Software, 2016, 27(9): 2348-2364 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4913.htm
    [9] 张斌, 廖仁杰. 基于关联信息提取的恶意域名检测方法. 通信学报, 2021, 42(10): 162–172. [doi: 10.11959/j.issn.1000-436x.2021181]
    Zhang B, Liao RJ. Malicious domain name detection method based on associated information extraction. Journal on Communications, 2021, 42(10): 162–172 (in Chinese with English abstract). [doi: 10.11959/j.issn.1000-436x.2021181]
    [10] Sun XQ, Tong MK, Yang JH, Liu XR, Liu H. HinDom: A robust malicious domain detection system based on heterogeneous information network with transductive classification. In: Proc. of the 22nd Int’l Symp. on Research in Attacks, Intrusions and Defenses. Beijing: USENIX Association, 2019. 399–412.
    [11] Sun XQ, Wang ZL, Yang JH, Liu XR. Deepdom: Malicious domain detection with scalable and heterogeneous graph convolutional networks. Computers & Security, 2020, 99: 102057. [doi: 10.1016/j.cose.2020.102057]
    [12] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017. 1–14.
    [13] Kumarasinghe U, Deniz F, Nabeel M. PDNS-Net: A large heterogeneous graph benchmark dataset of network resolutions for graph learning. arXiv:2203.07969, 2022.
    [14] Antonakakis M, Perdisci R, Dagon D, Lee W, Feamster N. Building a dynamic reputation system for DNS. In: Proc. of the 19th USENIX Conf. on Security. Washington: USENIX Association, 2010. 18.
    [15] Bilge L, Sen S, Balzarotti D, Kirda E, Kruegel C. Exposure: A passive DNS analysis service to detect and report malicious domains. ACM Transactions on Information and System Security, 2014, 16(4): 14. [doi: 10.1145/2584679]
    [16] 彭成维, 云晓春, 张永铮, 李书豪. 一种基于域名请求伴随关系的恶意域名检测方法. 计算机研究与发展, 2019, 56(6): 1263–1274. [doi: 10.7544/issn1000-1239.2019.20180481]
    Peng CW, Yun XC, Zhang YZ, Li SH. Detecting malicious domains using Co-occurrence relation between DNS query. Journal of Computer Research and Development, 2019, 56(6): 1263–1274 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.2019.20180481]
    [17] Zhang S, Zhou Z, Li D, Zhong YB, Liu QY, Yang W, Li S. Attributed heterogeneous graph neural network for malicious domain detection. In: Proc. of the 24th Int’l Conf. on Computer Supported Cooperative Work in Design. Dalian: IEEE, 2021. 397–403.
    [18] Wang X, Bo DY, Shi C, Fan SH, Ye YF, Yu PS. A survey on heterogeneous graph embedding: Methods, techniques, applications and sources. IEEE Transactions on Big Data, 2023, 9(2): 415–436. [doi: 10.1109/TBDATA.2022.3177455]
    [19] Shi C, Li YT, Zhang JW, Sun YZ, Yu PS. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17–37. [doi: 10.1109/TKDE.2016.2598561]
    [20] Li JD, Dani H, Hu X, Tang JL, Chang Y, Liu H. Attributed network embedding for learning in a dynamic environment. In: Proc. of the 2017 ACM Conf. on Information and Knowledge Management. Singapore: Association for Computing Machinery, 2017. 387–396.
    [21] Fu TY, Lee WC, Lei Z. HIN2Vec: Explore meta-paths in heterogeneous information networks for representation learning. In: Proc. of the 2017 ACM Conf. on Information and Knowledge Management. Singapore: Association for Computing Machinery, 2017. 1797–1806.
    [22] Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018. 1–12.
    [23] Schlichtkrull M, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: Proc. of the 15th European Semantic Web Conf. Heraklion: Springer, 2018. 593–607.
    [24] Wang X, Ji HY, Shi C, Wang B, Ye YF, Cui P, Yu PS. Heterogeneous graph attention network. In: Proc. of the 2019 World Wide Web Conf. San Francisco: Association for Computing Machinery, 2019. 2022–2032.
    [25] Zhang CX, Song DJ, Huang C, Swami A, Chawla NV. Heterogeneous graph neural network. In: Proc. of the 25th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. Anchorage: Association for Computing Machinery, 2019. 793–803.
    [26] Hu ZN, Dong YX, Wang KS, Sun YZ. Heterogeneous graph transformer. In: Proc. of the 2020 Web Conf. Taipei: Association for Computing Machinery, 2020. 2704–2710.
    [27] Velickovic P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm RD. Deep graph infomax. In: Proc. of the 7th Int’l Conf. on Learning Representations. New Orleans: OpenReview.net, 2019. 1–17.
    [28] Zhu YQ, Xu YC, Yu F, Liu Q, Wu S, Wang L. Deep graph contrastive representation learning. arXiv:2006.04131, 2020.
    [29] You YN, Chen TL, Sui YD, Chen T, Wang ZY, Shen Y. Graph contrastive learning with augmentations. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 4883.
    [30] Zhu YQ, Xu YC, Yu F, Liu Q, Wu S, Wang L. Graph contrastive learning with adaptive augmentation. In: Proc. of the 2021 Web Conf. Ljubljana: Association for Computing Machinery, 2021. 2069–2080.
    [31] Thakoor S, Tallec C, Azar MG, Munos R, Veličković P, Valko M. Bootstrapped representation learning on graphs. In: Proc. of the 2021 ICLR Workshop. on Geometrical and Topological Representation Learning. Vienna: OpenReview.net, 2021. 1–14.
    [32] Grill JB, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, Doersch C, Pires BA, Guo ZD, Azar MG, Piot B, Kavukcuoglu K, Munos R, Valko M. Bootstrap your own latent a new approach to self-supervised learning. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1786.
    [33] Bielak P, Kajdanowicz T, Chawla NV. Graph Barlow Twins: A self-supervised representation learning framework for graphs. Knowledge-Based Systems, 2022, 256: 109631. [doi: 10.1016/j.knosys.2022.109631]
    [34] Chen XL, He KM. Exploring simple Siamese representation learning. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 15750–15758.
    [35] Zbontar J, Jing L, Misra I, LeCun Y, Deny S. Barlow Twins: Self-supervised learning via redundancy reduction. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 12310–12320.
    [36] Lv QS, Ding M, Liu Q, Chen YX, Feng WZ, He SM, Zhou C, Jiang JG, Dong YX, Tang J. Are we really making much progress? Revisiting, benchmarking and refining heterogeneous graph neural networks. In: Proc. of the 27th ACM SIGKDD Conf. on Knowledge Discovery & Data Mining. Singapore: Association for Computing Machinery, 2021. 1150–1160.
    [37] Hamilton WL, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 1025–1035.
    [38] Wang TZ, Isola P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proc. of the 37th Int’l Conf. on Machine Learning. PMLR, 2020. 9929–9939.
    Related
    Cited by
Get Citation

张震,张三峰,杨望.基于图对比学习的恶意域名检测方法.软件学报,2024,35(10):4837-4858

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 06,2022
  • Revised:January 17,2023
  • Online: September 13,2023
  • Published: October 06,2024
You are the first2032315Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063