基于图对比学习的恶意域名检测方法
作者:
作者单位:

作者简介:

张震(1998-), 男, 硕士生, 主要研究领域为图神经网络, 图自监督学习, 恶意域名检测;张三峰(1979-), 男, 博士, 副教授, CCF专业会员, 主要研究领域为威胁情报分析, 智能安全, 对抗样本;杨望(1979-), 男, 博士, 讲师, CCF专业会员, 主要研究领域为威胁情报分析, 恶意软件检测, 网络安全应急响应

通讯作者:

张三峰, E-mail: sfzhang@seu.edu.cn

中图分类号:

TP393

基金项目:

国家自然科学基金(62272100)


Malicious Domain Name Detection Method Based on Graph Contrastive Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    域名是实施网络犯罪行为的重要环节, 现有的恶意域名检测方法一方面难以利用丰富的拓扑和属性信息, 另一方面需要大量的标签数据, 检测效果受限而成本较高. 针对该问题, 提出一种基于图对比学习的恶意域名检测方法, 以域名和IP地址作为异构图的两类节点并根据其属性建立对应节点的特征矩阵, 依据域名之间的包含关系、相似度度量以及域名和IP地址之间对应关系构建3种元路径; 在预训练阶段, 使用基于非对称编码器的对比学习模型, 避免图数据增强操作对图结构和语义的破坏, 也降低对计算资源的需求; 使用归纳式的图神经网络图编码器HeteroSAGE和HeteroGAT, 采用以节点为中心的小批量训练模式来挖掘目标节点和邻居节点的聚合关系, 避免直推式图神经网络在动态场景下适用性较差的问题; 下游分类检测任务则对比使用了逻辑回归、随机森林等算法. 在公开数据上的实验结果表明检测性能相比已有工作提高2–6个百分点.

    Abstract:

    The domain name plays an important role in cybercrimes. Existing malicious domain name detection methods are not only difficult to use with rich topology and attribute information but also require a large amount of label data, resulting in limited detection effects and high costs. To address this problem, this study proposes a malicious domain name detection method based on graph contrastive learning. The domain name and IP address are taken as two types of nodes in a heterogeneous graph, and the feature matrix of corresponding nodes is established according to their attributes. Three types of meta paths are constructed based on the inclusion relationship between domain names, the measure of similarity, and the correspondence between domain names and IP addresses. In the pre-training stage, the contrast learning model based on the asymmetric encoder is applied to avoid the damage to graph structure and semantics caused by graph data augmentation operation and reduce the demand for computing resources. By using the inductive graph neural network graph encoders HeteroSAGE and HeteroGAT, a node-centric mini-batch training strategy is adopted to explore the aggregation relationship between the target node and its neighbor nodes, which solves the problem of poor applicability of the transductive graph neural networks in dynamic scenarios. The downstream classification detection task contrastively utilizes logistic regression and random forest algorithms. Experimental results on publicly available data sets show that detection performance is improved by two to six percentage points compared with that of related works.

    参考文献
    相似文献
    引证文献
引用本文

张震,张三峰,杨望.基于图对比学习的恶意域名检测方法.软件学报,2024,35(10):4837-4858

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-09-06
  • 最后修改日期:2023-01-17
  • 录用日期:
  • 在线发布日期: 2023-09-13
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号