基于多元数据融合的网络侧告警排序方法
作者:
作者简介:

王维靖(1997-),女, 博士生, CCF学生会员, 主要研究领域为智能运维, 软件工程;王星凯(1992-), 男, 博士, CCF专业会员, 主要研究领域为安全智能分析, 人工智能安全, 安全知识图谱.;陈俊洁(1992-), 男, 博士, 副教授, 博士生导师, CCF高级会员, 主要研究领域为软件分析与测试;吴复迪(1993-), 男, 主要研究领域为安全攻防及其自动化;杨林(1995-), 男, 博士生, CCF学生会员, 主要研究领域为软件工程, 软件测试生成;张润滋(1989-), 男, 博士, CCF高级会员, 主要研究领域为智能安全运营, 威胁狩猎;侯德俊(1979-), 男, 高级工程师, 主要研究领域为网络与空间安全, 软件开发及安全;王赞(1979-), 男, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为软件测试, 机器学习.

通讯作者:

侯德俊, E-mail: hdj@tju.edu.cn

基金项目:

北京市科技新星计划(Z211100002121150)


Network-side Alert Prioritization Method Based on Multivariate Data Fusion
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [52]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    部署在网络节点上的网络安全监控系统每天会生成海量网络侧告警, 导致安全人员面临巨大压力, 并使其对高风险告警不再敏感, 无法及时发现网络攻击行为. 由于网络攻击行为的复杂多变以及网络侧告警信息的局限性, 已有面向IT运维的告警排序/分类方法并不适用于网络侧告警. 因此, 提出了基于多元数据融合的首个网络侧告警排序方法NAP (network-side alert prioritization). NAP首先设计了一个基于源IP地址与目的IP地址的多策略上下文编码器, 用于捕获告警的上下文信息; 其次, NAP设计了一个基于注意力机制双向GRU (gated recurrent unit)模型与ChineseBERT模型的文本编码器, 从告警报文等文本数据中学习网络侧告警的语义信息; 最后, NAP构建了排序模型得到告警排序值, 并按其降序将攻击性强的高风险告警排在前面, 从而优化网络侧告警管理流程. 在3组绿盟科技网络攻防数据上的实验表明: NAP能够有效且稳定地排序网络侧告警, 并且显著优于对比方法. 例如: 平均排序指标NDCG@k (kÎ[1,10]) (即前1-10个排序结果的归一化折损累计增益)均在0.893 1-0.958 3之间, 比最先进的方法提升64.73%以上. 另外, 通过将NAP应用于天津大学真实的网络侧告警数据, 进一步证实了其实用性.

    Abstract:

    The network security monitoring systems deployed on network nodes generate a large number of network-side alerts every day, causing the security engineers to face significant pressure to lose sensitivity to high-risk alerts and fail to detect network attacks in time. Due to the complexity and variability of cyber attacks and the limitation of network-side alert information, existing alert prioritization/ classification methods for IT operations are unsuitable for network-side alerts. Thus, network-side alert prioritization (NAP), the first network-side alert prioritization method, is proposed based on multivariate data fusion. NAP first designs a multi-strategy context encoder based on source IP address and destination IP address to capture the context information of network-side alerts. And then, NAP designs a text encoder based on the attention-based bidirectional GRU model and the ChineseBERT model to learn the semantic information of network-side alerts from the text data such as alert messages. Finally, NAP builds a ranking model to obtain the alert ranking values and then ranks the high-risk alerts with cyber attack intention in the front according to their descending order to optimize the network-side alert management process. The experiments on three groups of network attack and defense data from NSFOCUS show that NAP can achieve effective and stable prioritization results, and significantly outperforms the compared methods. For example, the average NDCG@k (kÎ[1,10]) (i.e., normalized discounted cumulative gain of the first 1 to 10 ranking results) ranges from 0.893 1 to 0.958 3, and outperforms the state-of-the-art method more than 64.73%. Besides, NAP has been applied to a real-world network-side alert dataset from Tianjin University, further confirming its practicability.

    参考文献
    [1] CNN. What we know about the pipeline ransomware attack: How it happened, who is responsible and more. 2021. https://edition. cnn.com/2021/05/10/politics/colonial-ransomware-attack-explainer/index.html
    [2] Khraisat A, Gondal I, Vamplew P, et al. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity, 2019, 2(1): 1-22. [doi: 10.1186/s42400-019-0038-7]
    [3] Prandl S, Lazarescu M, Pham DS. A study of web application firewall solutions. In: Proc. of the 11th Int’l Conf. on Information Systems Security (ICISS 2015). 2015. 501-510. [doi: 10.1007/978-3-319-26961-0_29]
    [4] Hassan WU, Guo S, Li D, et al. Nodoze: Combatting threat alert fatigue with automated provenance triage. In: Proc. of the 26th Annual Network and Distributed System Security Symp. (NDSS 2019). 2019.
    [5] Orca. The orca security 2022 cloud security alert fatigue report. 2022. https://orca.security/resources/blog/2022-cloud-cyber- security-alert-fatigue-report/
    [6] Lin Y, Chen Z, Cao C, et al. Collaborative alert ranking for anomaly detection. In: Proc. of the 27th ACM Int’l Conf. on Information and Knowledge Management (CIKM 2018). 2018. 1987-1995. [doi: 10.1145/3269206.3272013]
    [7] Alahmadi BA, Axon L, Martinovic I. 99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms. In: Proc. of the 31st USENIX Security Symp. (USENIX Security 2022). 2022. 2783-2800.
    [8] Liu J, Zhang R, Liu W, et al. Context2Vector: Accelerating security event triage via context representation learning. Information and Software Technology, 2022, 146: 106856. [doi: 10.1016/j.infsof.2022.106856]
    [9] Zhao N, Jin P, Wang L, et al. Automatically and adaptively identifying severe alerts for online service systems. In: Proc. of the IEEE Conf. on Computer Communications (INFOCOM 2020). 2020. 2420-2429. [doi: 10.1109/ INFOCOM41043.2020.9155219]
    [10] 刘剑, 苏璞睿, 杨珉, 等. 软件与网络安全研究综述. 软件学报, 2018, 29(1): 42-68. http://www.jos.org.cn/1000-9825/5320.htm [doi: 10.13328/j.cnki.jos.005320]
    Liu J, Su PR, Yang M, et al. Software and cyber security—A survey. Ruan Jian Xue Bao/Journal of Software, 2018, 29(1): 42-68 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5320.htm [doi: 10.13328/j.cnki. jos.005320]
    [11] Jiang G, Chen H, Yoshihira K, et al. Ranking the importance of alerts for problem determination in large computer systems. In: Proc. of the 6th Int’l Conf. on Autonomic Computing. 2009. 3-12. [doi: 10.1145/1555228.1555232]
    [12] Ben-Asher N, Gonzalez C. Effects of cyber security knowledge on attack detection. Computers in Human Behavior, 2015, 48: 51-61. [doi: 10.1016/j.chb.2015.01.039]
    [13] Zhao N, Chen J, Wang Z, et al. Real-time incident prediction for online service systems. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. 2020. 315-326. [doi: 10.1145/3368089.3409672]
    [14] Li B, Yang T, Chen Z, et al. Heterogeneous anomaly detection for software systems via attentive multi-modal learning. arXiv:2207.02918, 2022.
    [15] Chen J, Zhang S, He X, et al. How incidental are the incidents? Characterizing and prioritizing incidents for large-scale online service systems. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering. 2020. 373-384. [doi: 10.1145/3324884.3416624]
    [16] Van Ede T, Aghakhani H, Spahn N, et al. Deepcase: Semi-supervised contextual analysis of security events. In: Proc. of the 2022 IEEE Symp. on Security and Privacy (SP 2022). IEEE, 2022. 522-539. [doi: 10.1109/SP46214.2022.00036]
    [17] Shen Y, Stringhini G. Attack2vec: Leveraging temporal word embeddings to understand the evolution of cyberattacks. In: Proc. of the 28th USENIX Security Symp. (USENIX Security 2019). 2019. 905-921.
    [18] Qin ZQ, Ma XK, Wang YJ. Attentional payload anomaly detector for Web applications. In: Proc. of the 25th Int’l Conf. on Neural Information Processing (ICONIP 2018). Springer, 2018. 588-599. [doi: 10.1007/978-3-030-04212-7_52]
    [19] Jin X, Cui B, Yang J, et al. Payload-based Web attack detection using deep neural network. In: Advances on Broad-band Wireless Computing, Communication and Applications: Proc. of the 12th Int’l Conf. on Broad-band Wireless Computing, Communication and Applications (BWCCA 2017). Springer, 2018. 482-488. [doi: 10.1007/978-3-319-69811- 3_44]
    [20] Jin X, Cui B, Li D, et al. An improved payload-based anomaly detector for Web applications. Journal of Network and Computer Applications, 2018, 106: 111-116. [doi: 10.1016/j.jnca.2018.01.002]
    [21] Torrano-Gimenez C, Nguyen HT, Alvarez G, et al. Applying feature selection to payload-based web application firewalls. In: Proc. of the 3rd Int’l Workshop on Security and Communication Networks (IWSCN 2011). IEEE, 2011. 75-81. [doi: 10.1109/IWSCN. 2011.6827720]
    [22] 王立敏, 卜磊, 马乐之, 等. 基于指标依赖模型构建与监控的攻击检测方法. 软件学报, 2023, 34(6): 2641-2668. http://www. jos.org.cn/1000-9825/6847.htm [doi: 10.13328/j.cnki.jos.006847]
    Wang LM, Bu L, Ma LZ, et al. Attack detection method based on indicator-dependent model construction and monitoring. Ruan Jian Xue Bao/Journal of Software, 2023, 34(6): 2641-2668 (in Chinese with English abstract). http://www.jos. org.cn/1000-9825/ 6847.htm [doi: 10.13328/j.cnki.jos.006847]
    [23] Chen J, Wang P, Wang W. Online summarizing alerts through semantic and behavior information. In: Proc. of the 44th Int’l Conf. on Software Engineering (ICSE 2022). 2022. 1646-1657. [doi: 10.1145/3510003.3510055]
    [24] Zhao N, Chen J, Peng X, et al. Understanding and handling alert storm for online service systems. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering: Software Engineering in Practice (ICSE-SEIP 2020). 2020. 162-171. [doi: 10.1145/3377813. 3381363]
    [25] Lin D, Raghu R, Ramamurthy V, et al. Unveiling clusters of events for alert and incident management in large-scale enterprise it. In: Proc. of the 20th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (SIGKDD 2014). 2014. 1630-1639. [doi: 10.1145/2623330.2623360]
    [26] Ma M, Zhang S, Chen J, et al. Jump-Starting multivariate time series anomaly detection for online service systems. In: Proc. of the 2021 USENIX Annual Technical Conf. (USENIX ATC 2021). 2021. 413-426.
    [27] Zhao N, Chen J, Yu Z, et al. Identifying bad software changes via multimodal anomaly detection for online service systems. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering (ESEC/FSE 2021). 2021. 527-539. [doi: 10.1145/3468264.3468543]
    [28] Landauer M, Skopik F, Wurzenberger M, et al. Dealing with security alert flooding: using machine learning for domain- independent alert aggregation. ACM Trans. on Privacy and Security, 2022, 25(3): 1-36. [doi: 10.1145/3510581]
    [29] Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. In: Proc. of the Neural Information Processing Systems Workshop on Deep Learning (NeurIPS 2014). 2014.
    [30] Wang W, Chen J, Yang L, et al. How long will it take to mitigate this incident for online service systems? In: Proc. of the 32nd Int’l Symp. on Software Reliability Engineering (ISSRE 2021). IEEE, 2021. 36-46. [doi: 10.1109/ISSRE52982.2021.00017]
    [31] Wang W, Chen J, Yang L, et al. Understanding and predicting incident mitigation time. Information and Software Technology, 2023, 155: 107119. [doi: 10.1016/j.infsof.2022.107119]
    [32] Yang L, Chen J, Wang Z, et al. Semi-supervised log-based anomaly detection via probabilistic label estimation. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering (ICSE 2021). IEEE, 2021. 1448-1460. [doi: 10.1109/ICSE43902.2021. 00130]
    [33] Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP 2014). 2014. 1532-1543.
    [34] Devlin J, Chang MW, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019). 2019. 4171-4186.
    [35] Kang Y, Wang Z, Zhang H, et al. Apirecx: Cross-library api recommendation via pre-trained language model. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing (EMNLP 2021). 2021. 3425-3436. [doi: 10.18653/v1/ 2021.emnlp- main.275]
    [36] Shen Q, Chen J, Zhang JM, et al. Natural test generation for precise testing of question answering software. In: Proc. of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE 2022). 2022. 1-12. [doi: 10.1145/3551349. 3556953]
    [37] Gao T, Chen J, Zhao Y, et al. Vectorizing program ingredients for better JVM testing. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis (ISSTA 2023). 2023. 526-537. [doi: 10.1145/3597926.3598075]
    [38] Sun Z, Li X, Sun X, et al. Chinesebert: Chinese pretraining enhanced by glyph and pinyin information. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing (ACL/IJCNLP 2021) (Vol.1: Long Papers). 2021. 2065-2075. [doi: 10.18653/v1/2021.acl-long.161]
    [39] Zhang Y, Wu X, Fang Q, et al. Knowledge-enhanced attributed multi-task learning for medicine recommendation. ACM Trans. on Information Systems (TOIS), 2023, 41(1): 1-24. [doi: 10.1145/3527662]
    [40] Mostafa S, Wang X, Xie T. Perfranker: Prioritization of performance regression tests for collection-intensive software. In: Proc. of the 26th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis (ISSTA 2017). 2017. 23-34. [doi: 10.1145/3092703. 3092725]
    [41] Su Y, Xing Z, Peng X, et al. Reducing bug triaging confusion by learning from mistakes with a bug tossing knowledge graph. In: Proc. of the 36th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE 2021). IEEE, 2021. 191-202. [doi: 10.1109/ ASE51524.2021.9678574]
    [42] Chen N, Lin J, Hoi SCH, et al. AR-Miner: Mining informative reviews for developers from mobile app marketplace. In: Proc. of the 36th Int’l Conf. on Software Engineering (ICSE 2014). 2014. 767-778. [doi: 10.1145/2568225.2568263]
    [43] Al-Maskari A, Sanderson M, Clough P. The relationship between IR effectiveness measures and user satisfaction. In: Proc. of the 30th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. 2007. 773-774. [doi: 10.1145/ 1277741.1277902]
    [44] Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Trans. on Information Systems (TOIS), 2002, 20(4): 422-446. [doi: 10.1145/582415.582418]
    [45] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): 513-523.
    [46] Dos Santos C, Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. In: Proc. of the 25th Int’l Conf. on Computational Linguistics: Technical Papers (COLING 2014). 2014. 69-78.
    [47] 雷杰, 高鑫, 宋杰, 等. 深度网络模型压缩综述. 软件学报, 2018, 29(2): 251-266. http://www.jos.org.cn/1000-9825/5428.htm [doi: 10.13328/j.cnki.jos.005428]
    Lei J, Gao X, Song J, et al. Survey of deep neural network model compression. Ruan Jian Xue Bao/Journal of Software, 2018, 29(2): 251-266 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5428.htm [doi: 10.13328/ j.cnki.jos.005428]
    [48] 高晗, 田育龙, 许封元, 等. 深度学习模型压缩与加速综述. 软件学报, 2021, 32(1): 68-92. http://www.jos.org.cn/1000-9825/ 6096.htm [doi: 10.13328/j.cnki.jos.006096]
    Gao H, Tian YL, Xu FY, et al. Survey of deep learning model compression and acceleration. Ruan Jian Xue Bao/Journal of Software, 2021, 32(1): 68-92 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6096.htm [doi: 10.13328/j.cnki. jos.006096]
    相似文献
    引证文献
引用本文

王维靖,陈俊洁,杨林,侯德俊,王星凯,吴复迪,张润滋,王赞.基于多元数据融合的网络侧告警排序方法.软件学报,2024,35(8):3610-3625

复制
分享
文章指标
  • 点击次数:482
  • 下载次数: 2546
  • HTML阅读次数: 848
  • 引用次数: 0
历史
  • 收稿日期:2023-09-10
  • 最后修改日期:2023-10-30
  • 在线发布日期: 2024-01-05
  • 出版日期: 2024-08-06
文章二维码
您是第19754353位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号