Software vulnerabilities pose significant threats to real-world systems. In recent years, learning-based vulnerability detection methods, especially deep learning-based approaches, have gained widespread attention due to their ability to extract implicit vulnerability features from large-scale vulnerability samples. However, due to differences in features among different types of vulnerabilities and the problem of imbalanced data distribution, existing deep learning-based vulnerability detection methods struggle to accurately identify specific vulnerability types. To address this issue, this study proposes MulVD, a deep learning-based multi-class vulnerability detection method. MulVD constructs a structure-aware graph neural network (SA-GNN) that can adaptively extract local and representative vulnerability patterns while rebalancing the data distribution without introducing noise. The effectiveness of the proposed approach in both binary and multi-class vulnerability detection tasks is evaluated. Experimental results demonstrate that MulVD significantly improves the performance of existing deep learning-based vulnerability detection techniques.
Liu J, Su PR, Yang M, He L, Zhang Y, Zhu XY, Lin HM. Software and cyber security––A survey. Ruan Jian Xue Bao/Journal of Software, 2018, 29(1): 42–68 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5320.htm
Li GW, Yuan T, Li L. Study of state-of-the-art open-source C/C++ static analysis tools. Ruan Jian Xue Bao/Journal of Software, 2022, 33(6): 2061–2081 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6569.htm
Deng X, Ye W, Xie R, Zhang SK. Survey of source code bug detection based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2023, 34(2): 625–654 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6696.htm
Gu MX, Sun HY, Han D, Yang S, Cao WY, Guo Z, Cao CJ, Wang WJ, Zhang YQ. Software security vulnerability mining based on deep learning. Journal of Computer Research and Development, 2021, 58(10): 2140–2162 (in Chinese with English abstract).
Duan X, Wu JZ, Luo TY, Yang MT, Wu YJ. Vulnerability mining method based on code property graph and attention BiLSTM. Ruan Jian Xue Bao/Journal of Software, 2020, 31(11): 3404–3420 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6061.htm
[6] Li Z, Zou DQ, Xu SH, Ou XY, Jin H, Wang SJ, Deng ZJ, Zhong YY. VulDeePecker: A deep learning-based system for vulnerability detection. In: Proc. of the 25th Annual Network and Distributed System Security Symp. San Diego: NDSS, 2018. [doi: 10.14722/ndss.2018.23158]
[7] Cao SC, Sun XB, Bo LL, Wei Y, Li B. BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection. Information and Software Technology, 2021, 136: 106576.
[8] Cheng X, Wang HY, Hua JY, Xu GA, Sui YL. DeepWukong: Statically detecting software vulnerabilities using deep graph neural network. ACM Trans. on Software Engineering and Methodology, 2021, 30(3): 38.
[9] Wang HT, Ye GX, Tang ZY, Tan SH, Huang SF, Fang DY, Feng YS, Bian LZ, Wang Z. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. on Information Forensics and Security, 2021, 16: 1943–1958.
[10] Cao SC, Sun XB, Bo LL, Wu RX, Li B, Tao CQ. MVD: Memory-related vulnerability detection based on flow-sensitive graph neural networks. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 1456–1468. [doi: 10.1145/3510003.3510219]
[11] Zheng W, Gao JL, Wu XX, Liu FY, Xun YX, Liu GL, Chen X. The impact factors on the performance of machine learning-based vulnerability detection: A comparative study. Journal of Systems and Software, 2020, 168: 110659.
[12] Zou DQ, Wang SJ, Xu SH, Li Z, Jin H. μVulDeePecker: A deep learning-based system for multiclass vulnerability detection. IEEE Trans. on Dependable and Secure Computing, 2021, 18(5): 2224–2236. [doi: 10.1109/TDSC.2019.2942930]
[13] Liu BC, Meng GZ, Zou W, Gong Q, Li F, Lin M, Sun DD, Huo W, Zhang C. A large-scale empirical study on vulnerability distribution within projects and the lessons learned. In: Proc. of the 42nd Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 1547–1559. [doi: 10.1145/3377811.3380923]
[14] Yamaguchi F, Golde N, Arp D, Rieck K. Modeling and discovering vulnerabilities with code property graphs. In: Proc. of the 35th IEEE Symp. on Security and Privacy. Berkeley: IEEE, 2014. 590–604. [doi: 10.1109/SP.2014.44]
[15] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: ACM, 2013. 3111–3119.
[16] Grover A, Leskovec J. Node2vec: Scalable feature learning for networks. In: Proc. of the 22nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016. 855–864. [doi: 10.1145/2939672.2939754]
[20] Li Z, Zou DQ, Xu SH, Jin H, Zhu YW, Chen ZX. SySeVR: A framework for using deep learning to detect software vulnerabilities. IEEE Trans. on Dependable and Secure Computing, 2022, 19(4): 2244–2258.
[21] Zhou YQ, Liu SQ, Siow JK, Du XN, Liu Y. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: NeurIPS, 2019. 915.
[22] Chakraborty S, Krishna R, Ding YRB, Ray B. Deep learning based vulnerability detection: Are we there yet? IEEE Trans. on Software Engineering, 2022, 48(9): 3280–3296. [doi: 10.1109/TSE.2021.3087402]
[23] Fu M, Tantithamthavorn C. LineVul: A Transformer-based line-level vulnerability prediction. In: Proc. of the 19th Int’l Conf. on Mining Software Repositories. Pittsburgh: ACM, 2022. 608–620. [doi: 10.1145/3524842.3528452]
[24] Cao SC, Sun XB, Wu XX, Lo D, Bo LL, Li B, Liu W. Coca: Improving and explaining graph neural network-based vulnerability detection systems. In: Proc. of the 46th IEEE/ACM Int’l Conf. on Software Engineering. Lisbon: ACM, 2024. 155. [doi: 10.1145/3597503.3639168]
[25] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaisaer ?, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: NeurIPS, 2017. 6000–6010.
[26] Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A. Automatic feature learning for predicting vulnerable software components. IEEE Trans. on Software Engineering, 2021, 47(1): 67–85.
[27] Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M. Automated vulnerability detection in source code using deep representation learning. In: Proc. of the 17th IEEE Int’l Conf. on Machine Learning and Applications. Orlando: IEEE, 2018. 757–762. [doi: 10.1109/ICMLA.2018.00120]
[28] Cai J, Li B, Zhang T, Zhang JL, Sun XB. Fine-grained smart contract vulnerability detection by heterogeneous code feature learning and automated dataset construction. Journal of Systems and Software, 2024, 209: 111919.
[29] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017.
[30] Li YJ, Tarlow D, Brockschmidt M, Zemel RS. Gated graph sequence neural networks. In: Proc. of the 4th Int’l Conf. on Learning Representations. San Juan: OpenReview.net, 2016.
[31] Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
[32] Liu SG, Lin GJ, Han QL, Wen S, Zhang J, Xiang Y. DeepBalance: Deep-learning and fuzzy oversampling for vulnerability detection. IEEE Trans. on Fuzzy Systems, 2020, 28(7): 1329–1343.
[34] Tantithamthavorn C, Hassan AE, Matsumoto K. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans. on Software Engineering, 2020, 46(11): 1200–1219.
[35] Wu XX, Zheng W, Chen X, Zhao Y, Yu TT, Mu DJ. Improving high-impact bug report prediction with combination of interactive machine learning and active learning. Information and Software Technology, 2021, 133: 106530.
[36] Yang X, Wang SW, Li Y, Wang SH. Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! In: Proc. of the 45th Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 2287–2298. [doi: 10.1109/ICSE48619.2023.00192]
[37] Zhu TF, Lin YP, Liu YH. Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognition, 2017, 72: 327–340.
[38] Zhang JL, Sui H, Sun XB, Ge CP, Zhou L, Susilo W. GrabPhisher: Phishing scams detection in Ethereum via temporally evolving GNNs. IEEE Trans. on Services Computing, 2024, 17(6): 3727–3741.
[39] Lee JB, Rossi R, Kong XN. Graph classification using structural attention. In: Proc. of the 24th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. London: ACM, 2018. 1666–1674. [doi: 10.1145/3219819.3219980]
[40] Campello RJGB, Moulavi D, Sander J. Density-based clustering based on hierarchical density estimates. In: Proc. of the 17th Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Gold Coast: Springer, 2013. 160–172. [doi: 10.1007/978-3-642-37456-2_14]
[41] Fan JH, Li Y, Wang SH, Nguyen TN. A C/C++ code vulnerability dataset with code changes and CVE summaries. In: Proc. of the 17th Int’l Conf. on Mining Software Repositories. Seoul: ACM, 2020. 508–512. [doi: 10.1145/3379597.3387501]