GKCI:改进的基于图神经网络的关键类识别方法
作者:
作者简介:

周纯英(1998-),女,硕士,主要研究领域为复杂网络,软件缺陷预测,软件关键类识别,图神经网络;曾诚(1976-),男,博士,教授,CCF专业会员,主要研究领域为人工智能,行业软件研发;何鹏(1988-),男,博士,副教授,CCF专业会员,主要研究领域为软件工程,复杂网络,软件度量,软件缺陷预测;张龑(1973-),男,博士,教授,主要研究领域为信息安全,代码测试,教育大数据

通讯作者:

何鹏,penghe@hubu.edu.cn

基金项目:

国家自然科学基金(62102136);湖北省重点研发计划(2021BAA184,2021BAA188,2022BAA044);湖北省技术创新专项(2020AEA008)


GKCI: An Improved GNN-based Key Class Identification Method
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [38]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    研究人员将软件系统中的关键类作为理解和维护一个系统的起点,而关键类上的缺陷给系统带来了极大的安全隐患.因此,识别关键类可提高软件的可靠性和稳定性.常用的识别方法是将软件系统抽象为一个类依赖网络,再根据定义好的度量指标和计算规则计算每个节点的重要性得分,如此基于非训练框架得到的关键类,并没有充分利用软件网络的结构信息.针对这一问题,基于图神经网络技术提出了一种有监督的关键类识别方法.首先,将软件系统抽象为类粒度的软件网络,并利用网络嵌入学习方法Node2Vec得到类节点的表征向量,再通过一个全连接层将节点的表征向量转换为具体分值;然后,利用改进的图神经网络模型,综合考虑类节点之间的依赖方向和权重,进行节点分值的聚合操作;最后,模型输出每个类节点的最终得分并进行降序排列,从而实现关键类的识别.在8个Java开源软件系统上,通过与基准方法的实验对比,验证了该方法的有效性.实验结果表明:在前10个候选关键类中,所提方法比最先进的方法提升了6.4%的召回率和3.5%的精确率.

    Abstract:

    Researchers use key classes as starting points for software understanding and maintenance. These key classes may cause a significant security risk to the software if they have defects. Therefore, identifying key classes can improve the reliability and stability of the software. Most of the existing methods are based on non-trainable solutions, which calculate the score of each node according to a certain calculation rule, and cannot fully utilize the structural information available in the software network. To solve these problems, a supervised deep learning method is proposed based on graph neural network technology. First, the project is built as a software network and the network embedding learning method Node2Vec is used to learn the node representation. Then, the node representation is mapped into a score through a simple dense network. Second, the aggregation function of the graph neural networks (GNNs) is improved to aggregate important scores instead of node embedding. The direction and weight information between nodes are also considered when aggregating the scores of neighbor nodes. Finally, the nodes are ranked in descending order according to the predicted score output by the model. To evaluate the effectiveness of the proposed method, it is applied to eight Java open-source software systems. The experimental results show that the proposed method performs better than benchmark methods. In the top 10 key candidates, the proposed method achieves 6.4% higher recall and 3.5% higher precision than the state-of-the-art.

    参考文献
    [1] LaToza TD, Venolia G, DeLine R. Maintaining mental models: A study of developer work habits. In: Proc. of the 28th Int'l Conf. on Software Engineering. 2006. 492-501.
    [2] Maggie H, Katerina GP. Common trends in software fault and failure data. IEEE Trans. on software engineering. 2009, 35(4): 484-496.
    [3] Wang S, Liu T, Nam J, et al. Deep semantic feature learning for software defect prediction. IEEE Trans. on Software Engineering, 2018.
    [4] Li J, He P, Zhu J, et al. Software defect prediction via convolutional neural network. In: Proc. of the 2017 IEEE Int'l Conf. on Software Quality, Reliability and Security (QRS). IEEE, 2017. 318-328.
    [5] Qu Y, Liu T, Chi J, et al. node2defect: Using network embedding to improve software defect prediction. In: Proc. of the 33rd IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). IEEE, 2018. 844-849.
    [6] Ding Y, Li B, He P. An improved approach to identifying key classes in weighted software network. Mathematical Problems in Engineering, 2016.
    [7] Jiang L, Jing Y, Hu S, et al. Identifying node importance in a complex network based on node bridging feature. Applied Sciences, 2018, 8(10): 1914.
    [8] Zhang J, Luo Y. Degree centrality, betweenness centrality, and closeness centrality in social network. In: Proc. of the 2nd Int'l Conf. on Modelling, Simulation and Applied Mathematics (MSAM 2017). Atlantis Press, 2017. 300-303.
    [9] Mahmoody A, Tsourakakis CE, Upfal E. Scalable betweenness centrality maximization via sampling. In: Proc. of the 22nd ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2016. 1765-1773.
    [10] Lü L, Zhou T, Zhang QM, et al. The H-index of a network node and its relation to degree and coreness. Nature Communications, 2016, 7(1): 1-7.
    [11] Garas A, Schweitzer F, Havlin S. A k-shell decomposition method for weighted networks. New Journal of Physics, 2012, 14(8): 083030.
    [12] Tortosa L, Vicent JF, Yeghikyan G. An algorithm for ranking the nodes of multiplex networks with data based on the PageRank concept. Applied Mathematics and Computation, 2021, 392: 125676.
    [13] Pan WF, Li B, Ma YT, et al. Identifying the key packages using weighted PageRank algorithm. Acta Electonica Sinica, 2014, 42(11): 2174 (in Chinese with English abstract). 潘伟丰, 李兵, 马于涛, 等. 基于加权PageRank算法的关键包识别方法. 电子学报, 2014, 42(11): 2174.
    [14] Zeng C, Zhou CY, Lv SK, et al. GCN2defect: Graph convolutional networks for SMOTETomek-based software defect prediction. In: Proc. of the 32nd IEEE Int'l Symp. on Software Reliability Engineering (ISSRE). IEEE, 2021. 69-79.
    [15] Gong M, Zhou H, Qin AK, et al. Self-paced co-training of graph neural networks for semi-supervised node classification. IEEE Trans. on Neural Networks and Learning Systems, 2022.
    [16] Liu J, Zheng J, Wu J, et al. FA-GNN: Filter and augment graph neural networks for account classification in Ethereum. IEEE Trans. on Network Science and Engineering, 2022.
    [17] Guo JY, Li RH, Zhang Y, Wang GR. Graph neural network based anomaly detection in dynamic networks. Ruan Jian Xue Bao/ Jourmal of Sofware, 2020, 31(3): 748-762 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5903.htm[doi: 10. 13328/j.cnki.jos.005903] 郭嘉琰, 李荣华, 张岩, 王国仁. 基于图神经网络的动态网络异常检测算法. 软件学报, 2020, 31(3): 748-762. http://www.jos.org.cn/1000-9825/5903.htm [doi: 10. 13328/j.cnki.jos.005903]
    [18] Xie Y, Liang Y, Gong M, et al. Semisupervised graph neural networks for graph classification. IEEE Trans. on Cybernetics, 2022.
    [19] Bai L, Cui L, Jiao Y, et al. Learning backtrackless aligned-spatial graph convolutional networks for graph classification. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2020, 44(2): 783-798.
    [20] Park N, Kan A, Dong XL, et al. Estimating node importance in knowledge graphs using graph neural networks. In: Proc. of the 25th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining. 2019. 596-606.
    [21] Grover A, Leskovec J. Node2Vec: Scalable feature learning for networks. In: Proc. of the 22nd ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2016. 855-864.
    [22] Veličković P, Cucurull G, Casanova A, et al. Graph attention networks. arXiv: 1710.10903, 2017.
    [23] Du X, Wang T, Pan W, et al. COSPA: Identifying key classes in object-oriented software using preference aggregation. IEEE Access, 2021, 9: 114767-114780.
    [24] Zhang JX, Song K, He P, et al. Identification of key classes in software systems based on graph neural networks. Computer Science, 2021, 48(12): 149-158 (in Chinese with English abstract). 张健雄, 宋坤, 何鹏, 等. 基于图神经网络的软件系统中关键类的识别. 计算机科学, 2021, 48(12): 149-158.
    [25] Pan W, Ming H, Chang CK, et al. ElementRank: Ranking Java software classes and packages using a multilayer complex network-based approach. IEEE Trans. on Software Engineering, 2019, 47(10): 2272-2295.
    [26] Pan W, Song B, Li K, et al. Identifying key classes in object-oriented software using generalized k-core decomposition. Future Generation Computer Systems, 2018, 81: 188-202.
    [27] Şora I, Chirila CB. Finding key classes in object-oriented software systems by techniques based on static analysis. Information and Software Technology, 2019, 116: 106176.
    [28] Li H, Wang T, Pan W, et al. Mining key classes in java projects by examining a very small number of classes: A complex network-based approach. IEEE Access, 2021, 9: 28076-28088.
    [29] Young HP, Levenglick A. A consistent extension of Condorcet's election principle. SIAM Journal on Applied Mathematics, 1978, 35(2): 285-300.
    [30] Young HP. Condorcet's theory of voting. American Political Science Review, 1988, 82(4): 1231-1244.
    [31] Shah NB, Wainwright MJ. Simple, robust and optimal ranking from pairwise comparisons. The Journal of Machine Learning Research, 2017, 18(1): 7246-7283.
    [32] Fan C, Zeng L, Ding Y, et al. Learning to identify high betweenness centrality nodes from scratch: A novel graph neural network approach. In: Proc. of the 28th ACM Int'l Conf. on Information and Knowledge Management. 2019. 559-568.
    [33] Kitsak M, Gallos LK, Havlin S, et al. Identification of influential spreaders in complex networks. Nature physics, 2010, 6(11): 888-893.
    [34] Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks. IEEE Trans. on Neural Networks and Learning Systems, 2020, 32(1): 4-24.
    [35] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Advances in Neural Information Processing Systems, 2017, 30.
    [36] He P, Li B, Ma Y, et al. Using software dependency to bug prediction. Mathematical Problems in Engineering, 2013.
    [37] He P, Wang P, Li B, et al. An evolution analysis of software system based on multi-granularity software network. Acta Electonica Sinica, 2018, 46(2): 257 (in Chinese with English abstract). 何鹏, 王鹏, 李兵, 等. 基于多粒度软件网络模型的软件系统演化分析. 电子学报, 2018, 46(2): 11.
    [38] Macbeth G, Razumiejczyk E, Ledesma RD. Cliff's Delta calculator: A non-parametric effect size program for two groups of observations. Universitas Psychologica, 2011, 10(2): 545-555.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

周纯英,曾诚,何鹏,张龑. GKCI:改进的基于图神经网络的关键类识别方法.软件学报,2023,34(6):2509-2525

复制
分享
文章指标
  • 点击次数:1471
  • 下载次数: 4016
  • HTML阅读次数: 2156
  • 引用次数: 0
历史
  • 收稿日期:2022-09-05
  • 最后修改日期:2022-10-10
  • 在线发布日期: 2023-01-13
  • 出版日期: 2023-06-06
文章二维码
您是第19705195位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号