Software Change Prediction Based on Hybrid Graph Representation
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [45]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Software change prediction, aimed at identifying change-prone modules, can help software managers and developers allocate resources efficiently and reduce maintenance overhead. Extracting effective features from the code plays a vital role in the construction of accurate prediction models. In recent years, researchers have shifted from traditional hand-crafted features to semantic features with powerful representation capabilities for prediction. They extracted semantic features from abstract syntax tree (AST) node sequences to build models. However, existing studies have ignored the structural information in the AST and the rich semantic information in the code. How to extract the semantic features of the code is still a challenging problem. For this reason, the study proposes a change prediction method based on hybrid graph representation. To start with, the model combines AST, control flow graph (CFG), data flow graph (DFG), and other structural information to construct the program graph representation of the code. Then, it uses the graph neural network to learn the semantic features of the program graph and the features obtained to predict change-proneness. The model can integrate various semantic information to represent the code better. The effectiveness of the proposed method is verified by comparing it with the latest change prediction methods on various change datasets.

    Reference
    [1] Schneidewind NF. Measuring and evaluating maintenance process using reliability, risk, and test metrics. IEEE Transactions on Software Engineering, 1999, 25(6): 769–781. [doi: 10.1109/32.824387]
    [2] Catolino G, Palomba F, De Lucia A, Ferrucci F, Zaidman A. Enhancing change prediction models using developer-related factors. Journal of Systems and Software, 2018, 143: 14–28. [doi: 10.1016/j.jss.2018.05.003]
    [3] Zhu XY, He YY, Cheng L, Jia XL, Zhu L. Software change-proneness prediction through combination of bagging and resampling methods. Journal of Software: Evolution and Process, 2018, 30(12): e2111. [doi: 10.1002/smr.2111]
    [4] Yan M, Zhang XH, Liu C, Xu L, Yang MN, Yang D. Automated change-prone class prediction on unlabeled dataset using unsupervised method. Information and Software Technology, 2017, 92: 1–16. [doi: 10.1016/j.infsof.2017.07.003]
    [5] Zhou YM, Leung H, Xu BW. Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Transactions on Software Engineering, 2009, 35(5): 607–623. [doi: 10.1109/TSE.2009.32]
    [6] Liu HH, Yu YJ, Li BX, Yang YB, Jia R. Are smell-based metrics actually useful in effort-aware structural change-proneness prediction? An empirical study. In: Proc. of the 25th Asia-Pacific Software Engineering Conf. (APSEC). Nara: IEEE, 2018. 315–324.
    [7] Catolino G, Palomba F, Fontana FA, De Lucia A, Zaidman A, Ferrucci F. Improving change prediction models with code smell-related information. Empirical Software Engineering, 2020, 25(1): 49–95. [doi: 10.1007/s10664-019-09739-0]
    [8] Elish MO, Al-Khiaty MAR. A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. Journal of Software: Evolution and Process, 2013, 25(5): 407–437. [doi: 10.1002/smr.1549]
    [9] Catolino G, Ferrucci F. An extensive evaluation of ensemble techniques for software change prediction. Journal of Software: Evolution and Process, 2019, 31(9): e2156. [doi: 10.1002/smr.2156]
    [10] Malhotra R, Khanna M. Dynamic selection of fitness function for software change prediction using particle swarm optimization. Information and Software Technology, 2019, 112: 51–67. [doi: 10.1016/j.infsof.2019.04.007]
    [11] Wang S, Liu TY, Nam J, Tan L. Deep semantic feature learning for software defect prediction. IEEE Transactions on Software Engineering, 2020, 46(12): 1267–1293. [doi: 10.1109/TSE.2018.2877612]
    [12] Zhang J, Wang X, Zhang HY, Sun HL, Wang KX, Liu XD. A novel neural source code representation based on abstract syntax tree. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Montreal: IEEE, 2019. 783–794.
    [13] Hua W, Sui YL, Wan Y, Liu GZ, Xu GD. FCCA: Hybrid code representation for functional clone detection using attention networks. IEEE Transactions on Reliability, 2021, 70(1): 304–318. [doi: 10.1109/TR.2020.3001918]
    [14] Fang CR, Liu ZX, Shi YY, Huang J, Shi QK. Functional code clone detection with syntax and semantics fusion learning. In: Proc. of the 29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 516–527.
    [15] Guo DY, Ren S, Lu S, Feng ZY, Tang DY, Liu SJ, Zhou L, Duan N, Svyatkovskiy A, Fu SY, Tufano M, Deng SK, Clement CB, Drain D, Sundaresan N, Yin J, Jiang DX, Zhou M. Graphcodebert: Pre-training code representations with data flow. In: Proc. of the 9th Int’l Conf. on Learning Representations. ICLR, 2020.
    [16] Wang WH, Li G, Ma B, Xia X, Jin Z. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Proc. of the 27th IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). London: IEEE, 2020. 261–271.
    [17] Allamanis M, Brockschmidt M, Khademi M. Learning to represent programs with graphs. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: ICLR, 2018.
    [18] Lu HM, Zhou YM, Xu BW, Leung H, Chen L. The ability of object-oriented metrics to predict change-proneness: A meta-analysis. Empirical Software Engineering, 2012, 17(3): 200–242. [doi: 10.1007/s10664-011-9170-z]
    [19] Malhotra R, Kapoor R, Aggarwal D, Garg P. Comparative study of feature reduction techniques in software change prediction. In: Proc. of the 18th IEEE/ACM Int’l Conf. on Mining Software Repositories (MSR). Madrid: IEEE, 2021. 18–28.
    [20] Zhu XY, Li N, Wang Y. Software change-proneness prediction based on deep learning. Journal of Software: Evolution and Process, 2022, 34(4): e2434. [doi: 10.1002/smr.2434]
    [21] Mou LL, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In: Proc. of the 30th AAAI Conf. on Artificial Intelligence. Phoenix: AAAI Press, 2016. 1287–1293.
    [22] Wei HH, Li M. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: AAAI Press, 2017. 3034–3040.
    [23] Yang XY, Zhang XF, Tong Y. Simplified abstract syntax tree based semantic features learning for software change prediction. Journal of Software: Evolution and Process, 2022, 34(4): e2445. [doi: 10.1002/smr.2445]
    [24] Alon U, Zilberstein M, Levy O, Yahav E. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 2019, 3: 40. [doi: 10.1145/3290353]
    [25] Wang HT, Ye GX, Tang ZY, Tan SH, Huang SF, Fang DY, Feng YS, Bian LZ, Wang Z. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Transactions on Information Forensics and Security, 2021, 16: 1943–1958. [doi: 10.1109/TIFS.2020.3044773]
    [26] Gao HY, Wang ZY, Ji SW. Large-scale learnable graph convolutional networks. In: Proc. of the 24th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. London: ACM, 2018. 1416–1424.
    [27] Wu YT, Liu X, Feng YS, Wang Z, Zhao DY. Jointly learning entity and relation representations for entity alignment. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 240–249.
    [28] Li YJ, Tarlow D, Brockschmidt M, Zemel RS. Gated graph sequence neural networks. In: Proc. of the 4th Int’l Conf. on Learning Representations. San Juan: ICLR, 2015.
    [29] Cho K, van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder-decoder approaches. In: Proc. of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha: ACL, 2014. 103–111.
    [30] Zhuang WY, Wang H, Zhang XF. Just-in-time defect prediction based on AST change embedding. Knowledge-Based Systems, 2022, 248: 108852. [doi: 10.1016/j.knosys.2022.108852]
    [31] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. New York: ACM Press, 1999.
    [32] Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics, 2000, 16(5): 412–424. [doi: 10.1093/bioinformatics/16.5.412]
    [33] 邢颖, 钱晓萌, 管宇, 章世豪, 赵梦赐, 林婉婷. 一种采用对抗学习的跨项目缺陷预测方法. 软件学报, 2022, 33(6): 2097–2112. http://www.jos.org.cn/1000-9825/6571.htm
    Xing Y, Qian XM, Guan Y, Zhang SH, Zhao MC, Lin WT. Cross-project defect prediction method using adversarial learning. Ruan Jian Xue Bao/Journal of Software, 2022, 33(6): 2097–2112 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6571.htm
    [34] 贾修一, 张文舟, 李伟湋, 黄志球. 基于变分自编码器的异构缺陷预测特征表示方法. 软件学报, 2021, 32(7): 2204–2218. http://www.jos.org.cn/1000-9825/6257.htm
    Jia XY, Zhang WZ, Li WW, Huang ZQ. Feature representation method for heterogeneous defect prediction based on variational autoencoders. Ruan Jian Xue Bao/Journal of Software, 2021, 32(7): 2204–2218 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6257.htm
    [35] Wang H, Zhuang WY, Zhang XF. Software defect prediction based on gated hierarchical LSTMs. IEEE Transactions on Reliability, 2021, 70(2): 711–727. [doi: 10.1109/TR.2020.3047396]
    [36] Fey M, Lenssen JE. Fast graph representation learning with PyTorch Geometric. arXiv:1903.02428, 2019.
    [37] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proc. of the 26th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 3111–3119.
    [38] Kingma DP, Ba J. Adam: A method for stochastic optimization. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego: ICLR, 2014.
    [39] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: ICLR, 2016.
    [40] 赵港, 王千阁, 姚烽, 张岩峰, 于戈. 大规模图神经网络系统综述. 软件学报, 2022, 33(1): 150–170. http://www.jos.org.cn/1000-9825/6311.htm
    Zhao G, Wang QG, Yao F, Zhang YF, Yu G. Survey on large-scale graph neural network systems. Ruan Jian Xue Bao/Journal of Software, 2022, 33(1): 150–170 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6311.htm
    [41] Schlichtkrull M, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: Proc. of the 15th European Semantic Web Conf. Heraklion: Springer, 2018. 593–607.
    [42] Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. arXiv:1710.10903, 2017.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

杨馨悦,刘安,赵雷,陈林,章晓芳.基于混合图表示的软件变更预测方法.软件学报,2024,35(8):3824-3842

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 21,2022
  • Revised:November 17,2022
  • Online: September 13,2023
  • Published: August 06,2024
You are the first2044630Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063