Code Search with Generative Adversarial Game
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [34]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The code search method based on deep learning realizes the code search task by calculating the similarity of the corresponding representation of the code and the description statement. However, this manner does not consider the real probability distribution of relevance between the code and the description. To solve this problem, this study proposes a code search method based on a generative adversarial game that combines the correlation between the code and the description in the classical probability model with the feature extraction in the vector space model. Then the generative adversarial game is adopted to apply the probability distribution between the code and the description to the alternate training of the generator and discriminator. Meanwhile, the code encoder and the description encoder are optimized, and high-quality code representation and description statement representation are generated for the code search task. Finally, experimental verification is carried out on the public dataset, and the results show that the proposed method improves the Recall@10, MRR@10, and NDCG@10 metrics by 8.4%, 32.5%, and 24.3% respectively compared to the DeepCS method.

    Reference
    [1] Singer J, Lethbridge T, Vinson N, Anquetil N. An examination of software engineering work practices. In: Proc. of the 1997 Conf. of the Centre for Advanced Studies on Collaborative Research (CASCON). Toronto: IBM Press, 1997. 174–188.
    [2] Sadowski C, Stolee KT, Elbaum S. How developers search for code: A case study. In: Proc. of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). Bergamo: ACM, 2015. 191–201.
    [3] Hill E, Pollock L, Vijay-Shanker K. Improving source code search with natural language phrasal representations of method signatures. In: Proc. of the 26th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Lawrence: IEEE, 2011. 524–527.
    [4] Lv F, Zhang HY, Lou JG, Wang SW, Zhang DM, Zhao JJ. CodeHow: Effective code search based on API understanding and extended boolean model (E). In: Proc. of the 30th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Lincoln: IEEE, 2015. 260–270.
    [5] Cambronero J, Li HY, Kim S, Sen K, Chandra S. When deep learning met code search. In: Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering (ESEC/FSE). Tallinn: ACM, 2019. 964–974.
    [6] Salza P, Schwizer C, Gu J, Gall HC. On the effectiveness of transfer learning for code search. IEEE Trans. on Software Engineering, 2023, 49(4): 1804–1822.
    [7] 张祥平, 刘建勋. 基于深度学习的代码表征及其应用综述. 计算机科学与探索, 2022, 16(9): 2011–2029.
    Zhang XP, Liu JX. Overview of deep learning-based code representation and its applications. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2011–2029 (in Chinese with English abstract).
    [8] Hu HZ, Liu JX, Zhang XP, Cao B, Cheng SQ, Long T. A mutual embedded self-attention network model for code search. Journal of Systems and Software, 2023, 198: 111591.
    [9] Gu XD, Zhang HY, Kim S. Deep code search. In: Proc. of the 40th Int’l Conf. on Software Engineering (ICSE). Gothenburg: ACM, 2018. 933–944.
    [10] Shuai JH, Xu L, Liu C, Yan M, Xia X, Lei Y. Improving code search with co-attentive representation learning. In: Proc. of the 28th Int’l Conf. on Program Comprehension (ICPC). Seoul: ACM, 2020. 196–207.
    [11] Fang S, Tan YS, Zhang T, Liu YP. Self-attention networks for code search. Information and Software Technology, 2021, 134: 106542.
    [12] Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008.
    [13] Subramanian S, Inozemtseva L, Holmes R. Live API documentation. In: Proc. of the 36th Int’l Conf. on Software Engineering (ICSE). Hyderabad: ACM, 2014. 643–652.
    [14] Du L, Shi XZ, Wang YL, Shi ES, Han S, Zhang DM. Is a single model enough? MuCoS: A multi-model ensemble learning approach for semantic code search. In: Proc. of the 30th ACM Int’l Conf. on Information & Knowledge Management (CIKM). ACM, 2021. 2994–2998.
    [15] Ling CY, Lin ZQ, Zou YZ, Xie B. Adaptive deep code search. In: Proc. of the 28th Int’l Conf. on Program Comprehension (ICPC). Seoul: ACM, 2020. 48–59.
    [16] Gu WC, Li ZJ, Gao CY, Wang CZ, Zhang HY, Xu ZL, Lyu MR. CRaDLe: Deep code retrieval based on semantic dependency learning. Neural Networks, 2021, 141: 385–394.
    [17] Chen ZZ, Jiang RH, Zhang ZJ, Pei Y, Pan MX, Zhang T, Li XD. Enhancing example-based code search with functional semantics. Journal of Systems and Software, 2020, 165: 110568.
    [18] Gu J, Chen ZM, Monperrus M. Multimodal representation for neural code search. In: Proc. of the 2021 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Luxembourg: IEEE, 2021. 483–494.
    [19] Meng Y. An intelligent code search approach using hybrid encoders. Wireless Communications and Mobile Computing, 2021, 2021: 9990988.
    [20] Xu L, Yang HH, Liu C, Shuai JH, Yan M, Lei Y, Xu Z. Two-stage attention-based model for code search with textual and structural features. In: Proc. of the 2021 IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). Honolulu: IEEE, 2021. 342–353.
    [21] Zou YZ, Ling CY, Lin ZQ, Xie B. Graph embedding based code search in software project. In: Proc. of the 10th Asia-Pacific Symp. on Internetware. ACM, 2018. 1.
    [22] Sun ZS, Li L, Liu Y, Du XN, Li L. On the importance of building high-quality training datasets for neural code search. In: Proc. of the 44th Int’l Conf. on Software Engineering (ICSE). Pittsburgh: ACM, 2022. 1609–1620.
    [23] Yu Y, Si XS, Hu CH, Zhang JX. A review of recurrent neural networks: LSTM cells and network architectures. Neural Computation, 2019, 31(7): 1235–1270.
    [24] Gu XD, Zhang HY, Zhang DM, Kim S. Deep API learning. In: Proc. of the 24th ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering (FSE). Seattle: ACM, 2016. 631–642.
    [25] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. Communications of the ACM, 2020, 63(11): 139–144.
    [26] Wang J, Yu LT, Zhang WN, Gong Y, Xu YH, Wang BY, Zhang P, Zhang D. IRGAN: A minimax game for unifying generative and discriminative information retrieval models. In: Proc. of the 40th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Shinjuku: ACM, 2017. 515–524.
    [27] Gui J, Sun ZN, Wen YG, Tao DC, Ye JP. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Trans. on Knowledge and Data Engineering, 2023, 35(4): 3313–3332.
    [28] 邢颖, 钱晓萌, 管宇, 章世豪, 赵梦赐, 林婉婷. 一种采用对抗学习的跨项目缺陷预测方法. 软件学报, 2022, 33(6): 2097–2112. http://www.jos.org.cn/1000-9825/6571.htm
    Xing Y, Qian XM, Guan Y, Zhang SH, Zhao MC, Lin WT. Cross-project defect prediction method using adversarial learning. Ruan Jian Xue Bao/Journal of Software, 2022, 33(6): 2097–2112 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6571.htm
    [29] Razavi-Far R, Ruiz-Garcia A, Palade V, Schmidhuber J. Generative Adversarial Learning: Architectures and Applications. Cham: Springer, 2022.
    [30] Yu LT, Zhang WN, Wang J, Yu Y. SeqGAN: Sequence generative adversarial nets with policy gradient. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. San Francisco: AAAI, 2017. 2852–2858.
    [31] Zhao JL, Li H, Qu LJ, Zhang QZ, Sun QX, Huo H, Gong MG. DCFGAN: An adversarial deep reinforcement learning framework with improved negative sampling for session-based recommender systems. Information Sciences, 2022, 596: 222–235.
    [32] Chen XC, Yao L, Wang XZ, Sun AX, Sheng QZ. Generative adversarial reward learning for generalized behavior tendency inference. IEEE Trans. on Knowledge and Data Engineering, 2023, 35(10): 9878–9889.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张祥平,刘建勋,扈海泽,刘益.基于生成对抗策略的代码搜索.软件学报,2024,35(12):5382-5396

Copy
Share
Article Metrics
  • Abstract:752
  • PDF: 2428
  • HTML: 385
  • Cited by: 0
History
  • Received:January 25,2023
  • Revised:April 17,2023
  • Online: February 05,2024
  • Published: December 06,2024
You are the first2034570Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063