Revisiting Retrieval-augmentation Strategy in Code Completion

doi:10.13328/j.cnki.jos.007226

微信服务号

微信订阅号

2025-5-2- 11

Home > Archive>Volume , Issue , >1-27. DOI:10.13328/j.cnki.jos.007226

PDF HTML XML Export Cite reminder

Revisiting Retrieval-augmentation Strategy in Code Completion
DOI:
                        10.13328/j.cnki.jos.007226
                    
Author:
                        ZOU Bai-HanZOU Bai-Han
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG YingWANG Ying
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
PENG XinPENG Xin
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LOU Yi-LingLOU Yi-Ling
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU Li-HuaLIU Li-Hua
Alibaba Group, Hangzhou 310030, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Xin-DongZHANG Xin-Dong
Alibaba Group, Hangzhou 310030, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIN FanLIN Fan
Alibaba Group, Hangzhou 310030, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU Ming-WeiLIU Ming-Wei
School of Computer Science, Fudan University, Shanghai 200438, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

When writing code, software developers often refer to code snippets that implement similar functions in the project. The code generation model shares similar features when generating code fragments and uses the code context provided in the input as a reference. The code completion technology based on retrieval augmentation is akin to this idea. The external code retrieved from the retrieval library is used as additional context information to prompt the generation model so as to complete the unfinished code fragments. The existing code completion method based on retrieval augmentation directly splices the input code and retrieval results together as the input of the generated model. This method brings a risk that the retrieved code fragments may not prompt the model, but mislead the model, resulting in inaccurate or irrelevant code results. In addition, whether the retrieved external code is completely related to the input code or not, it will be spliced with the input code and input to the model. Consequently, the effect of this method largely depends on the accuracy of the code retrieval stage. If the available code fragments cannot be returned in the retrieval phase, the subsequent code completion effect may also be affected. An empirical study is conducted on the retrieval augmentation strategies in the existing code completion methods. Through qualitative and quantitative experiments, the impact of each stage of retrieval augmentation on the code completion effect is analyzed. The empirical study focuses on identifying three factors for the effect of retrieval augmentation, namely, code granularity, code retrieval methods, and post-processing methods. Based on the conclusion of the empirical research, an improved method is designed, and a code completion method MAGIC (multi-stage optimization for retrieval augmented code completion) is proposed to improve the retrieval augmentation by optimizing the code retrieval strategy in stages. The improved strategies such as code segmentation, retrieval-reranking, and template prompt generation are designed to effectively enhance the auxiliary generation effect of the code retrieval module on the code completion model. Meanwhile, those strategies can also reduce the interference of irrelevant code in the code generation phase of the model and improve the quality of generated code. The experimental results on the Java code dataset show that, compared with the existing code completion methods based on retrieval augmentation, this method increases the editing similarity and perfect matching index by 6.76% and 7.81%, respectively. Compared with the large code model with 6B parameters, this method can save 94.5% of the video memory and 73.8% of the inference time, and improve the editing similarity and complete matching index by 5.62% and 4.66% respectively.

Key words:retrieval augmentation;large language model;code completion;prompt learning;multi-stage optimization

Get Citation

邹佰翰,汪莹,彭鑫,娄一翎,刘力华,张昕东,林帆,刘名威.重新审视代码补全中的检索增强策略.软件学报,,():1-27

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 21,2023
Revised:March 08,2024
Adopted:
Online: July 03,2024
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History