Abstract:When writing code, software developers often refer to code snippets that implement similar functions in the project. The code generation model has similar characteristics when generating code fragments, and will use the code context provided in the input as a reference. The external code retrieved from the retrieval library is used as additional context information to prompt the generation model so as to complete the unfinished code fragments. This method is called code completion technology based on retrieval augmentation. The existing code completion method based on retrieval augmentation directly splices the input code and retrieval results together as the input of the generated model. Although this method can provide more context information for the model, it also brings a risk that the retrieved code fragments may not prompt the model, but may mislead the model, resulting in inaccurate or irrelevant code results. In addition, whether the retrieved external code is completely related to the input code or not, it will be spliced with the input code and input to the model, which leads to the effect of this method largely depends on the accuracy of the code retrieval stage. If the retrieval phase cannot return the available code fragments, the subsequent code completion effect may also be affected. This paper conducts an empirical study on the retrieval augmentation strategies in the existing code completion methods. Through qualitative and quantitative experiments, it analyzes the impact of each stage of retrieval augmentation on the effect of code completion. In the empirical study, it focuses on identifying three factors that affect the effect of retrieval augmentation, namely, code granularity, code retrieval methods, and post-processing methods. Based on the conclusion of empirical research, an improved method is designed, and a code completion method MAGIC (multi-stage optimization for retrieval augmented code completion) is proposed to improve the retrieval augmentation by optimizing the code retrieval strategy in stages. The improved strategies such as code segmentation, retrieval-reranking, template prompt generation are designed, which can effectively enhance the auxiliary generation effect of the code retrieval module on the code completion model, and reduce the interference of irrelevant code in the code generation phase of the model, and improve the quality of generated code. The experimental results on Java code dataset show that: compared with the existing code completion methods based on retrieval augmentation, this method improves the editing similarity and perfect matching index by 6.76% and 7.81% respectively. Compared with the large code model with 6B parameters, this method can save 94.5% of the display memory and 73.8% of the reasoning time, and improve the editing similarity and complete matching index by 5.62% and 4.66% respectively.