Code-search-oriented Function Multigraph Embedding

doi:10.13328/j.cnki.jos.006940

微信服务号

微信订阅号

2025-4-24- 12

Home > Archive>Volume 35, Issue 8, 2024 >3809-3823. DOI:10.13328/j.cnki.jos.006940

PDF HTML XML Export Cite reminder

Code-search-oriented Function Multigraph Embedding
DOI:
                        10.13328/j.cnki.jos.006940
                    
Author:
                        XU YangXU Yang
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Xiao-JieCHEN Xiao-Jie
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
TANG De-YouTANG De-You
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HUANG HanHUANG Han
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

How to improve the accuracy of matching between natural language query input and highly structured programming language source code is a fundamental concern in code search. Accurate extraction of code features is one of the key challenges to improving matching accuracy. The semantics expressed by statements in codes is not only relevant to themselves but also to their contexts. The structural model of the code provides rich contextual information for understanding code functions. This study proposes a code search method based on function multigraph embedding. By using an early fusion strategy, the study fuses the data dependencies of code statements into a control flow graph and constructs a function multigraph to represent the code. The multigraph explicitly expresses the dependency relationships of indirect predecessor and successor nodes that are lacking in the control flow graph through data dependencies and enhances the contextual information of statement nodes. At the same time, in view of the edge heterogeneity of the multigraph, this study uses the relational graph convolutional network to extract the features of the code from the function multigraph. Experiments on a public dataset show that the proposed method can improve the MRR by more than 5% compared with the existing methods based on code text and structural models. The ablation experiments also show that the control flow graph contributes more to the search accuracy than the data dependence graph.

Key words:code search;control flow graph (CFG);data dependence graph (DDG);function multigraph

Get Citation

徐杨,陈晓杰,汤德佑,黄翰.面向代码搜索的函数功能多重图嵌入.软件学报,2024,35(8):3809-3823

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 09,2022
Revised:October 04,2022
Adopted:
Online: July 26,2023
Published: August 06,2024

You are the first2038035Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History