面向代码搜索的函数功能多重图嵌入

doi:10.13328/j.cnki.jos.006940

微信服务号

微信订阅号

2025年6月1日 8:12 星期日

首页 > 过刊浏览>2024年第35卷第8期 >3809-3823. DOI:10.13328/j.cnki.jos.006940

PDF HTML阅读 XML下载导出引用引用提醒

面向代码搜索的函数功能多重图嵌入
DOI:
                        10.13328/j.cnki.jos.006940
                    
CSTR:
                        
                    
作者:
                        徐杨徐杨
华南理工大学 软件学院, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找
陈晓杰陈晓杰
华南理工大学 软件学院, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找
汤德佑汤德佑
华南理工大学 软件学院, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找
黄翰黄翰
华南理工大学 软件学院, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:徐杨(1970－), 男, 博士, 讲师, CCF专业会员, 主要研究领域为智能化软件工程, 机器学习, 分布式计算;陈晓杰(1995－), 男, 硕士, 主要研究领域为深度学习, 智能化软件工程;汤德佑(1976－), 男, 博士, 副教授, CCF专业会员, 主要研究领域为数据库, 高性能计算, 软件优化;黄翰(1980－), 男, 博士, 教授, 博士生导师, CCF杰出会员, 主要研究领域为微计算方法的理论基础及应用, 智能化软件工程.
通讯作者:黄翰, E-mail: hhan@scut.edu.cn
中图分类号:TP311
基金项目:广东省自然科学基金面上项目(2020A1515010696, 2022A1515011491); 国家自然科学基金面上项目(61876207, 62276103); 中央高校面上项目(2020ZYGXZR014); 广东省财税大数据重点实验室开放基金(2019B121203012)

Code-search-oriented Function Multigraph Embedding

Author:

XU Yang
XU Yang
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Xiao-Jie
CHEN Xiao-Jie
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找
TANG De-You
TANG De-You
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找
HUANG Han
HUANG Han
College of Software Engineering, South China University of Technology, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

如何提高异构的自然语言查询输入和高度结构化程序语言源代码的匹配准确度, 是代码搜索的一个基本问题. 代码特征的准确提取是提高匹配准确度的关键之一. 代码语句表达的语义不仅与其本身有关, 还与其所处的上下文相关. 代码的结构模型为理解代码功能提供了丰富的上下文信息. 提出一个基于函数功能多重图嵌入的代码搜索方法. 在所提方法中, 使用早期融合的策略, 将代码语句的数据依赖关系融合到控制流图中, 构建函数功能多重图来表示代码. 该多重图通过数据依赖关系显式表达控制流图中缺乏的非直接前驱后继节点的依赖关系, 增强语句节点的上下文信息. 同时, 针对多重图的边的异质性, 采用关系图卷积网络方法从函数多重图中提取代码的特征. 在公开数据集的实验表明, 相比现有基于代码文本和结构模型的方法, 所提方法的MRR提高5%以上. 通过消融实验也表明控制流图较数据依赖图在搜索准确度上贡献较大.

关键词:代码搜索;控制流图;数据依赖图;函数功能多重图

Abstract:

How to improve the accuracy of matching between natural language query input and highly structured programming language source code is a fundamental concern in code search. Accurate extraction of code features is one of the key challenges to improving matching accuracy. The semantics expressed by statements in codes is not only relevant to themselves but also to their contexts. The structural model of the code provides rich contextual information for understanding code functions. This study proposes a code search method based on function multigraph embedding. By using an early fusion strategy, the study fuses the data dependencies of code statements into a control flow graph and constructs a function multigraph to represent the code. The multigraph explicitly expresses the dependency relationships of indirect predecessor and successor nodes that are lacking in the control flow graph through data dependencies and enhances the contextual information of statement nodes. At the same time, in view of the edge heterogeneity of the multigraph, this study uses the relational graph convolutional network to extract the features of the code from the function multigraph. Experiments on a public dataset show that the proposed method can improve the MRR by more than 5% compared with the existing methods based on code text and structural models. The ablation experiments also show that the control flow graph contributes more to the search accuracy than the data dependence graph.

Key words:code search;control flow graph (CFG);data dependence graph (DDG);function multigraph

引用本文

徐杨,陈晓杰,汤德佑,黄翰.面向代码搜索的函数功能多重图嵌入.软件学报,2024,35(8):3809-3823

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-05-09
最后修改日期:2022-10-04
录用日期:
在线发布日期: 2023-07-26
出版日期: 2024-08-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码