[关键词]
[摘要]
源代码检索是软件工程领域的一项重要研究问题,其主要任务是检索和复用软件项目API(application program interface,应用程序接口).随着软件项目的规模越来越大、越来越复杂,当前,源代码检索一方面需要提高基于自然语言API查询的准确性,另一方面需要定位和展示目标API及其相关代码之间的关联,以更好地辅助用户理解API的实现逻辑和使用场景.为此,提出一种基于图嵌入的软件项目源代码检索方法.该方法能够基于软件项目源代码自动构建其代码结构图,并通过图嵌入对源代码进行信息表示.在此基础上,用户可以输入自然语言问题、检索并返回相关的API及其关联信息构成的连通代码子图,从而提高API检索和复用的效率.在以开源项目Apache Lucene和POI为例的检索实验中,该方法检索结果的F1值比现有基于最短路径的方法提高了10%,同时显著缩短了平均响应时间.
[Key word]
[Abstract]
Searching software source code and locating software's API (application program interface) are important research issues in software engineering. As software projects are becoming more and more complex, existing search tools mainly face the following two challenges. First, more accurate search results are required in natural language question based search process. Second, the relationships between API are required to illustrate so that these API' underlying logic and usage scenarios are able to be understood more quickly. In this study, an ovel approach is proposed to searching a software project's API based on graph embedding. It aims to improve the accuracy of natural language based code graph search. A software project's code graph is built automatically from its source code and they are represented through graph embedding. For a natural language question, a code-connected subgraph, composed by relevant API and their associated relationships, are returned as the best answer. In experiments, Apache Lucene and POI projects are selected as examples to perform some API search tasks. Experimental results show that the proposed approach improves F1-score by 10% than existing shortest path based approach, while reduces average response time significantly.
[中图分类号]
[基金项目]
国家重点研发计划(2016YFB1000801);国家杰出青年科学基金(61525201)