[关键词]
[摘要]
GitHub是著名的开源软件开发社区, 支持开发人员在开源项目中使用问题追踪系统来处理问题. 在软件缺陷问题的讨论过程中, 开发人员可能指出与该缺陷问题相关的其他项目问题(称为跨项目相关问题), 为缺陷问题的修复提供参考信息. 然而, GitHub平台中托管了超过2亿的开源项目和12亿个问题, 导致人工识别和获取跨项目相关问题的工作极其耗时. 提出为缺陷问题自动化推荐跨项目相关问题的方法CPIRecom. 为了构建预选集, 采用项目之间历史相关问题对的数量和问题发布时间间隔筛选问题. 其次, 为了精准推荐, 采用BERT预训练模型提取文本特征, 分析项目特征. 然后使用随机森林算法计算预选问题与缺陷问题的相关概率, 最终根据相关概率排名得到推荐列表. 模拟CPIRecom方法在GitHub平台的使用情况. CPIRecom方法的平均倒数排名达到0.603, 前5项查全率达到0.715.
[Key word]
[Abstract]
GitHub is a well-known open-source software development community that supports developers using the issue tracking system in each open-source project on GitHub to address issues. During the discussion of an issue about a defect, the developer may point out issues from other projects correlated to the defect, which are called cross-project issues, so as to provide reference information for fixing the defect. However, there are more than 200 million open-source projects and 1.2 billion issues on the GitHub platform, making it time-consuming to identify and acquire cross-project issues manually. This study presents a cross-project issue recommendation method CPIRecom for open-source software defects. This study builds a pre-selection set by filtering issues based on the number of historical issue pairs and the time interval for reporting issues. Then, the study also proposes an accurate recommendation model, which extracts textual features based on the pre-trained model of BERT, analyzes features of projects, calculates the relevant probability between defects and issues from the pre-selection set based on a random forest classifier, and obtains the recommendation list according to the ranking. This study simulates the application of CPIRecom method on GitHub platform. The mean reciprocal rank of CPIRecom method reaches 0.603, and the Recall@5 reaches 0.715 on the simulative test set.
[中图分类号]
[基金项目]
科技创新2030—“新一代人工智能”重大项目(2021ZD0112901);国家自然科学基金(62177003)