融合代码与文档的软件功能特征挖掘方法
作者:
作者简介:

沈琦(1995-),男,博士生,主要研究领域为软件工程,软件复用,代码自动生成.
钱莹(1994-),女,硕士,主要研究领域为软件工程,软件复用.
邹艳珍(1976-),女,博士,副教授,CCF专业会员,主要研究领域为软件工程,软件复用,知识图谱,智能软件开发.
伍仕骏(1998-),男,博士生,主要研究领域为软件工程,软件复用.
谢冰(1970-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为软件工程,形式化方法,软件复用,智能软件开发.

通讯作者:

邹艳珍,E-mail:zouyz@pku.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61972006);国家杰出青年科学基金(61525201)


Fusing Code and Documents to Mine Software Functional Features
Author:
Fund Project:

National Natural Science Foundation of China (61972006); National Natural Science Fund for Distinguished Young Scholars (61525201)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [39]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在软件复用过程中,简洁、清楚的软件功能自然语言描述是帮助复用者快速了解待复用软件项目/代码库的前提和基础.但当前开源软件往往缺乏高质量的软件功能说明文档,使得这一过程变得更加复杂和困难.为此,提出了一种融合代码与文档的软件功能特征挖掘方法.该方法以动宾短语的形式描述软件功能特征,通过迭代挖掘软件源代码和以Stack Overflow讨论帖为代表的软件文档,自动提取开源软件的功能特征描述,并构造了层次化的软件功能特征视图.在针对多个开源软件项目的实验中,该方法可覆盖官方文档中列举的95.38%的软件功能.挖掘结果中语句和功能特征的准确率分别达到了93.78%和92.57%.对比现有工作TaskNav和APITasks,该方法在平均准确率上分别提升了28.78%和11.56%.

    Abstract:

    In the process of software reuse, users need concise and clear natural language description of software functions to understand the candidate software project quickly. However, current open source software often lacks high-quality documentation, which makes this process even more complex and difficult. This study proposes a novel functional feature mining approach combining code and documentation. It describes functional features in the form of verb phrases, automatically extracts functional features by iterately mining source code and software documents such as Stack Overflow, associates corresponding API usage example for each functional feature, and builds hierarchical functional feature view for uses finally. The experiments are set on several open source software and its related heterogeneous data, the results show that the functional features generated by the proposed approach cover 95.38% of the functions in official documentation, and the proposed approach achieves 93.78% and 92.57% accuracy for mining sentences and functional features respectively. Compared to two existing tools TaskNav and APITasks, the proposed approach improves the accuracy by 28.78% and 11.56% separately.

    参考文献
    [1] Mili H, Mili F, Mili A. Reusing software:Issues and research directions. IEEE Trans. on Software Engineering, 1995,21(6):528-562.[doi:10.1109/32.391379]
    [2] Yang FQ, Mei H, Li K. Software reuse and software component technology. Acta Electronica Sinica, 1999,27(2):68-75,51(in Chinese with English abstract).
    [3] Theotokis SA, Spinellis D, Kechagia M, et al. Open source software:A survey from 10,000 feet. Foundations and Trendső in Technology, Information and Operations Management, 2011,4(3-4):187-347.[doi:10.1561/0200000026]
    [4] Robillard MP, Deline R. A field study of API learning obstacles. Empirical Software Engineering, 2011,16(6):703-732.[doi:10. 1007/s10664-010-9150-8]
    [5] Barthélémy D, Robillard MP. Recovering traceability links between an API and its learning resources. In:Proc. of the 34th Int'l Conf. on Software Engineering (ICSE). IEEE, 2012. 47-57.[doi:10.1109/ICSE.2012.6227207]
    [6] Shen Q, Xie B, Zou YZ, Zhu ZX, Wu SJ. Nli2code:Reusing libraries with natural language interface. In:Proc. of the 18th Int'l Conf. on Software and Systems Reuse (ICSR). Springer-Verlag, 2019. 168-184.[doi:10.1007/978-3-030-22888-0\_12]
    [7] Ding W, Liang P, Tang A, et al. Knowledge-based approaches in software documentation:A systematic literature review. In:Information and Software Technology, 2014. 545-567.[doi:10.1016/j.infsof.2014.01.008]
    [8] Zhi JJ, Yusifoğlu VG, Sun B, et al. Cost, benefits and quality of software development documentation:A systematic mapping. Journal of Systems and Software, 2015, 175-198.[doi:10.1016/j.jss.2014.09.042]
    [9] Garousi G, Yusifoğlu VG, Ruhe G, et al. Usage and usefulness of technical software documentation:An industrial case study. In:Information and Software Technology. 2015. 664-682.[doi:10.1016/j.infsof.2014.08.003]
    [10] 中华人民共和国国家质量监督检验检疫总局,中国国家标准化管理委员会.GB/T 11457信息技术软件工程术语.2006.
    [11] Treude C, Mathieu S, Klocke M, Robillard MP. TaskNav:Task-based navigation of software documentation. In:Proc. of the 37th Int'l Conf. on Software Engineering (ICSE). IEEE, 2015. 649-652.[doi:10.1109/ICSE.2015.214]
    [12] Zhao W, Zhang L, Mei H, Sun JS. A functional requirement based hierarchical agglomerative approach to program cluster. Ruan Jian Xue Bao/Journal of Software, 2006,17(8):1661-1668(in Chinese with English abstract). http://www.jos.org.cn/1000-982517/1661.htm
    [13] Treude C, Robillard MP, Barthélémy D. Extracting development tasks to navigate software documentation. IEEE Trans. on Software Engineering, 2014,41(6):565-581.[doi:10.1109/TSE.2014.2387172]
    [14] Sun J, Xing Z, Chu R, Bai H, Wang J, Peng X. Know-how in programming tasks:From textual tutorials to task-oriented knowledge graph. In:Proc. of the 35rd Int'l Conf. on Software Maintenance and Evolution (ICSME). IEEE, 2019. 257-268.[doi:10.1109/ICSME.2019.00039]
    [15] Wong E, Yang JQ, Tan L. AutoComment:Mining question and answer sites for automatic comment generation. In:Proc. of the 28th Int'l Conf. on Automated Software Engineering (ASE). IEEE, 2013. 562-567.[doi:10.1109/ASE.2013.6693113]
    [16] Hu X, Li G, Xia X, Lo D, Jin Z. Deep code comment generation. In:Proc. of the 26th Int'l Conf. on Program Comprehension. ACM, 2018. 200-210.[doi:10.1145/3196321.3196334]
    [17] Zhu ZX, Zou YZ, Hua CY, Shen Q, Zhao JF. Mining and organizing software functional features based on StackOverflow data. Ruan Jian Xue Bao/Journal of Software, 2018,29(8):2210−2225(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5533.htm[doi:10.13328/j.cnki.jos.005533]
    [18] Campbell BA, Treude C. NLP2Code:Code snippet content assist via natural language tasks. In:Proc. of the 33rd Int'l Conf. on Software Maintenance and Evolution (ICSME). IEEE, 2017. 628-632.[doi:10.1109/ICSME.2017.56]
    [19] Panichella S, Jairo A, Massimiliano DP, Andrian M, Gerardo C. Mining source code descriptions from developer communications. In:Proc. of the 20st Int'l Conf. on Program Comprehension (ICPC). IEEE, 2012. 63-72.[doi:10.1109/ICPC.2012.6240510]
    [20] Sarah R, Murphy GC, Murray G. Summarizing software artifacts:A case study of bug reports. In:Proc. of the 32nd Int'l Conf. on Software Engineering (ICSE). ACM, 2010. 505-514.[doi:10.1145/1806799.1806872]
    [21] Wang C, Peng X, Liu MW, Xing ZC, Bai XF, Xie B, Wang T. A learning-based approach for automatic construction of domain glossary from source code and documentation. In:Proc. of the Foundations of Software Engineering (FSE). 2019. 97-108.[doi:10. 1145/3338906.3338963]
    [22] Silva R, Roy C, Rahman M, Schneider K, Paixao K, Maia M. Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In:Proc. of the 27th Int'l Conf. on Program Comprehension (ICPC). IEEE, 2019. 358-368.[doi:10.1109/ICPC.2019.00054]
    [23] Jiang H, Zhang JX, Ren ZL, Zhang T. An unsupervised approach for discovering relevant tutorial fragments for APIs. In:Proc. of the 39th Int'l Conf. on Software Engineering (ICSE). IEEE, 2017. 38-48.[doi:10.1109/ICSE.2017.12]
    [24] Treude C, Robillard MP. Augmenting API documentation with insights from stack overflow. In:Proc. of the 38th Int'l Conf. on Software Engineering (ICSE). IEEE, 2016. 392-403.[doi:10.1145/2884781.2884800]
    [25] LeClair A, Jiang S, McMillan C. A neural model for generating natural language summaries of program subroutines. In:Proc. of the 41st Int'l Conf. on Software Engineering (ICSE). IEEE, 2019. 795-806.[doi:10.1109/ICSE.2019.00087]
    [26] Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In:Proc. of the 32nd Int'l Conf. on Software Engineering (ICSE). ACM, 2010. 223-226.[doi:10.1145/1810295.1810335]
    [27] Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In:Proc. of the 17th Working Conf. on Reverse Engineering (WCRE). IEEE, 2010. 35-44.[doi:10.1109/WCRE.2010.13]
    [28] Sridhara G, Hill E, Muppaneni D, Pollock L, Shanker VK. Towards automatically generating summary comments for Java methods. In:Proc. of the 25th Int'l Conf. on Automated software engineering (ASE). ACM, 2010. 43-52.[doi:10.1145/1858996.1859006]
    [29] Sridhara G, Pollock L, Shanker VK. Automatically detecting and describing high level actions within methods. In:Proc. of the 33rd Int'l Conf. on Software Engineering (ICSE). ACM, 2011. 101-110.[doi:10.1145/1985793.1985808]
    [30] Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L, Shanker VK. Automatic generation of natural language summaries for Java classes. In:Proc. of the 21st Int'l Conf. on Program Comprehension (ICPC). IEEE, 2013. 23-32.[doi:10.1109/ICPC.2013. 6613830]
    [31] McBurney PW, McMillan C. Automatic documentation generation via source code summarization of method context. In:Proc. of the 22nd Int'l Conf. on Program Comprehension (ICPC). ACM, 2014. 279-290.[doi:10.1145/2597008.2597149]
    [32] Iyer S, Konstas I, Cheung A, Zettlemoyer L. Summarizing source code using a neural attention model. In:Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). 2016,1:2073-2083.[doi:10.18653/v1/p16-1195]
    [33] Hu X, Li G, Xia X, Lo D, Jin Z, Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering, 2020,25(3):2179-2217.
    [34] Alon U, Levy O, Yahav E. Code2seq:Generating sequences from structured representations of code. In:Proc. of the 7th Int'l Conf. on Learning Representations (ICLR). OpenReview.net, 2019.
    附中文参考文献:
    [2] 杨芙清,梅宏,李克勤.软件复用与软件构件技术.电子学报,1999,27(2):68-75,51.
    [10] 中华人民共和国国家质量监督检验检疫总局,中国国家标准化管理委员会.GB/T 11457信息技术软件工程术语.2006.
    [12] 赵伟,张路,梅宏,孙家骕.一种基于功能需求层次凝聚的程序聚类方法.软件学报,2006,17(8):1661-1668. http://www.jos.org.cn/1000-9825/17/1661.htm
    [17] 朱子骁,邹艳珍,华晨彦,沈琦,赵俊峰.基于StackOverflow数据的软件功能特征挖掘组织方法.软件学报,2018,29(8):2210-2225. http://www.jos.org.cn/1000-9825/5533.htm[doi:10.13328/j.cnki.jos.005533]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

沈琦,钱莹,邹艳珍,伍仕骏,谢冰.融合代码与文档的软件功能特征挖掘方法.软件学报,2021,32(4):1023-1038

复制
分享
文章指标
  • 点击次数:2171
  • 下载次数: 6168
  • HTML阅读次数: 3115
  • 引用次数: 0
历史
  • 收稿日期:2020-09-13
  • 最后修改日期:2020-10-26
  • 在线发布日期: 2021-01-22
  • 出版日期: 2021-04-06
文章二维码
您是第19893965位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号