基于对象类型的API补全方法
作者:
通讯作者:

李传艺,E-mail:lcy@nju.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61802167, 61972197, 61802095); 江苏省自然科学基金(BK20201250); 华为-南京大学下一代程序设计创新实验室合作协议子项目


Method of API Completion Based on Object Type
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [39]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    近年来, 随着软件技术在各行各业、不同领域的应用不断扩展与深入, 同时伴随着软件架构、服务计算等技术的不断发展, 软件行业涌现出了功能丰富且规模庞大的第三方API或库, 软件开发者在实现软件功能的时候也越来越依赖这些API. 但学习这些API的使用是非常困难且耗时的, 主要有两方面的原因: 1)相关文档的缺失和错误; 2)相关API用法的示例代码较少. 因此, 研究自动的API补全方法以帮助开发人员在开发过程中正确且快速的使用API, 具有很大的应用价值. 然而, 现有API自动补全方案多数将待补全代码段看作纯文本, 忽略了API所属对象类型对预测API的影响. 为此, 探究了对象类型对补全API的作用, 并且在对象状态图的启发下, 设计了一种使用API所属对象的类型作为特征的补全方法. 具体而言, 首先从API调用序列中先抽取同一对象类型的子序列, 利用深度学习模型编码出每个对象的状态, 再利用对象状态生成整个方法块的状态表示进行补全. 为了验证提出的补全方法, 在6个流行Java项目上进行了验证. 实验结果证明, 提出的考虑对象类型的API补全方法在预测准确率上明显高于基线模型.

    Abstract:

    In recent years, with the continuous expansion and deepening of the application of software technology in various industries and fields, as well as the development of software architecture, services computing, etc., the software industry has emerged with feature-rich and large-scale third-party APIs or Libraries. Software developers are increasingly relying on these APIs when implementing software functions. However, learning the usage of these APIs is very difficult and time-consuming. There are two main reasons: 1) missing or wrong documents; 2) few sample codes for API usage. Therefore, designing automatic API completion methods to help developers use the API correctly and quickly has great application value. However, most of the existing API automatic completion methods regard the code segments to be completed as plain text, ignore the impact of the object types of APIs. Therefore, this study explores the role of the object types in completing APIs. Besides, inspired by the object state diagram, an concrete API completion method is designed and implemented that uses the types of the objects as a novel feature. Specifically, the subsequence of the same object type is first extracted from the API call sequence and a deep learning model is used to encode the state of each object. Then, the objects’ states is used to generate a state representation of the entire method block. In order to evaluate the proposed method, comprehensive experiments are conducted on six popular java projects. The experimental results prove that the proposed API completion method achieves significantly higher predicting accuracy than the baseline approaches.

    参考文献
    [1] Nguyen TD, Nguyen AT, Phan HD, Nguyen TN. Exploring API embedding for API usages and applications. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Buenos Aires: IEEE, 2017. 438–449.
    [2] Zhou Y, Gu RH, Chen TL, Huang ZQ, Panichella S, Gall H. Analyzing APIs documentation and code to detect directive defects. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Buenos Aires: IEEE, 2017. 27–37.
    [3] Robillard MP. What makes APIs hard to learn? Answers from developers. IEEE Software, 2009, 26(6): 27–34. [doi: 10.1109/MS.2009.193
    [4] Piccioni M, Furia CA, Meyer B. An empirical study of API usability. In: Proc. of the ACM/IEEE Int’l Symp. on Empirical Software Engineering and Measurement. Baltimore: IEEE, 2013. 5–14.
    [5] Bruch M, Monperrus M, Mezini M. Learning from examples to improve code completion systems. In: Proc. of the 7th Joint Meeting of the European Software Engineering Conf. and the ACM SIGSOFT Symp. on the Foundations of Software Engineering. Amsterdam: ACM, 2009. 213–222.
    [6] Hill R, Rideout J. Automatic method completion. In: Proc. of the 19th IEEE Int’l Conf. on Automated Software Engineering. Linz: IEEE, 2004. 228–235.
    [7] Roy CK, Cordy JR, Koschke R. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 2009, 74(7): 470–495. [doi: 10.1016/j.scico.2009.02.007
    [8] Rieger M, Ducasse S, Lanza M. Insights into system-wide code duplication. In: Proc. of the 11th Working Conf. on Reverse Engineering. Delft: IEEE, 2004. 100–109.
    [9] Proksch S, Lerch J, Mezini M. Intelligent code completion with Bayesian networks. ACM Trans. on Software Engineering and Methodology, 2015, 25(1): 3.
    [10] Heinemann L, Hummel B. Recommending API methods based on identifier contexts. In: Proc. of the 3rd Int’l Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. Waikiki: ACM, 2011. 1–4.
    [11] Nguyen AT, Nguyen HA, Nguyen TT, Nguyen TN. GraPacc: A graph-based pattern-oriented, context-sensitive code completion tool. In: Proc. of the 34th Int’l Conf. on Software Engineering (ICSE). Zurich: IEEE, 2012. 1407–1410.
    [12] Akbar RJ, Omori T, Maruyama K. Mining API usage patterns by applying method categorization to improve code completion. IEICE Trans. on Information and Systems, 2014, E97-D(5): 1069–1083. [doi: 10.1587/transinf.E97.D.1069
    [13] Asaduzzaman M, Roy CK, Schneider KA, Hou DQ. CSCC: Simple, efficient, context sensitive code completion. In: Proc. of the 2014 IEEE Int’l Conf. on Software Maintenance and Evolution. Victoria: IEEE, 2014. 71–80.
    [14] de Souza Amorim LE, Erdweg S, Wachsmuth G, Visser E. Principled syntactic code completion using placeholders. In: Proc. of the 2016 ACM SIGPLAN Int’l Conf. on Software Language Engineering. Amsterdam: ACM, 2016. 163–175.
    [15] Hu S, Xiao C, Ishikawa Y. Scope-aware code completion with discriminative modeling. Journal of Information Processing, 2019, 27: 469–478. [doi: 10.2197/ipsjjip.27.469
    [16] Nguyen TT, Pham HV, Vu PM, Nguyen TT. Recommending API usages for mobile apps with hidden Markov model. In: Proc. of the 30th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Lincoln: IEEE, 2015. 795–800.
    [17] Hellendoorn VJ, Devanbu P. Are deep neural networks the best choice for modeling source code? In: Proc. of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn: IEEE, 2017. 763–773.
    [18] Gvero T, Kuncak V, Kuraj I, Piskac R. InSynth: A system for code completion using types and weights. In: Proc. of the Software Engineering & Management, Dresden, 2015. 39–40.
    [19] Roos P. Fast and precise statistical code completion. In: Proc. of the IEEE/ACM 37th IEEE Int’l Conf. on Software Engineering. Florence: IEEE, 2015. 757–759.
    [20] Savchenko V, Volkov A. Statistical approach to increase source code completion accuracy. In: Proc. of the Ershov Informatics Conf. Moscow: Springer, 2018. 352–363.
    [21] Yan JP, Qi Y, Rao QF, He H. Learning API suggestion via single LSTM network with deterministic negative sampling. In: Proc. of the 30th Int’l Conf. on Software Engineering and Knowledge Engineering. San Francisco, 2018.
    [22] Svyatkovskiy A, Zhao Y, Fu SY, Sundaresan N. Pythia: AI-assisted code completion system. In: Proc. of the 25th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. Anchorage: ACM, 2019. 2727–2735.
    [23] Nguyen S, Nguyen T, Li Y, Wang SH. Combining program analysis and statistical language model for code statement completion. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). San Diego: IEEE, 2019. 710–721.
    [24] Chen C, Peng X, Sun J, Xing ZC, Wang X, Zhao YF, Zhang HR, Zhao WY. Generative API usage code recommendation with parameter concretization. Science China Information Sciences, 2019, 62(9): 192103. [doi: 10.1007/s11432-018-9821-9
    [25] Hellendoorn VJ, Proksch S, Gall HC, Bacchelli A. When code completion fails: A case study on real-world completions. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Montreal: IEEE, 2019. 960–970.
    [26] Yang YX, Chen X, Sun JG. Improve language modeling for code completion through learning general token repetition of source code with optimized memory. Int’l Journal of Software Engineering and Knowledge Engineering, 2019, 29(11–12): 1801–1818. [doi: 10.1142/S0218194019400229
    [27] Li J, Wang Y, Lyu MR, King I. Code completion with neural attention and pointer networks. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: AAAI Press, 2018. 4159–4165.
    [28] Terada K, Watanobe Y. Code completion for programming education based on recurrent neural network. In: Proc. of the IEEE 11th Int’l Workshop on Computational Intelligence and Applications (IWCIA). Hiroshima: IEEE, 2019. 109–114.
    [29] Yang YX. Improving the robustness to data inconsistency between training and testing for code completion by hierarchical language model. arXiv: 2003.08080, 2020.
    [30] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. [doi: 10.1162/neco.1997.9.8.1735
    [31] Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y. Attention-based models for speech recognition. In: Proc. of the 28th Int’l Conf. on Neural Information Processing Systems. Montreal: MIT Press, 2015. 577–585.
    [32] Vinyals O, Kaiser Ł, Koo T, Petrov S, Sutskever I, Hinton G. Grammar as a foreign language. In: Proc. of the 28th Int’l Conf. on Neural Information Processing Systems. Montreal: MIT Press, 2015. 2773–2781.
    [33] Sukhbaatar S, Szlam A, Weston J, Fergus R. End-to-end memory networks. In: Proc. of the 28th Int’l Conf. on Neural Information Processing Systems. Montreal: MIT Press, 2015. 2440–2448.
    [34] Zhong H, Mei H. An empirical study on API usages. IEEE Trans. on Software Engineering, 2019, 45(4): 319–334. [doi: 10.1109/TSE.2017.2782280
    [35] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv: 1301.3781, 2013.
    [36] Nguyen AT, Nguyen TN. Graph-based statistical language model for code. In: Proc. of the 37th IEEE/ACM Int’l Conf. on Software Engineering. Florence: IEEE, 2015. 858–868.
    [37] Nguyen AT, Hilton M, Codoban M, Nguyen HA, Mast L, Rademacher E, Nguyen TN, Dig D. API code recommendation using statistical learning from fine-grained changes. In: Proc. of the 24th ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering. Seattle: ACM, 2016. 511–522.
    [38] Hindle A, Barr ET, Su ZD, Gabel M, Devanbu P. On the naturalness of software. In: Proc. of the 34th Int’l Conf. on Software Engineering (ICSE). Zurich: IEEE, 2012. 837–847.
    [39] Dam HK, Tran T, Pham T. A deep language model for software code. arXiv: 1608.02715, 2016.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

唐泽,李传艺,葛季栋,骆斌.基于对象类型的API补全方法.软件学报,2022,33(5):1736-1757

复制
分享
文章指标
  • 点击次数:1230
  • 下载次数: 4214
  • HTML阅读次数: 3031
  • 引用次数: 0
历史
  • 收稿日期:2021-08-11
  • 最后修改日期:2021-10-09
  • 在线发布日期: 2022-01-28
  • 出版日期: 2022-05-06
文章二维码
您是第19705229位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号