基于深度学习的API误用缺陷检测

doi:10.13328/j.cnki.jos.005722

微信服务号

微信订阅号

2025年5月1日 4:32 星期四

首页 > 过刊浏览>2019年第30卷第5期 >1342-1358. DOI:10.13328/j.cnki.jos.005722

PDF HTML阅读 XML下载导出引用引用提醒

基于深度学习的API误用缺陷检测
DOI:
                        10.13328/j.cnki.jos.005722
                    
CSTR:
                        
                    
作者:
                        汪昕汪昕
复旦大学 软件学院, 上海 201203;上海市数据科学重点实验室(复旦大学), 上海 201203
在期刊界中查找
在百度中查找
在本站中查找
陈驰陈驰
复旦大学 软件学院, 上海 201203;上海市数据科学重点实验室(复旦大学), 上海 201203
在期刊界中查找
在百度中查找
在本站中查找
赵逸凡赵逸凡
复旦大学 软件学院, 上海 201203;上海市数据科学重点实验室(复旦大学), 上海 201203
在期刊界中查找
在百度中查找
在本站中查找
彭鑫彭鑫
复旦大学 软件学院, 上海 201203;上海市数据科学重点实验室(复旦大学), 上海 201203
在期刊界中查找
在百度中查找
在本站中查找
赵文耘赵文耘
复旦大学 软件学院, 上海 201203;上海市数据科学重点实验室(复旦大学), 上海 201203
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:汪昕(1996-),女,福建泉州人,硕士生,CCF学生会员,主要研究领域为智能化软件开发;彭鑫(1979-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为智能化软件开发,移动计算,云计算;陈驰(1993-),男,博士生,主要研究领域为代码推荐;赵文耘(1964-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为软件工程,智慧城市;赵逸凡(1995-),男,硕士生,CCF专业会员,主要研究领域为智能化软件开发.
通讯作者:彭鑫,E-mail:pengxin@fudan.edu.cn
中图分类号:
基金项目:国家重点研发计划（2016YFB1000801）

API Misuse Bug Detection Based on Deep Learning

Author:

WANG Xin
WANG Xin
Software School, Fudan University, Shanghai 201203, China;Shanghai Key Laboratory of Data Science(Fudan University), Shanghai 201203, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Chi
CHEN Chi
Software School, Fudan University, Shanghai 201203, China;Shanghai Key Laboratory of Data Science(Fudan University), Shanghai 201203, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Yi-Fan
ZHAO Yi-Fan
Software School, Fudan University, Shanghai 201203, China;Shanghai Key Laboratory of Data Science(Fudan University), Shanghai 201203, China
在期刊界中查找
在百度中查找
在本站中查找
PENG Xin
PENG Xin
Software School, Fudan University, Shanghai 201203, China;Shanghai Key Laboratory of Data Science(Fudan University), Shanghai 201203, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Wen-Yun
ZHAO Wen-Yun
Software School, Fudan University, Shanghai 201203, China;Shanghai Key Laboratory of Data Science(Fudan University), Shanghai 201203, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key Research and Development Program of China (2016YFB1000801)

摘要

图/表

访问统计

参考文献 [30]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

开发人员经常需要使用各种应用程序编程接口（application programming interface，简称API）来复用已有的软件框架、类库等.由于API自身的复杂性、文档资料的缺失等原因，开发人员经常会误用API，从而导致代码缺陷.为了自动检测API误用缺陷，需要获得API使用规约，并根据规约对API使用代码进行检测.然而，可用于自动检测的API规约难以获得，而人工编写并维护的代价又很高.针对以上问题，将深度学习中的循环神经网络模型应用于API使用规约的学习及API误用缺陷的检测.在大量的开源Java代码基础上，通过静态分析构造API使用规约训练样本，同时利用这些训练样本搭建循环神经网络学习API使用规约.在此基础上，针对API使用代码进行基于上下文的语句预测，并通过预测结果与实际代码的比较发现潜在的API误用缺陷.对所提出的方法进行实现并针对Java加密相关的API及其使用代码进行了实验评估，结果表明，该方法能够在一定程度上实现API误用缺陷的自动发现.

关键词:API误用;使用规约;缺陷检测;深度学习

Abstract:

Developers often need to use various application programming interfaces (API) to reuse existing software frameworks, class libraries, and so on. Because of the complexity of the API itself, or the lack of documentation, developers often make some API misuses, which can lead to some code defects. In order to automatically detect API misuse defects, the API use specification is required and the API is tested according to the specification. However, API specifications that can be used for automatic detection are difficult to obtain, and the cost of manual writing and maintenance is high. To address the issue, this study applies the recurrent neural network model of deep learning to the task of learning API use specifications and the task of detecting the API misuse defect. In this study, based on a large number of open source Java code, the training sample of API use specification is extracted based on static analysis method, and then use the training sample to set up the recurrent neural network to learning API use specification. On this basis, this study makes a context-based prediction on the API use code, and finds out the potential API misuse defects by comparing the prediction results with the actual code. The method above is implemented, and it is evaluated with experiments about Java encryption related APIs and their used code. The results show that the proposed approach has the ability to a certain extent to automatically detect API misuse defects.

Key words:API misuse;usage specification;bug detection;deep learning

参考文献

[1] Li Z, Wu JZ, Li MS. Study on key issues about API usage. Ruan Jian Xue Bao/Journal of Software, 2018,29(6):1716-1738(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5541.htm[doi:10.13328/j.cnki.jos.005541]

[2] Zhou Y, Gu R, Chen T, Huang Z, Panichella S, Gall H. Analyzing APIs documentation and code to detect directive defects. In:Proc. of the 39th Int'l Conf. on Software Engineering. 2017. 27-37.[doi:10.1109/ICSE.2017.11]

[3] Liu S, Bai G, Sun J, Dong JS. Towards using concurrent Java API correctly. In:Proc. of the 21st Int'l Conf. on Engineering of Complex Computer Systems. 2016. 219-222.[doi:10.1109/ICECCS.2016.32]

[4] Sacramento P, Cabral B, Marques P. Unchecked exceptions:Can the programmer be trusted to document exceptions. In:Proc. of the 2nd Int'l Conf. on Innovative Views of NET Technologies. 2006.

[5] Amann S, Nadi S, Nguyen HA, et al. MUBench:A benchmark for API-misuse detectors. In:Proc. of the 13th Int'l Conf. on Mining Software Repositories. 2016. 464-467.[doi:http://dx.doi.org/10.1145/2901739.2903506]

[6] Gao Q, Zhang H, Wang J, et al. Fixing recurring crash bugs via analyzing q&a sites (T). In:Proc. of the 201530th IEEE/ACM Int'l Conf. on Automated Software Engineering. 2015. 307-318.[doi:10.1109/ASE.2015.81]

[7] Zhong H, Zhang L, Mei H. Mining invocation specifications for API libraries. Ruan Jian Xue Bao/Journal of Software, 2011,22(3):408-416(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3931.htm[doi:10.3724/SP.J.1001.2011.03931]

[8] Li Z, Zhou Y. PR-miner:Automatically extracting implicit programming rules and detecting violations in large software code. SIGSOFT Software Engineering Notes, 2005,30(5):306-315.[doi:10.1145/1081706.1081755]

[9] Zhong H, Zhang L, Xie T, Mei H. Inferring resource specifications from natural language API documentation. In:Proc. of the 2009 IEEE/ACM Int'l Conf. on Automated Software Engineering. 2009. 307-318.[doi:10.1109/ASE.2009.94]

[10] Wang S, Chollak D, Movshovitz-Attias D, Tan L. Bugram:Bug detection with n-gram language models. In:Proc. of the 201631st IEEE/ACM Int'l Conf. on Automated Software Engineering. 2016. 708-719.[doi:10.1145/2970276.2970341]

[11] https://en.wikipedia.org/wiki/Deep_learning

[12] Zheng ZY, Liang BW, Gu SY. TensorFlow:Googledeep Learning Framework, Put It into Practice. 2nd ed., Beijing:Publishing Houseof Electronic Industry, 2018(in Chinese).

[13] Graves A, Jaitly N, Mohamed A. Hybrid speech recognition with deep bidirectional LSTM. In:Proc. of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. 2013. 273-278.[doi:10.1109/ASRU.2013.6707742]

[14] Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. In:Proc. of the Empirical Methods in Natural Language Processing. 2015. 1412-1421.[doi:10.18653/v1/D15-1166]

[15] Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In:Proc. of the Empirical Methods in Natural Language Processing. 2014. 1724-1734.[doi:10.3115/v1/D14-1179]

[16] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997,9(8):1735-1780.

[17] Li H. Statistical Learning Method. Beijing:Tsinghua University Press, 2012(in Chinese).

[18] Smith N, Bruggen DV, Tomassetti F. JavaParser:Visited analyse, transform and generate your Java code base. 2017. https://enterprise.leanpub.com/javaparservisited

[19] Grinberg M. Flask Web Development. Oreilly Vlg Gmbh & Co., 2014.

[20] Nadi S, Kr S, Mezini M, Bodden E. Jumping through hoops:Why do Java developers struggle with cryptography APIs? In:Proc. of the 38th Int'l Conf. on Software Engineering. 2016. 935-946.[doi:10.1145/2884781.2884790]

[21] Egele M, Brumley D, Fratantonio Y, Kruegel C. An empirical study of cryptographic misuse in Android applications. In:Proc. of the Conf. on Computer and Communications Security. 2013. 73-84.[doi:10.1145/2508859.2516693]

[22] Fahl S, Harbach M, Muders T, Smith M, Baumgärtner L, Freisleben B. Why eve and mallory love Android:An analysis of android SSL (in) security. In:Proc. of the Conf. on Computer and Communications Security. 2012. 50-61.

[23] Gousios G, Spinellis D. Mining software engineering data from GitHub. In:Proc. of the 2017 IEEE/ACM 39th Int'l Conf. on Software Engineering Companion. 2017. 501-502.[doi:10.1109/ICSE-C.2017.164]

[24] Fowkes J, Sutton C. Parameter-free probabilistic API mining across GitHub. In:Proc. of the 201624th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. 2016. 254-265.[doi:10.1145/2950290.2950319]

[25] Chen SF, Goodman J. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 1999, 13(4):359-394.[doi:10.1006/csla.1999.0128]

附中文参考文献:

[1] 李正,吴敬征,李明树.API使用的关键问题研究.软件学报,2018,29(6):1716-1738. http://www.jos.org.cn/1000-9825/5541.htm[doi:10.13328/j.cnki.jos.005541]

[7] 钟浩,张路,梅宏.软件库调用规约挖掘.软件学报,2011,22(3):408-416. http://www.jos.org.cn/1000-9825/3931.htm[doi:10.3724/SP.J.1001.2011.03931]

[12] 郑泽宇,梁博文,顾思宇.TensorFlow:实战Google深度学习框架.第2版,北京:电子工业出版社,2018.

[17] 李航.统计学习方法.北京:清华大学出版社,2012.

引用本文

汪昕,陈驰,赵逸凡,彭鑫,赵文耘.基于深度学习的API误用缺陷检测.软件学报,2019,30(5):1342-1358

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2018-08-31
最后修改日期:2018-10-31
录用日期:
在线发布日期: 2019-05-08
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码