自然语言数据驱动的智能化软件安全评估方法

doi:10.13328/j.cnki.jos.005526

微信服务号

微信订阅号

2025年4月2日 12:20 星期三

首页 > 过刊浏览>2018年第29卷第8期 >2336-2349. DOI:10.13328/j.cnki.jos.005526

PDF HTML阅读 XML下载导出引用引用提醒

自然语言数据驱动的智能化软件安全评估方法
DOI:
                        10.13328/j.cnki.jos.005526
                    
CSTR:
                        
                    
作者:
                        张一帆张一帆
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;南京大学 计算机科学与技术系, 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
汤恩义汤恩义
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;南京大学 软件学院, 江苏 南京 210093
在期刊界中查找
在百度中查找
在本站中查找
苏琰梓苏琰梓
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;南京大学 软件学院, 江苏 南京 210093
在期刊界中查找
在百度中查找
在本站中查找
杨开懋杨开懋
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;南京大学 软件学院, 江苏 南京 210093
在期刊界中查找
在百度中查找
在本站中查找
匡宏宇匡宏宇
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;南京大学 软件学院, 江苏 南京 210093
在期刊界中查找
在百度中查找
在本站中查找
陈鑫陈鑫
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;南京大学 计算机科学与技术系, 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张一帆(1989-),男,浙江杭州人,博士生,主要研究领域为软件工程,软件开发,设备驱动程序;杨开懋(1995-),男,学士,主要研究领域为软件工程;汤恩义(1982-),男,博士,助理研究员,CCF专业会员,主要研究领域为软件工程,新型软件测试方法,程序分析方法;匡宏宇(1985-),男,博士,助理研究员,主要研究领域为软件可追踪性,文本分析,程序理解;苏琰梓(1996-),男,本科生,主要研究领域为深度学习,自然语言分析与理解;陈鑫(1975-),男,博士,副教授,主要研究领域为软件工程,软件测试,验证技术
通讯作者:汤恩义,E-mail:eytang@nju.edu.cn
中图分类号:
基金项目:国家重点研发计划（2016YFB1000802）；国家自然科学基金（61772260，61402222）

Natural Language Data Driven Approach for Software Intelligent Safety Evaluation

Author:

ZHANG Yi-Fan
ZHANG Yi-Fan
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
TANG En-Yi
TANG En-Yi
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
在期刊界中查找
在百度中查找
在本站中查找
SU Yan-Zi
SU Yan-Zi
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Kai-Mao
YANG Kai-Mao
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
在期刊界中查找
在百度中查找
在本站中查找
KUANG Hong-Yu
KUANG Hong-Yu
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Xin
CHEN Xin
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key Research and Development Program of China (2016YFB1000802); National Natural Science Foundation of China (61772260, 61402222)

摘要

图/表

访问统计

参考文献 [30]

相似文献

引证文献

资源附件

文章评论

摘要:

软件安全性是衡量软件是否能够抵御恶意攻击的重要性质.在当前互联网环境下，黑客攻击无处不在，因而估计软件中可能含有的漏洞数量与类型，即对软件进行安全评估，变得十分必要.在实际中，用户不仅需要对未发布或者最新发布的软件实施安全性评估，对已发布软件也会有一定的安全评估需求，例如，当用户需要从市场上互为竞争的多款软件中做出选择，就会希望能够花费较低成本、较为客观地对这些软件进行第三方的评估与比较.提出了一种由自然语言数据驱动的智能化软件安全评估方法来满足这一要求.该方法基于待评估软件现有用户的使用经验信息来评估软件的安全性.它首先自适应地爬取用户在软件使用过程中对软件的自然语言评价数据，并利用深度学习方法与机器学习评估模型的双重训练来获得软件的安全性评估指标.由于所提出的自适应爬虫能够在反馈中调整特征词，并结合搜索引擎来获得异构数据，因而可通过采集广泛的自然语言数据来进行安全评估.另外，使用一对多的机器翻译训练能够有效地解决将自然语言数据转换为语义编码的问题，使得用于安全评估的机器学习模型可以建立在自然语言的语义特征基础上.在国际通用漏洞披露数据库（CVE）和美国国家漏洞数据库（NVD）上对该方法进行了实验，结果表明，该方法在评估软件漏洞数量、漏洞类型以及漏洞严重程度等指标上十分有效.

关键词:软件安全评估;自然语言处理;机器学习;网络爬虫

Abstract:

Software safety is a key property that determines whether software is vulnerable to malicious attacks. Nowadays, Internet attacks are ubiquitous, thus it is important to evaluate the number and category of defects in the software. Users need not only evaluate the safety of software that is unreleased or released recently, but also evaluate the software that is already published for a while. For example, when users want to evaluate the safety of several competitive software systems before they decide their purchase, they need a low cost, objective evaluation approach. In this paper, a natural language data driven approach is proposed for evaluating the safety of software that is released already. This approach crawls natural language data adaptively, and applies a dual training to evaluate the software safety. As the self-adaptive Web crawler adjusts feature words from the feedback and acquires heterogeneous data from search engines, software safety evaluation utilizes extensive data sources automatically. Furthermore, by customizing a machine translation model, it is quite efficient to convert natural language to its semantic encoding. Hence, a machine learning model is built for intelligently evaluating software safety based on semantic characteristics of natural language. Experiments are conducted on the Common Vulnerabilities and Exposures (CVE) and the National Vulnerability Database (NVD). The results show that the presented approach is able to make safety evaluations precisely on the amount, impact and category of defects in software.

Key words:software safety evaluation;natural language processing;machine learning;Web crawler

参考文献

[1] Symantec. Internet Security Threat Report, 2012. https://www.symantec.com/content/dam/symantec/docs/reports/istr-17-2012-en.pdf

[2] Symantec. Internet Security Threat Report, 2017. https://www.symantec.com/content/dam/symantec/docs/reports/istr-22-2017-en.pdf

[3] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015,521(7553):436-444.

[4] Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S. Recurrent neural network based language model. In:Proc. of the 11th Annual Conf. of the Int'l Speech Communication Association (INTERSPEECH 2010). 2010. 1045-1048.

[5] Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In:Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP 2014). Doha:Association for Computational Linguistics, 2014. 1724-1734.

[6] Lund MS, Solhaug B, Stølen K. A guided tour of the CORAS method. In:Proc. of the Model-Driven Risk Analysis:The CORAS Approach. Berlin, Heidelberg:Springer-Verlag, 2011. 23-43.

[7] Goseva-Popstojanova K, Hassan A, Guedem A, Abdelmoez W, Nassar DEM, Ammar H, Mili A. Architectural-Level risk analysis using UML. IEEE Trans. on Software Engineering, 2003,29(10):946-960.

[8] Yacoub SM, Ammar HH. A methodology for architecture-level reliability risk analysis. IEEE Trans. on Software Engineering, 2002,28(6):529-547.

[9] Cho T, Kim H, Yi JH. Security assessment of code obfuscation based on dynamic monitoring in Android things. IEEE Access, 2017,5:6361-6371.

[10] Nostro N, Matteucci I, Ceccarelli A, Santini F, Di Giandomenico F, Martinelli F, Bondavalli A. A multi-criteria ranking of security countermeasures. In:Proc. of the 31st Annual ACM Symp. on Applied Computing. New York:ACM Press, 2016. 530-533.

[11] Tang X, Shen B. Extending model driven architecture with software security assessment. In:Proc. of the 3rd IEEE Int'l Conf. on Secure Software Integration and Reliability Improvement. 2009. 436-441.

[12] Wang X, Shi H, Huang TYW, Lin FC. Integrated software vulnerability and security functionality assessment. In:Proc. of the 18th IEEE Int'l Symp.on Software Reliability (ISSRE 2007). 2007. 103-108.

[13] Schmeelk S, Yang J, Aho A. Android malware static analysis techniques. In:Proc. of the 10th Annual Cyber and Information Security Research Conf. New York:ACM Press, 2015.

[14] Rashidi B, Fung C. A survey of Android security threats and defenses. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA), 2015,6:3-35.

[15] Neuner S, van der Veen V, Lindorfer M, Huber M, Merzdovnik G, Mulazzani M, Weippl E. Enter sandbox:Android sandbox comparison. arXiv:1410.7749[cs], 2014.

[16] Gu TL, He ZC, Chang L, Xu ZB. Secure evaluation of classification algorithms based on symbolic ADD and linear multi-branching program. Chinese Journal of Electronics, 2014,42(5):940-947(in Chinese with English abstract).

[17] Shabtai A, Fledel Y, Kanonov U, Elovici Y, Dolev S, Glezer C. Google Android:A comprehensive security assessment. IEEE Security Privacy, 2010,8:35-44.

[18] Liu Q, Zhang Y. VRSS:A new system for rating and scoring vulnerabilities. Computer Communications, 2011,34:264-273.

[19] Liu Q, Zhang Y, Kong Y, Wu Q. Improving VRSS-based vulnerability prioritization using analytic hierarchy process. Journal of Systems and Software, 2012,85:1699-1708.

[20] Bagheri H, Sadeghi A, Garcia J, Malek S. COVERT:Compositional analysis of Android inter-app permission leakage. IEEE Trans. on Software Engineering, 2015,41(9):866-886.

[21] Gascon H, Yamaguchi F, Arp D, Rieck K, Structural detection of android malware using embedded call graphs. In:Proc. of the 2013 ACM Workshop on Artificial Intelligence and Security. 2013. 45-54.

[22] Afonso VM, de Amorim MF, Grégio ARA, Junquera GB, de Geus PL. Identifying Android malware using dynamically obtained features. Journal of Computer Virology and Hacking Techniques, 2015,11(1):9-17.

[23] Yang W, Xiao X, Andow B, Li S, Xie T, Enck W. AppContext:Differentiating malicious and benign mobile app behaviors using context. In:Proc. of the 37th Int'l Conf. on Software Engineering, Vol.1. 2015. 303-313.

[24] Gorla A, Tavecchia I, Gross F, Zeller A. Checking app behavior against app descriptions. In:Proc. of the 36th Int'l Conf. on Software Engineering. 2014. 1025-1035.

[25] Lu K, Li Z, Kemerlis VP, Wu Z, Lu L, Zheng C, Qian Z, Lee W, Jiang G. Checking more and alerting less:Detecting privacy leakages via enhanced data-flow analysis and peer voting. In:Proc. of the 2015 Network and Distributed System Security (NDSS 2015). 2015.

[26] Qu Z, Rastogi V, Zhang X, Chen Y, Zhu T, Chen Z. AutoCog:Measuring the description-to-permission fidelity in Android applications. In:Proc. of the 2014 ACM SIGSAC Conf. on Computer and Communications Security. 2014. 1354-1365.

[27] Nan Y, Yang M, Yang Z, Zhou S, Gu G, Wang X. UIPicker:User-Input privacy identification in mobile applications. In:Proc. of the 24th USENIX Security Symp. (USENIX Security 2015). Washington:USENIX Association, 2015. 993-1008.

[28] Huang J, Li Z, Xiao X, Wu Z, Lu K, Zhang X, Jiang G. SUPOR:Precise and scalable sensitive user input detection for Android apps. In:Proc. of the 24th USENIX Security Symp. (USENIX Security 2015). Washington:USENIX Association, 2015. 977-992.

附中文参考文献:

[16] 古天龙,何仲春,常亮,徐周波.基于符号ADD和线性多分支程序的分类算法安全评估.电子学报,2014,42(5):940-947.

引用本文

张一帆,汤恩义,苏琰梓,杨开懋,匡宏宇,陈鑫.自然语言数据驱动的智能化软件安全评估方法.软件学报,2018,29(8):2336-2349

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2017-07-18
最后修改日期:2017-09-28
录用日期:
在线发布日期: 2018-03-13
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码