针对复杂用户评论的代码质量属性判断
作者:
作者简介:

徐海燕(1996-),女,学士,CCF学生会员,主要研究领域为软件工程,软件质量保证与测试.
姜瑛(1974-),女,博士,教授,博士生导师,CCF高级会员,主要研究领域为软件质量保证与测试,云计算,大数据分析,智能软件工程.

通讯作者:

姜瑛,E-mail:jy_910@163.com

基金项目:

国家重点研发计划(2018YFB1003904);国家自然科学基金(61462049,61063006,60703116);云南省应用基础研究计划(2017FA033);云南省计算机技术应用重点实验室开放基金(2020101)


Determination of Code Quality Attribute for Complex User's Comments
Author:
Fund Project:

National Key R&D Program of China (2018YFB1003904); National Natural Science Foundation of China (61462049, 60703116, 61063006); Key Project of Yunnan Applied Basic Research (2017FA033); Open Foundation of Yunnan Key Laboratory of Computer Technology Application (2020101)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [54]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着开发者社区和代码托管平台成为程序员获取代码的主要途径,针对代码的用户评论数量急剧增加.用户在使用代码后给出的评论中包含多种静态和动态的代码质量属性信息,但是由于用户评论多为复杂句,使得评论中包含的代码质量属性难以判断.针对复杂用户评论的代码质量属性判断将有助于分析用户评论中的代码质量信息,有助于开发者在了解用户的代码使用情况和用户关注的代码质量属性后有针对性地提升代码质量.提出了针对复杂用户评论的代码质量属性判断方法.首先对复杂用户评论进行分句并构建分句的依存句法关系有向图;然后,应用基于分句的依存句法关系的主题判断规则抽取分句中的主题;接着,根据初始的代码质量属性特征词库识别各主题对应的代码质量属性,并获取各主题的代码质量属性表现与表现结果;最后,基于主题处理规则分析复杂用户评论中的代码质量属性表现与表现结果,产生复杂用户评论中代码质量属性相关结果,并持续扩充初始代码质量属性特征词库.实验结果表明,该方法能够对复杂用户评论的代码质量属性进行有效判断.

    Abstract:

    As the developer community and code-hosting platforms become the primary means for programmers to access code, the number of user's comments on code has increased dramatically. There are a variety of static and dynamic code quality attributes in user's comments. However, as most of the user's comments are complex sentences, it is difficult to identify the code quality attributes in the comments. Judging the code quality attributes of complex user's comments will help to analyze the code quality information in user's comments and to improve code quality for the developers when they know about user's code usage and code quality attributes. In this study, a method is proposed to judge code quality attributes based on complex user's comments. Firstly, complex user's comments are divided into clauses and a dependency syntactic relation directed graph of the clauses is constructed. After that, the topic of the clause is extracted based on the topic judgment rule of the dependency syntactic relation of the clause. Then, according to the initial feature thesaurus of code quality attribute, the code quality attributes corresponding to each topic are identified, and the representation and the representation result of code quality attribute for each topic are acquired. And finally, the representation and the representation result of code quality attribute in the complex user's comments are analyzed based on the topic processing rule. The code quality attribute related result in the complex user's comment is produced, and the initial code quality attribute feature thesaurus is continuously expanded. The experimental results show that the proposed method can judge the code quality attributes of complex user's comments effectively.

    参考文献
    [1] Berkhan D. Software component score:Measuring software component quality using static code analysis. In:Proc. of the Computational Science and Its Applications-ICCSA 2015. 2015,9159:63-72.[doi:10.1007/978-3-319-21413-9_5]
    [2] Mamun MAA, Berger C, Jörgen H. Correlations of software code metrics:An empirical study. In:Proc. of the IWSM Mensura 2017:The 27th Int'l Workshop on Software Measurement and the 12th Int'l Conf. on Software Process and Product Measurement, 2017.
    [3] Kaur A, Nayyar R. A comparative study of static code analysis tools for vulnerability detection in C/C++ and JAVA source code. Procedia Computer Science, 2020,171:2023-2029.[doi:10.1016/j.procs.2020.04.217]
    [4] Ruiz-Rube I, Person T, Dodero JM, Mota J, Mota JM, Sánchez-Jara JM. Applying static code analysis for domain-specific languages. Software & Systems Modeling, 2020,19(1):95-11.[doi:10.1007/s10270-019-00729-w]
    [5] Pagano D, Brügge B. User involvement in software evolution practice:A case study. In:Proc. of the 2013 Int'l Conf. on Software Engineering. San Francisco:IEEE Press, 2013. 953-962.
    [6] Xu HY, Jiang Y. Code quality recognition and analysis based on user's comments. Computer Science, 2020,47(6):44-50(in Chinese with English abstract).
    [7] Li AP, Qiu P, Duan LG. Document sentiment orientation analysis based on sentence weighted algorithm. Journal of Chinese Computer Systems, 2015,36(10):2252-2256(in Chinese with English abstract).
    [8] Hu TY, Jiang Y. Mining of user's comments reflecting usage feedback for APP software. Ruan Jian Xue Bao/Journal of Software, 2019,30(10):3168-3185(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5794.htm[doi:10.13328/j.cnki.jos. 005794]
    [9] Duan WJ, Jiang Y. Defect recognition of APP software based on user feedback. Computer Science, 2020,47(6):44-50(in Chinese with English abstract).
    [10] Wang DX, Wang Q. Trustworthiness evidence supporting evaluation of software process trustworthiness. Ruan Jian Xue Bao/Journal of Software, 2018,29(11):3412-3434(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5291.htm[doi:10.13328/j.cnki.jos.005291]
    [11] Venkatasubramanyamrd RD, Sowmya GR. Why is dynamic analysis not used as extensively as static analysis:An industrial study? In:Proc. of the 1st Int'l Workshop on Software Engineering Research and Industrial Practices-SER&IPs 2014. 2014. 24-33.
    [12] Liu ZG, Chen XR. A entity-action relationship model for text clustering. Journal of Chinese Information Processing, 2018,32(5):22-30(in Chinese with English abstract).
    [13] Liu ZG, Chen XR. Research on argument relationship model based in syntactic analyses. Journal of Nanjing University (Natural Science), 2019,55(6):1010-1019(in Chinese with English abstract).
    [14] Mao TT, Lv XQ, Zhou Q, Liu Y. Manual annotation approach to Chinese complex sentences by using bottom-up and top-down. Journal of Chinese Computer Systems, 2016,37(4):716-721(in Chinese with English abstract).
    [15] Ye ZL, Jia Z, Yin HF, et al. Research on multi-domain natural language question understanding. Computer Science, 2017(6):216-221(in Chinese with English abstract).
    [16] Swain D, Tambe M, Ballal P, Dolase V, Agrawal K, Rajmane Y. Lexical text simplification using WordNet. Communications in Computer and Information Science, 2019,1046:114-122.[doi:10.1007/978-981-13-9942-8_11]
    [17] Siddharthan A. Syntactic simplification and text cohesion. Research on Language & Computation, 2006,4(1):77-109.[doi:10.1007/s11168-006-9011-1]
    [18] Andreasen E, Gong L, Møller A, Pradel M, Selakovic M, Sen K, Staicu C. A survey of dynamic analysis and test generation for JavaScript. ACM Computing Surveys, 2017,50(5):66:1-36.[doi:10.1145/3106739]
    [19] Selakovic M, Pradel M, Karim R, Tip F. Test generation for higher-order functions in dynamic languages. Proc. of the ACM on Programming Languages, 2018,2:27:1-27.[doi:10.1145/3276531]
    [20] Huang PJ, Yang MQ. Research and application of static metrics for code quality. Computer Engineering and Applications, 2011, 47(23):61-63(in Chinese with English abstract).
    [21] Zheng RJ. Computer Software Testing Technology. Beijing:Tsinghua University Press, 1992. 31-35(in Chinese).
    [22] Yu Y, Chen L, Jiang JD, Zhao NX. Research on the selection of Chinese patent candidate term based on dependency syntax parsing. Library and Information Service, 2019,63(18):109-118(in Chinese with English abstract).
    [23] Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing:A novel approach. Cognitive Computation, 2015,7(4):487-499.[doi:10.1007/s12559-014-9316-6]
    [24] Feng C, Liao C, Liu ZR, Huang HY. Sentiment key sentence identification based on lexical semantics and syntactic dependency. Acta Electronica Sinica, 2016,44(10):2471-2476(in Chinese with English abstract).
    [25] Gan LX, Wan CX, Liu DX, Zhong Q, Jiang TJ. Chinese named entity relation extraction based on syntactic and semantic features. Journal of Computer Research and Development, 2016,53(2):284-302(in Chinese with English abstract).
    [26] Wan CX, Gan LX, Jiang TJ, Liu DX, Liu XP, Liu Y. Chinese named entity implicit relation extraction based on company verbs. Chinese Journal of Computers, 2019,42(12):2795-2820(in Chinese with English abstract).
    [27] Tian CY, Chen DH, Wang M, Le JJ. Structured processing for pathological reports based on dependency parsing. Journal of Computer Research and Development, 2016,52(12):2669-2680(in Chinese with English abstract).
    [28] Luo SL, Han L, Pan LM, Wei C. Construction method of Chinese sentential semantic structure. Journal of Beijing Institute of Technology, 2015(1):110-117.
    [29] Che WX, Li ZH, Liu T. LTP:A Chinese language technology platform. In:Proc. of the 23th Int'l Conf. on Computational Linguistics-COLING 2010. Beijing, 2010. 23-27.
    [30] Chen Q, Zhang L, Jiang J, Huang XY. Review analysis method based on support vector machine and latent Dirichlet allocation. Ruan Jian Xue Bao/Journal of Software, 2019,30(5):1547-1560(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5731.htm[doi:10.13328/j.cnki.jos.005731]
    [31] Zhang YT, Wan CX, Liu XP, Jiang TJ, Liu DX, Liao GQ. Mining unstructured economic indicators based on PSP_HDP topic model. Ruan Jian Xue Bao/Journal of Software, 2020,31(3):845-865(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5898.htm[doi:10.13328/j.cnki.jos.005898]
    [32] Gao HY, Liu JW, Yang SX. Identifying topics of online healthcare reviews based on improved LDA. Trans. of Beijing Institute of Technology, 2019,39(4):427-434(in Chinese with English abstract).
    [33] Zhang ZY, Huo WG. A topic text network construction method based on PL-LDA model. Complex Systems and Complexity Science, 2017,14(1):52-57, 110(in Chinese with English abstract).
    附中文参考文献:
    [6] 徐海燕,姜瑛.基于用户评论的代码质量识别与分析.计算机科学,2020,47(3):41-47.
    [7] 李爱萍,邸鹏,段利国.基于句子情感加权算法的篇章情感分析.小型微型计算机系统,2015,36(10):2252-2256.
    [8] 胡甜媛,姜瑛.体现使用反馈的APP软件用户评论挖掘.软件学报,2019,30(10):3168-3185. http://www.jos.org.cn/1000-9825/5794.htm[doi:10.13328/j.cnki.jos.005794]
    [9] 段文静,姜瑛.基于用户反馈的APP软件缺陷识别.计算机科学,2020,47(6):44-50.
    [10] 王德鑫,王青.支持软件过程可信评估的可信证据.软件学报,2018,29(11):3412-3434. http://www.jos.org.cn/1000-9825/5291.htm[doi:10.13328/j.cnki.jos.005291]
    [12] 刘作国,陈笑蓉.面向文本聚类的实体-动作关联模型研究.中文信息学报,2018,32(5):22-30.
    [13] 刘作国,陈笑蓉.汉语句法分析中的论元关系模型研究.南京大学学报(自然科学版),2019,55(6):1010-1019.
    [14] 毛婷婷,吕学强,周强,刘殷.融合从底向上与自顶向下的中文复杂句人工标注方法.小型微型计算机系统,2016,37(4):716-721.
    [15] 冶忠林,贾真,尹红风.多领域自然语言问句理解研究.计算机科学,2017,44(6):216-221.
    [20] 黄沛杰,杨铭铨.代码质量静态度量的研究与应用.计算机工程与应用,2011,47(23):61-63.
    [21] 郑人杰.计算机软件测试技术.北京:清华大学出版社,1992.31-35.
    [22] 俞琰,陈磊,姜金德,赵乃瑄.基于依存句法分析的中文专利候选术语选取研究.图书情报工作,2019,63(18):109-118.
    [24] 冯冲,廖纯,刘至润,黄河燕.基于词汇语义和句法依存的情感关键句识别.电子学报,2016,44(10):2471-2476.
    [25] 甘丽新,万常选,刘德喜,钟青,江腾蛟.基于句法语义特征的中文实体关系抽取.计算机研究与发展,2016,53(2):284-302.
    [26] 万常选,甘丽新,江腾蛟,刘德喜,刘喜平,刘玉.基于协陪义动词的中文隐式实体关系抽取.计算机学报,2019,42(12):2795-2820.
    [27] 田驰远,陈德华,王梅,乐嘉锦.基于依存句法分析的病理报告结构化处理方法.计算机研究与发展,2016,52(12):2669-2680.
    [30] 陈琪,张莉,蒋竞,黄新越.一种基于支持向量机和主题模型的评论分析方法.软件学报,2019,30(5):1547-1560. http://www.jos.org.cn/1000-9825/5731.htm[doi:10.13328/j.cnki.jos.005731]
    [31] 张奕韬,万常选,刘喜平,江腾蛟,刘德喜,廖国琼.基于PSP_HDP主题模型的非结构化经济指标挖掘.软件学报,2020,31(3):845-865. http://www.jos.org.cn/1000-9825/5898.htm[doi:10.13328/j.cnki.jos.005898]
    [32] 高慧颖,刘嘉唯,杨淑昕.基于改进LDA的在线医疗评论主题挖掘.北京理工大学学报,2019,39(4):427-434.
    [33] 张志远,霍纬纲.一种基于PL-LDA模型的主题文本网络构建方法.复杂系统与复杂性科学,2017,14(1):52-57,110.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

徐海燕,姜瑛.针对复杂用户评论的代码质量属性判断.软件学报,2021,32(7):2183-2203

复制
分享
文章指标
  • 点击次数:2037
  • 下载次数: 5784
  • HTML阅读次数: 3467
  • 引用次数: 0
历史
  • 收稿日期:2020-09-14
  • 最后修改日期:2020-10-26
  • 在线发布日期: 2021-01-22
  • 出版日期: 2021-07-06
文章二维码
您是第19727868位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号