软件问答社区的问题删除预测方法

doi:10.13328/j.cnki.jos.006556

微信服务号

微信订阅号

2025年3月18日 0:50 星期二

首页 > 过刊浏览>2022年第33卷第5期 >1699-1710. DOI:10.13328/j.cnki.jos.006556

PDF HTML阅读 XML下载导出引用引用提醒

软件问答社区的问题删除预测方法
DOI:
                        10.13328/j.cnki.jos.006556
                    
CSTR:
                        
                    
作者:
                        蒋竞蒋竞
北京航空航天大学 计算机学院, 北京 100191
在期刊界中查找
在百度中查找
在本站中查找
苗萌苗萌
北京航空航天大学 计算机学院, 北京 100191
在期刊界中查找
在百度中查找
在本站中查找
赵丽娴赵丽娴
北京航空航天大学 计算机学院, 北京 100191
在期刊界中查找
在百度中查找
在本站中查找
张莉张莉
北京航空航天大学 计算机学院, 北京 100191
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:张莉, lily@buaa.edu.cn
中图分类号:TP311
基金项目:科技创新2030-“新一代人工智能”重大项目(2018AAA0102304); 国家自然科学基金(62177003); 中央高校基本科研业务费(YWF-20-BJ-J-1018)

Prediction Method for Question Deletion in Software Question and Answer Community

Author:

JIANG Jing
JIANG Jing
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找
MIAO Meng
MIAO Meng
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Li-Xian
ZHAO Li-Xian
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Li
ZHANG Li
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [23]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

Stack Overflow是最受欢迎的软件问答社区之一, 用户可以在该网站发布问题并得到其他用户的回答. 为了保证问题质量, 网站需要尽快发现并删除低质量或者不符合社区主题的问题. 当前, Stack Overflow主要采用人工检查的方式发现需要被删除的问题. 然而这种方式往往不能保证问题被及时发现、删除, 而且加重了社区管理员的负担. 为了快速发现需要删除的问题, 提出了自动化预测问题删除的方法MulPredictor. 该方法提取问题的语义内容特征、语义统计特征和元特征, 使用随机森林分类器计算问题会被删除的概率. 实验结果表明: 与现有方法DelPredictor和NLPPredictor相比, MulPredictor的准确率在平衡测试集上分别提升了16.34%和12.78%, 在随机测试集上分别提升了12.38%和14.14%. 此外, 分析了影响问题删除的重要特征, 发现代码段、问题的标题和正文第1段的特征对问题删除有重要的影响.

关键词:问题删除预测;问题质量;问题分类;软件问答社区;Stack Overflow

Abstract:

Stack Overflow is one of the most popular software question and answer communities, where users can post questions and receive answers from others. In order to ensure the quality of questions, the website needs to promptly discover and delete questions with low quality or not conforming to the community’s theme. Currently, Stack Overflow mainly relies on manual inspection to find questions that need to be deleted. However, this way usually hardly guarantees to discover and delete questions in time, and increases the burden of community administrators. In order to quickly find questions that need to be deleted, this study proposes a method to automatically predict question deletion, which is named MulPredictor. This method extracts the semantic content features, the semantic statistical features and the meta features of a question, and uses the random forest classifier to calculate the probability that it will be deleted. Experimental results showed that, compared with existing methods DelPredictor and NLPPredictor, MulPredictor increases the accuracy by 16.34% and 12.78% on balanced test set, and increases the accuracy by 12.38% and 14.14% on random test set. In addition, this study also analyzes important features in question deletion, and finds that the code segment, the question’s title, and the first paragraph of the question’s body have the most significant impacts on question deletion.

Key words:prediction of question deletion;question quality;question classification;software question and answer community;Stack Overflow

参考文献

[1] Phukan D, Singha AK. Feasibility analysis for popularity prediction of stack exchange posts based on its initial content. In: Proc. of the 3rd Int’l Conf. on Computing for Sustainable Global Development (INDIACom). New Delhi: IEEE, 2016. 1397–1402.

[2] Wu YH, Wang SW, Bezemer CP, Inoue K. How do developers utilize source code from stack overflow? Empirical Software Engineering, 2019, 24(2): 637–673. [doi: 10.1007/s10664-018-9634-5

[3] Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA. Classifying stack overflow posts on API issues. In: Proc. of the 25th IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). Campobasso: IEEE, 2018. 244–254.

[4] Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B. Design lessons from the fastest Q&A site in the west. In: Proc. of the SIGCHI Conf. on Human Factors in Computing Systems. Vancouver: ACM, 2011. 2857–2866.

[5] Correa D, Sureka A. Chaff from the wheat: Characterization and modeling of deleted questions on stack overflow. In: Proc. of the 23rd Int’l Conf. on World Wide Web. Seoul: ACM, 2014. 631–642.

[6] Barua A, Thomas SW, Hassan AE. What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering, 2014, 19(3): 619–654. [doi: 10.1007/s10664-012-9231-y

[7] Xia X, Lo D, Correa D, Sureka A, Shihab E. It takes two to tango: Deleted stack overflow question prediction with text and meta features. In: Proc. of the 40th IEEE Annual Computer Software and Applications Conf. (COMPSAC). Atlanta: IEEE, 2016. 73–82.

[8] Tóth L, Nagy B, Gyimóthy T, Vidács L. Why will my question be closed?: NLP-based pre-submission predictions of question closing reasons on stack overflow. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering: New Ideas and Emerging Results. Seoul: ACM, 2020. 45–48.

[9] Zhang W, Wang W, Wang J, Zha HY. User-guided hierarchical attention network for multi-modal social image popularity prediction. In: Proc. of the 2018 World Wide Web Conf. Lyon: Int’l World Wide Web Conferences Steering Committee, 2018. 1277–1286.

[10] Zhou JY, Wang SW, Bezemer CP, Hassan AE. Bounties on technical Q&A sites: A case study of Stack Overflow bounties. Empirical Software Engineering, 2020, 25(1): 139–177. [doi: 10.1007/s10664-019-09744-3

[11] Pâr?achi PP, Dash SK, Treude C, Barr ET. POSIT: Simultaneously tagging natural and programming languages. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 1348–1358.

[12] Zhang TY, Upadhyaya G, Reinhardt A, Rajan H, Kim M. Are code examples on an online Q&A forum reliable?: A study of API misuse on stack overflow. In: Proc. of the 40th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Gothenburg: IEEE, 2018. 886–896.

[13] Beyer S, Macho C, Di Penta M, Pinzger M. What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories. Empirical Software Engineering, 2020, 25(3): 2258–2301. [doi: 10.1007/s10664-019-09758-x

[14] An L, Mlouki O, Khomh F, Antoniol G. Stack overflow: A code laundering platform? In: Proc. of the 24th IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). Klagenfurt: IEEE, 2017. 283–293.

[15] Gómez C, Cleary B, Singer L. A study of innovation diffusion through link sharing on stack overflow. In: Proc. of the 10th Working Conf. on Mining Software Repositories (MSR). San Francisco: IEEE, 2013. 81–84.

[16] Linares-Vásquez M, Dit B, Poshyvanyk D. An exploratory analysis of mobile development issues using stack overflow. In: Proc. of the 10th Working Conf. on Mining Software Repositories (MSR). San Francisco: IEEE, 2013. 93–96.

[17] Wang W, Godfrey MW. Detecting API usage obstacles: A study of iOS and android developer questions. In: Proc. of the 10th Working Conf. on Mining Software Repositories (MSR). San Francisco: IEEE, 2013. 61–64.

[18] Zhang HX, Wang SW, Chen TH, Zou Y, Hassan AE. An empirical study of obsolete answers on Stack Overflow. IEEE Trans. on Software Engineering, 2021, 47(4): 850–862. [doi: 10.1109/TSE.2019.2906315

[19] Ren XX, Xing ZC, Xia X, Li GQ, Sun JL. Discovering, explaining and summarizing controversial discussions in community Q&A sites. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). San Diego: IEEE, 2019. 151–162.

[20] Singh P, Chopra R, Sharma O, et al. Stackoverflow tag prediction using tag associations and code analysis. Journal of Discrete Mathematical Sciences and Cryptography, 2020, 23(1): 35–43.

[21] Kim Y. Convolutional neural networks for sentence classification. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha: Association for Computational Linguistics, 2014. 1746–1751.

[22] Tausczik YR, Pennebaker JW. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 2010, 29(1): 24–54. [doi: 10.1177/0261927X09351676

[23] McHaney R, Tako A, Robinson S. Using LIWC to choose simulation approaches: A feasibility study. Decision Support Systems, 2018, 111: 1–12. [doi: 10.1016/j.dss.2018.04.002

引用本文

蒋竞,苗萌,赵丽娴,张莉.软件问答社区的问题删除预测方法.软件学报,2022,33(5):1699-1710

复制

文章指标

点击次数:1007
下载次数: 4212
HTML阅读次数: 2677
引用次数: 0

历史

收稿日期:2021-08-10
最后修改日期:2021-10-09
录用日期:
在线发布日期: 2022-01-28
出版日期: 2022-05-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码