Prediction Method for Question Deletion in Software Question and Answer Community
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [23]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Stack Overflow is one of the most popular software question and answer communities, where users can post questions and receive answers from others. In order to ensure the quality of questions, the website needs to promptly discover and delete questions with low quality or not conforming to the community’s theme. Currently, Stack Overflow mainly relies on manual inspection to find questions that need to be deleted. However, this way usually hardly guarantees to discover and delete questions in time, and increases the burden of community administrators. In order to quickly find questions that need to be deleted, this study proposes a method to automatically predict question deletion, which is named MulPredictor. This method extracts the semantic content features, the semantic statistical features and the meta features of a question, and uses the random forest classifier to calculate the probability that it will be deleted. Experimental results showed that, compared with existing methods DelPredictor and NLPPredictor, MulPredictor increases the accuracy by 16.34% and 12.78% on balanced test set, and increases the accuracy by 12.38% and 14.14% on random test set. In addition, this study also analyzes important features in question deletion, and finds that the code segment, the question’s title, and the first paragraph of the question’s body have the most significant impacts on question deletion.

    Reference
    [1] Phukan D, Singha AK. Feasibility analysis for popularity prediction of stack exchange posts based on its initial content. In: Proc. of the 3rd Int’l Conf. on Computing for Sustainable Global Development (INDIACom). New Delhi: IEEE, 2016. 1397–1402.
    [2] Wu YH, Wang SW, Bezemer CP, Inoue K. How do developers utilize source code from stack overflow? Empirical Software Engineering, 2019, 24(2): 637–673. [doi: 10.1007/s10664-018-9634-5
    [3] Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA. Classifying stack overflow posts on API issues. In: Proc. of the 25th IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). Campobasso: IEEE, 2018. 244–254.
    [4] Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B. Design lessons from the fastest Q&A site in the west. In: Proc. of the SIGCHI Conf. on Human Factors in Computing Systems. Vancouver: ACM, 2011. 2857–2866.
    [5] Correa D, Sureka A. Chaff from the wheat: Characterization and modeling of deleted questions on stack overflow. In: Proc. of the 23rd Int’l Conf. on World Wide Web. Seoul: ACM, 2014. 631–642.
    [6] Barua A, Thomas SW, Hassan AE. What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering, 2014, 19(3): 619–654. [doi: 10.1007/s10664-012-9231-y
    [7] Xia X, Lo D, Correa D, Sureka A, Shihab E. It takes two to tango: Deleted stack overflow question prediction with text and meta features. In: Proc. of the 40th IEEE Annual Computer Software and Applications Conf. (COMPSAC). Atlanta: IEEE, 2016. 73–82.
    [8] Tóth L, Nagy B, Gyimóthy T, Vidács L. Why will my question be closed?: NLP-based pre-submission predictions of question closing reasons on stack overflow. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering: New Ideas and Emerging Results. Seoul: ACM, 2020. 45–48.
    [9] Zhang W, Wang W, Wang J, Zha HY. User-guided hierarchical attention network for multi-modal social image popularity prediction. In: Proc. of the 2018 World Wide Web Conf. Lyon: Int’l World Wide Web Conferences Steering Committee, 2018. 1277–1286.
    [10] Zhou JY, Wang SW, Bezemer CP, Hassan AE. Bounties on technical Q&A sites: A case study of Stack Overflow bounties. Empirical Software Engineering, 2020, 25(1): 139–177. [doi: 10.1007/s10664-019-09744-3
    [11] Pâr?achi PP, Dash SK, Treude C, Barr ET. POSIT: Simultaneously tagging natural and programming languages. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 1348–1358.
    [12] Zhang TY, Upadhyaya G, Reinhardt A, Rajan H, Kim M. Are code examples on an online Q&A forum reliable?: A study of API misuse on stack overflow. In: Proc. of the 40th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Gothenburg: IEEE, 2018. 886–896.
    [13] Beyer S, Macho C, Di Penta M, Pinzger M. What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories. Empirical Software Engineering, 2020, 25(3): 2258–2301. [doi: 10.1007/s10664-019-09758-x
    [14] An L, Mlouki O, Khomh F, Antoniol G. Stack overflow: A code laundering platform? In: Proc. of the 24th IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). Klagenfurt: IEEE, 2017. 283–293.
    [15] Gómez C, Cleary B, Singer L. A study of innovation diffusion through link sharing on stack overflow. In: Proc. of the 10th Working Conf. on Mining Software Repositories (MSR). San Francisco: IEEE, 2013. 81–84.
    [16] Linares-Vásquez M, Dit B, Poshyvanyk D. An exploratory analysis of mobile development issues using stack overflow. In: Proc. of the 10th Working Conf. on Mining Software Repositories (MSR). San Francisco: IEEE, 2013. 93–96.
    [17] Wang W, Godfrey MW. Detecting API usage obstacles: A study of iOS and android developer questions. In: Proc. of the 10th Working Conf. on Mining Software Repositories (MSR). San Francisco: IEEE, 2013. 61–64.
    [18] Zhang HX, Wang SW, Chen TH, Zou Y, Hassan AE. An empirical study of obsolete answers on Stack Overflow. IEEE Trans. on Software Engineering, 2021, 47(4): 850–862. [doi: 10.1109/TSE.2019.2906315
    [19] Ren XX, Xing ZC, Xia X, Li GQ, Sun JL. Discovering, explaining and summarizing controversial discussions in community Q&A sites. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). San Diego: IEEE, 2019. 151–162.
    [20] Singh P, Chopra R, Sharma O, et al. Stackoverflow tag prediction using tag associations and code analysis. Journal of Discrete Mathematical Sciences and Cryptography, 2020, 23(1): 35–43.
    [21] Kim Y. Convolutional neural networks for sentence classification. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha: Association for Computational Linguistics, 2014. 1746–1751.
    [22] Tausczik YR, Pennebaker JW. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 2010, 29(1): 24–54. [doi: 10.1177/0261927X09351676
    [23] McHaney R, Tako A, Robinson S. Using LIWC to choose simulation approaches: A feasibility study. Decision Support Systems, 2018, 111: 1–12. [doi: 10.1016/j.dss.2018.04.002
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

蒋竞,苗萌,赵丽娴,张莉.软件问答社区的问题删除预测方法.软件学报,2022,33(5):1699-1710

Copy
Share
Article Metrics
  • Abstract:1015
  • PDF: 4246
  • HTML: 2728
  • Cited by: 0
History
  • Received:August 10,2021
  • Revised:October 09,2021
  • Online: January 28,2022
  • Published: May 06,2022
You are the first2033168Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063