Automatic Generation of Large-Granularity Pull Request Description

doi:10.13328/j.cnki.jos.006239

微信服务号

微信订阅号

2025-4-21- 12

Home > Archive>Volume 32, Issue 6, 2021 >1597-1611. DOI:10.13328/j.cnki.jos.006239

PDF HTML XML Export Cite reminder

Automatic Generation of Large-Granularity Pull Request Description
DOI:
                        10.13328/j.cnki.jos.006239
                    
Author:
                        KUANG LiKUANG Li
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHI Ru-YiSHI Ru-Yi
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHAO Lei-HaoZHAO Lei-Hao
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG HuanZHANG Huan
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GAO Hong-HaoGAO Hong-Hao
School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Key R & D Program of China (2018YFB1003800); National Natural Science Foundation of China (61772560)

Article

Figures

Metrics

Reference [56]

Related [20]

Cited by

Materials

Comments

Abstract:

In GitHub platform, many project contributors often ignore the descriptions of pull requests (PRs) when submitting PRs, making their PRs easily neglected or rejected by reviewers. Therefore, it is necessary to generate PR descriptions automatically to help increase PR pass rate. The performances of existing PR description generation methods are usually affected by PR granularity, so it is difficult to generate descriptions for large-granularity PRs effectively. For such reasons, this work focuses on generating descriptions for large-granularity PRs. The text information is first preprocessed in PR and word-sentence heterogeneous graphs are constructed where the words are used as secondary nodes, so as to establish the connections between PR sentences. Subsequently, feature extraction is performed on the heterogeneous graphs, and then the features are input into graph neural network for further graph representation learning, from which the sentence nodes can learn more abundant content information through message delivery between nodes. Finally, the sentences with key information are selected to form a PR description. In addition, the supervised learning method cannot be used for training due to the lack of manually labeled tags in the dataset, therefore, reinforcement learning is used to guide the generation of PR descriptions. The goal of model training is minimizing the negative expectation of rewards, which does not require the ground truth and directly improves the performance of the results. The experiments are conducted on real dataset and the experimental results show that the proposed method is superior to existing methods in F1 and readability.

Key words:Pull Request description;heterogeneous graph neural network;reinforcement learning;unstructured document;summarization generation

Reference

[1] https://github.com

[2] Georgios G, Storey MA, Bacchelli A. Work practices and challenges in pull-based development:the contributor's perspective. In:Kellenberger P, ed. Proc. of the 38th Int'l Conf. on Software Engineering (ICSE). Austin:IEEE, 2016. 285-296.[doi:10.1145/2884781.2884826]

[3] Liu ZX, Xia X, Treude C, Lo D, Li SP. Automatic generation of pull request descriptions. In:Proc. of the 34th IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). San Diego:IEEE, 2019. 176-188.[doi:10.1109/ASE.2019.00026]

[4] Zhong M, Liu PF, Wang DQ, Qiu XP, Huang XJ. Searching for effective neural extractive summarization:What works and what's next. In:Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence:Association for Computational Linguistics, 2019. 1049-1058.[doi:10.18653/v1/P19-1100]

[5] Zhou Z, Pan HJ, Fan CJ, Liu Y, Li LL, Yang M, Cai D. Abstractive meeting summarization via hierarchical adaptive segmental network learning. In:Proc. of the World Wide Web Conf. (WWW 2019). New York:Association for Computing Machinery, 2019. 455-3461.[doi:https://doi.org/10.1145/3308558.3313619]

[6] Kedzie C, Kathleen M, Hal D. Content selection in deep learning models of summarization. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. Brussels:Association for Computational Linguistics, 2018. 1818-1828.[doi:10.18653/v1/D18-1208]

[7] Wang DQ, Liu PF, Zheng YY, Qiu XP, Huang XJ. Heterogeneous graph neural networks for extractive document summarization. In:Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. 6209-6219.[doi:10.18653/v1/2020.acl-main.553]

[8] Soares DM, de Lima Júnior ML, Murta L, Plastino A. Acceptance factors of pull requests in open-source projects. In:Proc. of the 30th Annual ACM Symp. on Applied Computing. New York:Association for Computing Machinery, 2015. 1541-1546.[doi:https://doi.org/10.1145/2695664.2695856]

[9] Chen D, Stolee KT, Menzies T. Replication can improve prior results:A Github study of pull request acceptance. In:Proc. of the 27th Int'l Conf. on Program Comprehension (ICPC). Montreal:IEEE 2019. 179-190.

[10] Terrell J, Kofink A, Middleton J, Rainear C, Murphy-Hill E, Parnin C, Stallings J. Gender differences and bias in open source:Pull request acceptance of women versus men. PeerJ Computer Science, 2017,3:e111.[doi:https://doi.org/10.7717/peerj-cs.111]

[11] Jiang J, Yang Y, He J, Blanc X, Zhang L. Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development. Information and Software Technology, 2017,84:48-62.

[12] Maddila C, Bansal C, Nagappan N. Predicting pull request completion time:A case study on large scale cloud services. In:Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. New York:Association for Computing Machinery, 2019. 874-882.

[13] van der Veen E, Gousios G, Zaidman A. Automatically prioritizing pull requests. In:Proc. of the 12th Working Conf. on Mining Software Repositories. Florence:IEEE, 2015. 357-361.[doi:10.1109/MSR.2015.40]

[14] Yu S, Xu L, Zhang Y, Wu JS, Liao ZF, Li YB. NBSL:A supervised classification model of pull request in Github. In:Proc. of the IEEE Int'l Conf. on Communications (ICC). Kansas City:IEEE, 2018. 1-6.[doi:10.1109/ICC.2018.8422103]

[15] Xia X, Lo D, Wang X, Yang XH. Who should review this change? Putting text and file location analyses together for more accurate recommendations. In:Proc. of the Int'l Conf. on Software Maintenance and Evolution (ICSME). Bremen:IEEE, 2015. 261-270.[doi:10.1109/ICSM.2015.7332472]

[16] Zanjani MB, Kagdi H, Bird C. Automatically recommending peer reviewers in modern code review. IEEE Trans. on Software Engineering, 2016,42(6):530-543.[doi:10.1109/TSE.2015.2500238]

[17] Lu S, Yang D, Hu J, Zhang X. Code reviewer recommendation based on time and impact factor for pull request in Github. Computer Systems Applications, 2016,25(12):155-161(in Chinese with English abstract).

[18] Liao ZF, Wu ZX, Wu JS, Zhang Y, Liu JY, Long J. TIRR:A code reviewer recommendation algorithm with topic model and reviewer influence. In:Proc. of the 2019 IEEE Global Communications Conf. (GLOBECOM). Waikoloa:IEEE, 2019. 1-6.

[19] Mihalcea R, Tarau P. Textrank:Bringing order into text. In:Proc. of the 2004 Conf. on Empirical Methods in Natural Language Processing. Barcelona:Association for Computational Linguistics, 2004. 404-411.

[20] Page L, Brin S, Motwani R, Winograd T. The PageRank Citation Ranking:Bringing Order to the Web. Stanford InfoLab, 1999.

[21] Nallapati R, Zhai FF, Zhou BW. SummaRuNNer:A recurrent neural network based sequence model for extractive summarization of documents. In:Proc. of the 31st AAAI Conf. on Artificial Intelligence (AAAI 2017). San Francisco:AAAI, 2017. 3075-3081.

[22] Liu Y, Lapata M. Text summarization with pretrained encoders. In:Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int'l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong:Association for Computational Linguistics, 2019. 3730-3740.[doi:10.18653/v1/D19-1387]

[23] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In:Proc. of the 3rd Int'l Conf. on Learning Representations (ICLR 2015). 2015.

[24] See A, Liu PJ, Manning CD. Get to the point:Summarization with pointer-generator networks. In:Proc. of the Annual Meeting of the Association for Computational Linguistics (Vol.1:Long Papers). Vancouver:Association for Computational Linguistics, 2017. 1073-1083.[doi:10.18653/v1/P17-1099]

[25] Gehrmann S, Deng YT, Rush AM. Bottom-Up abstractive summarization. In:Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels:Association for Computational Linguistics, 2018. 4098-4109.

[26] Liu F, Flanigan J, Thomson S, Sadeh N, Smith NA. Toward abstractive summarization using semantic representations. In:Proc. of the 2015 Conf. of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Denver:Association for Computational Linguistics, 2015. 1077-1086.

[27] Yu LT, Zhang WN, Yu Y. Seqgan:Sequence generative adversarial nets with policy gradient. In Proc. of the 31st AAAI Conf. on Artificial Intelligence (AAAI 2017). San Francisco:AAAI, 2017. 2852-2858.

[28] Fumio N, Nakano YI., Takase Y. Predicting meeting extracts in group discussions using multimodal convolutional neural networks. In:Proc. of the 19th ACM Int'l Conf. on Multimodal Interaction. New York:Association for Computing Machinery, 2017. 421-425.[doi:https://doi.org/10.1145/3136755.3136803]

[29] Pan H, Zhou JP, Zhao Z, Liu Y, Cai D, Yang M. Dial2desc:End-to-end dialogue description generation. arXiv preprint arXiv:1811. 00185, 2018.

[30] Liu CY, Wang P, Xu J, Li Z, Ye JP. Automatic dialogue summary generation for customer service. In:Proc. of the 25th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining. New York:Association for Computing Machinery, 2019. 1957-1965.[doi:https://doi.org/10.1145/3292500.3330683]

[31] Tao X, Zhang XX, Guo SL, Zhang LM. Automatic summarization of user-generated content in academic Q&A community based on Word2Vec and MMR. Data Analysis and Knowledge Discovery, 2020,4(4):109-118(in Chinese with English abstract).

[32] Kyunghyun C, Merriënboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In:Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha:Association for Computational Linguistics, 2014. 1724-1734.

[33] Jiang SY, Armaly A, McMillan C. Automatically generating commit messages from diffs using neural machine translation. In:Proc. of the 201732nd IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). Urbana:IEEE, 2017. 135-146.

[34] Xu SB, Yao Y, Xu F, Gu TX, Tong HH, Lu J. Commit message generation for source code changes. In:Proc. of the 28th Int'l Joint Conf. on Artificial Intelligence (IJCAI). 2019. 3975-3981.

[35] Hu X, Li G, Xia X, Lo D, Jin Z. Deep code comment generation. In:Proc. of the 2018 IEEE/ACM 26th Int'l Conf. on Program Comprehension (ICPC). Gothenburg:IEEE, 2018. 200-210.

[36] Hu X, Li G, Xia X, Lo D, Jin Z. Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering, 2020,25:2179-2217.[doi:https://doi.org/10.1007/s10664-019-09730-9]

[37] Alon U, Zilberstein M, Levy O, Yahav E. code2vec:Learning distributed representations of code. Proc. of the ACM on Programming Languages, 2019,3:1-29.[doi:https://doi.org/10.1145/3290353]

[38] Ye DH, Xing ZC, Foo CY, Ang ZQ, Li J, Kapre N. Software-Specific named entity recognition in software engineering social content. In:Proc. of the 2016 IEEE 23rd Int'l Conf. on Software Analysis, Evolution, and Reengineering (SANER). Suita:IEEE, 2016. 90-101.[doi:10.1109/SANER.2016.10]

[39] Markovtsev V, Long W, Bulychev E, Keramitas R, Slavnov K, Markowski G. Splitting source code identifiers using bidirectional LSTM recurrent neural network. arXiv preprint arXiv:1805.11651, 2018.

[40] Ferrari A., Esuli A. An NLP approach for cross-domain ambiguity detection in requirements engineering. Automated Software Engineering, 2019,26:559-598.[doi:https://doi.org/10.1007/s10515-019-00261-7]

[41] Chen H, Damevski K, Shepherd D, Kraft NA. Modeling hierarchical usage context for software exceptions based on interaction data. Automated Software Engineering, 2019,26:733-756.[doi:https://doi.org/10.1007/s10515-019-00265-3]

[42] Alreshedy K, Dharmaretnam D, German DM, Srinivasan V, Gulliver TA. SCC++:Predicting the programming language of questions and snippets of StackOverflow. Journal of Systems and Software, 2020,162:110505.[doi:10.1016/j.jss.2019.110505]

[43] Hao R, Feng Y, Jones JA, Li YY, Chen ZY. CTRAS:Crowdsourced test report aggregation and summarization. In:Proc. of the 2019 IEEE/ACM 41st Int'l Conf. on Software Engineering (ICSE). Montreal:IEEE, 2019. 900-911.

[44] Shi CY, Xu CJ, Yang XJ. Study of TFIDF algorithm. Journal of Computer Applications, 2009,29(z1):167-170, 180(in Chinese with English abstract).

[45] Pennington J, Socher R, Manning CD. Glove:Global vectors for word representation. In:Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha:Association for Computational Linguistics, 2014. 1532-1543.[doi:10.3115/v1/D14-1162]

[46] Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.

[47] de Boer PT, Kroese DP, Mannor S, Rubinstein RY. A tutorial on the cross-entropy method. Annals of Operations Research, 2005, 134(1):19-67.

[48] Lin CY. Rouge:A package for automatic evaluation of summaries. In:Proc. of the Text Summarization Branches Out. Barcelona:Association for Computational Linguistics, 2004. 74-81.

[49] Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn, 1992,8:229-256. https://doi.org/10.1007/BF00992696

[50] Zhang XX, Lapata M, Wei FR, Zhou M. Neural latent extractive document summarization. In:Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels:Association for Computational Linguistics, 2018. 779-784.[doi:10.18653/v1/D18-1088]

[51] Ranzato MA, Chopra S, Auli M, Zaremba W. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511. 06732[cs.LG], 2015.

[52] Narayan S, Cohen SB, Lapata M. Ranking sentences for extractive summarization with reinforcement learning. In:Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Vol.1(Long Papers). New Orleans:Association for Computational Linguistics, 2018. 1747-1759.

附中文参考文献:

[17] 卢松,杨达,胡军,张潇.基于时间和影响力因子的Github Pull Request评审人推荐.计算机系统应用,2016,25(12):155-161.

[31] 陶兴,张向先,郭顺利,等.学术问答社区用户生成内容的W2V-MMR自动摘要方法研究.数据分析与知识发现,2020,4(4):109-118.

[44] 施聪莺,徐朝军,杨晓江.TFIDF算法研究综述.计算机应用,2009,29(S1):167-170, 180.

Get Citation

邝砾,施如意,赵雷浩,张欢,高洪皓.大粒度Pull Request描述自动生成.软件学报,2021,32(6):1597-1611

Copy

Article Metrics

Abstract:1921
PDF: 5284
HTML: 3042
Cited by: 0

History

Received:August 09,2020
Revised:October 26,2020
Adopted:
Online: February 07,2021
Published: June 06,2021

You are the first2036634Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History