Empirical Study on Pull-request Revisions in Open Source Software Community of TensorFlow
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [34]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The recent boom in artificial intelligence (AI) benefits from the open collaboration of the open source software (OSS) community. An increasing number of OSS developers are contributing to AI projects by submitting pull requests (PRs). However, the PR quality submitted by external contributors varies, and the AI project management teams have to review PRs and ask contributors to revise them if necessary. Since the revision exerts a direct impact on the review efficiency and acceptance of PRs, it is important to achieve a better understanding of PR revisions. This study conducts an empirical study based on a set of PRs and their revision histories collected from the TensorFlow project. It first manually analyzes a sample of commit messages, reviews PR comments, and constructs a taxonomy of revision types. Then, according to the defined taxonomy, a large set of PR revisions are manually labeled. Based on the dataset, the frequency and order of each type of revision are explored. Additionally, this study also investigates the frequency distribution, order distribution, and correlation relationship between different types of revisions. The empirical findings show that there are 11 different types of revisions which can be classified into three categories. Evolvability revisions occur more frequently than other revision types, and functional revisions are more likely to occur in the early PR updates than evolvability revisions and other types of revisions. Structure-related revisions have a high chance to co-occur or adj-occur with other revisions. Configuration-related revisions or rebasing revisions are more likely to appear in succession. The empirical results can help AI open source practitioners and researchers better understand the PR revision process, especially guide the review and revision behaviors of PRs and improve the collaborative efficiency of open source groups.

    Reference
    [1] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770-778.
    [2] Abadi M, Barham P, Chen JM, Chen ZF, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng XQ. TensorFlow: A system for large-scale machine learning. In: Proc. of the 12th USENIX Conf. on Operating Systems Design and Implementation. Savannah: USENIX Association, 2016. 265-283.
    [3] Hamet P, Tremblay J. Artificial intelligence in medicine. Metabolism, 2017, 69: S36-S40. [doi: 10.1016/j.metabol.2017.01.011
    [4] Chassignol M, Khoroshavin A, Klimova A, Bilyatdinova A. Artificial intelligence trends in education: A narrative overview. Procedia Computer Science, 2018, 136: 16-24. [doi: 10.1016/j.procs.2018.08.233
    [5] Gousios G, Pinzger M, van Deursen A. An exploratory study of the pull-based software development model. In: Proc. of the 36th Int’l Conf. on Software Engineering. Hyderabad: ACM, 2014. 345-355.
    [6] Zhu JX, Zhou MH, Mockus A. Effectiveness of code contribution: From patch-based to pull-request-based tools. In: Proc. of the 24th ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering. Seattle: ACM, 2016. 871-882.
    [7] Vasilescu B, Yu Y, Wang HM, Devanbu P, Filkov V. Quality and productivity outcomes relating to continuous integration in GitHub. In: Proc. of the 10th Joint Meeting on Foundations of Software Engineering. Bergamo: ACM, 2015. 805-816.
    [8] Yu Y, Wang HM, Yin G, Wang T. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and Software Technology, 2016, 74: 204-218. [doi: 10.1016/j.infsof.2016.01.004
    [9] Nadri R, Rodriguezperez G, Nagappan M. On the relationship between the developer’s perceptible race and ethnicity and the evaluation of contributions in OSS. IEEE Trans. on Software Engineering, 2021, 48(8): 2955-2968.
    [10] Chacon S, Straub B. Pro Git. 2nd ed., Berkeley: Apress, 2014.
    [11] Gousios G, Zaidman A, Storey MA, van Deursen A. Work practices and challenges in pull-based development: The integrator’s perspective. In: Proc. of the 37th IEEE/ACM IEEE Int’l Conf. on Software Engineering. Florence: IEEE, 2015. 358-368.
    [12] Dabbish L, Stuart C, Tsay J, Herbsleb J. Social coding in GitHub: Transparency and collaboration in an open software repository. In: Proc. of the 2012 ACM Conf. Computer Supported Cooperative Work. Seattle: ACM, 2012. 1277-1286.
    [13] Jiang J, Yang Y, He JH, Blanc X, Zhang L. Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development. Information and Software Technology, 2017, 84: 48-62. [doi: 10.1016/j.infsof.2016.10.006
    [14] Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub. In: Proc. of the 36th Int’l Conf. on Software Engineering. Hyderabad: ACM, 2014. 356-366.
    [15] Tsay J, Dabbish L, Herbsleb J. Let’s talk about it: Evaluating contributions through discussion in GitHub. In: Proc. of the 22nd ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering. Hong Kong: ACM, 2014. 144-154.
    [16] Ford D, Behroozi M, Serebrenik A, Parnin C. Beyond the code itself: How programmers really look at pull requests. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering: Software Engineering in Society. Montreal: IEEE, 2019. 51-60.
    [17] Chillarege R, Bhandari IS, Chaar JK, Halliday MJ, Moebus DS, Ray BK, Wong MY. Orthogonal defect classification—A concept for in-process measurements. IEEE Trans. on Software Engineering, 1992, 18(11): 943-956. [doi: 10.1109/32.177364
    [18] El Emam K, Wieczorek I. The repeatability of code defect classifications. In: Proc. of the 9th Int’l Symp. on Software Reliability Engineering. Paderborn: IEEE, 1998. 322-333.
    [19] Mäntylä MV, Lassenius C. What types of defects are really discovered in code reviews? IEEE Trans. on Software Engineering, 2009, 35(3): 430-448. [doi: 10.1109/TSE.2008.71
    [20] Beller M, Bacchelli A, Zaidman A, Juergens E. Modern code reviews in open-source projects: Which problems do they fix? In: Proc. of the 11th Working Conf. on Mining Software Repositories. Hyderabad: ACM, 2014. 202-211.
    [21] Panichella S, Zaugg N. An empirical investigation of relevant changes and automation needs in modern code review. Empirical Software Engineering, 2020, 25(6): 4833-4872. [doi: 10.1007/s10664-020-09870-3
    [22] Tan X, Zhou MH, Sun ZY. A first look at good first issues on GitHub. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Virtual Event: ACM, 2020. 398-409.
    [23] Jiang J, Lo D, Ma XY, Feng FL, Zhang L. Understanding inactive yet available assignees in GitHub. Information and Software Technology, 2017, 91: 44-55. [doi: 10.1016/j.infsof.2017.06.005
    [24] Kalliamvakou E, Gousios G, Blincoe K, Singer L. The promises and perils of mining GitHub. In: Proc. of the 11th Working Conf. on Mining Software Repositories. Hyderabad: ACM, 2014. 92-101.
    [25] Borges H, Hora A, Valente MT. Understanding the factors that impact the popularity of GitHub repositories. In: Proc. of the 2016 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Raleigh: IEEE, 2016. 334-344.
    [26] Fusch PI, Ness LR. Are we there yet? Data saturation in qualitative research. The Qualitative Report, 2015, 20(9): 1408-1416
    [27] Li ZX, Yu Y, Wang T, Yin G, Li SS, Wang HM. Are you still working on this? An empirical study on pull request abandonment. IEEE Trans. on Software Engineering, 2022, 48(6): 2173-2188. [doi: 10.1109/TSE.2021.3053403
    [28] Yu Y, Yin G, Wang T, Yang C, Wang HM. Determinants of pull-based development in the context of continuous integration. Science China Information Sciences, 2016, 59(8): 080104. [doi: 10.1007/s11432-016-5595-8
    [29] Gousios G, Storey MA, Bacchelli A. Work practices and challenges in pull-based development: The contributor’s perspective. In: Proc. of the 38th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Austin: IEEE, 2016. 285-296.
    [30] Wilcoxon F. Individual comparisons by ranking methods. In: Kotz S, Johnson NL, eds. Breakthroughs in Statistics. New York: Springer, 1992. 196-202.
    [31] van der Veen E, Gousios G, Zaidman A. Automatically prioritizing pull requests. In: Proc. of the 12th IEEE/ACM Working Conf. on Mining Software Repositories. Florence: IEEE, 2015. 357-361.
    [32] Elazhary O, Storey MA, Ernst N, Zaidman A. Do as i do, not as i say: Do contribution guidelines match the GitHub contribution process? In: Proc. of the 2019 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Cleveland: IEEE, 2019. 286-290.
    [33] Pinto G, Steinmacher I, Gerosa MA. More common than you think: An in-depth study of casual contributors. In: Proc. of the 23rd IEEE Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER). Osaka: IEEE, 2016. 112-123.
    [34] Steinmacher I, Pinto G, Wiese IS, Gerosa MA. Almost there: A study on quasi-contributors in open-source software projects. In: Proc. of the 40th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Gothenburg: IEEE, 2018. 256-266.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李志星,余跃,王涛,蔡孟栾,王怀民. TensorFlow开源软件社区中贡献修订的实证研究.软件学报,2023,34(9):4056-4068

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 04,2022
  • Revised:October 13,2022
  • Online: January 13,2023
  • Published: September 06,2023
You are the first2044088Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063