Method Name Recommendation Based on Source Code Depository and Feature Matching
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61272169, 61472034, 61003065, 61371194); National Program on Key Basic Research Project of China (973 Program) (2013CB329303); Program for New Century Excellent Talents in University of Ministry of Education of China (NCET-13-0041); Beijing Higher Education Young Elite Teacher Project (YETP1183); Major Scientific and Technological Projects of Press and Publication, China (GAPP_ZDKJ_ BQ/01)

  • Article
  • | |
  • Metrics
  • |
  • Reference [47]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Quality of method names is critical for the readability and maintainability of program. However, it is difficult for software engineers, especially non-English speaking, inexperienced engineers, to propose high quality method names. To address this issue, this paper proposes an approach to recommend method names. First, a method corpus is constructed from open source applications. For a given method f to be named, similar methods are retrieved from the method corpus. Names of these retrieved methods are divided into phrases, and features of these methods are extracted as well. A mapping between these phrases and features is also created to derive a list of candidate phrases and features for the method to be named. These phrases are finally constructed into candidate method names. The proposed approach is evaluated on 1430 methods in open source applications. Evaluation results suggest that 22.7 percent of recommended method names are the same as original ones, and 57.9 percent has the same or almost the same keywords as original ones.

    Reference
    [1] Boehm B, Basili VR. Software defect reduction top 10 list. Computer, 2001,34(1):135-137.[doi:10.1109/2.962984]
    [2] Von Mayrhauser A, Vans AM. Program understanding behavior during debugging of large scale software. In:Proc. of the Papers Presented at the 7th Workshop on Empirical Studies of Programmers. New York:ACM Press, 1997. 157-179.[doi:10.1145/266399.266414]
    [3] Deissenboeck F, Pizka M. Concise and consistent naming. Software Quality Journal, 2006,14:261-282.[doi:10.1007/s11219-006- 9219-1]
    [4] Caprile C, Tonella P. Nomen est omen:Analyzing the language of function identifiers. In:Proc. of the 6th Working Conf. on Reverse Engineering. 1999. 112-122.[doi:10.1109/WCRE.1999.806952]
    [5] Butler S. The effect of identifier naming on source code readability and quality. In:Proc. of the Doctoral Symp. for ESEC/FSE on Doctoral Symp.(ESEC/FSE Doctoral Symp. 2009). New York:ACM Press, 2009. 33-34.[doi:10.1145/1595782.1595796]
    [6] Blinman S, Cockburn A. Program comprehension:Investigating the effects of naming style and documentation. In:Proc. of the 6th Australasian Conf. on User Interface(AUIC 2005), Vol.40. Darlinghurst:Australian Computer Society, Inc., 2005. 73-78.
    [7] Høst EW. Understanding programmer language. In:Proc. of the Companion to the 22nd ACM SIGPLAN Conf. on Object-Oriented Programming Systems and Applications Companion(OOPSLA 2007). New York:ACM Press, 2007. 943-944.[doi:10.1145/1297846.1297957]
    [8] Butler S, Wermelinger M, Yu Y, Sharp H. Exploring the influence of identifier names on code quality:An empirical study. In:Proc. of the 201014th European Conf. on Software Maintenance and Reengineering(CSMR 2010). Washington:IEEE Computer Society, 2010. 156-165.[doi:10.1109/CSMR.2010.27]
    [9] Gao Y, Liu H, Fan XZ, Niu ZD, Shao WZ. Research on resolution sequence of bad smells. Ruan Jian Xue Bao/Journal of Software, 2012,23(8):1965-1977(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4152.htm[doi:10.3724/SP.J.1001. 2012.04152]
    [10] Kuhn A. On recommending meaningful names in source and UML. In:Proc. of the 2nd Int'l Workshop on Recommendation Systems for Software Engineering(RSSE 2010). New York:ACM Press, 2010. 50-51.[doi:10.1145/1808920.1808932]
    [11] Host E, Ostvold B. The programmer's lexicon. Vol.i:The verbs. In:Proc. of the 7th IEEE Int'l Working Conf. on Source Code Analysis and Manipulation(SCAM 2007). 2007. 193-202.[doi:10.1109/SCAM.2007.18]
    [12] Broder A. On the resemblance and containment of documents. In:Proc. of the Compression and Complexity of Sequences 1997(SEQUENCES'97). Washington:IEEE Computer Society, 1997. 21.[doi:10.1109/SEQUEN.1997.666900]
    [13] Toutanova K, Klein D, Manning CD, Singer Y. Feature-Rich part-of-speech tagging with a cyclic dependency network. In:Proc. of the 2003 Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology(NAACL 2003), Vol.1. Stroudsburg, 2003. 173-180.[doi:10.3115/1073445.1073478]
    [14] Toutanova K, Manning CD. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In:Proc. of the 2000 Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora:Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics(EMNLP 2000), Vol.13. Stroudsburg, 2000. 63-70.[doi:10.3115/1117794.1117802]
    [15] Forman G. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003,3:1289-1305.
    [16] Pei YB, Liu XX. Study on improved CHI for feature selection in Chinese text categorization. Computer Engineering and Applications, 2011,47(4):128-130,194(in Chinese with English abstract).[doi:10.3778/j.issn.1002-8331.2011.04.035]
    [17] Asmussen J. Survey of pos taggers. Technical Report, DK-CLARIN, 2011.
    [18] Eugenie Giesbrecht SE. Is part-of-speech(pos) tagging-A solved task? an evaluation of pos taggers for the Web as corpus. In:Proc. of the 5th Web as Corpus Workshop(WAC5). Donostia-San Sebastin, 2009. 27-35.
    [19] Wordnet. http://wordnet.princeton.edu/wordnet/
    [20] Abebe S, Tonella P. Natural language parsing of program element names for concept extraction. In:Proc. of the 2010 IEEE 18th Int'l Conf. on Program Comprehension(ICPC). 2010. 156-159.[doi:10.1109/ICPC.2010.29]
    [21] Binkley D, Hearn M, Lawrie D. Improving identifier informativeness using part of speech information. In:Proc. of the 8th Working Conf. on Mining Software Repositories(MSR 2011). New York:ACM Press, 2011. 203-206.[doi:10.1145/1985441.1985471]
    [22] Butler S, Wermelinger M, Yu Y, Sharp H. Mining java class naming conventions. In:Proc. of the 201127th IEEE Int'l Conf. on Software Maintenance(ICSM). 2011. 93-102.[doi:10.1109/ICSM.2011.6080776]
    [23] Hu Y. Text feature selection method based on the information gain. Computer & Digital Engineering, 2013,(3):460-462(in Chinese with English abstract).[doi:10.3969/j.issn.1672-9722.2013.03.039]
    [24] Ren YG, Yang RJ, Yin MF, Ma MW. Information-Gain-Based text feature selection method. Computer Science, 2012,(11):127-130(in Chinese with English abstract).[doi:10.3969/j.issn.1002-137X.2012.11.029]
    [25] Liu L, Kang J, Yu J, Wang Z. A comparative study on unsupervised feature selection methods for text clustering. In:Proc. of the 2005 IEEE Int'l Conf. on Natural Language Processing and Knowledge Engineering(IEEE NLP-KE 2005). 2005. 597-601.[doi:10.1109/NLPKE.2005.1598807]
    [26] Fan DH, Wang ZH, Chen JH, Xu HY. Improved feature selection algorithm based on DF algorithm for text clustering. Journal of Gansu Lianhe University(Natural Sciences), 2012,(1):51-54(in Chinese with English abstract).[doi:10.3969/j.issn.1672- 691X.2012.01.014]
    [27] Wang G, Lochovsky FH. Feature selection with conditional mutual information maximin in text categorization. In:Proc. of the 13th ACM Int'l Conf. on Information and Knowledge Management(CIKM 2004). New York:ACM Press, 2004. 342-349.[doi:10.1145/1031171.1031241]
    [28] Yang Y, Pedersen JO. A comparative study on feature selection in text categorization. In:Proc. of the 14th Int'l Conf. on Machine Learning(ICML'97). San Francisco:Morgan Kaufmann Publishers, 1997. 412-420.
    [29] Liu T, Liu S, Chen Z. An evaluation on feature selection for text clustering. In:Proc. of the ICML. 2003. 488-495.
    [30] Høst EW, Østvold BM. The Java programmer's phrase book. In:Proc. of the 1st Int'l Conf. on Software Language Engineering(SLE 2008). Springer-Verlag, 2008.[doi:10.1007/978-3-642-00434-6_20]
    [31] Lawrie D, Morrell C, Feild H, Binkley D. What's in a name? A study of identifiers. In:Proc. of the 14th IEEE Int'l Conf. on Program Comprehension. Washington:IEEE Computer Society, 2006. 3-12.[doi:10.1109/ICPC.2006.51]
    [32] Lin QS, Xie GD. On programming style and robustness in C/C++. Journal of Putian University, 2002,(3):40-44(in Chinese with English abstract).[doi:10.3969/j.issn.1672-4143.2002.03.011]
    [33] Cao N. Research on Code Tidiness and Quality. Software Guide, 201,(10):38-40(in Chinese with English abstract).
    [34] Jing YS. An evalution approach of identifier quality based on lexical rules[MS. Thesis]. Harbin:Harbin Institute of Technology, 2011(in Chinese).
    [35] Høst EW, Østvold BM. Debugging method names. In:Proc. of the 23rd European Conf. on Object-Oriented Programming. ser.(ECOOP 2009). Berlin, Heidelberg:Springer-Verlag, 2009. 294-317.[doi:10.1007/978-3-642-03013-0_14]
    [36] Rajlich V, Wilde N. The role of concepts in program comprehension. In:Proc. of the IWPC 2002. 2002. 271-278.
    [37] Hill E. Integrating natural language and program structure information to improve software search and exploration[Ph.D. Thesis]. Newark, 2010.
    [38] Thies A, Roth C. Recommending rename refactorings. In:Proc. of the 2nd Int'l Workshop on Recommendation Systems for Software Engineering(RSSE 2010). New York:ACM Press, 2010. 1-5.[doi:10.1145/1808920.1808921]
    附中文参考文献:
    [9] 高原,刘辉,樊孝忠,牛振东,邵维忠.代码坏味的处理顺序.软件学报,2012,23(8):1965-1977. http://www.jos.org.cn/1000-9825/4152.htm[doi:10.3724/SP.J.1001.2012.04152]
    [16] 裴英博,刘晓霞.文本分类中改进型CHI特征选择方法的研究.计算机工程与应用,2011,47(4):128-130,194.[doi:10.3778/j.issn. 1002-8331.2011.04.035]
    [23] 胡颖.基于信息增益的文本特征选择方法.计算机与数字工程,2013,(3):460-462.[doi:10.3969/j.issn.1672-9722.2013.03.039]
    [24] 任永功,杨荣杰,尹明飞,马明威.基于信息增益的文本特征选择方法.计算机科学,2012,(11):127-130.[doi:10.3969/j.issn.1002- 137X.2012.11.029]
    [26] 樊东辉,王治和,陈建华,许虎寅.基于DF算法改进的文本聚类特征选择算法.甘肃联合大学学报(自然科学版),2012,(1):51-54.[doi:10.3969/j.issn.1672-691X.2012.01.014]
    [32] 林秋申,谢国栋.C/C++的编程风格与强壮性的探讨.莆田学院学报,2002,(3):40-44.[doi:10.3969/j.issn.1672-4143.2002.03.011]
    [33] 曹娜.代码整洁与代码质量研究.软件导刊,2013,(10):38-40.
    [34] 运思婧.基于词性规则的软件标识符质量评价方法[硕士学位论文].哈尔滨:哈尔滨工业大学,2011.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

高原,刘辉,樊孝忠,牛振东.基于代码库和特征匹配的函数名称推荐方法.软件学报,2015,26(12):3062-3074

Copy
Share
Article Metrics
  • Abstract:3392
  • PDF: 5598
  • HTML: 1825
  • Cited by: 0
History
  • Received:December 03,2013
  • Revised:January 08,2015
  • Online: December 04,2015
You are the first2035271Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063