Survey on Intelligent Search and Construction Methods of Program
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61690203, 61532007); National Program on Key Basic Research Project of China (973) (2014CB340703)

  • Article
  • | |
  • Metrics
  • |
  • Reference [91]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The rapid development of Internet, machine learning and artificial intelligence, as well as the appearance of a large number of open-source software and communities, has brought new opportunities and challenges to the development of software engineering. There are billions of lines of code on the Internet. These codes, especially those of high quality and widely used contains all kinds of knowledge, which has led to the new idea of intelligent software development. It tries to make full use of code resources, knowledge and collective intelligence on the Internet to effectively improve the efficiency and quality of software development. The key technology is program search and construction, providing great theoretical and practical value. At present, the research work of these areas mainly focuses on code search, program synthesis, code recommendation and completion, defect detection, code style improvement, and automatic program repair. This paper surveys the current main research work from the above aspects, sorts out the specific theoretical and technical approaches in detail and summarizes the challenges in the current research process. Several directions of research in the future are also proposed.

    Reference
    [1] GitHub. https://github.com/
    [2] Lakhani KR, Garvin DA, Lonstein E. Top coder (A):Developing software through crowdsourcing. In:Proc. of the Harvard Business School Case. 2016.
    [3] Stack Overflow. https://stackoverflow.com/
    [4] Schmidt DC. Model-Driven engineering. IEEE Computer, 2006,39(2):25-31.
    [5] Manna Z, Waldinger RJ. Toward automatic program synthesis. Communications of the ACM, 1971,14(3):151-165.[doi:10.1145/362566.362568]
    [6] Potvin R, Levenberg J. Why Google stores billions of lines of code in a single repository. Communications of the ACM, 2016,59(7):78-87.[doi:10.1145/2854146]
    [7] Vechev M, Yahav E. Programming with "Big Code". Foundations and Trends® in Programming Languages, 2016,3(4):231-284.[doi:10.1561/2500000028]
    [8] MUSE. http://www.darpa.mil/Our_Work/I2O/Programs/Mining_and_Understanding_Software_Enclaves_(MUSE).aspx/
    [9] PLINY. http://pliny.rice.edu/index.html/
    [10] Hindle A, Barr ET, Su Z, Gabel M, Devanbu P. On the naturalness of software. In:Proc. of the 34th Int'l Conf. on Software Engineering. IEEE, 2012. 837-847.[doi:10.1109/icse.2012.6227135]
    [11] Ray B, Hellendoorn V, Godhane S, Tu Z, Bacchelli A, Devanbu P. On the naturalness of buggy code. In:Proc. of the 38th Int'l Conf. on Software Engineering. ACM Press, 2016. 428-439.[doi:10.1145/2884781.2884848]
    [12] Xuan JF, Ren ZL, Wang ZY, Xie XY, Jiang H. Progress on approaches to automatic program repair. Ruan Jian Xue Bao/Journal of Software, 2016,27(4):771-784(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4972.htm[doi:10.13328/j.cnki. jos.004972]
    [13] Monperrus M. A critical review of automatic patch generation learned from human-written patches:Essay on the problem statement and the evaluation of automatic software repair. In:Proc. of the 36th Int'l Conf. on Software Engineering. ACM Press, 2014. 234-242.[doi:10.1145/2568225.2568324]
    [14] Singer J, Lethbridge T, Vinson N, Anquetil N. An examination of software engineering work practices. In:Proc. of the CASCON 1st Decade High Impact Papers. IBM Corp., 2010. 174-188.[doi:10.1145/1925805.1925815]
    [15] Krugle. http://www.krugle.com/
    [16] GrepCode. http://grepcode.com/
    [17] Chatterjee S, Juvekar S, Sen K. Sniff:A search engine for java using free-form queries. In:Proc. of the Int'l Conf. on Fundamental Approaches to Software Engineering. Berlin, Heidelberg:Springer-Verlag, 2009. 385-400.[doi:10.1007/978-3-642-00593-0_26]
    [18] Lü F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. CodeHow:Effective code search based on API understanding and extended boolean model. In:Proc. of the 30th IEEE/ACM Int'l Conf. on Automated Software Engineering. IEEE, 2015. 260-270.[doi:10. 1109/ase.2015.42]
    [19] Bajracharya S, Ngo T, Linstead E, Rigor P, Dou Y, Baldi P, Lopes C. Sourcerer:A search engine for open source code. In:Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications. 2006. 681-682.[doi:10.1145/1176617. 1176671]
    [20] Reiss SP. Semantics-Based code search. In:Proc. of the 31st Int'l Conf. on Software Engineering. IEEE, 2009. 243-253.[doi:10. 1109/icse.2009.5070525]
    [21] McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C. Portfolio:Finding relevant functions and their usage. In:Proc. of the 33rd Int'l Conf. on Software Engineering. ACM Press, 2011. 111-120.[doi:10.1145/1985793.1985809]
    [22] Thummalapenta S, Xie T. Parseweb:A programmer assistant for reusing open source code on the Web. In:Proc. of the 22nd IEEE/ACM Int'l Conf. on Automated Software Engineering. ACM Press, 2007. 204-213.[doi:10.1145/1321631.1321663]
    [23] Google code search. http://code.google.com/codesearch/
    [24] Google code search. https://swtch.com/~rsc/regexp/regexp4.html/
    [25] Bing code search. http://codesnippet.research.microsoft.com/
    [26] Bitbucket. https://bitbucket.org/
    [27] CodePlex. http://www.codeplex.com/
    [28] GitLab. https://about.gitlab.com/
    [29] SourceForge. https://sourceforge.net/
    [30] Vinayakarao V, Sarma A, Purandare R, Jain S, Jain S. Anne:Improving source code search using entity retrieval approach. In:Proc. of the 10th ACM Int'l Conf. on Web Search and Data Mining. ACM Press, 2017. 211-220.[doi:10.1145/3018661.3018691]
    [31] Leacock C, Chodorow M. Combining local context and WordNet similarity for word sense identification. WordNet:An Electronic Lexical Database, 1998,49(2):265-283.
    [32] Lu M, Sun X, Wang S, Lo D, Duan Y. Query expansion via wordnet for effective code search. In:Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution and Reengineering. IEEE, 2015. 545-549.[doi:10.1109/saner.2015.7081874]
    [33] Lemos OAL, de Paula AC, Zanichelli FC, Lopes CV. Thesaurus-Based automatic query expansion for interface-driven code search. In:Proc. of the 11th Working Conf. on Mining Software Repositories. ACM Press, 2014. 212-221.[doi:10.1145/2597073.2597087]
    [34] Nie L, Jiang H, Ren Z, Sun Z, Li X. Query expansion based on crowd knowledge for code search. IEEE Trans. on Services Computing, 2016,9(5):771-783.[doi:10.1109/tsc.2016.2560165]
    [35] Rahman MM, Roy CK, Lo D. RACK:Code search in the IDE using crowdsourced knowledge. In:Proc. of the 39th Int'l Conf. on Software Engineering Companion. IEEE, 2017. 51-54.[doi:10.1109/icse-c.2017.11]
    [36] Stolee KT. Finding suitable programs:Semantic search with incomplete and lightweight specifications. In:Proc. of the 34th Int'l Conf. on Software Engineering. 2012. 1571-1574.[doi:10.1109/icse.2012.6227034]
    [37] Wang S, Lo D, Jiang L. Active code search:Incorporating user feedback to improve code search relevance. In:Proc. of the 29th ACM/IEEE Int'l Conf. on Automated Software Engineering. ACM Press, 2014. 677-682.[doi:10.1145/2642937.2642947]
    [38] Gulwani S. Dimensions in program synthesis. In:Proc. of the 12th Int'l ACM SIGPLAN Symp. on Principles and Practice of Declarative Programming. ACM Press, 2010. 13-24.[doi:10.1145/1836089.1836091]
    [39] Gulwani S, Polozov O, Singh R. Program synthesis. Foundations and Trends® in Programming Languages, 2017,4(1-2):1-119.[doi:10.1561/2500000010]
    [40] Gulwani S. Automating string processing in spreadsheets using input-output examples. ACM SIGPLAN Notices, 2011,46(1):317-330.[doi:10.1145/1925844.1926423]
    [41] Gulwani S. Programming by examples (and its applications in data wrangling). In:Proc. of the Verification and Synthesis of Correct and Secure Systems. IOS Press, 2016.
    [42] Desai A, Gulwani S, Hingorani V, Jain N, Karkare A, Marron MRS, Roy S. Program synthesis using natural language. In:Proc. of the 38th Int'l Conf. on Software Engineering. ACM Press, 2016. 345-356.[doi:10.1145/2884781.2884786]
    [43] Raza M, Gulwani S, Milic-Frayling N. Compositional program synthesis from natural language and examples. In:Proc. of the 24th Int'l Joint Conf. on Artificial Intelligence. 2015. 792-800.
    [44] Manshadi MH, Gildea D, Allen JF. Integrating programming by example and natural language programming. In:Proc. of the 27th AAAI Conf. on Artificial Intelligence. 2013.
    [45] Manshadi M, Keenan C, Allen J. Using the crowd to do natural language programming. In:Proc. of the 26th AAAI Conf. on Artificial Intelligence, Workshop on Human-Computer Interaction (HCOMP-2012). 2012.
    [46] Feser JK, Chaudhuri S, Dillig I. Synthesizing data structure transformations from input-output examples. ACM SIGPLAN Notices, 2015,50(6):229-239.[doi:10.1145/2737924.2737977]
    [47] Zhai J, Huang J, Ma S, Zhang X, Tan L, Zhao J, Qin F. Automatic model generation from documentation for Java API functions. In:Proc. of the 38th Int'l Conf. on Software Engineering. IEEE, 2016. 380-391.[doi:10.1145/2884781.2884881]
    [48] Raghothaman M, Wei Y, Hamadi Y. SWIM:Synthesizing what I mean:Code search and idiomatic snippet synthesis. In:Proc. of the 38th Int'l Conf. on Software Engineering. ACM Press, 2016. 357-367.[doi:10.1145/2884781.2884808]
    [49] Wang Y, Feng Y, Martins R, Kaushik A, Dillig I, Reiss SP. Hunter:Next-Generation code reuse for Java. In:Proc. of the 24th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2016. 1028-1032.[doi:10.1145/2950290.2983934]
    [50] Feng Y, Martins R, Wang Y, Dillig I, Reps TW. Component-Based synthesis for complex APIs. In:Proc. of the 44th Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages. 2017. 599-612.[doi:10.1145/3093333.3009851]
    [51] Feng Y, Martins R, Van Geffen J, Dillig I, Chaudhuri S. Component-Based synthesis of table consolidation and transformation tasks from examples. In:Proc. of the 38th ACM SIGPLAN Conf. on Programming Language Design and Implementation. ACM Press, 2017. 422-436.[doi:10.1145/3062341.3062351]
    [52] Gong Q, Tian Y, Zitnick CL. Unsupervised program induction with hierarchical generative convolutional neural networks. In:Proc. of the 5th Int'l Conf. on Learning Representations. 2017.
    [53] Parisotto E, Mohamed A, Singh R, Li L, Zhou D, Kohli P. Neuro-Symbolic program synthesis. arXiv preprint arXiv:1611.01855, 2016.
    [54] Balog M, Gaunt AL, Brockschmidt M, Nowozin S, Tarlow D. DeepCoder:Learning to write programs. arXiv preprint arXiv:1611. 01989, 2016.
    [55] Gu X, Zhang H, Zhang D, Kim S. Deep API learning. In:Proc. of the 24th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2016. 631-642.[doi:10.1145/2950290.2950334]
    [56] Cochran RA, D'Antoni L, Livshits B, Molnar D, Veanes M. Program boosting:Program synthesis via crowd-sourcing. In:Proc. of the 42nd Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages. 2015. 677-688.[doi:10.1145/2676726.2676973]
    [57] Raychev V, Vechev M, Yahav E. Code completion with statistical language models. ACM SIGPLAN Notices, 2014,49(6):419-428.[doi:10.1145/2594291.2594321]
    [58] Tu Z, Su Z, Devanbu P. On the localness of software. In:Proc. of the 22nd ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2014. 269-280.[doi:10.1145/2635868.2635875]
    [59] Proksch S, Lerch J, Mezini M. Intelligent code completion with Bayesian networks. ACM Trans. on Software Engineering and Methodology, 2015,25(1):1-31.[doi:10.1145/2744200]
    [60] Bruch M, Monperrus M, Mezini M. Learning from examples to improve code completion systems. In:Proc. of the the 7th Joint Meeting of the European Software Engineering Conf. and the ACM SIGSOFT Symp. on the Foundations of Software Engineering. ACM Press, 2009. 213-222.[doi:10.1145/1595696.1595728]
    [61] Nguyen TT, Nguyen AT, Nguyen HA. A statistical semantic language model for source code. In:Proc. of the 9th Joint Meeting on Foundations of Software Engineering. 2013. 532-542.[doi:10.1145/2491411.2491458]
    [62] Nguyen AT, Nguyen TN. Graph-Based statistical language model for code. In:Proc. of the IEEE/ACM 37th IEEE Int'l Conf. on Software Engineering. 2015. 858-868.[doi:10.1109/icse.2015.336]
    [63] Nguyen AT, Hilton M, Codoban M, Nguyen JA, Mast L, Rademacher E, Nguyen TN Dig D. API code recommendation using statistical learning from fine-grained changes. In:Proc. of the 24th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2016. 511-522.[doi:10.1145/2950290.2950333]
    [64] Bielik P, Raychev V, Vechev M. PHOG:Probabilistic model for code. Int'l Conf. on Machine Learning. 2016. 2933-2942.
    [65] Holmes R, Walker RJ, Murphy GC. Strathcona example recommendation tool. In:Proc. of the 10th European Software Engineering Conf. Held Jointly with 13th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. 2005.[doi:10.1145/1081706. 1081744]
    [66] Zhang H, Jain A, Khandelwal G, Kaushik C, Ge S, Hu W. Bing developer assistant:Improving developer productivity by recommending sample code. In:Proc. of the 24th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2016. 956-961.[doi:10.1145/2950290.2983955]
    [67] Codota. https://www.codota.com/
    [68] Mei H, Wang QX, Zhang L, Wang J. Software analysis:A road map. Chinese Journal of Computers, 2009,32(9):1697-1710(in Chinese with English abstract).
    [69] Shaw M. Truth VS. Knowledge:The difference between what a component does and what we know it does. In:Proc. of the 8th Int'l Workshop on Software Specification and Design. IEEE, 1996. 181-185.[doi:10.1109/iwssd.1996.501165]
    [70] Rahman F, Khatri S, Barr ET, Devanbu P. Comparing static bug finders and statistical prediction. In:Proc. of the 36th Int'l Conf. on Software Engineering. ACM Press, 2014. 424-434.[doi:10.1145/2568225.2568269]
    [71] Wang S, Chollak D, Movshovitz-Attias D, Tan L. Bugram:Bug detection with n-gram language models. In:Proc. of the 31st IEEE/ACM Int'l Conf. on Automated Software Engineering. ACM Press, 2016. 708-719.[doi:10.1145/2970276.2970341]
    [72] Chen F, Kim S. Crowd debugging. In:Proc. of the 10th Joint Meeting on Foundations of Software Engineering. ACM Press, 2015. 320-332.[doi:10.1145/2786805.2786819]
    [73] Campbell JC, Hindle A, Amaral JN. Syntax errors just aren't natural:Improving error reporting with language models. In:Proc. of the 11th Working Conf. on Mining Software Repositories. ACM Press, 2014. 252-261.[doi:10.1145/2597073.2597102]
    [74] Hanam Q, Brito FSM, Mesbah A. Discovering bug patterns in JavaScript. In:Proc. of the 24th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2016. 144-156.[doi:10.1145/2950290.2950308]
    [75] Li Z, Zhou Y. PR-Miner:Automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes, 2005,30(5):306-315.[doi:10.1145/1081706.1081755]
    [76] Liang B, Bian P, Zhang Y, Shi W, You W, Cai Y. AntMiner:Mining more bugs by reducing noise interference. In:Proc. of the 38th Int'l Conf. on Software Engineering. IEEE, 2016. 333-344.[doi:10.1145/2884781.2884870]
    [77] Murali V, Chaudhuri S, Jermaine C. Finding likely errors with Bayesian specifications. arXiv preprint arXiv:1703.01370, 2017.
    [78] Raychev V, Vechev M, Krause A. Predicting program properties from big code. ACM SIGPLAN Notices, 2015,50(1):111-124.[doi:10.1145/2676726.2677009]
    [79] Nguyen HA, Dyer R, Nguyen TN, Rajan H. Mining preconditions of APIs in large-scale code corpus. In:Proc. of the 22nd ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2014. 166-177.[doi:10.1145/2635868.2635924]
    [80] Xu Z, Zhang X, Chen L, Pei K, Xu B. Python probabilistic type inference with natural language support. In:Proc. of the 24th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2016. 607-618.[doi:10.1145/2950290.2950343]
    [81] Allamanis M, Sutton C. Mining idioms from source code. In:Proc. of the 22nd ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2014. 472-483.[doi:10.1145/2635868.2635901]
    [82] Allamanis M, Barr ET, Bird C, Sutton C. Learning natural coding conventions. In:Proc. of the 22nd ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2014. 281-293.[doi:10.1145/2635868.2635883]
    [83] Allamanis M, Barr ET, Bird C, Sutton C. Suggesting accurate method and class names. In:Proc. of the 10th Joint Meeting on Foundations of Software Engineering. ACM Press, 2015. 38-49.[doi:10.1145/2786805.2786849]
    [84] Treude C, Robillard MP. Augmenting API documentation with insights from stack overflow. In:Proc. of the 38th Int'l Conf. on Software Engineering. ACM Press, 2016. 392-403.[doi:10.1145/2884781.2884800]
    [85] Mani S, Catherine R, Sinha VS, Dubey A. Ausum:Approach for unsupervised bug report summarization. In:Proc. of the ACM SIGSOFT 20th Int'l Symp. on the Foundations of Software Engineering. ACM Press, 2012. Article No.11.[doi:10.1145/2393596. 2393607]
    [86] Karaivanov S, Raychev V, Vechev M. Phrase-Based statistical translation of programming languages. In:Proc. of the 2014 ACM Int'l Symp. on New Ideas, New Paradigms, and Reflections on Programming & Software. ACM Press, 2014. 173-184.[doi:10. 1145/2661136.2661148]
    [87] Nguyen AT, Nguyen TT, Nguyen TN. Lexical statistical machine translation for language migration. In:Proc. of the 9th Joint Meeting on Foundations of Software Engineering. 2013. 651-654.[doi:10.1145/2491411.2494584]
    [88] Oda Y, Fudaba H, Neubig G, Hata H, Sakti S, Toda T, Nakamura S. Learning to generate pseudo-code from source code using statistical machine translation. In:Proc. of the 30th IEEE/ACM Int'l Conf. on Automated Software Engineering. IEEE, 2015. 574-584.[doi:10.1109/ase.2015.36]
    附中文参考文献:
    [12] 玄跻峰,任志磊,王子元,谢晓园,江贺.自动程序修复方法研究进展.软件学报,2016,27(4):771-784. http://www.jos.org.cn/1000-9825/4972.htm[doi:10.13328/j.cnki.jos.004972]
    [68] 梅宏,王千祥,张路,王戟.软件分析技术进展.计算机学报,2009,32(9):1697-1710.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘斌斌,董威,王戟.智能化的程序搜索与构造方法综述.软件学报,2018,29(8):2180-2197

Copy
Share
Article Metrics
  • Abstract:6431
  • PDF: 7791
  • HTML: 4225
  • Cited by: 0
History
  • Received:July 18,2017
  • Revised:September 28,2017
  • Online: March 13,2018
You are the first2033154Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063