基于深度学习的程序生成与补全技术研究进展
作者:
作者简介:

胡星(1993-),女,河南商丘人,博士生,主要研究领域为程序分析,深度学习;刘芳(1994-),女,博士生,主要研究领域为深度学习,软件工程;李戈(1977-),男,博士,副教授,CCF专业会员,主要研究领域为深度学习,程序分析,知识工程;金芝(1962-),女,博士,教授,博士生导师,CCF会士,主要研究领域为需求工程,知识工程.

通讯作者:

李戈,E-mail:lige@pku.edu.cn;金芝,E-mail:zhijin@pku.edu.cn

基金项目:

国家重点基础研究发展计划(973)(2015CB352201);国家自然科学基金(61620106007,61751210)


Program Generation and Code Completion Techniques Based on Deep Learning: Literature Review
Author:
Fund Project:

National Program on Key Basic Research Project of China (973) (2015CB352201); National Natural Science Foundation of China (61620106007, 61751210)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [70]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    自动化软件开发一直是软件工程领域的研究热点.目前,互联网技术促进了开源软件和开源社区的发展,这些大规模的代码和数据成为自动化软件开发的机遇.与此同时,深度学习也在软件工程领域开始得到应用.如何将深度学习技术用于大规模代码的学习,并实现机器自动编写程序,是人工智能与软件工程领域的共同期望.机器自动编写程序,辅助甚至在一定程度上代替程序员开发程序,极大地减轻了程序员的开发负担,提高了软件开发的效率和质量.目前,基于深度学习方法自动编写程序主要从两个方面实现:程序生成和代码补全.对这两个方面的应用以及主要涉及的深度学习模型进行了介绍.

    Abstract:

    Automatic software development has always been a research hotspot in the field of software engineering. Currently, Internet technology has promoted the development of open source software and open source communities. These large-scale code and data are opportunities for automatic software development. At the same time, deep learning is beginning to be applied in various software engineering tasks. How to use deep learning technology for large-scale code learning and realize automatic programming of machines is a common expectation in the field of artificial intelligence and software engineering. The machine automatically writes program to assist or even replace the programmer to develop the program to a certain extent, which greatly reduces the development burden of the programmer and improves the efficiency and quality of the software development. At present, automatic programming based on deep learning methods is mainly implemented from two aspects, program generation and code completion. This study introduces these two aspects and the deep learning models.

    参考文献
    [1] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Proc. of the Advances in Neural Information Processing Systems. 2012. 1097-1105.[doi:10.1145/3065386]
    [2] Deng L, Li J, Huang JT, Yao K, Yu D, Seide F, Seltzer M. Recent advances in deep learning for speech research at Microsoft. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2013. 8604-8608.[doi:10.1109/ICASSP.2013.6639345]
    [3] Socher R, Huang EH, Pennin J, Manning CD, Ng AY. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In:Proc. of the Advances in Neural Information Processing Systems. 2011. 801-809.
    [4] Bordes A, Glorot X, Weston J, Bengio Y. Joint learning of words and meaning representations for open-text semantic parsing. In:Proc. of the Int'l Conf. on Artificial Intelligence and Statistics. 2012. 127-135.
    [5] Gulwani S, Polozov O, Singh R. Program synthesis. Foundations and Trends® in Programming Languages, 2017,4(1-2):1-119.
    [6] Gulwani S. Dimensions in program synthesis. In:Proc. of the 12th Int'l ACM SIGPLAN Symp. on Principles and Practice of Declarative Programming. ACM Press, 2010. 13-24.
    [7] Kolmogoroff A. Zur deutung der intuitionistischen logik. Mathematische Zeitschrift, 1932,35(1):58-65.
    [8] Waldinger RJ, Lee RCT. PROW:A step toward automatic program writing. In:Proc. of the 1st Int'l Joint Conf. on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 1969. 241-252.
    [9] Green C. Application of theorem proving to problem solving. In:Proc. of the Readings in Artificial Intelligence. 1981. 202-222.
    [10] Manna Z, Waldinger RJ. Toward automatic program synthesis. Communications of the ACM, 1971,14(3):151-165.
    [11] Manna Z, Waldinger R. Special relations in automated deduction. Journal of the ACM, 1986,33(1):1-59.
    [12] Manna Z, Waldinger R. Fundamentals of deductive program synthesis. IEEE Trans. on Software Engineering, 1992,18(8):674-704.
    [13] Summers PD. A methodology for LISP program construction from examples. Journal of the ACM, 1977,24(1):161-175.[doi:10. 1145/321992.322002]
    [14] Jouannaud JP, Kodratoff Y. Program synthesis from examples of behavior. In:Proc. of the Computer Program Synthesis Methodologies. Dordrecht:Springer-Verlag, 1983. 213-250.
    [15] Smith DR. The synthesis of LISP programs from examples:A survey. In:Proc. of the Automatic Program Construction Techniques. 1984. 307-324.
    [16] Koza JR. Genetic programming as a means for programming computers by natural selection. Statistics and Computing, 1994,4(2):87-112.
    [17] Partridge D. The case for inductive programming. Computer, 1997,30(1):36-41.
    [18] Flener P, Partridge D. Inductive programming. Automated Software Engineering, 2001,8(2):131-137.
    [19] Schmid U. Inductive Synthesis of Functional Programs:Universal Planning, Folding of Finite Programs, and Schema Abstraction by Analogical Reasoning. LNCS (LNAI) 2654, Heidelberg:Springer-Verlag, 2003.
    [20] Kitzelmann E. Inductive programming:A survey of program synthesis techniques. In:Proc. of the Int'l Workshop on Approaches and Applications of Inductive Programming. Berlin, Heidelberg:Springer-Verlag, 2009. 50-73.
    [21] Gulwani S. Automating string processing in spreadsheets using input-output examples. Proc. of the ACM SIGPLAN Notices, 2011, 46(1):317-330.
    [22] Gulwani S, Harris WR, Singh R. Spreadsheet data manipulation using examples. Communications of the ACM, 2012,55(8):97-105.
    [23] Liu C, Wang X, Shin R, et al. Neural code completion. In:Proc. of the ICLR 2017. 2017.
    [24] Hellendoorn VJ, Devanbu P. Are deep neural networks the best choice for modeling source code? In:Proc. of the 201711th Joint Meeting on Foundations of Software Engineering. ACM Press, 2017. 763-773.[doi:10.1145/3106237.3106290]
    [25] Raychev V, Vechev M, Yahav E. Code completion with statistical language models. Proc. of the ACM SIGPLAN Notices, 2014, 49(6):419-428.[doi:10.1145/2594291.2594321]
    [26] Bhoopchand A, Rocktäschel T, Barr E, et al. Learning python code suggestion with a sparse pointer network. arXiv preprint arXiv:1611.08307, 2016.
    [27] Li J, Wang Y, King I, et al. Code completion with neural attention and pointer networks. In:Proc. of the Int'l Joint Conf. on Artifical Intelligence (IJCAI). 2018.[doi:10.24963/ijcai.2018/578]
    [28] Allamanis M, Brockschmidt M, Khademi M. Learning to represent programs with graphs. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2018.
    [29] Balog M, Gaunt AL, Brockschmidt M, et al. DeepCoder:Learning to write programs. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2017.
    [30] Shu C, Zhang H. Neural programming by example. In:Proc. of the AAAI. 2017. 1539-1545.
    [31] Lee H, Grosse R, Ranganath R, Ng AY. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In:Proc. of the 26th Annual Int'l Conf. on Machine Learning. 2009. 609-616.[doi:10.1145/1553374.1553453]
    [32] Parisotto E, Mohamed A, Singh R, et al. Neuro-symbolic program synthesis. arXiv preprint arXiv:1611.01855, 2016.
    [33] Devlin J, Uesato J, Bhupatiraju S, et al. RobustFill:Neural program learning under noisy I/O. In:Proc. of the Int'l Conf. on Machine Learning. 2017. 990-998.
    [34] Feser JK, Brockschmidt M, Gaunt AL, et al. Neural functional programming. In:Proc. of the ICLR 2017. 2017.
    [35] Vijayakumar AJ, Mohta A, Polozov O, et al. Neural-guided deductive search for real-time program synthesis from examples. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2018.
    [36] Bošnjak M, Rocktäschel T, Naradowsky J, et al. Programming with a differentiable forth interpreter. In:Proc. of the Int'l Conf. on Machine Learning. 2017. 547-556.
    [37] Reed S, De Freitas N. Neural programmer-interpreters. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2016.
    [38] Cai J, Shin R, Song D. Making neural programming architectures generalize via recursion. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2017.
    [39] Xiao D, Liao JY, Yuan XY. Improving the universality and learnability of neural programmer-interpreters with combinator abstraction. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2018.
    [40] Chen XY, Liu C, Song D. Towards synthesizing complex programs from input-output examples. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2018.
    [41] Quirk C, Mooney R, Galley M. Language to code:Learning semantic parsers for if-this-then-that recipes. In:Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int'l Joint Conf. on Natural Language Processing, Vol.1. 2015. 878-888.
    [42] Yin P, Neubig G. A syntactic neural model for general-purpose code generation. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics, Vol.1. 2017. 440-450.[doi:10.18653/v1/P17-1041]
    [43] Liu C, Chen X, Shin EC, et al. Latent attention for if-then program synthesis. In:Proc. of the Advances in Neural Information Processing Systems. 2016. 4574-4582.
    [44] Beltagy I, Quirk C. Improved semantic parsers for if-then statements. In:Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, Vol.1. 2016. 726-736.
    [45] Dong L, Lapata M. Language to logical form with neural attention. In:Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, Vol.1. 2016. 33-43.[doi:10.18653/v1/P16-1004]
    [46] Gu X, Zhang H, Zhang D, et al. Deep API learning. In:Proc. of the 201624th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2016. 631-642.[doi:10.1145/2950290.2950334]
    [47] Murali V, Qi L, Chaudhuri S, et al. Neural sketch learning for conditional program generation. In:Proc. of the Int'l Conf. on Learning Representations (ICLR). 2018.
    [48] Mou L, Men R, Li G, et al. On end-to-end program generation from user intention by deep neural networks. arXiv preprint arXiv:1510.07211, 2015.
    [49] Zhong V, Xiong C, Socher R. Seq2SQL:Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103, 2017.
    [50] Cai R, Xu B, Yang X, et al. An encoder-decoder framework translating natural language to database queries. arXiv preprint arXiv:1711.06061, 2017.
    [51] Gong Q, Tian Y, Zitnick CL. Unsupervised program induction with hierarchical generative convolutional neural networks. In:Proc. of the ICLR 2016. 2016.
    [52] Bruch M, Monperrus M, Mezini M. Learning from examples to improve code completion systems. In:Proc. of the 7th Joint Meeting of the European Software Engineering Conf. and the ACM SIGSOFT Symp. on the Foundations of Software Engineering. ACM Press, 2009. 213-222.[doi:10.1145/1595696.1595728]
    [53] Hou D, Pletcher DM. Towards a better code completion system by API grouping, filtering, and popularity-based ranking. In:Proc. of the 2nd Int'l Workshop on Recommendation Systems for Software Engineering. ACM Press, 2010. 26-30.[doi:10.1145/1808920.1808926]
    [54] Hindle A, Barr ET, Su Z, et al. On the naturalness of software. In:Proc. of the 201234th Int'l Conf. on Software Engineering (ICSE). IEEE, 2012. 837-847.[doi:10.1109/ICSE.2012.6227135]
    [55] Tu Z, Su Z, Devanbu P. On the localness of software. In:Proc. of the 22nd ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. ACM Press, 2014. 269-280.[doi:10.1145/2635868.2635875]
    [56] Raychev V, Bielik P, Vechev M. Probabilistic model for code with decision trees. Proc. of the ACM SIGPLAN Notices, 2016, 51(10):731-747.[doi:10.1145/2983990.2984041]
    [57] Gu X, Zhang H, Zhang D, et al. DeepAM:Migrate APIs with multi-modal sequence to sequence learning. In:Proc. of the 26th Int'l Joint Conf. on Artificial Intelligence. AAAI Press, 2017. 3675-3681.[doi:10.24963/ijcai.2017/514]
    [58] Nguyen TT, Nguyen AT, Nguyen HA, et al. A statistical semantic language model for source code. In:Proc. of the 20139th Joint Meeting on Foundations of Software Engineering. ACM Press, 2013. 532-542.[doi:10.1145/2491411.2491458]
    [59] Nguyen TD, Nguyen AT, Nguyen TN. Mapping API elements for code migration with vector representations. In:Proc. of the IEEE/ACM Int'l Conf. on Software Engineering Companion (ICSE-C). IEEE, 2016. 756-758.
    [60] Nguyen TD, Nguyen AT, Phan HD, et al. Exploring API embedding for API usages and applications. In:Proc. of the 39th Int'l Conf. on Software Engineering. IEEE Press, 2017. 438-449.[doi:10.1145/2889160.2892661]
    [61] Bielik P, Raychev V, Vechev M. Program synthesis for character level language modeling. In:Proc. of the ICLR. 2017.
    [62] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In:Proc. of the Advances in Neural Information Processing Systems. 2015. 2692-2700.
    [63] Bielik P, Raychev V, Vechev M. PHOG:Probabilistic model for code. In:Proc. of the Int'l Conf. on Machine Learning. 2016. 2933-2942.
    [64] Rabinovich M, Stern M, Klein D. Abstract syntax networks for code generation and semantic parsing. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics, Vol.1. 2017. 1139-1149.[doi:10.18653/v1/P17-1105]
    [65] Nguyen AT, Nguyen TN. Graph-based statistical language model for code. In:Proc. of the 2015 IEEE/ACM 37th IEEE Int'l Conf. on Software Engineering (ICSE). IEEE, 2015. 858-868.[doi:10.1109/ICSE.2015.336]
    [66] Hu X, Li G, Xia X, Lo D, Jin Z. Deep code comment generation. In:Proc. of the 201826th IEEE/ACM Int'l Conf. on Program Comprehension. ACM Press, 2018. 200-210.[doi:10.1145/3196321.3196334]
    [67] Ling W, Blunsom P, Grefenstette E, et al. Latent predictor networks for code generation. In:Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, Vol.1. 2016. 599-609.[doi:10.18653/v1/P16-1057]
    [68] Mou L, Li G, Zhang L, et al. Convolutional neural networks over tree structures for programming language processing. In:Proc. of the AAAI. AAAI Press, 2016. 1287-1293.[doi:10.13140/RG.2.1.2912.2966]
    [69] Iyer S, Konstas I, Cheung A, et al. Summarizing source code using a neural attention model. In:Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, Vol.1. 2016. 2073-2083.[doi:10.18653/v1/P16-1195]
    [70] Hu X, Li G, Xia X, Lo D, Lu S, Jin Z. Summarizing source code with transferred API knowledge. In:Proc. of the 27th Int'l Joint Conf. on Artificial Intelligence (IJCAI). 2018. 2269-2275.[doi:10.24963/ijcai.2018/314]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

胡星,李戈,刘芳,金芝.基于深度学习的程序生成与补全技术研究进展.软件学报,2019,30(5):1206-1223

复制
分享
文章指标
  • 点击次数:4842
  • 下载次数: 9953
  • HTML阅读次数: 5310
  • 引用次数: 0
历史
  • 收稿日期:2018-08-31
  • 最后修改日期:2018-10-31
  • 在线发布日期: 2019-05-08
文章二维码
您是第19758090位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号