基于强化学习的金融交易系统研究与发展
作者:
作者简介:

梁天新(1984-),男,黑龙江齐齐哈尔人,博士生,CCF学生会员,主要研究领域为自然语言处理,深度学习,机器学习,强化学习;王良(1963-),男,博士,副教授,CCF高级会员,主要研究领域为智能科学,数据库管理系统,数据库系统评价和性能优化;杨小平(1956-),男,博士,教授,博士生导师,主要研究领域为信息系统工程,电子政务,网络安全技术;韩镇远(1993-),男,硕士生,主要研究领域为深度学习,自然语言处理.

通讯作者:

王良,E-mail:wangliang@ruc.edu.cn

基金项目:

国家自然科学基金(71531012)


Review on Financial Trading System Based on Reinforcement Learning
Author:
Fund Project:

National Natural Science Foundation of China (71531012)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [73]
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    近年来,强化学习在电子游戏、棋类、决策控制等领域取得了巨大进展,也带动着金融交易系统的迅速发展.金融交易问题已经成为强化学习领域的研究热点,特别是股票、外汇和期货等方面具有广泛的应用需求和学术研究意义.以金融领域常用的强化学习模型的发展为脉络,对交易系统、自适应算法、交易策略等方面的诸多研究成果进行了综述.最后讨论了强化学习在金融领域应用中存在的困难和挑战,并对今后强化学习交易系统发展趋势进行展望.

    Abstract:

    In recent years, reinforcement learning has made great progress in the fields of electronic games, chess, and decision-making control. It has also driven the rapid development of financial transaction systems. The issue of financial transactions has become a hot topic in the field of reinforcement learning. Especially, it has wide application demand and academic research significance in the fields of stock, foreign exchange, and futures. This paper summarizes the research achievements of transaction systems, adaptive algorithms, and transaction strategies based on the progress of reinforcement learning models, which are commonly used in the financial field. Finally, the difficulties and challenges of reinforcement learning in financial trading system are discussed, and the future development trend is prospected.

    参考文献
    [1] Fama Eugene F. Random walks in stock market prices. Financial Analysts Journal, 1965,21(5):55-59.
    [2] Farmer JD. Market force, ecology and evolution. Computing in Economics & Finance, 1998,11(5):895-953(59).[doi:10.1093/icc/11.5.895]
    [3] Lo AW. The adaptive markets hypothesis:Market efficiency from an evolutionary perspective. Social Science Electronic Publishing, 2004.[doi:10.3905/jpm.2004.442611]
    [4] Lo AW. Reconciling efficient markets with behavioral finance:The adaptive markets hypothesis. Journal of Investment Consulting, 2005. http://ssrn.com/abstract=728864
    [5] Sutton RS, Barto AG. Introduction to Reinforcement Learning. Vol.135. Cambridge:MIT Press, 1998. http://legacydirs.umiacs.umd.edu/~hal/courses/2016F_RL/RL9.pdf
    [6] Kuleshov V, Precup D. Algorithms for the multi-armed bandit problem. Journal of Machine Learning Research, 2000,1:1-48. http://cn.arxiv.org/pdf/1402.6028
    [7] Moody J, Saffell M. Reinforcement learning for trading. In:Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 917-923.
    [8] Moody J, Wu L, Liao Y, Saffell M. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 1998,17(5-6):441-470.[doi:10.1002/(sici)1099-131x(1998090)17:5/6<441::aid-for707>3.3.co;2-r]
    [9] Moody J, Saffell M. Learning to trade via direct reinforcement. IEEE Trans. on Neural Networks, 2001,12(4):875-889.[doi:10.1109/72.935097]
    [10] Gold C. FX trading via recurrent reinforcement learning. In:Proc. of the IEEE Int'l Conf. on Computational Intelligence for Financial Engineering. IEEE, 2003. 363-370.[doi:10.1109/cifer.2003.1196283]
    [11] Gorse D. Application of stochastic recurrent reinforcement learning to index trading. In:Proc. of the Esann 2011, European Symp. on Artificial Neural Networks. Bruges:DBLP, 2011. http://pdfs.semanticscholar.org/e7aa/08a404bb879cae6fcb751394a29465078e56.pdf
    [12] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science, 2006,313(5786):504-507.[doi:10.1126/science.1127647]
    [13] Zhang J, Maringer D. Indicator selection for daily equity trading with recurrent reinforcement learning. In:Proc. of the Conf. Companion on Genetic and Evolutionary Computation. ACM Press, 2013. 1757-1758.[doi:10.1145/2464576.2480773]
    [14] Zhang J, Maringer D. Using a genetic algorithm to improve recurrent reinforcement learning for equity trading. Computational Economics, 2016,47(4):551-567.[doi:10.1007/s10614-015-9490-y]
    [15] Werbos PJ. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22(6):25-38.
    [16] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming:An overview. In:Proc. of the IEEE Conf. on Decision and Control. IEEE, 1995. 560-564.[doi:10.1109/cdc.1995.478953]
    [17] Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009,9(3):32-50.[doi:10.1109/MCAS.2009.933854]
    [18] Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. on Neural Networks and Learning Systems, 2014,25(3):621-634.[doi:10.1109/tnnls.2013.2281663]
    [19] Zhao H, Wang B, Liao J, Wang H, Tan G. Adaptive dynamic programming for control:algorithms and stability. Communications & Control Engineering, 2013,54(45):6019-6022.
    [20] Atiya AF, Parlos AG, Ingber L. A reinforcement learning method based on adaptive simulated annealing. In:Proc. of the 2003 IEEE Midwest Symp. on Circuits and Systems. IEEE, 2003. 121-124.[doi:10.1109/mwscas.2003.1562233]
    [21] Jangmin O, Lee J, Lee JW, Zhang BT. Adaptive stock trading with dynamic asset allocation using reinforcement learning. Information Sciences, 2006,176(15):2121-2147.[doi:10.1016/j.ins.2005.10.009]
    [22] Dempster MAH, Leemans V. An automated FX trading system using adaptive reinforcement learning. Expert Systems with Applications, 2006,30(3):543-552.[doi:10.1016/j.eswa.2005.10.012]
    [23] Bertoluzzo F, Corazza M. Making financial trading by recurrent reinforcement learning. In:Proc. of the Int'l Conf. on Knowledge-based and Intelligent Information and Engineering Systems. Berlin, Heidelberg:Springer-Verlag, 2007. 619-626.[doi:10.1007/978-3-540-74827-4_78].
    [24] Tan Z, Quek C, Cheng PYK. Stock trading with cycles:A financial application of ANFIS and reinforcement learning. Expert Systems with Applications, 2011,38(5):4741-4755.[doi:10.1016/j.eswa.2010.09.001]
    [25] Almahdi S, Yang SY. An adaptive portfolio trading system:A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 2017,87:267-279.[doi:10.1016/j.eswa.2017. 06.023]
    [26] Hamilton JD. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 1989, 57(2):357-384.[doi:10.2307/1912559]
    [27] Hamilton JD, Susmel R. Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics, 1994, 64(1-2):307-333.[doi:10.1016/0304-4076(94)90067-1]
    [28] Gray SF. Modeling the conditional distribution of interest rates as a regime-switching process. Journal of Financial Economics, 1996,42(1):27-62.[doi:10.1016/0304-405x(96)00875-6]
    [29] Maringer D, Ramtohul T. Regime-switching recurrent reinforcement learning for investment decision making. Computational Management Science, 2012,9(1):89-107.[doi:10.1007/s10287-011-0131-1]
    [30] Maringer D, Ramtohul T. Threshold recurrent reinforcement learning model for automated trading. In:Proc. of the Applications of Evolutionary Computation, Evoapplications 2010:Evocomnet, Evoenvironment, Evofin, Evomusart, and Evotranslog. Istanbul:DBLP, 2010. 212-221.[doi:10.1007/978-3-642-12242-2_22]
    [31] Maringer D, Ramtohul T. Regime-switching recurrent reinforcement learning in automated trading. In:Proc. of the Natural Computing in Computational Finance. Berlin, Heidelberg:Springer-Verlag, 2011. 93-121.[doi:10.1007/978-3-642-23336-4_6]
    [32] Maringer D, Zhang J. Transition variable selection for regime switching recurrent reinforcement learning. In:Proc. of the Computational Intelligence for Financial Engineering & Economics. IEEE, 2014. 407-413.[doi:10.1109/cifer.2014.6924102]
    [33] Wierstra D, Förster A, Peters J, Schmidhuber J. Recurrent policy gradients. Logic Journal of Igpl, 2010,18(2010):620-634.[doi:10.1093/jigpal/jzp049]
    [34] Baird L, Moore A. Gradient descent for general reinforcement learning. In:Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 968-974.
    [35] Watkins CJCH. Learning from delayed rewards. Robotics & Autonomous Systems, 1989,15(4):233-235.
    [36] Jaakkola T, Jordan MI, Singh SP. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 1993,6(6):1185-1201.[doi:10.21236/ada276517]
    [37] Tsitsiklis JN. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994,16(3):185-202.[doi:10.1007/bf00993306]
    [38] Watkins CJCH, Dayan P. Technical note:Q-learning. Machine Learning, 1992,8(3-4):279-292.[doi:10.1007/978-1-4615-3618-5_4]
    [39] Moore AW, Atkeson CG. Prioritized sweeping:Reinforcement learning with less data and less time. Machine Learning, 1993,13(1):103-130.[doi:10.1007/bf00993104]
    [40] Mahadevan S, Maggioni M. Proto-value functions:A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 2007,8:2169-2231.[doi:10.1145/1102351.1102421]
    [41] Sutton RS. Policy gradient methods for reinforcement learning with function approximation. Submitted to Advances in Neural Information Processing Systems, 1999,12:1057-1063.
    [42] Lee JW, Jangmin O. A multi-agent Q-learning framework for optimizing stock trading systems. In:Proc. of the Int'l Conf. on Database and Expert Systems Applications. Springer-Verlag, 2002. 153-162.[doi:10.1007/3-540-46146-9_16]
    [43] Lee JW, Park J, Jangmin O, Lee J, Hong E. A multiagent approach to $Q$-learning for daily stock trading. IEEE Trans. on Systems Man & Cybernetics-Part A:Systems & Humans, 2007,37(6):864-877.[doi:10.1109/tsmca.2007.904825]
    [44] Li J, Chan L. Reward adjustment reinforcement learning for risk-averse asset allocation. In:Proc. of the IEEE Int'l Joint Conf. on Neural Network. 2006. 534-541.[doi:10.1109/ijcnn.2006.246728]
    [45] Bertoluzzo F, Corazza M. Reinforcement learning for automatic financial trading:Introduction and some applications. Working Papers, 2012.[doi:10.2139/ssrn.2192034]
    [46] Bertoluzzo F, Corazza M. Testing different reinforcement learning configurations for financial trading:Introduction and applications. Procedia Economics & Finance, 2012,3(338):68-77.[doi:10.1016/s2212-5671(12)00122-0]
    [47] Corazza M, Bertoluzzo F. Q-learning-based financial trading systems with applications. Social Science Electronic Publishing, 2014.[doi:10.2139/ssrn.2507826]
    [48] Du X, Zhai JJ, Lv KP. Algorithm trading using q-learning and recurrent reinforcement learning. 2016. http://cs229.stanford.edu/proj2009/LvDuZhai.pdf
    [49] Eilers D, Dunis CL, von Mettenheim HJ, Breitner MH. Intelligent trading of seasonal effects:A decision support algorithm based on reinforcement learning. Decision Support Systems, 2014,64:100-108.[doi:10.1016/j.dss.2014.04.011]
    [50] Konda V. Actor-critic algorithms. Siam Journal on Control & Optimization, 1999,42(4):1143-1166. http://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf
    [51] Li H, Dagli CH, Enke D. Short-term stock market timing prediction under reinforcement learning schemes. In:Proc. of the IEEE Int'l Symp. on Approximate Dynamic Programming and Reinforcement Learning. IEEE, 2007. 233-240.[doi:10.1109/adprl.2007. 368193]
    [52] Bekiros SD. Heterogeneous trading strategies with adaptive fuzzy actor-Critic reinforcement learning:A behavioral approach. Journal of Economic Dynamics & Control, 2010,34(6):1153-1170.[doi:10.1016/j.jedc.2010.01.015]
    [53] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D. Playing atari with deep reinforcement learning. Computer Science, 2013.
    [54] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529.[doi:10.1038/nature14236]
    [55] Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa, Y. Continuous control with deep reinforcement learning. Computer Science, 2015,8(6):A187.
    [56] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T. Asynchronous methods for deep reinforcement learning. 2016.
    [57] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Proc. of the 26th Annual Conf. on Neural Information Processing Systems. Nevada, 2012. 1097-1105.[doi:10.1145/3065386]
    [58] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S. Image net large scale visual recognition challenge. Int'l Journal of Computer Vision, 2015,115(3):211-252.[doi:10.1007/s11263-015-0816-y]
    [59] Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In:Proc. of the IEEE Conf. on Acoustics, Speech and NAL Processing. Vancouver, 2013. 6645-6649.[doi:10.1109/icassp.2013.6638947]
    [60] Li YX, Zhang JQ, Pan D, Hu D. A study of speech recognition based on RNN-RBM language model. Journal of Computer Research a Development, 2014,51(9):1936-1944(in Chinese with English abstract).[doi:10.7544/issn1000-1239.2014.20140211]
    [61] Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. Doha, 2014. 1724-1734.[doi:10.3115/v1/d14-1179]
    [62] Yang Z, Tao DP, Zhang SY, Jin LW. Similar handwritten Chinese character recognition based on deep neural networks with big data. Journal on Communications, 2014,35(9):184-189(in Chinese with English abstract).[doi:10.3969/j.issn.1000-436x.2014. 09.019]
    [63] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F. Large-scale video classification with convolutional neural networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Columbus, 2014. 1725-1732.[doi:10.1109/cvpr.2014.223]
    [64] Sun ZJ, Xue L, Xu YM, Wang Z. Overview of deep learning. Application Research of Computers, 2012,29(8):2806-2810(in Chinese with English abstract).[doi:10.3969/j.issn.1001-3695.2012.08.002]
    [65] Deng Y, Bao F, Kong Y, Ren Z, Dai Q. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. on Neural Networks and Learning Systems, 2017,28(3):653-664.[doi:10.1109/tnnls.2016.2522401]
    [66] Lu DW. Agent inspired trading using recurrent reinforcement learning and LSTM neural networks. Papers, 2017. https://arxiv.org/pdf/1707.07338.pdf
    [67] Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. In:Proc. of the Int'l Conf. on Machine Learning. 2014. 387-395.
    [68] Jiang ZY, Xu DX, Liang JJ. A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059, 2017. https://arxiv.org/abs/1706.10059
    [69] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A. Mastering the game of Go without human knowledge. Nature, 2017,550(7676):354-359.[doi:10.1038/nature24270]
    附中文参考文献:
    [60] 黎亚雄,张坚强,潘登,等.基于RNN-RBM语言模型的语音识别研究计算机研究与发展, 2014,51(9):1936-1944.
    [62] 杨钊,陶大鹏,张树业,等.大数据下的基于深度神经网的相似汉字识别.通信学报,2014,35(9):184-189.
    [64] 孙志军,薛磊,许阳明,等.深度学习研究综述.计算机应用研究,2012,29(8):2806-2810.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

梁天新,杨小平,王良,韩镇远.基于强化学习的金融交易系统研究与发展.软件学报,2019,30(3):845-864

复制
分享
文章指标
  • 点击次数:8466
  • 下载次数: 15054
  • HTML阅读次数: 6427
  • 引用次数: 0
历史
  • 收稿日期:2018-07-19
  • 最后修改日期:2018-09-20
  • 在线发布日期: 2019-03-06
文章二维码
您是第20544882位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号