Review on Financial Trading System Based on Reinforcement Learning
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (71531012)

  • Article
  • | |
  • Metrics
  • |
  • Reference [73]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    In recent years, reinforcement learning has made great progress in the fields of electronic games, chess, and decision-making control. It has also driven the rapid development of financial transaction systems. The issue of financial transactions has become a hot topic in the field of reinforcement learning. Especially, it has wide application demand and academic research significance in the fields of stock, foreign exchange, and futures. This paper summarizes the research achievements of transaction systems, adaptive algorithms, and transaction strategies based on the progress of reinforcement learning models, which are commonly used in the financial field. Finally, the difficulties and challenges of reinforcement learning in financial trading system are discussed, and the future development trend is prospected.

    Reference
    [1] Fama Eugene F. Random walks in stock market prices. Financial Analysts Journal, 1965,21(5):55-59.
    [2] Farmer JD. Market force, ecology and evolution. Computing in Economics & Finance, 1998,11(5):895-953(59).[doi:10.1093/icc/11.5.895]
    [3] Lo AW. The adaptive markets hypothesis:Market efficiency from an evolutionary perspective. Social Science Electronic Publishing, 2004.[doi:10.3905/jpm.2004.442611]
    [4] Lo AW. Reconciling efficient markets with behavioral finance:The adaptive markets hypothesis. Journal of Investment Consulting, 2005. http://ssrn.com/abstract=728864
    [5] Sutton RS, Barto AG. Introduction to Reinforcement Learning. Vol.135. Cambridge:MIT Press, 1998. http://legacydirs.umiacs.umd.edu/~hal/courses/2016F_RL/RL9.pdf
    [6] Kuleshov V, Precup D. Algorithms for the multi-armed bandit problem. Journal of Machine Learning Research, 2000,1:1-48. http://cn.arxiv.org/pdf/1402.6028
    [7] Moody J, Saffell M. Reinforcement learning for trading. In:Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 917-923.
    [8] Moody J, Wu L, Liao Y, Saffell M. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 1998,17(5-6):441-470.[doi:10.1002/(sici)1099-131x(1998090)17:5/6<441::aid-for707>3.3.co;2-r]
    [9] Moody J, Saffell M. Learning to trade via direct reinforcement. IEEE Trans. on Neural Networks, 2001,12(4):875-889.[doi:10.1109/72.935097]
    [10] Gold C. FX trading via recurrent reinforcement learning. In:Proc. of the IEEE Int'l Conf. on Computational Intelligence for Financial Engineering. IEEE, 2003. 363-370.[doi:10.1109/cifer.2003.1196283]
    [11] Gorse D. Application of stochastic recurrent reinforcement learning to index trading. In:Proc. of the Esann 2011, European Symp. on Artificial Neural Networks. Bruges:DBLP, 2011. http://pdfs.semanticscholar.org/e7aa/08a404bb879cae6fcb751394a29465078e56.pdf
    [12] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science, 2006,313(5786):504-507.[doi:10.1126/science.1127647]
    [13] Zhang J, Maringer D. Indicator selection for daily equity trading with recurrent reinforcement learning. In:Proc. of the Conf. Companion on Genetic and Evolutionary Computation. ACM Press, 2013. 1757-1758.[doi:10.1145/2464576.2480773]
    [14] Zhang J, Maringer D. Using a genetic algorithm to improve recurrent reinforcement learning for equity trading. Computational Economics, 2016,47(4):551-567.[doi:10.1007/s10614-015-9490-y]
    [15] Werbos PJ. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22(6):25-38.
    [16] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming:An overview. In:Proc. of the IEEE Conf. on Decision and Control. IEEE, 1995. 560-564.[doi:10.1109/cdc.1995.478953]
    [17] Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009,9(3):32-50.[doi:10.1109/MCAS.2009.933854]
    [18] Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. on Neural Networks and Learning Systems, 2014,25(3):621-634.[doi:10.1109/tnnls.2013.2281663]
    [19] Zhao H, Wang B, Liao J, Wang H, Tan G. Adaptive dynamic programming for control:algorithms and stability. Communications & Control Engineering, 2013,54(45):6019-6022.
    [20] Atiya AF, Parlos AG, Ingber L. A reinforcement learning method based on adaptive simulated annealing. In:Proc. of the 2003 IEEE Midwest Symp. on Circuits and Systems. IEEE, 2003. 121-124.[doi:10.1109/mwscas.2003.1562233]
    [21] Jangmin O, Lee J, Lee JW, Zhang BT. Adaptive stock trading with dynamic asset allocation using reinforcement learning. Information Sciences, 2006,176(15):2121-2147.[doi:10.1016/j.ins.2005.10.009]
    [22] Dempster MAH, Leemans V. An automated FX trading system using adaptive reinforcement learning. Expert Systems with Applications, 2006,30(3):543-552.[doi:10.1016/j.eswa.2005.10.012]
    [23] Bertoluzzo F, Corazza M. Making financial trading by recurrent reinforcement learning. In:Proc. of the Int'l Conf. on Knowledge-based and Intelligent Information and Engineering Systems. Berlin, Heidelberg:Springer-Verlag, 2007. 619-626.[doi:10.1007/978-3-540-74827-4_78].
    [24] Tan Z, Quek C, Cheng PYK. Stock trading with cycles:A financial application of ANFIS and reinforcement learning. Expert Systems with Applications, 2011,38(5):4741-4755.[doi:10.1016/j.eswa.2010.09.001]
    [25] Almahdi S, Yang SY. An adaptive portfolio trading system:A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 2017,87:267-279.[doi:10.1016/j.eswa.2017. 06.023]
    [26] Hamilton JD. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 1989, 57(2):357-384.[doi:10.2307/1912559]
    [27] Hamilton JD, Susmel R. Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics, 1994, 64(1-2):307-333.[doi:10.1016/0304-4076(94)90067-1]
    [28] Gray SF. Modeling the conditional distribution of interest rates as a regime-switching process. Journal of Financial Economics, 1996,42(1):27-62.[doi:10.1016/0304-405x(96)00875-6]
    [29] Maringer D, Ramtohul T. Regime-switching recurrent reinforcement learning for investment decision making. Computational Management Science, 2012,9(1):89-107.[doi:10.1007/s10287-011-0131-1]
    [30] Maringer D, Ramtohul T. Threshold recurrent reinforcement learning model for automated trading. In:Proc. of the Applications of Evolutionary Computation, Evoapplications 2010:Evocomnet, Evoenvironment, Evofin, Evomusart, and Evotranslog. Istanbul:DBLP, 2010. 212-221.[doi:10.1007/978-3-642-12242-2_22]
    [31] Maringer D, Ramtohul T. Regime-switching recurrent reinforcement learning in automated trading. In:Proc. of the Natural Computing in Computational Finance. Berlin, Heidelberg:Springer-Verlag, 2011. 93-121.[doi:10.1007/978-3-642-23336-4_6]
    [32] Maringer D, Zhang J. Transition variable selection for regime switching recurrent reinforcement learning. In:Proc. of the Computational Intelligence for Financial Engineering & Economics. IEEE, 2014. 407-413.[doi:10.1109/cifer.2014.6924102]
    [33] Wierstra D, Förster A, Peters J, Schmidhuber J. Recurrent policy gradients. Logic Journal of Igpl, 2010,18(2010):620-634.[doi:10.1093/jigpal/jzp049]
    [34] Baird L, Moore A. Gradient descent for general reinforcement learning. In:Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 968-974.
    [35] Watkins CJCH. Learning from delayed rewards. Robotics & Autonomous Systems, 1989,15(4):233-235.
    [36] Jaakkola T, Jordan MI, Singh SP. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 1993,6(6):1185-1201.[doi:10.21236/ada276517]
    [37] Tsitsiklis JN. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994,16(3):185-202.[doi:10.1007/bf00993306]
    [38] Watkins CJCH, Dayan P. Technical note:Q-learning. Machine Learning, 1992,8(3-4):279-292.[doi:10.1007/978-1-4615-3618-5_4]
    [39] Moore AW, Atkeson CG. Prioritized sweeping:Reinforcement learning with less data and less time. Machine Learning, 1993,13(1):103-130.[doi:10.1007/bf00993104]
    [40] Mahadevan S, Maggioni M. Proto-value functions:A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 2007,8:2169-2231.[doi:10.1145/1102351.1102421]
    [41] Sutton RS. Policy gradient methods for reinforcement learning with function approximation. Submitted to Advances in Neural Information Processing Systems, 1999,12:1057-1063.
    [42] Lee JW, Jangmin O. A multi-agent Q-learning framework for optimizing stock trading systems. In:Proc. of the Int'l Conf. on Database and Expert Systems Applications. Springer-Verlag, 2002. 153-162.[doi:10.1007/3-540-46146-9_16]
    [43] Lee JW, Park J, Jangmin O, Lee J, Hong E. A multiagent approach to $Q$-learning for daily stock trading. IEEE Trans. on Systems Man & Cybernetics-Part A:Systems & Humans, 2007,37(6):864-877.[doi:10.1109/tsmca.2007.904825]
    [44] Li J, Chan L. Reward adjustment reinforcement learning for risk-averse asset allocation. In:Proc. of the IEEE Int'l Joint Conf. on Neural Network. 2006. 534-541.[doi:10.1109/ijcnn.2006.246728]
    [45] Bertoluzzo F, Corazza M. Reinforcement learning for automatic financial trading:Introduction and some applications. Working Papers, 2012.[doi:10.2139/ssrn.2192034]
    [46] Bertoluzzo F, Corazza M. Testing different reinforcement learning configurations for financial trading:Introduction and applications. Procedia Economics & Finance, 2012,3(338):68-77.[doi:10.1016/s2212-5671(12)00122-0]
    [47] Corazza M, Bertoluzzo F. Q-learning-based financial trading systems with applications. Social Science Electronic Publishing, 2014.[doi:10.2139/ssrn.2507826]
    [48] Du X, Zhai JJ, Lv KP. Algorithm trading using q-learning and recurrent reinforcement learning. 2016. http://cs229.stanford.edu/proj2009/LvDuZhai.pdf
    [49] Eilers D, Dunis CL, von Mettenheim HJ, Breitner MH. Intelligent trading of seasonal effects:A decision support algorithm based on reinforcement learning. Decision Support Systems, 2014,64:100-108.[doi:10.1016/j.dss.2014.04.011]
    [50] Konda V. Actor-critic algorithms. Siam Journal on Control & Optimization, 1999,42(4):1143-1166. http://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf
    [51] Li H, Dagli CH, Enke D. Short-term stock market timing prediction under reinforcement learning schemes. In:Proc. of the IEEE Int'l Symp. on Approximate Dynamic Programming and Reinforcement Learning. IEEE, 2007. 233-240.[doi:10.1109/adprl.2007. 368193]
    [52] Bekiros SD. Heterogeneous trading strategies with adaptive fuzzy actor-Critic reinforcement learning:A behavioral approach. Journal of Economic Dynamics & Control, 2010,34(6):1153-1170.[doi:10.1016/j.jedc.2010.01.015]
    [53] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D. Playing atari with deep reinforcement learning. Computer Science, 2013.
    [54] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529.[doi:10.1038/nature14236]
    [55] Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa, Y. Continuous control with deep reinforcement learning. Computer Science, 2015,8(6):A187.
    [56] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T. Asynchronous methods for deep reinforcement learning. 2016.
    [57] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Proc. of the 26th Annual Conf. on Neural Information Processing Systems. Nevada, 2012. 1097-1105.[doi:10.1145/3065386]
    [58] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S. Image net large scale visual recognition challenge. Int'l Journal of Computer Vision, 2015,115(3):211-252.[doi:10.1007/s11263-015-0816-y]
    [59] Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In:Proc. of the IEEE Conf. on Acoustics, Speech and NAL Processing. Vancouver, 2013. 6645-6649.[doi:10.1109/icassp.2013.6638947]
    [60] Li YX, Zhang JQ, Pan D, Hu D. A study of speech recognition based on RNN-RBM language model. Journal of Computer Research a Development, 2014,51(9):1936-1944(in Chinese with English abstract).[doi:10.7544/issn1000-1239.2014.20140211]
    [61] Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. Doha, 2014. 1724-1734.[doi:10.3115/v1/d14-1179]
    [62] Yang Z, Tao DP, Zhang SY, Jin LW. Similar handwritten Chinese character recognition based on deep neural networks with big data. Journal on Communications, 2014,35(9):184-189(in Chinese with English abstract).[doi:10.3969/j.issn.1000-436x.2014. 09.019]
    [63] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F. Large-scale video classification with convolutional neural networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Columbus, 2014. 1725-1732.[doi:10.1109/cvpr.2014.223]
    [64] Sun ZJ, Xue L, Xu YM, Wang Z. Overview of deep learning. Application Research of Computers, 2012,29(8):2806-2810(in Chinese with English abstract).[doi:10.3969/j.issn.1001-3695.2012.08.002]
    [65] Deng Y, Bao F, Kong Y, Ren Z, Dai Q. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. on Neural Networks and Learning Systems, 2017,28(3):653-664.[doi:10.1109/tnnls.2016.2522401]
    [66] Lu DW. Agent inspired trading using recurrent reinforcement learning and LSTM neural networks. Papers, 2017. https://arxiv.org/pdf/1707.07338.pdf
    [67] Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. In:Proc. of the Int'l Conf. on Machine Learning. 2014. 387-395.
    [68] Jiang ZY, Xu DX, Liang JJ. A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059, 2017. https://arxiv.org/abs/1706.10059
    [69] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A. Mastering the game of Go without human knowledge. Nature, 2017,550(7676):354-359.[doi:10.1038/nature24270]
    附中文参考文献:
    [60] 黎亚雄,张坚强,潘登,等.基于RNN-RBM语言模型的语音识别研究计算机研究与发展, 2014,51(9):1936-1944.
    [62] 杨钊,陶大鹏,张树业,等.大数据下的基于深度神经网的相似汉字识别.通信学报,2014,35(9):184-189.
    [64] 孙志军,薛磊,许阳明,等.深度学习研究综述.计算机应用研究,2012,29(8):2806-2810.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

梁天新,杨小平,王良,韩镇远.基于强化学习的金融交易系统研究与发展.软件学报,2019,30(3):845-864

Copy
Share
Article Metrics
  • Abstract:8336
  • PDF: 14723
  • HTML: 5786
  • Cited by: 0
History
  • Received:July 19,2018
  • Revised:September 20,2018
  • Online: March 06,2019
You are the first2036712Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063