Review on Financial Trading System Based on Reinforcement Learning

doi:10.13328/j.cnki.jos.005689

微信服务号

微信订阅号

2025-4-21- 17

Home > Archive>Volume 30, Issue 3, 2019 >845-864. DOI:10.13328/j.cnki.jos.005689

PDF HTML XML Export Cite reminder

Review on Financial Trading System Based on Reinforcement Learning
DOI:
                        10.13328/j.cnki.jos.005689
                    
Author:
                        LIANG Tian-XinLIANG Tian-Xin
School of Information, Renmin Universityof China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YANG Xiao-PingYANG Xiao-Ping
School of Information, Renmin Universityof China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG LiangWANG Liang
School of Information, Renmin Universityof China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HAN Zhen-YuanHAN Zhen-Yuan
School of Information, Renmin Universityof China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (71531012)

Article

Figures

Metrics

Reference [73]

Related [20]

Cited by

Materials

Comments

Abstract:

In recent years, reinforcement learning has made great progress in the fields of electronic games, chess, and decision-making control. It has also driven the rapid development of financial transaction systems. The issue of financial transactions has become a hot topic in the field of reinforcement learning. Especially, it has wide application demand and academic research significance in the fields of stock, foreign exchange, and futures. This paper summarizes the research achievements of transaction systems, adaptive algorithms, and transaction strategies based on the progress of reinforcement learning models, which are commonly used in the financial field. Finally, the difficulties and challenges of reinforcement learning in financial trading system are discussed, and the future development trend is prospected.

Key words:reinforcement learning;deep learning;financial trading system;adaptive algorithm;trading strategy

Reference

[1] Fama Eugene F. Random walks in stock market prices. Financial Analysts Journal, 1965,21(5):55-59.

[2] Farmer JD. Market force, ecology and evolution. Computing in Economics & Finance, 1998,11(5):895-953(59).[doi:10.1093/icc/11.5.895]

[3] Lo AW. The adaptive markets hypothesis:Market efficiency from an evolutionary perspective. Social Science Electronic Publishing, 2004.[doi:10.3905/jpm.2004.442611]

[4] Lo AW. Reconciling efficient markets with behavioral finance:The adaptive markets hypothesis. Journal of Investment Consulting, 2005. http://ssrn.com/abstract=728864

[5] Sutton RS, Barto AG. Introduction to Reinforcement Learning. Vol.135. Cambridge:MIT Press, 1998. http://legacydirs.umiacs.umd.edu/~hal/courses/2016F_RL/RL9.pdf

[6] Kuleshov V, Precup D. Algorithms for the multi-armed bandit problem. Journal of Machine Learning Research, 2000,1:1-48. http://cn.arxiv.org/pdf/1402.6028

[7] Moody J, Saffell M. Reinforcement learning for trading. In:Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 917-923.

[8] Moody J, Wu L, Liao Y, Saffell M. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 1998,17(5-6):441-470.[doi:10.1002/(sici)1099-131x(1998090)17:5/6<441::aid-for707>3.3.co;2-r]

[9] Moody J, Saffell M. Learning to trade via direct reinforcement. IEEE Trans. on Neural Networks, 2001,12(4):875-889.[doi:10.1109/72.935097]

[10] Gold C. FX trading via recurrent reinforcement learning. In:Proc. of the IEEE Int'l Conf. on Computational Intelligence for Financial Engineering. IEEE, 2003. 363-370.[doi:10.1109/cifer.2003.1196283]

[11] Gorse D. Application of stochastic recurrent reinforcement learning to index trading. In:Proc. of the Esann 2011, European Symp. on Artificial Neural Networks. Bruges:DBLP, 2011. http://pdfs.semanticscholar.org/e7aa/08a404bb879cae6fcb751394a29465078e56.pdf

[12] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science, 2006,313(5786):504-507.[doi:10.1126/science.1127647]

[13] Zhang J, Maringer D. Indicator selection for daily equity trading with recurrent reinforcement learning. In:Proc. of the Conf. Companion on Genetic and Evolutionary Computation. ACM Press, 2013. 1757-1758.[doi:10.1145/2464576.2480773]

[14] Zhang J, Maringer D. Using a genetic algorithm to improve recurrent reinforcement learning for equity trading. Computational Economics, 2016,47(4):551-567.[doi:10.1007/s10614-015-9490-y]

[15] Werbos PJ. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22(6):25-38.

[16] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming:An overview. In:Proc. of the IEEE Conf. on Decision and Control. IEEE, 1995. 560-564.[doi:10.1109/cdc.1995.478953]

[17] Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009,9(3):32-50.[doi:10.1109/MCAS.2009.933854]

[18] Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. on Neural Networks and Learning Systems, 2014,25(3):621-634.[doi:10.1109/tnnls.2013.2281663]

[19] Zhao H, Wang B, Liao J, Wang H, Tan G. Adaptive dynamic programming for control:algorithms and stability. Communications & Control Engineering, 2013,54(45):6019-6022.

[20] Atiya AF, Parlos AG, Ingber L. A reinforcement learning method based on adaptive simulated annealing. In:Proc. of the 2003 IEEE Midwest Symp. on Circuits and Systems. IEEE, 2003. 121-124.[doi:10.1109/mwscas.2003.1562233]

[21] Jangmin O, Lee J, Lee JW, Zhang BT. Adaptive stock trading with dynamic asset allocation using reinforcement learning. Information Sciences, 2006,176(15):2121-2147.[doi:10.1016/j.ins.2005.10.009]

[22] Dempster MAH, Leemans V. An automated FX trading system using adaptive reinforcement learning. Expert Systems with Applications, 2006,30(3):543-552.[doi:10.1016/j.eswa.2005.10.012]

[23] Bertoluzzo F, Corazza M. Making financial trading by recurrent reinforcement learning. In:Proc. of the Int'l Conf. on Knowledge-based and Intelligent Information and Engineering Systems. Berlin, Heidelberg:Springer-Verlag, 2007. 619-626.[doi:10.1007/978-3-540-74827-4_78].

[24] Tan Z, Quek C, Cheng PYK. Stock trading with cycles:A financial application of ANFIS and reinforcement learning. Expert Systems with Applications, 2011,38(5):4741-4755.[doi:10.1016/j.eswa.2010.09.001]

[25] Almahdi S, Yang SY. An adaptive portfolio trading system:A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 2017,87:267-279.[doi:10.1016/j.eswa.2017. 06.023]

[26] Hamilton JD. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 1989, 57(2):357-384.[doi:10.2307/1912559]

[27] Hamilton JD, Susmel R. Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics, 1994, 64(1-2):307-333.[doi:10.1016/0304-4076(94)90067-1]

[28] Gray SF. Modeling the conditional distribution of interest rates as a regime-switching process. Journal of Financial Economics, 1996,42(1):27-62.[doi:10.1016/0304-405x(96)00875-6]

[29] Maringer D, Ramtohul T. Regime-switching recurrent reinforcement learning for investment decision making. Computational Management Science, 2012,9(1):89-107.[doi:10.1007/s10287-011-0131-1]

[30] Maringer D, Ramtohul T. Threshold recurrent reinforcement learning model for automated trading. In:Proc. of the Applications of Evolutionary Computation, Evoapplications 2010:Evocomnet, Evoenvironment, Evofin, Evomusart, and Evotranslog. Istanbul:DBLP, 2010. 212-221.[doi:10.1007/978-3-642-12242-2_22]

[31] Maringer D, Ramtohul T. Regime-switching recurrent reinforcement learning in automated trading. In:Proc. of the Natural Computing in Computational Finance. Berlin, Heidelberg:Springer-Verlag, 2011. 93-121.[doi:10.1007/978-3-642-23336-4_6]

[32] Maringer D, Zhang J. Transition variable selection for regime switching recurrent reinforcement learning. In:Proc. of the Computational Intelligence for Financial Engineering & Economics. IEEE, 2014. 407-413.[doi:10.1109/cifer.2014.6924102]

[33] Wierstra D, Förster A, Peters J, Schmidhuber J. Recurrent policy gradients. Logic Journal of Igpl, 2010,18(2010):620-634.[doi:10.1093/jigpal/jzp049]

[34] Baird L, Moore A. Gradient descent for general reinforcement learning. In:Proc. of the Conf. on Advances in Neural Information Processing Systems Ⅱ. MIT Press, 1999. 968-974.

[35] Watkins CJCH. Learning from delayed rewards. Robotics & Autonomous Systems, 1989,15(4):233-235.

[36] Jaakkola T, Jordan MI, Singh SP. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 1993,6(6):1185-1201.[doi:10.21236/ada276517]

[37] Tsitsiklis JN. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994,16(3):185-202.[doi:10.1007/bf00993306]

[38] Watkins CJCH, Dayan P. Technical note:Q-learning. Machine Learning, 1992,8(3-4):279-292.[doi:10.1007/978-1-4615-3618-5_4]

[39] Moore AW, Atkeson CG. Prioritized sweeping:Reinforcement learning with less data and less time. Machine Learning, 1993,13(1):103-130.[doi:10.1007/bf00993104]

[40] Mahadevan S, Maggioni M. Proto-value functions:A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 2007,8:2169-2231.[doi:10.1145/1102351.1102421]

[41] Sutton RS. Policy gradient methods for reinforcement learning with function approximation. Submitted to Advances in Neural Information Processing Systems, 1999,12:1057-1063.

[42] Lee JW, Jangmin O. A multi-agent Q-learning framework for optimizing stock trading systems. In:Proc. of the Int'l Conf. on Database and Expert Systems Applications. Springer-Verlag, 2002. 153-162.[doi:10.1007/3-540-46146-9_16]

[43] Lee JW, Park J, Jangmin O, Lee J, Hong E. A multiagent approach to $Q$-learning for daily stock trading. IEEE Trans. on Systems Man & Cybernetics-Part A:Systems & Humans, 2007,37(6):864-877.[doi:10.1109/tsmca.2007.904825]

[44] Li J, Chan L. Reward adjustment reinforcement learning for risk-averse asset allocation. In:Proc. of the IEEE Int'l Joint Conf. on Neural Network. 2006. 534-541.[doi:10.1109/ijcnn.2006.246728]

[45] Bertoluzzo F, Corazza M. Reinforcement learning for automatic financial trading:Introduction and some applications. Working Papers, 2012.[doi:10.2139/ssrn.2192034]

[46] Bertoluzzo F, Corazza M. Testing different reinforcement learning configurations for financial trading:Introduction and applications. Procedia Economics & Finance, 2012,3(338):68-77.[doi:10.1016/s2212-5671(12)00122-0]

[47] Corazza M, Bertoluzzo F. Q-learning-based financial trading systems with applications. Social Science Electronic Publishing, 2014.[doi:10.2139/ssrn.2507826]

[48] Du X, Zhai JJ, Lv KP. Algorithm trading using q-learning and recurrent reinforcement learning. 2016. http://cs229.stanford.edu/proj2009/LvDuZhai.pdf

[49] Eilers D, Dunis CL, von Mettenheim HJ, Breitner MH. Intelligent trading of seasonal effects:A decision support algorithm based on reinforcement learning. Decision Support Systems, 2014,64:100-108.[doi:10.1016/j.dss.2014.04.011]

[50] Konda V. Actor-critic algorithms. Siam Journal on Control & Optimization, 1999,42(4):1143-1166. http://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf

[51] Li H, Dagli CH, Enke D. Short-term stock market timing prediction under reinforcement learning schemes. In:Proc. of the IEEE Int'l Symp. on Approximate Dynamic Programming and Reinforcement Learning. IEEE, 2007. 233-240.[doi:10.1109/adprl.2007. 368193]

[52] Bekiros SD. Heterogeneous trading strategies with adaptive fuzzy actor-Critic reinforcement learning:A behavioral approach. Journal of Economic Dynamics & Control, 2010,34(6):1153-1170.[doi:10.1016/j.jedc.2010.01.015]

[53] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D. Playing atari with deep reinforcement learning. Computer Science, 2013.

[54] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529.[doi:10.1038/nature14236]

[55] Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa, Y. Continuous control with deep reinforcement learning. Computer Science, 2015,8(6):A187.

[56] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T. Asynchronous methods for deep reinforcement learning. 2016.

[57] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Proc. of the 26th Annual Conf. on Neural Information Processing Systems. Nevada, 2012. 1097-1105.[doi:10.1145/3065386]

[58] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S. Image net large scale visual recognition challenge. Int'l Journal of Computer Vision, 2015,115(3):211-252.[doi:10.1007/s11263-015-0816-y]

[59] Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In:Proc. of the IEEE Conf. on Acoustics, Speech and NAL Processing. Vancouver, 2013. 6645-6649.[doi:10.1109/icassp.2013.6638947]

[60] Li YX, Zhang JQ, Pan D, Hu D. A study of speech recognition based on RNN-RBM language model. Journal of Computer Research a Development, 2014,51(9):1936-1944(in Chinese with English abstract).[doi:10.7544/issn1000-1239.2014.20140211]

[61] Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. Doha, 2014. 1724-1734.[doi:10.3115/v1/d14-1179]

[62] Yang Z, Tao DP, Zhang SY, Jin LW. Similar handwritten Chinese character recognition based on deep neural networks with big data. Journal on Communications, 2014,35(9):184-189(in Chinese with English abstract).[doi:10.3969/j.issn.1000-436x.2014. 09.019]

[63] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F. Large-scale video classification with convolutional neural networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Columbus, 2014. 1725-1732.[doi:10.1109/cvpr.2014.223]

[64] Sun ZJ, Xue L, Xu YM, Wang Z. Overview of deep learning. Application Research of Computers, 2012,29(8):2806-2810(in Chinese with English abstract).[doi:10.3969/j.issn.1001-3695.2012.08.002]

[65] Deng Y, Bao F, Kong Y, Ren Z, Dai Q. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. on Neural Networks and Learning Systems, 2017,28(3):653-664.[doi:10.1109/tnnls.2016.2522401]

[66] Lu DW. Agent inspired trading using recurrent reinforcement learning and LSTM neural networks. Papers, 2017. https://arxiv.org/pdf/1707.07338.pdf

[67] Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. In:Proc. of the Int'l Conf. on Machine Learning. 2014. 387-395.

[68] Jiang ZY, Xu DX, Liang JJ. A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059, 2017. https://arxiv.org/abs/1706.10059

[69] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A. Mastering the game of Go without human knowledge. Nature, 2017,550(7676):354-359.[doi:10.1038/nature24270]

附中文参考文献:

[60] 黎亚雄,张坚强,潘登,等.基于RNN-RBM语言模型的语音识别研究计算机研究与发展, 2014,51(9):1936-1944.

[62] 杨钊,陶大鹏,张树业,等.大数据下的基于深度神经网的相似汉字识别.通信学报,2014,35(9):184-189.

[64] 孙志军,薛磊,许阳明,等.深度学习研究综述.计算机应用研究,2012,29(8):2806-2810.

Get Citation

梁天新,杨小平,王良,韩镇远.基于强化学习的金融交易系统研究与发展.软件学报,2019,30(3):845-864

Copy

Article Metrics

Abstract:8336
PDF: 14723
HTML: 5786
Cited by: 0

History

Received:July 19,2018
Revised:September 20,2018
Adopted:
Online: March 06,2019
Published:

You are the first2036712Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History