Supported by the National Natural Science Foundation of China under Grant No.60745002 (国家自然科学基金); the National Basic Research Program of China under No.2003CB317002 (国家重点基础研究发展计划(973))
Optimal Action Criterion and Algorithm Improvement of Real-Time Dynamic Programming
This paper is primarily to improve the efficiency of real-time dynamic programming (RTDP) algorithm for solving Markov decision problems. Several typical convergence criteria are compared and analyzed. A criterion called optimal action criterion and a corresponding branch strategy are proposed on the basis of the upper and lower bound theory. This criterion guarantees that the agent can act earlier in a real-time decision process while an optimal policy with sufficient precision still remains. It can be proved that under certain conditions one can obtain an optimal policy with arbitrary precision by using such an incremental method. With these new techniques, a bounded incremental real-time dynamic programming (BI-RTDP) algorithm is designed. In the experiments of two typical real-time simulation systems, BI-RTDP outperforms the other state-of-the-art RTDP algorithms tested.