Abstract:This paper is primarily to improve the efficiency of real-time dynamic programming (RTDP) algorithm for solving Markov decision problems. Several typical convergence criteria are compared and analyzed. A criterion called optimal action criterion and a corresponding branch strategy are proposed on the basis of the upper and lower bound theory. This criterion guarantees that the agent can act earlier in a real-time decision process while an optimal policy with sufficient precision still remains. It can be proved that under certain conditions one can obtain an optimal policy with arbitrary precision by using such an incremental method. With these new techniques, a bounded incremental real-time dynamic programming (BI-RTDP) algorithm is designed. In the experiments of two typical real-time simulation systems, BI-RTDP outperforms the other state-of-the-art RTDP algorithms tested.