Abstract:For conflict game, a rational but conservative action selection method is investigated, namely, minimizing regret function in the worst case. By this method the loss incurred possibly in future is the lowest under this very policy, and Nash equilibrium mixed policy is obtained without information about other agents. Based on regret, a reinforcement learning model and its algorithm for conflict game under multi-agent complex environment are put forward. This model also builds agents' belief updating process on the concept of cross entropy distance, which further optimizes action selection policy for conflict games. Based on Markov repeated game model, this paper demonstrates the convergence property of this algorithm, and analyzes the relationship between belief and optimal policy. Additionally, compared with extended Q-learning algorithm under MMDP (multi-agent markov decision process), the proposed algorithm decreases the number of conflicts dramatically, enhances coordination among agents, improves system performance, and helps to maintain system stability.