From the viewpoint of decision theory, AQM (active queue management) can be considered as an optimal decision problem. In this paper, a new AQM scheme, Reinforcement Learning Gradient-Descent (RLGD), is described based on the optimal decision theory of reinforcement learning. Aiming to maximize the throughput and stabilize the queue length, RLGD adjusts the update step adaptively, without the demand of knowing the rate adjustment scheme of the source sender. Simulation demonstrates that RLGD can lead to the convergence of the queue length to the desired value quickly and maintain the oscillation small. The results also show that the RLGD scheme is very robust to disturbance under various network conditions and outperforms the traditional REM and PI controllers significantly.