Abstract:The two-dimensional strip packing problem is a classic NP-hard combinatorial optimization problem, which has been widely used in daily life and industrial production. This study proposes a reinforcement learning heuristic algorithm for it. The reinforcement learning is used to provide an initial boxing sequence for the heuristic algorithm to effectively improve the heuristic cold start problem. The reinforcement learning model can perform self-driven learning, using only the value of the heuristically calculated solution as a reward signal to optimize the network, so that the network can learn a better packing sequence. A simplified version of the pointer network is used to decode the output boxing sequence. The model consists of an embedding layer, a decoder, and an attention mechanism. Actor-critic algorithm is used to train the model, which improves the efficiency of the model. The reinforcement learning heuristic algorithm is tested on 714 standard problem instances and 400 generated problem instances. Experimental results show that the proposed algorithm can effectively improve the heuristic cold start problem and outperform the state-of-the-art heuristics with much higher solution quality.