Reinforcement Learning Heuristic Algorithm for Solving the Two-dimensional Strip Packing Problem

doi:10.13328/j.cnki.jos.006161

微信服务号

微信订阅号

2025-4-16- 8

Home > Archive>Volume 32, Issue 12, 2021 >3684-3697. DOI:10.13328/j.cnki.jos.006161

PDF HTML XML Export Cite reminder

Reinforcement Learning Heuristic Algorithm for Solving the Two-dimensional Strip Packing Problem
DOI:
                        10.13328/j.cnki.jos.006161
                    
Author:
                        YANG Ming-GangYANG Ming-Gang
School of Informatics, Xiamen University, Xiamen 361005, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Meng-FanCHEN Meng-Fan
School of Informatics, Xiamen University, Xiamen 361005, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YANG Shuang-YuanYANG Shuang-Yuan
School of Informatics, Xiamen University, Xiamen 361005, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG De-FuZHANG De-Fu
School of Informatics, Xiamen University, Xiamen 361005, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP301
Fund Project:National Natural Science Foundation of China (61672439)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The two-dimensional strip packing problem is a classic NP-hard combinatorial optimization problem, which has been widely used in daily life and industrial production. This study proposes a reinforcement learning heuristic algorithm for it. The reinforcement learning is used to provide an initial boxing sequence for the heuristic algorithm to effectively improve the heuristic cold start problem. The reinforcement learning model can perform self-driven learning, using only the value of the heuristically calculated solution as a reward signal to optimize the network, so that the network can learn a better packing sequence. A simplified version of the pointer network is used to decode the output boxing sequence. The model consists of an embedding layer, a decoder, and an attention mechanism. Actor-critic algorithm is used to train the model, which improves the efficiency of the model. The reinforcement learning heuristic algorithm is tested on 714 standard problem instances and 400 generated problem instances. Experimental results show that the proposed algorithm can effectively improve the heuristic cold start problem and outperform the state-of-the-art heuristics with much higher solution quality.

Key words:two-dimensional strip packing problem;reinforcement learning;pointer network;heuristics;hierarchical search

Get Citation

阳名钢,陈梦烦,杨双远,张德富.求解二维装箱问题的强化学习启发式算法.软件学报,2021,32(12):3684-3697

Copy

Article Metrics

Abstract:2148
PDF: 5649
HTML: 2342
Cited by: 0

History

Received:July 14,2020
Revised:August 20,2020
Adopted:
Online: May 21,2021
Published: December 06,2021

You are the first2035270Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History