Abstract:With the rapid development of merchant review websites, the volume of content on these websites has increased significantly, making it challenging for users to quickly find valuable reviews. This study introduces a new task, “multimodal customized review generation”. The task aims to generate customized reviews for specific users about products they have not yet reviewed, thus providing valuable insights into these products. To achieve this goal, this study explores a multimodal review generation framework based on a pre-trained language model. Specifically, a multimodal pre-trained language model is employed, which takes product images and user preferences as inputs. The visual and textual features are then fused to generate customized reviews. Experimental results demonstrate that the proposed model is effective in generating high-quality customized reviews.