Abstract:To make full use of the local spatial relation between point cloud and multi-view data to further improve the accuracy of 3D shape recognition, a 3D shape recognition network based on multimodal relation is proposed. Firstly, a multimodal relation module (MRM) is designed, which can extract the relation information between the local features of any point cloud and that of any multi-view to obtain the corresponding relation features. Then, a cascade pooling consisting of maximum pooling and generalized mean pooling is applied to process the relation tensor and obtain the global relation feature. There are two types of multimodal relation modules, which output the point-view relation feature and the view-point relation feature, respectively. The proposed gating module adopts a self-attention mechanism to find the relation information within the features so that the aggregated global features can be weighted to achieve the suppression of redundant information. Extensive experiments show that the MRM can make the network obtain stronger representational ability; the gating module can allow the final global feature more discriminative and boost the performance of the retrieval task. The proposed network achieves 93.8% and 95.0% classification accuracy, as well as 90.5% and 93.4% average retrieval precision on two standard 3D shape recognition datasets (ModelNet40 and ModelNet10k), respectively, which outperforms the existing works.