基于多模态关系建模的三维形状识别方法
作者:
作者单位:

作者简介:

通讯作者:

朱映映,E-mail:zhuyy@szu.edu.cn

中图分类号:

TP391

基金项目:

国家自然科学基金(62072318); 广东省自然科学基金(2021A1515012014); 深圳市科技研发资金重点项目B类(20220810142553001)


3D Shape Recognition Based on Multimodal Relation Modeling
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了充分利用点云和多视图两种模态数据之间的局部空间关系以进一步提高三维形状识别精度, 提出一个基于多模态关系的三维形状识别网络, 首先设计多模态关系模块(multimodal relation module, MRM), 该模块可以提取任意一个点云的局部特征和一个多视图的局部特征之间的关系信息, 以得到对应的关系特征. 然后, 采用由最大池化和广义平均池化组成的级联池化对关系特征张量进行处理, 得到全局关系特征. 多模态关系模块分为两种类型, 分别输出点-视图关系特征和视图-点关系特征. 提出的门控模块采用自注意力机制来发现特征内部的关联信息, 从而将聚合得到的全局特征进行加权来实现对冗余信息的抑制. 详尽的实验表明多模态关系模块可以使网络获得更优的表征能力; 门控模块可以让最终的全局特征更具有判别力, 提升检索任务的性能. 所提网络在三维形状识别标准数据集 ModelNet40 和 ModelNet10上分别取得了93.8%和95.0%的分类准确率以及90.5%和93.4%的平均检索精度, 在同类工作中处于先进的水平.

    Abstract:

    To make full use of the local spatial relation between point cloud and multi-view data to further improve the accuracy of 3D shape recognition, a 3D shape recognition network based on multimodal relation is proposed. Firstly, a multimodal relation module (MRM) is designed, which can extract the relation information between the local features of any point cloud and that of any multi-view to obtain the corresponding relation features. Then, a cascade pooling consisting of maximum pooling and generalized mean pooling is applied to process the relation tensor and obtain the global relation feature. There are two types of multimodal relation modules, which output the point-view relation feature and the view-point relation feature, respectively. The proposed gating module adopts a self-attention mechanism to find the relation information within the features so that the aggregated global features can be weighted to achieve the suppression of redundant information. Extensive experiments show that the MRM can make the network obtain stronger representational ability; the gating module can allow the final global feature more discriminative and boost the performance of the retrieval task. The proposed network achieves 93.8% and 95.0% classification accuracy, as well as 90.5% and 93.4% average retrieval precision on two standard 3D shape recognition datasets (ModelNet40 and ModelNet10k), respectively, which outperforms the existing works.

    参考文献
    相似文献
    引证文献
引用本文

陈浩楠,朱映映,赵骏骐,田奇.基于多模态关系建模的三维形状识别方法.软件学报,2024,35(5):1-12

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-04-10
  • 最后修改日期:2023-06-08
  • 录用日期:
  • 在线发布日期: 2023-09-11
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号