基于双向拟合掩码重建的多模态自监督点云表示学习
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

陕西省重点研发项目(2021GY-025, 2021GXLHZ-097)


Multi-modal Self-supervised Point Cloud Representation Learning Based on Bidirectional Fit Mask Reconstruction
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    点云自监督表示学习以无标签预训练的方式, 探索三维拓扑几何空间结构关系并捕获特征表示, 可应用至点云分类、分割以及物体探测等下游任务. 为提升预训练模型的泛化性和鲁棒性, 提出基于双向拟合掩码重建的多模态自监督点云表示学习方法, 主要由3部分构成: (1) 逆密度尺度指导下的“坏教师”模型通过基于逆密度噪声表示和全局特征表示的双向拟合策略, 加速掩码区域逼近真值. (2) 基于StyleGAN的辅助点云生成模型以局部几何信息为基础, 生成风格化点云并与掩码重建结果在阈值约束下融合, 旨在抵抗重建过程噪声对表示学习的不良影响. (3) 多模态教师模型以增强三维特征空间多样性及防止模态信息崩溃为目标, 依靠三重特征对比损失函数, 充分汲取点云-图像-文本样本空间中所蕴含的潜层信息. 所提出的方法在ModelNet、ScanObjectNN和ShapeNet这3种点云数据集上进行微调任务测试. 实验结果表明, 预训练模型在点云分类、线性支持向量机分类、小样本分类、零样本分类以及部件分割等点云识别任务上的效果达到领先水平.

    Abstract:

    Point cloud self-supervised representation learning is conducted in an unlabeled pre-training manner, exploring the structural relationships of 3D topological geometric spaces and capturing feature representations. This approach can be applied to downstream tasks, such as point cloud classification, segmentation, and object detection. To enhance the generalization and robustness of the pretrained models, this study proposes a multi-modal self-supervised method for learning point cloud representations. The method is based on bidirectional fit mask reconstruction and comprises three main components: (1) The “bad teacher” model, guided by the inverse density scale, employs a bidirectional fit strategy that utilizes inverse density noise representation and global feature representation to expedite the convergence of the mask region towards the true value. (2) The StyleGAN-based auxiliary point cloud generation model, grounded in local geometric information, generates stylized point clouds and fuses them with mask reconstruction results while adhering to threshold constraints. The objective is to mitigate the adverse effects of noise on representation learning during the reconstruction process. (3) The multi-modal teacher model aims to enhance the diversity of the 3D feature space and prevent the collapse of modal information. It relies on the triple feature contrast loss function to fully extract the latent information contained in the point cloud-image-text sample space. The proposed method is evaluated on ModelNet, ScanObjectNN, and ShapeNet datasets for fine-tuning tasks. Experimental results demonstrate that the pretrained model achieves state-of-the-art performance in various point cloud recognition tasks, including point cloud classification, linear support vector machine classification, few-shot classification, zero-shot classification, and part segmentation.

    参考文献
    相似文献
    引证文献
引用本文

程浩喆,祝继华,史鹏程,胡乃文,谢奕凡,李仕奇.基于双向拟合掩码重建的多模态自监督点云表示学习.软件学报,,():1-20

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-11-02
  • 最后修改日期:2024-03-15
  • 录用日期:
  • 在线发布日期: 2024-09-11
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号