基于双向拟合掩码重建的多模态自监督点云表示学习

doi:10.13328/j.cnki.jos.007187

微信服务号

微信订阅号

2025年4月7日 22:38 星期一

首页 > 过刊浏览>年第卷第期 >1-20. DOI:10.13328/j.cnki.jos.007187

PDF HTML阅读 XML下载导出引用引用提醒

基于双向拟合掩码重建的多模态自监督点云表示学习
DOI:
                        10.13328/j.cnki.jos.007187
                    
CSTR:
                        
                    
作者:
                        程浩喆程浩喆
西安交通大学 软件学院, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
祝继华祝继华
西安交通大学 软件学院, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
史鹏程史鹏程
西安交通大学 软件学院, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
胡乃文胡乃文
西安交通大学 软件学院, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
谢奕凡谢奕凡
西安交通大学 软件学院, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
李仕奇李仕奇
西安交通大学 软件学院, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP18
基金项目:陕西省重点研发项目(2021GY-025, 2021GXLHZ-097)

Multi-modal Self-supervised Point Cloud Representation Learning Based on Bidirectional Fit Mask Reconstruction

Author:

CHENG Hao-Zhe
CHENG Hao-Zhe
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
ZHU Ji-Hua
ZHU Ji-Hua
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
SHI Peng-Cheng
SHI Peng-Cheng
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
HU Nai-Wen
HU Nai-Wen
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
XIE Yi-Fan
XIE Yi-Fan
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
LI Shi-Qi
LI Shi-Qi
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

点云自监督表示学习以无标签预训练的方式, 探索三维拓扑几何空间结构关系并捕获特征表示, 可应用至点云分类、分割以及物体探测等下游任务. 为提升预训练模型的泛化性和鲁棒性, 提出基于双向拟合掩码重建的多模态自监督点云表示学习方法, 主要由3部分构成: (1) 逆密度尺度指导下的“坏教师”模型通过基于逆密度噪声表示和全局特征表示的双向拟合策略, 加速掩码区域逼近真值. (2) 基于StyleGAN的辅助点云生成模型以局部几何信息为基础, 生成风格化点云并与掩码重建结果在阈值约束下融合, 旨在抵抗重建过程噪声对表示学习的不良影响. (3) 多模态教师模型以增强三维特征空间多样性及防止模态信息崩溃为目标, 依靠三重特征对比损失函数, 充分汲取点云-图像-文本样本空间中所蕴含的潜层信息. 所提出的方法在ModelNet、ScanObjectNN和ShapeNet这3种点云数据集上进行微调任务测试. 实验结果表明, 预训练模型在点云分类、线性支持向量机分类、小样本分类、零样本分类以及部件分割等点云识别任务上的效果达到领先水平.

关键词:三维点云;自监督表示学习;多模态特征;密度尺度;生成对抗网络

Abstract:

Point cloud self-supervised representation learning is conducted in an unlabeled pre-training manner, exploring the structural relationships of 3D topological geometric spaces and capturing feature representations. This approach can be applied to downstream tasks, such as point cloud classification, segmentation, and object detection. To enhance the generalization and robustness of the pretrained models, this study proposes a multi-modal self-supervised method for learning point cloud representations. The method is based on bidirectional fit mask reconstruction and comprises three main components: (1) The “bad teacher” model, guided by the inverse density scale, employs a bidirectional fit strategy that utilizes inverse density noise representation and global feature representation to expedite the convergence of the mask region towards the true value. (2) The StyleGAN-based auxiliary point cloud generation model, grounded in local geometric information, generates stylized point clouds and fuses them with mask reconstruction results while adhering to threshold constraints. The objective is to mitigate the adverse effects of noise on representation learning during the reconstruction process. (3) The multi-modal teacher model aims to enhance the diversity of the 3D feature space and prevent the collapse of modal information. It relies on the triple feature contrast loss function to fully extract the latent information contained in the point cloud-image-text sample space. The proposed method is evaluated on ModelNet, ScanObjectNN, and ShapeNet datasets for fine-tuning tasks. Experimental results demonstrate that the pretrained model achieves state-of-the-art performance in various point cloud recognition tasks, including point cloud classification, linear support vector machine classification, few-shot classification, zero-shot classification, and part segmentation.

Key words:3D point cloud;self-supervised representation learning;multi-modal feature;density scale;generative adversarial network (GAN)

引用本文

程浩喆,祝继华,史鹏程,胡乃文,谢奕凡,李仕奇.基于双向拟合掩码重建的多模态自监督点云表示学习.软件学报,,():1-20

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-11-02
最后修改日期:2024-03-15
录用日期:
在线发布日期: 2024-09-11
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码