Multi-modal Self-supervised Point Cloud Representation Learning Based on Bidirectional Fit Mask Reconstruction

doi:10.13328/j.cnki.jos.007187

微信服务号

微信订阅号

2025-4-15- 5

Home > Archive>Volume , Issue , >1-20. DOI:10.13328/j.cnki.jos.007187

PDF HTML XML Export Cite reminder

Multi-modal Self-supervised Point Cloud Representation Learning Based on Bidirectional Fit Mask Reconstruction
DOI:
                        10.13328/j.cnki.jos.007187
                    
Author:
                        CHENG Hao-ZheCHENG Hao-Zhe
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHU Ji-HuaZHU Ji-Hua
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHI Peng-ChengSHI Peng-Cheng
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HU Nai-WenHU Nai-Wen
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIE Yi-FanXIE Yi-Fan
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Shi-QiLI Shi-Qi
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Point cloud self-supervised representation learning is conducted in an unlabeled pre-training manner, exploring the structural relationships of 3D topological geometric spaces and capturing feature representations. This approach can be applied to downstream tasks, such as point cloud classification, segmentation, and object detection. To enhance the generalization and robustness of the pretrained models, this study proposes a multi-modal self-supervised method for learning point cloud representations. The method is based on bidirectional fit mask reconstruction and comprises three main components: (1) The “bad teacher” model, guided by the inverse density scale, employs a bidirectional fit strategy that utilizes inverse density noise representation and global feature representation to expedite the convergence of the mask region towards the true value. (2) The StyleGAN-based auxiliary point cloud generation model, grounded in local geometric information, generates stylized point clouds and fuses them with mask reconstruction results while adhering to threshold constraints. The objective is to mitigate the adverse effects of noise on representation learning during the reconstruction process. (3) The multi-modal teacher model aims to enhance the diversity of the 3D feature space and prevent the collapse of modal information. It relies on the triple feature contrast loss function to fully extract the latent information contained in the point cloud-image-text sample space. The proposed method is evaluated on ModelNet, ScanObjectNN, and ShapeNet datasets for fine-tuning tasks. Experimental results demonstrate that the pretrained model achieves state-of-the-art performance in various point cloud recognition tasks, including point cloud classification, linear support vector machine classification, few-shot classification, zero-shot classification, and part segmentation.

Key words:3D point cloud;self-supervised representation learning;multi-modal feature;density scale;generative adversarial network (GAN)

Get Citation

程浩喆,祝继华,史鹏程,胡乃文,谢奕凡,李仕奇.基于双向拟合掩码重建的多模态自监督点云表示学习.软件学报,,():1-20

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:November 02,2023
Revised:March 15,2024
Adopted:
Online: September 11,2024
Published:

You are the first2035080Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History