基于RGB-D图像的语义场景补全研究综述
作者:
作者简介:

张康(1993-), 男, 博士生, 主要研究领域为深度学习, 图像处理, 场景补全;安泊舟(1994-), 男, 博士生, 主要研究领域为深度学习, 语义场景补全;李捷(1988-), 男, 博士, 主要研究领域为计算机视觉, 机器人视觉中的场景理解, 语义分割, 关键点检测, 点云;袁夏(1981-), 男, 博士, 副教授, 主要研究领域为智能机器人环境理解与自主导航, 视觉显著性分析, 场景语义分割;赵春霞(1964-), 女, 博士, 教授, 博士生导师, 主要研究领域为模式识别与计算机视觉, 人工智能, 移动机器人

通讯作者:

袁夏,E-mail:yuanxia@njust.edu.cn

基金项目:

国家自然科学基金(61773210)


Survey on Semantic Scene Completion Based on RGB-D Images
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [41]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    近年来随着计算机视觉领域的不断发展, 三维场景的语义分割和形状补全受到学术界和工业界的广泛关注. 其中, 语义场景补全是这一领域的新兴研究, 该研究以同时预测三维场景的空间布局和语义标签为目标, 在近几年得到快速发展. 对近些年该领域提出的基于RGB-D图像的方法进行了分类和总结. 根据有无使用深度学习将语义场景补全方法划分为传统方法和基于深度学习的方法两大类. 其中, 对于基于深度学习的方法, 根据输入数据类型将其划分为基于单一深度图像的方法和基于彩色图像联合深度图像的方法. 在对已有方法分类和概述的基础上, 对语义场景补全任务所使用的相关数据集进行了整理, 并分析了现有方法的实验结果. 最后, 总结了该领域面临的挑战和发展前景.

    Abstract:

    In recent years, with the continuous development of computer vision, semantic segmentation and shape completion of 3D scene have been paid more and more attention by academia and industry. Among them, semantic scene completion is emerging research in this field, which aims to simultaneously predict the spatial layout and semantic labels of a 3D scene, and has developed rapidly in recent years. This study classifies and summarizes the methods based on RGB-D images proposed in this field in recent years. These methods are divided into two categories based on whether deep learning is used or not, which include traditional methods and deep learning-based methods. Among them, the methods based on deep learning are divided into two categories according to the input data type, which are the methods based on single depth image and the methods based on RGB-D images. Based on the classification and overview of the existing methods, the relevant datasets used for semantic scene completion task are collated and the experimental results are analyzed. Finally, the challenges and development prospects of this field are summarized.

    参考文献
    [1] Song SR, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T. Semantic scene completion from a single depth image. In:Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Honolulu:IEEE, 2017. 190-198.
    [2] Gupta S, Arbeláez P, Malik J. Perceptual organization and recognition of indoor scenes from RGB-D images. In:Proc. of the 2013 IEEE Conf. on Computer Vision and Pattern Recognition. Portland:IEEE, 2013. 564-571.
    [3] Ren XF, Bo LF, Fox D. RGB-(D) scene labeling:Features and algorithms. In:Proc. of the 2012 IEEE Conf. on Computer Vision and Pattern Recognition. Providence:IEEE, 2012. 2759-2766.
    [4] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In:Proc. of the 12th European Conf. on Computer Vision. Florence:Springer, 2012. 746-760.
    [5] Lai K, Bo LF, Fox D. Unsupervised feature learning for 3D scene labeling. In:Proc. of the 2014 IEEE Int'l Conf. on Robotics and Automation (ICRA). Hong Kong:IEEE, 2014. 3050-3057.
    [6] Varley J, DeChant C, Richardson A, Ruales J, Allen P. Shape completion enabled robotic grasping. In:Proc. of the IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems (IROS). Vancouver:IEEE, 2017. 2442-2447.
    [7] Rock J, Gupta T, Thorsen J, Gwak J, Shin D, Hoiem D. Completing 3D object shape from one depth image. In:Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Boston:IEEE, 2015. 2484-2493.
    [8] Wu ZR, Song SR, Khosla A, Yu F, Zhang LG, Tang XO, Xiao JX. 3D ShapeNets:A deep representation for volumetric shapes. In:Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Boston:IEEE, 2015. 1912-1920.
    [9] Nguyen DT, Hua BS, Tran MK, Pham QH, Yeung SK. A field model for repairing 3D shapes. In:Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Las Vega:IEEE, 2016. 5676-5684.
    [10] Monszpart A, Mellado N, Brostow GJ, Mitra NJ. RAPter:Rebuilding man-made scenes with regular arrangements of planes. ACM Trans. on Graphics, 2015, 34(4):103.[doi:10.1145/2766995
    [11] Kim YM, Mitra NJ, Yan DM, Guibas L. Acquiring 3D indoor environments with variability and repetition. ACM Trans. on Graphics, 2012, 31(6):138.[doi:10.1145/2366145.2366157
    [12] Mattausch O, Panozzo D, Mura C, Sorkine-Hornung O, Pajarola R. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum, 2014, 33(2):11-21.[doi:10.1111/cgf.12286
    [13] Firman M, Aodha O M, Julier S, Brostow GJ. Structured prediction of unobserved voxels from a single depth image. In:Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Las Vegas:IEEE, 2016. 5431-5440.
    [14] Zheng B, Zhao YB, Yu JC, Ikeuchi K, Zhu SC. Beyond point clouds:Scene understanding by reasoning geometry and physics. In:Proc. of the 2013 IEEE Conf. on Computer Vision and Pattern Recognition. Portland:IEEE, 2013. 3127-3134.
    [15] Lin DH, Fidler S, Urtasun R. Holistic scene understanding for 3D object detection with RGBD cameras. In:Proc. of the 2013 IEEE Int'l Conf. on Computer Vision. Sydney:IEEE, 2013. 1417-1424.
    [16] Geiger A, Wang CH. Joint 3D object and layout inference from a single RGB-D image. In:Proc. of the 37th German Conf. on Pattern Recognition. Aachen:Springer, 2015. 183-195.
    [17] Guo YX, Tong X. View-volume network for semantic scene completion from a single depth image. In:Proc. of the 27th Int'l Joint Conf. on Artificial Intelligence. Stockholm:IJCAI, 2018. 726-732.
    [18] Zhang L, Wang L, Zhang XD, Shen PY, Bennamoun M, Zhu GM, Shah SAA, Song J. Semantic scene completion with dense CRF from a single depth image. Neurocomputing, 2018, 318:182-195.[doi:10.1016/j.neucom.2018.08.052
    [19] Wang YD, Tan DJ, Navab N, Tombari F. Adversarial semantic scene completion from a single depth image. In:Proc. of the 2018 Int'l Conf. on 3D Vision (3DV). Verona:IEEE, 2018. 426-434.
    [20] Zhang JH, Zhao H, Yao AB, Chen YR, Zhang L, Liao HE. Efficient semantic scene completion network with spatial group convolution. In:Proc. of the 15th European Conf. on Computer Vision. Munich:Springer, 2018. 749-765.
    [21] Liu SC, Hu Y, Zeng YM, Tang QK, Jin BB, Han YH, Li XW. See and think:Disentangling semantic scene completion. In:Proc. of the 2018 Int'l Conf. on Neural Information Processing Systems. Montréal:NeurIPS, 2018. 261-272.
    [22] Garbade M, Chen YT, Sawatzky J, Gall J. Two stream 3D semantic scene completion. In:Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW). Long Beach:IEEE, 2019. 416-425.
    [23] Li J, Liu Y, Gong D, Shi QF, Yuan X, Zhao CX, Reid I. RGBD based dimensional decomposition residual network for 3D semantic scene completion. In:Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach:IEEE, 2019. 7685-7694.
    [24] Chen YT, Garbade M, Gall J. 3D semantic scene completion from a single depth image using adversarial training. In:Proc. of the 2019 IEEE Int'l Conf. on Image Processing (ICIP). Taipei:IEEE, 2019. 1835-1839.
    [25] Zhang PP, Liu W, Lei YJ, Lu HC, Yang XY. Cascaded context pyramid for full-resolution 3D semantic scene completion. In:Proc. of the 2019 IEEE/CVF Int'l Conf. on Computer Vision (ICCV). Seoul:IEEE, 2019. 7800-7809.
    [26] Wang YD, Tan DJ, Navab N, Tombari F. ForkNet:Multi-branch volumetric semantic completion from a single depth image. In:Proc. of the 2019 IEEE Int'l Conf. on Computer Vision (ICCV). Seoul:IEEE, 2019. 8607-8616.
    [27] Li J, Liu Y, Yuan X, Zhao CX, Siegwart R, Reid I, Cadena C. Depth based semantic scene completion with position importance aware loss. IEEE Robotics and Automation Letters, 2020, 5(1):219-226.[doi:10.1109/LRA.2019.2953639
    [28] Dourado A, Kim H, De Campos T, Hilton A. Semantic scene completion from a single 360-degree image and depth map. In:Proc. of the 2020 Int'l Conf. on Computer Vision Theory and Applications. Valetta:VISAPP, 2020. 36-46.
    [29] Li SQ, Zou CQ, Li YP, Zhao XB, Gao Y. Attention-based multi-modal fusion network for semantic scene completion. In:Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York:AAAI, 2020. 11402-11409.
    [30] Chen XK, Lin KY, Qian C, Zeng G, Li HS. 3D sketch-aware semantic scene completion via semi-supervised structure prior. In:Proc. of the 2020 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Seattle:IEEE, 2020. 4192-4201.
    [31] Li J, Han K, Wang P, Liu Y, Yuan X. Anisotropic convolutional networks for 3D semantic scene completion. In:Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Seattle:IEEE, 2020. 3348-3356.
    [32] Chen XK, Xing YJ, Zeng G. Real-time semantic scene completion via feature aggregation and conditioned prediction. In:Proc. of the 2020 IEEE Int'l Conf. on Image Processing (ICIP). Abu Dhabi:IEEE, 2020. 2830-2834.
    [33] Dourado A, De Campos TE, Kim H, Hilton A. EdgeNet:Semantic scene completion from a single RGB-D image. In:Proc. of the 25th Int'l Conf. on Pattern Recognition (ICPR). Milan:IEEE, 2021. 503-510.
    [34] Cai YJ, Chen XS, Zhang C, Lin KY, Wang XG, Li HS. Semantic scene completion via integrating instances and scene in-the-loop. In:Proc. of the 2021 IEEE Conf. on Computer Vision and Pattern Recognition. Nashville:IEEE, 2021. 324-333.
    [35] Jellinek J. Energy landscapes:With applications to clusters, biomolecules and glasses. Physics Today, 2005, 58(6):63-64.[doi:10.1063/1.1996481
    [36] Barbu A, Zhu SC. Generalizing swendsen-wang to sampling arbitrary posterior probabilities. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2005, 27(8):1239-1253
    [37] Kim J, Grauman K. Shape sharing for object segmentation. In:Proc. of the 12th European Conf. on Computer Vision. Florence:Springer, 2012. 444-458.
    [38] Carreira J, Sminchisescu C. CPMC:Automatic object segmentation using constrained parametric Min-Cuts. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2012, 34(7):1312-1328.[doi:10.1109/TPAMI.2011.231
    [39] Jia ZY, Gallagher A, Saxena A, Chen T. 3D-based reasoning with blocks, support, and stability. In:Proc. of the 2013 IEEE Conf. on Computer Vision and Pattern Recognition. Portland:IEEE, 2013. 1-8.
    [40] Jiang H, Xiao JX. A linear approach to matching cuboids in RGBD images. In:Proc. of the 2013 IEEE Conf. on Computer Vision and Pattern Recognition. Portland:IEEE, 2013. 2171-2178.
    [41] Li J, Wang P, Han K, Liu Y. Anisotropic convolutional neural networks for RGB-D based semantic scene completion. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张康,安泊舟,李捷,袁夏,赵春霞.基于RGB-D图像的语义场景补全研究综述.软件学报,2023,34(1):444-462

复制
分享
文章指标
  • 点击次数:1761
  • 下载次数: 5745
  • HTML阅读次数: 3727
  • 引用次数: 0
历史
  • 收稿日期:2020-09-16
  • 最后修改日期:2021-02-21
  • 在线发布日期: 2021-10-20
  • 出版日期: 2023-01-06
文章二维码
您是第19764462位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号