[关键词]
[摘要]
因严重遮挡和剧烈形变等挑战长期共存,精准鲁棒的视频分割已成为计算机视觉的热点之一.构建联合吸收马尔可夫链和骨架映射的视频分割方法,经由“预分割—后优化—再提升”逐步递进地生成精准目标轮廓.在预分割阶段,基于孪生网络和区域生成网络获取目标感兴趣区域,建立这些区域内超像素的吸收马尔可夫链,计算出超像素的前景/背景标签.吸收马尔可夫链可灵活有效地感知和传播目标特征,能从复杂场景初步预分割出目标物体.后优化阶段,设计短期时空线索模型和长期时空线索模型,以获取目标的短期变化规律和长期稳定特征,进而优化超像素标签,降低相似物体和噪声带来的误差.在再提升阶段,为减少优化结果的边缘毛刺和不连贯,基于超像素标签和位置,提出前景骨架和背景骨架的自动生成算法,并构建基于编解码的骨架映射网络,以学习出像素级目标轮廓,最终得到精准视频分割结果.标准数据集的大量实验表明:所提方法优于现有主流视频分割方法,能够产生具有更高区域相似度和轮廓精准度的分割结果.
[Key word]
[Abstract]
As challenges such as serious occlusions and deformations coexist, video segmentation with accurate robustness has become one of the hot topics in computer vision. This study proposes a video segmentation method with absorbing Markov chains and skeleton mapping, which progressively produces accurate object contours through the process of pre-segmentation—optimization—improvement. In the phase of pre-segmentation, based on the twin network and the region proposal network, the study obtains regions of interest for objects, constructs the absorbing Markov chains of superpixels in these regions, and calculates the labels of foreground/background of the superpixels. The absorbing Markov chains can perceive and propagate the object features flexibly and effectively and preliminarily pre-segment the target object from the complex scene. In the phase of optimization, the study designs the short-term and long-term spatial-temporal cue models to obtain the short-term variation and the long-term feature of the object, so as to optimize superpixel labels and reduce errors caused by similar objects and noise. In the phase of improvement, to reduce the artifacts and discontinuities of optimization results, this study proposes an automatic generation algorithm for foreground/background skeleton based on superpixel labels and positions and constructs a skeleton mapping network based on encoding and decoding, so as to learn the pixel-level object contour and finally obtain accurate video segmentation results. Many experiments on standard datasets show that the proposed method is superior to the existing mainstream video segmentation methods and can produce segmentation results with higher region similarity and contour accuracy.
[中图分类号]
TP391
[基金项目]
国家自然科学基金(62072015,61772209);广东省科技计划(2019A050510034);广州市智慧农业重点实验室(201902010081);广州市重点研发计划(202206010091)