面向手术器械语义分割的半监督时空Transformer网络

doi:10.13328/j.cnki.jos.006469

微信服务号

微信订阅号

首页 > 过刊浏览>2022年第33卷第4期 >1501-1515. DOI:10.13328/j.cnki.jos.006469

PDF HTML阅读 XML下载导出引用引用提醒

面向手术器械语义分割的半监督时空Transformer网络
DOI:
                        10.13328/j.cnki.jos.006469
                    
作者:
                        
                        
                    
作者单位:
作者简介:李耀仟(1997－),男,硕士生,主要研究领域为手术器械分割,人工智能;
司伟鑫(1990－),男,博士,副研究员,博士生导师,CCF高级会员,主要研究领域为医学影像分析,计算机辅助介入;
李才子(1993－),男,博士生,主要研究领域为医学影像分析,人工智能;
金玥明(1994－),女,博士,主要研究领域为机器人视频感知,人工智能;
刘瑞强(1997－),男,硕士生,CCF学生会员,主要研究领域为医学图像处理,人工智能;
刘瑞强(1997－),男,硕士生,CCF学生会员,主要研究领域为医学图像处理,人工智能
通讯作者:司伟鑫,E-mail:wx.si@siat.ac.cn
中图分类号:
基金项目:深圳市基础研究重点项目(JCYJ20200109110208764,JCYJ20200109110420626);国家自然科学基金(U1813204,61802385);广东省自然科学基金(2021A1515012604)

Semi-supervised Spatiotemporal Transformer Networks for Semantic Segmentation of Surgical Instrument

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

基于内窥镜的微创手术机器人在临床上的应用日益广泛,为医生提供内窥镜视频中精准的手术器械分割信息,对提高医生操作的准确度、改善患者预后有重要意义.现阶段,深度学习框架训练手术器械分割模型需要大量精准标注的术中视频数据,然而视频数据标注成本较高,在一定程度上限制了深度学习在该任务上的应用.目前的半监督方法通过预测与插帧,可以改善稀疏标注视频的时序信息与数据多样性,从而在有限标注数据下提高分割精度,但是这些方法在插帧质量与对连续帧时序特征方面存在一定缺陷.针对此问题,提出了一种带有时空Transformer的半监督分割框架,该方法可以通过高精度插帧与生成伪标签来提高稀疏标注视频数据集的时序一致性与数据多样性,在分割网络bottleneck位置使用Transformer模块,并利用其自我注意力机制,从时间与空间两个角度分析全局上下文信息,增强高级语义特征,改善分割网络对复杂环境的感知能力,克服手术视频中各类干扰从而提高分割效果.提出的半监督时空Transformer网络在仅使用30%带标签数据的情况下,在MICCAI 2017手术器械分割挑战赛数据集上取得了平均DICE为82.42%、平均IoU为72.01%的分割结果,分别超过现有方法7.68%与8.19%,并且优于全监督方法.

Abstract:

With the increasingly wide application of surgical robots in clinical practice, it is of great significance to provide doctors with precise semantic segmentation information of surgical instrument in endoscopic video to improve the clinicians’ operation accuracy and patients’ prognosis. Training surgical instrument segmentation models requires a large amount of accurately labeled video frames, which limits the application of deep learning in the surgical instrument segmentation task due to the high cost of video data labeling. The current semi-supervised methods enhance the temporal information and data diversity of sparsely labeled videos by predicting and interpolating frames, which can improve the segmentation accuracy with limited labeled data. However, these semi-supervised methods suffer from the drawbacks of frame interpolation quality and temporal feature extraction from sequential frames. To tackle this issue, this study proposes a semi-supervised segmentation framework with spatiotemporal Transformer, which can improve the temporal consistency and data diversity of sparsely labeled video datasets by interpolating frames with high accuracy and generating pseudo-labels. Here the Transformer module is integrated at the bottleneck position of the segmentation network to analyze global contextual information from both temporal and spatial perspectives, enhancing advanced semantic features while improving the perception to complex environments of the segmentation network, which can overcome various types of distractions in surgical videos and thus improve the segmentation effect. The proposed semi-supervised segmentation framework with Transformer achieves an average DICE of 82.42% and an average IOU of 72.01% on the MICCAI 2017 Surgical Instrument Segmentation Challenge dataset using only 30% labeled data, which exceeds the state-of-the-art method by 7.68% and 8.19%, respectively, and outperforms the fully supervised methods.

参考文献

相似文献

引证文献

引用本文

李耀仟,李才子,刘瑞强,司伟鑫,金玥明,王平安.面向手术器械语义分割的半监督时空Transformer网络.软件学报,2022,33(4):1501-1515

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-05-10
最后修改日期:2021-07-16
录用日期:
在线发布日期: 2021-10-26
出版日期: 2022-04-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码