Semi-supervised Spatiotemporal Transformer Networks for Semantic Segmentation of Surgical Instrument

doi:10.13328/j.cnki.jos.006469

微信服务号

微信订阅号

Home > Archive>Volume 33, Issue 4, 2022 >1501-1515. DOI:10.13328/j.cnki.jos.006469

PDF HTML XML Export Cite reminder

Semi-supervised Spatiotemporal Transformer Networks for Semantic Segmentation of Surgical Instrument
DOI:
                        10.13328/j.cnki.jos.006469
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the increasingly wide application of surgical robots in clinical practice, it is of great significance to provide doctors with precise semantic segmentation information of surgical instrument in endoscopic video to improve the clinicians’ operation accuracy and patients’ prognosis. Training surgical instrument segmentation models requires a large amount of accurately labeled video frames, which limits the application of deep learning in the surgical instrument segmentation task due to the high cost of video data labeling. The current semi-supervised methods enhance the temporal information and data diversity of sparsely labeled videos by predicting and interpolating frames, which can improve the segmentation accuracy with limited labeled data. However, these semi-supervised methods suffer from the drawbacks of frame interpolation quality and temporal feature extraction from sequential frames. To tackle this issue, this study proposes a semi-supervised segmentation framework with spatiotemporal Transformer, which can improve the temporal consistency and data diversity of sparsely labeled video datasets by interpolating frames with high accuracy and generating pseudo-labels. Here the Transformer module is integrated at the bottleneck position of the segmentation network to analyze global contextual information from both temporal and spatial perspectives, enhancing advanced semantic features while improving the perception to complex environments of the segmentation network, which can overcome various types of distractions in surgical videos and thus improve the segmentation effect. The proposed semi-supervised segmentation framework with Transformer achieves an average DICE of 82.42% and an average IOU of 72.01% on the MICCAI 2017 Surgical Instrument Segmentation Challenge dataset using only 30% labeled data, which exceeds the state-of-the-art method by 7.68% and 8.19%, respectively, and outperforms the fully supervised methods.

Reference

Cited by

Get Citation

李耀仟,李才子,刘瑞强,司伟鑫,金玥明,王平安.面向手术器械语义分割的半监督时空Transformer网络.软件学报,2022,33(4):1501-1515

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 10,2021
Revised:July 16,2021
Adopted:
Online: October 26,2021
Published: April 06,2022

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History