Non Contact Multi-channel Natural Interactive Surgical Environment under Sterile Conditions

doi:10.13328/j.cnki.jos.005785

微信服务号

微信订阅号

2025-4-5- 12

Home > Archive>Volume 30, Issue 10, 2019 >2986-3004. DOI:10.13328/j.cnki.jos.005785

PDF HTML XML Export Cite reminder

Non Contact Multi-channel Natural Interactive Surgical Environment under Sterile Conditions
DOI:
                        10.13328/j.cnki.jos.005785
                    
Author:
                        TAO Jian-HuaTAO Jian-Hua
National Laboratory of Pattern Recognition(Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YANG Ming-HaoYANG Ming-Hao
National Laboratory of Pattern Recognition(Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Zhi-LiangWANG Zhi-Liang
School of Computer & Communication Engineering, University of Science & Technology Beijing, Beijing 100083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
BAN Xiao-JuanBAN Xiao-Juan
School of Computer & Communication Engineering, University of Science & Technology Beijing, Beijing 100083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIE LunXIE Lun
School of Computer & Communication Engineering, University of Science & Technology Beijing, Beijing 100083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Yun-HaiWANG Yun-Hai
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZENG QiongZENG Qiong
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG FeiWANG Fei
Information Department, Southwest Hospital, Army Medical University, Chongqing 400038, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Hong-QianWANG Hong-Qian
Information Department, Southwest Hospital, Army Medical University, Chongqing 400038, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU BinLIU Bin
National Laboratory of Pattern Recognition(Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HAN Zhi-ShuaiHAN Zhi-Shuai
School of Computer & Communication Engineering, University of Science & Technology Beijing, Beijing 100083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
PAN HangPAN Hang
School of Computer & Communication Engineering, University of Science & Technology Beijing, Beijing 100083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Wen-ZhengCHEN Wen-Zheng
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Key Research & Development Plan of China (2016YFB1001404); National Natural Science Foundation of China (61873269, 61831022, 61425017, 61332017)

Article

Figures

Metrics

Reference [78]

Related [20]

Cited by

Materials

Comments

Abstract:

Sterile and non-contact environment is the basic requirement of medical operating room, which makes the computer room and operating room need to be physically isolated. At the same time, if the attending doctor needs to look at the image of the lesion during the operation, he usually instructs the nurse or the assistant to operate the image of the lesion in the computer operating room, because of the isolation between the operating room and the computer room, and because the intention between the attending doctor and the assistant may not be understood accurately, it is easy to lead the nurse or surgical assistant in the operating room and computer room to and fro many times, which increases the risk of prolonged operation time, increased blood loss, and organ exposure time, minimizing the time to locate the lesion image in the operation is important for doctors and patients. To meet the above requirements, a non-contact multi-channel natural interactive surgical environment under aseptic conditions is constructed by means of human skeleton extraction, gesture tracking and understanding, far-field speech recognition in operating room environment, multi-modal information processing, and fusion technology. This environment allows the attending physician to quickly locate the lesion to be observed during surgery by combining voice commands, gestures, and the above interaction. In the experimental environment close to the real environment, the non-contact multi-channel natural interactive surgical environment established in this study can significantly reduce the localization time of the lesion image under the condition of ensuring the accuracy. Intelligent interactive operating room in aseptic environment provides technical and methodological validation for the next generation of efficient surgery.

Key words:operation room;multimodal information fusion;intention understanding

Reference

[1] Wang HQ, Wang P, Wang F, et al. Design and practice of intelligent digital operating room system. Journal of Medical Intelligence, 2018,39(6):30-33(in Chinese with English abstract).

[2] Zhu C, Wu LY. Selection and implementation of digital operating room. Modern Hospital Management, 2014,12(6):77-80(in Chinese with English abstract).

[3] Yang K, Cai YX, Fan PS, et al. Intelligent digital operating room design and implementation. Health Frontier, 2017,26(2) (in Chinese with English abstract).

[4] Hu YF. Intelligent operating room with gesture command. 2015(in Chinese). http://hnrb.voc.com.cn/hnrb_epaper/html/2015-11/06/content_1029876.htm?div=-1

[5] Chorowski J, et al. Attention-based models for speech recognition. Computer Science, 2015.

[6] Wang L, et al. Recent developments in human motion analysis. Pattern Recognition, 2003,36:585-601.

[7] Betke CM. Fagiani, JG. Evaluation of tracking methods for human-computer interaction. In:Proc. of the IEEE Workshop on Applications in Computer Vision. 2002. 121-126.

[8] Hu W, et al. A survey on visual surveillance of object motion and behaviors. IEEE Trans. on Systems, Man, and Cybernetics, 2004, (34):334-352.

[9] Oudeyer PY. The production and recognition of emotions in speech:Features and algorithms. Int'l Journal of Human-Computer Studies, 2003,(59):157-183.

[10] Ge LH, et al. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In:Proc. of the CVPR. 2017.

[11] Hasan HS, Kareem SA. Human computer interaction for vision based hand gesture recognition:A survey. Artificial Intelligence Review, 2015,(43):1-54.

[12] Ruffieux S, et al. A survey of datasets for human gesture recognition. In:Proc. of the Int'l Conf. on Human-computer Interaction. 2014. 337-348.

[13] Zimmermann C, Brox T. Learning to estimate 3D hand pose from single RGB images. In:Proc. of the Int'l Conf. on Computer Vision. 2017.

[14] Hatice G, Massimo P. Affect recognition from face and body:Early fusion vs. late fusion. In:Proc. of the IEEE Int'l Conf. on Systems, Man and Cybernetics. IEEE, 2006,(4):3437-3443.

[15] Yang MH, et al. A nature multimodal human-computer-interaction dialog system. In:Proc. of the CHCI in Harmonious Human Computer Environment, 2013(CHCI 2013). 2013(in Chinese with English abstract).

[16] Yang MH, et al. User behavior fusion in dialog management with multi-modal history cues. Multimedia Tools and Applications, 2015,(74):10025-10051.

[17] Ngiam JQ, et al. Multimodal deep learning. In:Proc. of the Int'l Conf. on Machine Learning. 2011. 689-696.

[18] Collobert R, Weston J. A unified architecture for natural language processing:Deep neural networks with multitask learning. In:Proc. of the 25th Int'l Conf. on Machine Learning, 2008. 160-170.

[19] Seltzer ML, Droppo J. Multi-task learning in deep neural networks for improved phoneme recognition. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2013.

[20] Huang FF, Cao JT, Ji XF. Two-human interaction recognition algorithm based on multi-channels information fusion. Computer Technology and Development, 2016,26(3):58-62(in Chinese with English abstract).

[21] Mori G, Malik J. Recovering 3d human body configurations using shape contexts. TPAMI, 2006,28(7):1052-1062.

[22] Ferrari V, Marin-Jimenez M, Zisserman A. Progressive search space reduction for human pose estimation. In:Proc. of the CVPR. 2008. 1-8.

[23] Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In:Proc. of the CVPR. 2014.

[24] Ye M, Wang XW, Yang RG, Ren L, Pollefeys M. Accurate 3D pose estimation from a single depth image. In:Proc. of the ICCV. 2011.

[25] Choi CH, Christensen HI. 3D pose estimation of daily objects using an RGB-D camera. In:Proc. of the IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems. 2012.

[26] Fan X, Zheng K, Lin Y, Wang S. Combining local appearance and holistic view:Dual-source deep neural networks for human pose estimation. In:Proc. of the CVPR. 2015.

[27] Gkioxari G, Hariharan B, Girshick R, Malik J. RCNNS for pose estimation and action detection. arXiv Preprint arXiv:1406.5212, 2014.

[28] Li S, Chan AB. 3D human pose estimation from monocular images with deep convolutional neural network. In:Proc. of the ACCV. Springer-Verlag, 2014. 332-347.

[29] Belagiannis V, Wang X, Shitrit H, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feuner H, Navab N. Parsing human skeletons in an operating room. In:Proc. of the Machine Vision and Applications. 2016.

[30] KadkhodamohammadiA, Gangi A, de Mathelin M, Padoy N. Articulated clinician detection using 3D pictorial structures on RGB-D data. Medical Image Analysis, 2017,35:215-224.

[31] Felzenszwalb PF, Huttenlocher DP. Pictorial structures for object recognition. Int'l Journal of Computer Vision, 2005,61(1):55-79.

[32] Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N. A multi-view RGB-D approach for human pose estimation in operating rooms. In:Proc. of the WACV. 2017. 363-372.

[33] Ruan X, Tian C. Dynamic gesture recognition based on improved DTW algorithm. In:Proc. of the 2015 IEEE Int'l Conf. on Mechatronics and Automation (ICMA). Beijing, 2015. 2134-2138.

[34] He C, Hu ZF, Wang Y. Novel dynamic gesture recognition method based on improved DTW algorithm. Digital Communication, 2013.

[35] Pan H, Li J. Online human action recognition based on improved dynamic time warping. In:Proc. of the 2016 IEEE Int'l Conf. on Big Data Analysis (ICBDA). Hangzhou, 2016. 1-5.

[36] Hiyadi H, Ababsa F, Montagne C, Bouyakhf EH, Regragui F. Adaptive dynamic time warping for recognition of natural gestures. In:Proc. of the 6th Int'l Conf. on Image Processing Theory, Tools and Applications (IPTA). Oulu, 2016. 1-6.

[37] Zhang Z, Liu Y, Li A, et al. A novel method for user-defined human posture recognition using Kinect. In:Proc. of the Int'l Congress on Image and Signal Processing. IEEE, 2014. 736-740.

[38] Chen Y, Luo B, Chen YL, Liang G, Wu X. A real-time dynamic hand gesture recognition system using Kinect sensor. In:Proc. of the 2015 IEEE Int'l Conf. on Robotics and Biomimetics (ROBIO). Zhuhai, 2015. 2026-2030.

[39] Zhang Y, Wu S, Luo Y. Applications and recognition of gesture trajectory using HMM. Bandaoti Guangdian/Semiconductor Optoelectronics, 2015,36(4):650-656.

[40] Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE, 1989,77(2):257-286.

[41] Song Y, Gu Y, Wang P, et al. A Kinect based gesture recognition algorithm using GMM and HMM. In:Proc. of the Int'l Conf. on Biomedical Engineering and Informatics. IEEE, 2014. 750-754.

[42] Wu Q, Zhan Y, Shao Y, et al. Human motion matching and evaluation based on STDTW and K-means. Application of Electronic Technique, 2016.

[43] Li Y, Yang Y, Chen Y, Zhu M. A pre-training strategy for convolutional neural network applied to Chinese digital gesture recognition. In:Proc. of the 8th IEEE Int'l Conf. on Communication Software and Networks (ICCSN). Beijing, 2016. 620-624.

[44] Chavan P, Ghorpade T, Padiya P. Indian sign language to forecast text using leap motion sensor and RF classifier. In:Proc. of the 2016 Symp. on Colossal Data Analysis and Networking (CDAN). Indore, 2016. 1-5.

[45] Anguera X, Wooters C, Hernando J. Acoustic beamforming for speaker diarization of meetings. IEEE Trans. on Audio Speech & Language Processing, 2007,15(7):2011-2022.

[46] Higuchi T, Ito N, Yoshioka T, et al. Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2016. 5210-5214.

[47] Warsitz E, Haeb-Umbach R. Blind acoustic beamforming based on generalized eigenvalue decomposition. IEEE Trans. on Audio Speech & Language Processing, 2007,15(5):1529-1539.

[48] Roux JL, Vincent E. Consistent wiener filtering for audio source separation. IEEE Signal Processing Letters, 2013,20(3):217-220.

[49] Nugraha AA, Liutkus A, Vincent E. Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. on Audio Speech & Language Processing, 2015,24(9):1652-1664.

[50] Heymann J, Drude L, Haebumbach R. Neural network based spectral mask estimation for acoustic beamforming. IEEE Trans. on Industrial Electronics, 2016,46(3):544-553.

[51] Lebart K, Boucher JM, Denbigh PN. A new method based on spectral subtraction for speech dereverberation. Acta Acustica United with Acustica, 2001,87(3):359-366.

[52] Yoshioka T, Nakatani T, Miyoshi M, et al. Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. on Audio Speech & Language Processing, 2010,19(1):69-84.

[53] Han K, Wang Y, Wang DL, et al. Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. on Audio Speech & Language Processing, 2015,23(6):982-992.

[54] Gao J, Du J, Kong C, et al. An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition. In:Proc. of the IJCNN. IEEE, 2016. 588-594.

[55] Narayanan A, Wang DL. Joint noise adaptive training for robust automatic speech recognition. In:Proc. of the ICASSP. 2014. 2504-2508.

[56] Gao T, Du J, Dai LR, et al. Joint training of front-end and back-end deep neural networks for robust speech recognition. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2015. 4375-4379.

[57] Wang Q, Du J, Bao X, et al. A universal VAD based on jointly trained deep neural networks. In:Proc. of the Interspeech. 2015.

[58] Liu B, Tao J, Zhang D, Zheng Y. A novel pitch extraction based on jointly trained deep BLSTM recurrent neural networks with bottleneck features. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech, and Signal Processing. 2017.

[59] Li XW, et al. A priori knowledge accumulation and its application to linear BRDF model inversion. Journal of Geophysical Research-Atmospheres, 2001,106:11925-11935.

[60] Lake BM, et al. Human-level concept learning through probabilistic program induction. Science, 2015,350.

[61] Lehrmann AM, Gehler PV, Nowozin S. A nonparametric bayesian network prior of human pose. In:Proc. of the ICCV 2013. Washington:IEEE Computer Society, 2013. 1281-1288.

[62] Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J. SCAPE:Shape completion and animation of people. ACM ToG, 2005,24(3):408-416.

[63] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012. 1097-1105.

[64] Chao LL, et al. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In:Proc. of the ACM Multimedia. 2015. 65-72.

[65] He L, et al. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In:Proc. of the ACM Int'l Workshop on Audio/Visual Emotion Challenge. 2015. 73-80.

[66] Yu D, et al. Large-margin minimum classification error training for large-scale speech recognition tasks. In:Proc. of the IEEE Int'l Conf. on Acoustics. 2016.

[67] He KM, et al. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification. In:Proc. of the IEEE Computer Vision and Pattern Recognition. 2015.

[68] Yang F, et al. Exploit all the layers:Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2016. 2129-2137.

[69] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recoginition. In:Proc. of the ICLR. 2015.

[70] Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M:Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 2014,36(7):1325-1339.

[71] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation. In:Proc. of the 32nd Int'l Conf. on Machine Learning (ICML 2015). 2015.

附中文参考文献:

[1] 王红迁,汪鹏,王飞,等.智能数字化手术室系统设计与实践.医学信息学杂志,2018,39(6):30-33.

[2] 朱晨,吴玲燕.数字化手术室的选型与实施.现代医院管理,2014,12(6):77-80.

[3] 杨琨,蔡亚欣,樊沛澍,等.智能数字化手术室整体设计与实施.健康前沿,2017,26(2).

[4] 胡宇芬.智能手术室,手势来指挥.2015.http://hnrb.voc.com.cn/hnrb_epaper/html/2015-11/06/content_1029876.htm?div=-1

[15] 杨明浩,等.面向自然交互的多通道人机对话系统.见:第9届全国和谐人机环境联合学术会议(CHCI 2013).2013.

[20] 黄菲菲,曹江涛,姬晓飞.基于多通道信息融合的双人交互动作识别算法.计算机技术与发展,2016,26(3):58-62.

Get Citation

陶建华,杨明浩,王志良,班晓娟,解仑,汪云海,曾琼,王飞,王红迁,刘斌,韩志帅,潘航,陈文拯.无菌条件非接触式多通道自然交互手术环境.软件学报,2019,30(10):2986-3004

Copy

Article Metrics

Abstract:3527
PDF: 8334
HTML: 3273
Cited by: 0

History

Received:August 18,2018
Revised:November 01,2018
Adopted:
Online: May 16,2019
Published:

You are the first2033161Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History