Non Contact Multi-channel Natural Interactive Surgical Environment under Sterile Conditions
Author:
Affiliation:

Fund Project:

National Key Research & Development Plan of China (2016YFB1001404); National Natural Science Foundation of China (61873269, 61831022, 61425017, 61332017)

  • Article
  • | |
  • Metrics
  • |
  • Reference [78]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Sterile and non-contact environment is the basic requirement of medical operating room, which makes the computer room and operating room need to be physically isolated. At the same time, if the attending doctor needs to look at the image of the lesion during the operation, he usually instructs the nurse or the assistant to operate the image of the lesion in the computer operating room, because of the isolation between the operating room and the computer room, and because the intention between the attending doctor and the assistant may not be understood accurately, it is easy to lead the nurse or surgical assistant in the operating room and computer room to and fro many times, which increases the risk of prolonged operation time, increased blood loss, and organ exposure time, minimizing the time to locate the lesion image in the operation is important for doctors and patients. To meet the above requirements, a non-contact multi-channel natural interactive surgical environment under aseptic conditions is constructed by means of human skeleton extraction, gesture tracking and understanding, far-field speech recognition in operating room environment, multi-modal information processing, and fusion technology. This environment allows the attending physician to quickly locate the lesion to be observed during surgery by combining voice commands, gestures, and the above interaction. In the experimental environment close to the real environment, the non-contact multi-channel natural interactive surgical environment established in this study can significantly reduce the localization time of the lesion image under the condition of ensuring the accuracy. Intelligent interactive operating room in aseptic environment provides technical and methodological validation for the next generation of efficient surgery.

    Reference
    [1] Wang HQ, Wang P, Wang F, et al. Design and practice of intelligent digital operating room system. Journal of Medical Intelligence, 2018,39(6):30-33(in Chinese with English abstract).
    [2] Zhu C, Wu LY. Selection and implementation of digital operating room. Modern Hospital Management, 2014,12(6):77-80(in Chinese with English abstract).
    [3] Yang K, Cai YX, Fan PS, et al. Intelligent digital operating room design and implementation. Health Frontier, 2017,26(2) (in Chinese with English abstract).
    [4] Hu YF. Intelligent operating room with gesture command. 2015(in Chinese). http://hnrb.voc.com.cn/hnrb_epaper/html/2015-11/06/content_1029876.htm?div=-1
    [5] Chorowski J, et al. Attention-based models for speech recognition. Computer Science, 2015.
    [6] Wang L, et al. Recent developments in human motion analysis. Pattern Recognition, 2003,36:585-601.
    [7] Betke CM. Fagiani, JG. Evaluation of tracking methods for human-computer interaction. In:Proc. of the IEEE Workshop on Applications in Computer Vision. 2002. 121-126.
    [8] Hu W, et al. A survey on visual surveillance of object motion and behaviors. IEEE Trans. on Systems, Man, and Cybernetics, 2004, (34):334-352.
    [9] Oudeyer PY. The production and recognition of emotions in speech:Features and algorithms. Int'l Journal of Human-Computer Studies, 2003,(59):157-183.
    [10] Ge LH, et al. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In:Proc. of the CVPR. 2017.
    [11] Hasan HS, Kareem SA. Human computer interaction for vision based hand gesture recognition:A survey. Artificial Intelligence Review, 2015,(43):1-54.
    [12] Ruffieux S, et al. A survey of datasets for human gesture recognition. In:Proc. of the Int'l Conf. on Human-computer Interaction. 2014. 337-348.
    [13] Zimmermann C, Brox T. Learning to estimate 3D hand pose from single RGB images. In:Proc. of the Int'l Conf. on Computer Vision. 2017.
    [14] Hatice G, Massimo P. Affect recognition from face and body:Early fusion vs. late fusion. In:Proc. of the IEEE Int'l Conf. on Systems, Man and Cybernetics. IEEE, 2006,(4):3437-3443.
    [15] Yang MH, et al. A nature multimodal human-computer-interaction dialog system. In:Proc. of the CHCI in Harmonious Human Computer Environment, 2013(CHCI 2013). 2013(in Chinese with English abstract).
    [16] Yang MH, et al. User behavior fusion in dialog management with multi-modal history cues. Multimedia Tools and Applications, 2015,(74):10025-10051.
    [17] Ngiam JQ, et al. Multimodal deep learning. In:Proc. of the Int'l Conf. on Machine Learning. 2011. 689-696.
    [18] Collobert R, Weston J. A unified architecture for natural language processing:Deep neural networks with multitask learning. In:Proc. of the 25th Int'l Conf. on Machine Learning, 2008. 160-170.
    [19] Seltzer ML, Droppo J. Multi-task learning in deep neural networks for improved phoneme recognition. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2013.
    [20] Huang FF, Cao JT, Ji XF. Two-human interaction recognition algorithm based on multi-channels information fusion. Computer Technology and Development, 2016,26(3):58-62(in Chinese with English abstract).
    [21] Mori G, Malik J. Recovering 3d human body configurations using shape contexts. TPAMI, 2006,28(7):1052-1062.
    [22] Ferrari V, Marin-Jimenez M, Zisserman A. Progressive search space reduction for human pose estimation. In:Proc. of the CVPR. 2008. 1-8.
    [23] Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In:Proc. of the CVPR. 2014.
    [24] Ye M, Wang XW, Yang RG, Ren L, Pollefeys M. Accurate 3D pose estimation from a single depth image. In:Proc. of the ICCV. 2011.
    [25] Choi CH, Christensen HI. 3D pose estimation of daily objects using an RGB-D camera. In:Proc. of the IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems. 2012.
    [26] Fan X, Zheng K, Lin Y, Wang S. Combining local appearance and holistic view:Dual-source deep neural networks for human pose estimation. In:Proc. of the CVPR. 2015.
    [27] Gkioxari G, Hariharan B, Girshick R, Malik J. RCNNS for pose estimation and action detection. arXiv Preprint arXiv:1406.5212, 2014.
    [28] Li S, Chan AB. 3D human pose estimation from monocular images with deep convolutional neural network. In:Proc. of the ACCV. Springer-Verlag, 2014. 332-347.
    [29] Belagiannis V, Wang X, Shitrit H, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feuner H, Navab N. Parsing human skeletons in an operating room. In:Proc. of the Machine Vision and Applications. 2016.
    [30] KadkhodamohammadiA, Gangi A, de Mathelin M, Padoy N. Articulated clinician detection using 3D pictorial structures on RGB-D data. Medical Image Analysis, 2017,35:215-224.
    [31] Felzenszwalb PF, Huttenlocher DP. Pictorial structures for object recognition. Int'l Journal of Computer Vision, 2005,61(1):55-79.
    [32] Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N. A multi-view RGB-D approach for human pose estimation in operating rooms. In:Proc. of the WACV. 2017. 363-372.
    [33] Ruan X, Tian C. Dynamic gesture recognition based on improved DTW algorithm. In:Proc. of the 2015 IEEE Int'l Conf. on Mechatronics and Automation (ICMA). Beijing, 2015. 2134-2138.
    [34] He C, Hu ZF, Wang Y. Novel dynamic gesture recognition method based on improved DTW algorithm. Digital Communication, 2013.
    [35] Pan H, Li J. Online human action recognition based on improved dynamic time warping. In:Proc. of the 2016 IEEE Int'l Conf. on Big Data Analysis (ICBDA). Hangzhou, 2016. 1-5.
    [36] Hiyadi H, Ababsa F, Montagne C, Bouyakhf EH, Regragui F. Adaptive dynamic time warping for recognition of natural gestures. In:Proc. of the 6th Int'l Conf. on Image Processing Theory, Tools and Applications (IPTA). Oulu, 2016. 1-6.
    [37] Zhang Z, Liu Y, Li A, et al. A novel method for user-defined human posture recognition using Kinect. In:Proc. of the Int'l Congress on Image and Signal Processing. IEEE, 2014. 736-740.
    [38] Chen Y, Luo B, Chen YL, Liang G, Wu X. A real-time dynamic hand gesture recognition system using Kinect sensor. In:Proc. of the 2015 IEEE Int'l Conf. on Robotics and Biomimetics (ROBIO). Zhuhai, 2015. 2026-2030.
    [39] Zhang Y, Wu S, Luo Y. Applications and recognition of gesture trajectory using HMM. Bandaoti Guangdian/Semiconductor Optoelectronics, 2015,36(4):650-656.
    [40] Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE, 1989,77(2):257-286.
    [41] Song Y, Gu Y, Wang P, et al. A Kinect based gesture recognition algorithm using GMM and HMM. In:Proc. of the Int'l Conf. on Biomedical Engineering and Informatics. IEEE, 2014. 750-754.
    [42] Wu Q, Zhan Y, Shao Y, et al. Human motion matching and evaluation based on STDTW and K-means. Application of Electronic Technique, 2016.
    [43] Li Y, Yang Y, Chen Y, Zhu M. A pre-training strategy for convolutional neural network applied to Chinese digital gesture recognition. In:Proc. of the 8th IEEE Int'l Conf. on Communication Software and Networks (ICCSN). Beijing, 2016. 620-624.
    [44] Chavan P, Ghorpade T, Padiya P. Indian sign language to forecast text using leap motion sensor and RF classifier. In:Proc. of the 2016 Symp. on Colossal Data Analysis and Networking (CDAN). Indore, 2016. 1-5.
    [45] Anguera X, Wooters C, Hernando J. Acoustic beamforming for speaker diarization of meetings. IEEE Trans. on Audio Speech & Language Processing, 2007,15(7):2011-2022.
    [46] Higuchi T, Ito N, Yoshioka T, et al. Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2016. 5210-5214.
    [47] Warsitz E, Haeb-Umbach R. Blind acoustic beamforming based on generalized eigenvalue decomposition. IEEE Trans. on Audio Speech & Language Processing, 2007,15(5):1529-1539.
    [48] Roux JL, Vincent E. Consistent wiener filtering for audio source separation. IEEE Signal Processing Letters, 2013,20(3):217-220.
    [49] Nugraha AA, Liutkus A, Vincent E. Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. on Audio Speech & Language Processing, 2015,24(9):1652-1664.
    [50] Heymann J, Drude L, Haebumbach R. Neural network based spectral mask estimation for acoustic beamforming. IEEE Trans. on Industrial Electronics, 2016,46(3):544-553.
    [51] Lebart K, Boucher JM, Denbigh PN. A new method based on spectral subtraction for speech dereverberation. Acta Acustica United with Acustica, 2001,87(3):359-366.
    [52] Yoshioka T, Nakatani T, Miyoshi M, et al. Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. on Audio Speech & Language Processing, 2010,19(1):69-84.
    [53] Han K, Wang Y, Wang DL, et al. Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. on Audio Speech & Language Processing, 2015,23(6):982-992.
    [54] Gao J, Du J, Kong C, et al. An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition. In:Proc. of the IJCNN. IEEE, 2016. 588-594.
    [55] Narayanan A, Wang DL. Joint noise adaptive training for robust automatic speech recognition. In:Proc. of the ICASSP. 2014. 2504-2508.
    [56] Gao T, Du J, Dai LR, et al. Joint training of front-end and back-end deep neural networks for robust speech recognition. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2015. 4375-4379.
    [57] Wang Q, Du J, Bao X, et al. A universal VAD based on jointly trained deep neural networks. In:Proc. of the Interspeech. 2015.
    [58] Liu B, Tao J, Zhang D, Zheng Y. A novel pitch extraction based on jointly trained deep BLSTM recurrent neural networks with bottleneck features. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech, and Signal Processing. 2017.
    [59] Li XW, et al. A priori knowledge accumulation and its application to linear BRDF model inversion. Journal of Geophysical Research-Atmospheres, 2001,106:11925-11935.
    [60] Lake BM, et al. Human-level concept learning through probabilistic program induction. Science, 2015,350.
    [61] Lehrmann AM, Gehler PV, Nowozin S. A nonparametric bayesian network prior of human pose. In:Proc. of the ICCV 2013. Washington:IEEE Computer Society, 2013. 1281-1288.
    [62] Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J. SCAPE:Shape completion and animation of people. ACM ToG, 2005,24(3):408-416.
    [63] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012. 1097-1105.
    [64] Chao LL, et al. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In:Proc. of the ACM Multimedia. 2015. 65-72.
    [65] He L, et al. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In:Proc. of the ACM Int'l Workshop on Audio/Visual Emotion Challenge. 2015. 73-80.
    [66] Yu D, et al. Large-margin minimum classification error training for large-scale speech recognition tasks. In:Proc. of the IEEE Int'l Conf. on Acoustics. 2016.
    [67] He KM, et al. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification. In:Proc. of the IEEE Computer Vision and Pattern Recognition. 2015.
    [68] Yang F, et al. Exploit all the layers:Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2016. 2129-2137.
    [69] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recoginition. In:Proc. of the ICLR. 2015.
    [70] Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M:Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 2014,36(7):1325-1339.
    [71] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation. In:Proc. of the 32nd Int'l Conf. on Machine Learning (ICML 2015). 2015.
    附中文参考文献:
    [1] 王红迁,汪鹏,王飞,等.智能数字化手术室系统设计与实践.医学信息学杂志,2018,39(6):30-33.
    [2] 朱晨,吴玲燕.数字化手术室的选型与实施.现代医院管理,2014,12(6):77-80.
    [3] 杨琨,蔡亚欣,樊沛澍,等.智能数字化手术室整体设计与实施.健康前沿,2017,26(2).
    [4] 胡宇芬.智能手术室,手势来指挥.2015.http://hnrb.voc.com.cn/hnrb_epaper/html/2015-11/06/content_1029876.htm?div=-1
    [15] 杨明浩,等.面向自然交互的多通道人机对话系统.见:第9届全国和谐人机环境联合学术会议(CHCI 2013).2013.
    [20] 黄菲菲,曹江涛,姬晓飞.基于多通道信息融合的双人交互动作识别算法.计算机技术与发展,2016,26(3):58-62.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

陶建华,杨明浩,王志良,班晓娟,解仑,汪云海,曾琼,王飞,王红迁,刘斌,韩志帅,潘航,陈文拯.无菌条件非接触式多通道自然交互手术环境.软件学报,2019,30(10):2986-3004

Copy
Share
Article Metrics
  • Abstract:3527
  • PDF: 8334
  • HTML: 3273
  • Cited by: 0
History
  • Received:August 18,2018
  • Revised:November 01,2018
  • Online: May 16,2019
You are the first2033161Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063