无菌条件非接触式多通道自然交互手术环境
作者:
作者简介:

陶建华(1971-),男,江苏淮安人,博士,教授,博士生导师,CCF会士,主要研究领域为语音合成,语音识别,情感计算,人机 对话;王飞(1982-),男,高级工程师,主要研究领域为医疗信息化,医学大数据,人工 智能;杨明浩(1977-),男,博士,副研究员,CCF专业会员,主要研究领域为人机融合感知与决策,身自化认知计算理论与方法,多通道交互信息处理;王红迁(1987-),男,工程师,主要研究领域为大数据架构与研发,人工智能,医疗信息化;王志良(1956-),男,博士,教授,博士生导师,主要研究领域为人工心理理论与方法,机器人,物联网;刘斌(1984-),男,博士,副研究员,CCF专业会员,主要研究领域为语音处理,虚拟听觉;班晓娟(1970-),女,博士,正高级,博士生导师,CCF高级会员,主要研究领域为人工智能,自然人机交互,三维可视化技术;韩志帅(1993-),男,硕士,主要研究领域为人机交互,计算机视觉;解仑(1968-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为情感计算与智能交互,智能机器人技术,工业控制系统信息安全;潘航(1991-),男,硕士,主要研究领域为情感计算,模式识别;汪云海(1984-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为计算机图形学,数据可视化;陈文拯(1992-),男,硕士,主要研究领域为计算机视觉;曾琼(1987-),女,博士后,CCF专业会员,主要研究领域为计算机图形学,数据可视化.

通讯作者:

杨明浩,E-mail:mhyang@nlpr.ia.ac.cn

基金项目:

国家重点研发计划(2016YFB1001404);国家自然科学基金(61873269,61831022,61425017,61332017)


Non Contact Multi-channel Natural Interactive Surgical Environment under Sterile Conditions
Author:
Fund Project:

National Key Research & Development Plan of China (2016YFB1001404); National Natural Science Foundation of China (61873269, 61831022, 61425017, 61332017)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [78]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    无菌和非接触环境是医疗手术室的基本要求,这使得计算机操作室和手术室需要在物理上隔离.同时,因为手术进行中,主治医生如果需要查看病灶图像,通常授意护士或者手术助理到计算机操作室操作病灶图像,由于手术室和计算机操作室间的隔离,以及主治医生和助理间可能存在的意图理解不准确,容易导致护士或者手术助理在手术室和计算机操作室往返多次,这增加了患者手术时间延长、失血增多、脏器暴露时间长等风险,尽量减少手术中定位到病灶图像的时间对于医生和病人都很重要.针对上述需求,借助遮挡环境下的深度图像人体骨架提取、手势跟踪与理解、手术室环境远场语音识别,多模态信息处理与融合技术,构建无菌条件下的非接触式多通道自然交互手术环境.该环境使得主治医生在手术中可通过语音命令、手势及上述交互方式相结合的方式快速定位到需要观察的病灶成像.在接近真实环境的实验环境中,建立的无菌条件的非接触式多通道自然交互手术环境在保证精度的情况下,可显著缩短病灶图像定位时间.无菌环境智能交互医疗手术室为未来下一代高效的手术提供了技术与方法验证.

    Abstract:

    Sterile and non-contact environment is the basic requirement of medical operating room, which makes the computer room and operating room need to be physically isolated. At the same time, if the attending doctor needs to look at the image of the lesion during the operation, he usually instructs the nurse or the assistant to operate the image of the lesion in the computer operating room, because of the isolation between the operating room and the computer room, and because the intention between the attending doctor and the assistant may not be understood accurately, it is easy to lead the nurse or surgical assistant in the operating room and computer room to and fro many times, which increases the risk of prolonged operation time, increased blood loss, and organ exposure time, minimizing the time to locate the lesion image in the operation is important for doctors and patients. To meet the above requirements, a non-contact multi-channel natural interactive surgical environment under aseptic conditions is constructed by means of human skeleton extraction, gesture tracking and understanding, far-field speech recognition in operating room environment, multi-modal information processing, and fusion technology. This environment allows the attending physician to quickly locate the lesion to be observed during surgery by combining voice commands, gestures, and the above interaction. In the experimental environment close to the real environment, the non-contact multi-channel natural interactive surgical environment established in this study can significantly reduce the localization time of the lesion image under the condition of ensuring the accuracy. Intelligent interactive operating room in aseptic environment provides technical and methodological validation for the next generation of efficient surgery.

    参考文献
    [1] Wang HQ, Wang P, Wang F, et al. Design and practice of intelligent digital operating room system. Journal of Medical Intelligence, 2018,39(6):30-33(in Chinese with English abstract).
    [2] Zhu C, Wu LY. Selection and implementation of digital operating room. Modern Hospital Management, 2014,12(6):77-80(in Chinese with English abstract).
    [3] Yang K, Cai YX, Fan PS, et al. Intelligent digital operating room design and implementation. Health Frontier, 2017,26(2) (in Chinese with English abstract).
    [4] Hu YF. Intelligent operating room with gesture command. 2015(in Chinese). http://hnrb.voc.com.cn/hnrb_epaper/html/2015-11/06/content_1029876.htm?div=-1
    [5] Chorowski J, et al. Attention-based models for speech recognition. Computer Science, 2015.
    [6] Wang L, et al. Recent developments in human motion analysis. Pattern Recognition, 2003,36:585-601.
    [7] Betke CM. Fagiani, JG. Evaluation of tracking methods for human-computer interaction. In:Proc. of the IEEE Workshop on Applications in Computer Vision. 2002. 121-126.
    [8] Hu W, et al. A survey on visual surveillance of object motion and behaviors. IEEE Trans. on Systems, Man, and Cybernetics, 2004, (34):334-352.
    [9] Oudeyer PY. The production and recognition of emotions in speech:Features and algorithms. Int'l Journal of Human-Computer Studies, 2003,(59):157-183.
    [10] Ge LH, et al. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In:Proc. of the CVPR. 2017.
    [11] Hasan HS, Kareem SA. Human computer interaction for vision based hand gesture recognition:A survey. Artificial Intelligence Review, 2015,(43):1-54.
    [12] Ruffieux S, et al. A survey of datasets for human gesture recognition. In:Proc. of the Int'l Conf. on Human-computer Interaction. 2014. 337-348.
    [13] Zimmermann C, Brox T. Learning to estimate 3D hand pose from single RGB images. In:Proc. of the Int'l Conf. on Computer Vision. 2017.
    [14] Hatice G, Massimo P. Affect recognition from face and body:Early fusion vs. late fusion. In:Proc. of the IEEE Int'l Conf. on Systems, Man and Cybernetics. IEEE, 2006,(4):3437-3443.
    [15] Yang MH, et al. A nature multimodal human-computer-interaction dialog system. In:Proc. of the CHCI in Harmonious Human Computer Environment, 2013(CHCI 2013). 2013(in Chinese with English abstract).
    [16] Yang MH, et al. User behavior fusion in dialog management with multi-modal history cues. Multimedia Tools and Applications, 2015,(74):10025-10051.
    [17] Ngiam JQ, et al. Multimodal deep learning. In:Proc. of the Int'l Conf. on Machine Learning. 2011. 689-696.
    [18] Collobert R, Weston J. A unified architecture for natural language processing:Deep neural networks with multitask learning. In:Proc. of the 25th Int'l Conf. on Machine Learning, 2008. 160-170.
    [19] Seltzer ML, Droppo J. Multi-task learning in deep neural networks for improved phoneme recognition. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2013.
    [20] Huang FF, Cao JT, Ji XF. Two-human interaction recognition algorithm based on multi-channels information fusion. Computer Technology and Development, 2016,26(3):58-62(in Chinese with English abstract).
    [21] Mori G, Malik J. Recovering 3d human body configurations using shape contexts. TPAMI, 2006,28(7):1052-1062.
    [22] Ferrari V, Marin-Jimenez M, Zisserman A. Progressive search space reduction for human pose estimation. In:Proc. of the CVPR. 2008. 1-8.
    [23] Andriluka M, Pishchulin L, Gehler P, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In:Proc. of the CVPR. 2014.
    [24] Ye M, Wang XW, Yang RG, Ren L, Pollefeys M. Accurate 3D pose estimation from a single depth image. In:Proc. of the ICCV. 2011.
    [25] Choi CH, Christensen HI. 3D pose estimation of daily objects using an RGB-D camera. In:Proc. of the IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems. 2012.
    [26] Fan X, Zheng K, Lin Y, Wang S. Combining local appearance and holistic view:Dual-source deep neural networks for human pose estimation. In:Proc. of the CVPR. 2015.
    [27] Gkioxari G, Hariharan B, Girshick R, Malik J. RCNNS for pose estimation and action detection. arXiv Preprint arXiv:1406.5212, 2014.
    [28] Li S, Chan AB. 3D human pose estimation from monocular images with deep convolutional neural network. In:Proc. of the ACCV. Springer-Verlag, 2014. 332-347.
    [29] Belagiannis V, Wang X, Shitrit H, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feuner H, Navab N. Parsing human skeletons in an operating room. In:Proc. of the Machine Vision and Applications. 2016.
    [30] KadkhodamohammadiA, Gangi A, de Mathelin M, Padoy N. Articulated clinician detection using 3D pictorial structures on RGB-D data. Medical Image Analysis, 2017,35:215-224.
    [31] Felzenszwalb PF, Huttenlocher DP. Pictorial structures for object recognition. Int'l Journal of Computer Vision, 2005,61(1):55-79.
    [32] Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N. A multi-view RGB-D approach for human pose estimation in operating rooms. In:Proc. of the WACV. 2017. 363-372.
    [33] Ruan X, Tian C. Dynamic gesture recognition based on improved DTW algorithm. In:Proc. of the 2015 IEEE Int'l Conf. on Mechatronics and Automation (ICMA). Beijing, 2015. 2134-2138.
    [34] He C, Hu ZF, Wang Y. Novel dynamic gesture recognition method based on improved DTW algorithm. Digital Communication, 2013.
    [35] Pan H, Li J. Online human action recognition based on improved dynamic time warping. In:Proc. of the 2016 IEEE Int'l Conf. on Big Data Analysis (ICBDA). Hangzhou, 2016. 1-5.
    [36] Hiyadi H, Ababsa F, Montagne C, Bouyakhf EH, Regragui F. Adaptive dynamic time warping for recognition of natural gestures. In:Proc. of the 6th Int'l Conf. on Image Processing Theory, Tools and Applications (IPTA). Oulu, 2016. 1-6.
    [37] Zhang Z, Liu Y, Li A, et al. A novel method for user-defined human posture recognition using Kinect. In:Proc. of the Int'l Congress on Image and Signal Processing. IEEE, 2014. 736-740.
    [38] Chen Y, Luo B, Chen YL, Liang G, Wu X. A real-time dynamic hand gesture recognition system using Kinect sensor. In:Proc. of the 2015 IEEE Int'l Conf. on Robotics and Biomimetics (ROBIO). Zhuhai, 2015. 2026-2030.
    [39] Zhang Y, Wu S, Luo Y. Applications and recognition of gesture trajectory using HMM. Bandaoti Guangdian/Semiconductor Optoelectronics, 2015,36(4):650-656.
    [40] Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE, 1989,77(2):257-286.
    [41] Song Y, Gu Y, Wang P, et al. A Kinect based gesture recognition algorithm using GMM and HMM. In:Proc. of the Int'l Conf. on Biomedical Engineering and Informatics. IEEE, 2014. 750-754.
    [42] Wu Q, Zhan Y, Shao Y, et al. Human motion matching and evaluation based on STDTW and K-means. Application of Electronic Technique, 2016.
    [43] Li Y, Yang Y, Chen Y, Zhu M. A pre-training strategy for convolutional neural network applied to Chinese digital gesture recognition. In:Proc. of the 8th IEEE Int'l Conf. on Communication Software and Networks (ICCSN). Beijing, 2016. 620-624.
    [44] Chavan P, Ghorpade T, Padiya P. Indian sign language to forecast text using leap motion sensor and RF classifier. In:Proc. of the 2016 Symp. on Colossal Data Analysis and Networking (CDAN). Indore, 2016. 1-5.
    [45] Anguera X, Wooters C, Hernando J. Acoustic beamforming for speaker diarization of meetings. IEEE Trans. on Audio Speech & Language Processing, 2007,15(7):2011-2022.
    [46] Higuchi T, Ito N, Yoshioka T, et al. Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2016. 5210-5214.
    [47] Warsitz E, Haeb-Umbach R. Blind acoustic beamforming based on generalized eigenvalue decomposition. IEEE Trans. on Audio Speech & Language Processing, 2007,15(5):1529-1539.
    [48] Roux JL, Vincent E. Consistent wiener filtering for audio source separation. IEEE Signal Processing Letters, 2013,20(3):217-220.
    [49] Nugraha AA, Liutkus A, Vincent E. Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. on Audio Speech & Language Processing, 2015,24(9):1652-1664.
    [50] Heymann J, Drude L, Haebumbach R. Neural network based spectral mask estimation for acoustic beamforming. IEEE Trans. on Industrial Electronics, 2016,46(3):544-553.
    [51] Lebart K, Boucher JM, Denbigh PN. A new method based on spectral subtraction for speech dereverberation. Acta Acustica United with Acustica, 2001,87(3):359-366.
    [52] Yoshioka T, Nakatani T, Miyoshi M, et al. Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. on Audio Speech & Language Processing, 2010,19(1):69-84.
    [53] Han K, Wang Y, Wang DL, et al. Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. on Audio Speech & Language Processing, 2015,23(6):982-992.
    [54] Gao J, Du J, Kong C, et al. An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition. In:Proc. of the IJCNN. IEEE, 2016. 588-594.
    [55] Narayanan A, Wang DL. Joint noise adaptive training for robust automatic speech recognition. In:Proc. of the ICASSP. 2014. 2504-2508.
    [56] Gao T, Du J, Dai LR, et al. Joint training of front-end and back-end deep neural networks for robust speech recognition. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2015. 4375-4379.
    [57] Wang Q, Du J, Bao X, et al. A universal VAD based on jointly trained deep neural networks. In:Proc. of the Interspeech. 2015.
    [58] Liu B, Tao J, Zhang D, Zheng Y. A novel pitch extraction based on jointly trained deep BLSTM recurrent neural networks with bottleneck features. In:Proc. of the IEEE Int'l Conf. on Acoustics, Speech, and Signal Processing. 2017.
    [59] Li XW, et al. A priori knowledge accumulation and its application to linear BRDF model inversion. Journal of Geophysical Research-Atmospheres, 2001,106:11925-11935.
    [60] Lake BM, et al. Human-level concept learning through probabilistic program induction. Science, 2015,350.
    [61] Lehrmann AM, Gehler PV, Nowozin S. A nonparametric bayesian network prior of human pose. In:Proc. of the ICCV 2013. Washington:IEEE Computer Society, 2013. 1281-1288.
    [62] Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J. SCAPE:Shape completion and animation of people. ACM ToG, 2005,24(3):408-416.
    [63] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012. 1097-1105.
    [64] Chao LL, et al. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In:Proc. of the ACM Multimedia. 2015. 65-72.
    [65] He L, et al. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In:Proc. of the ACM Int'l Workshop on Audio/Visual Emotion Challenge. 2015. 73-80.
    [66] Yu D, et al. Large-margin minimum classification error training for large-scale speech recognition tasks. In:Proc. of the IEEE Int'l Conf. on Acoustics. 2016.
    [67] He KM, et al. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification. In:Proc. of the IEEE Computer Vision and Pattern Recognition. 2015.
    [68] Yang F, et al. Exploit all the layers:Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2016. 2129-2137.
    [69] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recoginition. In:Proc. of the ICLR. 2015.
    [70] Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M:Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 2014,36(7):1325-1339.
    [71] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation. In:Proc. of the 32nd Int'l Conf. on Machine Learning (ICML 2015). 2015.
    附中文参考文献:
    [1] 王红迁,汪鹏,王飞,等.智能数字化手术室系统设计与实践.医学信息学杂志,2018,39(6):30-33.
    [2] 朱晨,吴玲燕.数字化手术室的选型与实施.现代医院管理,2014,12(6):77-80.
    [3] 杨琨,蔡亚欣,樊沛澍,等.智能数字化手术室整体设计与实施.健康前沿,2017,26(2).
    [4] 胡宇芬.智能手术室,手势来指挥.2015.http://hnrb.voc.com.cn/hnrb_epaper/html/2015-11/06/content_1029876.htm?div=-1
    [15] 杨明浩,等.面向自然交互的多通道人机对话系统.见:第9届全国和谐人机环境联合学术会议(CHCI 2013).2013.
    [20] 黄菲菲,曹江涛,姬晓飞.基于多通道信息融合的双人交互动作识别算法.计算机技术与发展,2016,26(3):58-62.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

陶建华,杨明浩,王志良,班晓娟,解仑,汪云海,曾琼,王飞,王红迁,刘斌,韩志帅,潘航,陈文拯.无菌条件非接触式多通道自然交互手术环境.软件学报,2019,30(10):2986-3004

复制
分享
文章指标
  • 点击次数:3523
  • 下载次数: 8319
  • HTML阅读次数: 3247
  • 引用次数: 0
历史
  • 收稿日期:2018-08-18
  • 最后修改日期:2018-11-01
  • 在线发布日期: 2019-05-16
文章二维码
您是第19728331位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号