Abstract:Sterile and non-contact environment is the basic requirement of medical operating room, which makes the computer room and operating room need to be physically isolated. At the same time, if the attending doctor needs to look at the image of the lesion during the operation, he usually instructs the nurse or the assistant to operate the image of the lesion in the computer operating room, because of the isolation between the operating room and the computer room, and because the intention between the attending doctor and the assistant may not be understood accurately, it is easy to lead the nurse or surgical assistant in the operating room and computer room to and fro many times, which increases the risk of prolonged operation time, increased blood loss, and organ exposure time, minimizing the time to locate the lesion image in the operation is important for doctors and patients. To meet the above requirements, a non-contact multi-channel natural interactive surgical environment under aseptic conditions is constructed by means of human skeleton extraction, gesture tracking and understanding, far-field speech recognition in operating room environment, multi-modal information processing, and fusion technology. This environment allows the attending physician to quickly locate the lesion to be observed during surgery by combining voice commands, gestures, and the above interaction. In the experimental environment close to the real environment, the non-contact multi-channel natural interactive surgical environment established in this study can significantly reduce the localization time of the lesion image under the condition of ensuring the accuracy. Intelligent interactive operating room in aseptic environment provides technical and methodological validation for the next generation of efficient surgery.