Abstract:In recent years, digital video shooting equipment has been continuously upgraded. Although the improvement of the latitude of its image sensor and shutter rate has greatly enriched the diversity of the scene that can be photographed, the degraded factors such as rain streaks caused by raindrops passing through the field of view at high speed are also easier to be recorded. The dense rain streaks in the foreground block the effective information of the background scene, thus affecting the effective acquisition of images. Therefore, video image deraining becomes an urgent problem to be solved. The previous video deraining methods focus on using the information of conventional images themselves. However, due to the physical limit of the image sensors of conventional cameras, the constraints of the shutter mechanism, etc., much optical information is lost during video acquisition, which affects the subsequent video deraining effect. Therefore, taking advantage of the complementarity of event data and conventional video information, as well as the high dynamic range and high temporal resolution of event information, this study proposes a video deraining network based on event data fusion, spatial attention, and temporal memory, which uses three-dimensional alignment to convert the sparse event stream into an expression form that matches the size of the image and superimposes the input to the event-image fusion module that integrates the spatial attention mechanism, so as to effectively extract the spatial information of the image. In addition, in continuous frame processing, the inter-frame memory module is used to utilize the previous frame features, which are finally constrained by the three-dimensional convolution and two loss functions. The video deraining method is effective on the publicly available dataset and meets the standard of real-time video processing.