Abstract:Multi-view stereo (MVS) is widely used in fields such as autonomous driving, augmented reality, heritage conservation, and biomedicine. To address the limitations of traditional MVS methods, such as insensitivity to low-texture regions and poor reconstruction integrity, deep learning-based MVS methods have been proposed. This study reviews the pioneering work and current development of deep learning-based MVS methods. In particular, it focuses on methods for local functional improvement and overall architectural improvement and analyzes representative models. Meanwhile, the study describes widely used datasets and evaluation metrics and compares the test performance of existing methods on the datasets. Finally, promising research directions for MVS are presented.