Abstract:The analysis of unknown forms is a challenging and important problem in document processing. Current methods can only tolerate small breaks in form lines. In this paper, a strategy is proposed for analyzing unknown structure and filled forms based on extracted lines. Individual edges are validated using knowledge of features of the extracted lines and their local proximity. In a process of scanning the horizontal and vertical lines, candidate edges are validated and rectangles are generated if their surrounding edges and their combination are all valid. To preserve the constraints and make full use of global information, the process is recursively applied. The rectangle extraction can tolerate large breaks in form lines, ignore irrelevant segments and deal with complex configurations such as embedded rectangles. After rectangle extraction, other form components are extracted by searching the remaining segments. Experiments on a collection of forms with handwritten fields and documents with tables show that the proposed approach works well even on poor quality images.