Research on Chinese/English Mixed Document Recognition
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Currently, OCR (optical character recognition) classifiers are generally designed for one character set (or language). On the other hand, multilingual document increases drastically due to the globalization. Therefore, designing a document processing system with multilingual capability is very important. A general scheme is presented in this paper: two OCR techniques, a system, and a language classification. For embodying the scheme, a Chinese/English mixed document processing system is implemented. Three key problems are considered: the control of the system flow, the classification of Chinese/English regions, and the segmentation of English characters. Compared with old systems presented in other papers, the module of the classification of Chinese/English regions is added in the system, and a novel approach based on the equidistance is applied to the module. To verify the effectiveness of the system, another system is implemented according to the methods presented in other papers. Experiment shows, the new system is more effective than the old system. The recognition rate increases from 98.48% to 99.13% on magazine samples and from 98.68% to 99.25% on book samples, respectively.

    Reference
    Related
    Cited by
Get Citation

王恺,王庆人.中英文混合文章识别问题.软件学报,2005,16(5):786-798

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 03,2004
  • Revised:May 08,2004
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063