The application of digitized civilization legacy plays an important role in the digital library project. Due to the intrinsic handwritten nature, it lacks effective mechanisms to perform content retrieval on digitized Chinese antique books. In this paper, an original method for content retrieval based on visual similarity is proposed and some key techniques are studied. By extracting morphological, positional and page features from images, the method makes up a feature space and applies spatial indexing to it. A range searching strategy is then employed to get all analogs to the query sample. In addition, a precision parameter is defined to dynamically adjust the mapping from morphological feature to semantics, and a constraint verifying technique is developed to improve the overall precision. The operational prototypical system demonstrates its fesibility and gets the effectiveness of automatic content-based retrieval directly on page images.
[1] Zhu, Yan. Experiences of electronic version of Si Ku Quan Shu complete library of the four branches of literature. The Journal of the Library Science in China, 1999,25(125):82~84 (in Chinese). 朱岩.<四库全书>电子版问世的启迪.中国图书馆学报,1999,25(125):82~84.
[2] Gladney, H., Mintzer, F., Schiattarella, F. Safeguarding digital library contents and users: digital images of treasured antiquities. D-Lib Magazine, 1997. http://www.dlib.org/dlib.html.
[3] Thibadeau, R., Benoit, F. Antique books. D-Lib Magazine, 1997. http://www.dlib.org/dlib.html.
[4] Zhang, Xin-zhong. Chinese Character Recognizing Techniques. Beijing: Tsinghua University Press, 1992 (in Chinese).张火斤中.汉字识别技术.北京:清华大学出版社,1992.
[5] Wang, W., Yang, J., Muntz, R. PK-Tree: a spatial index structure for high dimensional point data. In: Tanaka, K., Ghandeharizadeh, S., Kambayashi, Y., eds. Information Organization and Database. Boston: Kluwer Academic Publishers, 2000. 281~293.