An integrated semantic analysis system is presented, and the structured language models are proposed based on it. The semantic analysis system can automatically tag semantic class for each word and analyze the semantic dependency structure between words with the precision of 90.85% and 75.84% respectively. In order to describe sentence structure and long-distance dependency, two kinds of structured language models are examined and analyzed. Finally, these two language models are evaluated on the task of Chinese speech recognition. Experiments show that the best semantic structured language model?headword trigram model?achieves 0.8% absolute error reduction and 8% relative error reduction over the trigram model.
[1]Jelinek F. Self-Organized language modeling for speech recognition. In: Waibel A, Lee KF, eds. Readings in Speech Recognition.San Mateo: Morgan Kaufmann Publishers, 1990. 450-506.
[2]Brown PF, DellaPietra VJ, DeSouza PV, Lai JC, Mercer RL. Class-Based n-gram models of natural language. Computational Linguistics, 1992,18(4):467-479.
[3]Lau R, Rosenfeld R, Roukos S. Trigger-Based language models: A maximum entropy approach. In: Sullivan BJ, ed. Proc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing (ICASSP), VolⅡ. 1993.45-48.
[4]Bellegarda JR. A multi-span language modeling framework for large vocabulary speech recognition. IEEE Trans. on Speech Audio Processing, 1998,6(5):456-467.
[5]Gao JF, Suzuki H, Wen Y. Exploring headword dependency and predictive clustering for language modeling. In: Hajic J,Matsumoto Y, eds. Proc. of the Empirical Methods in Natural Language Processing (EMNLP). 2002. 248-256.
[6]Chelba C. Exploiting syntactic structure for natural language modeling [Ph.D. Thesis]. Johns Hopkins University, 2000.
[7]Xu P, Chelba C, Jelinek F. A study on rich syntactic dependencies for structured language modeling. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 2002. 191-199.
[8]Roark B. Probabilistic top-down parsing and language modeling. Computational Linguistics, 2001,27(2):249-276.
[9]Gao JF, Suzuki H. Unsupervised learning of dependency structure for language modeling. In: Proc. of the 41st Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 2003. 7-12. http:∥research.microsoft.com/~jfgao/paper/dlm-ACL03.pdf
[10]Li MQ, Li JZ, Wang ZY, Lu DJ. A statistical model for parsing semantic dependency relations in a Chinese sentence. Chinese Journal of Computers, 2004,27(12):1679-1687 (in Chinese with English abstract).
[11]Mei JJ, Zhu YM, Gao YQ, Yin HX. Tongyici Cilin (Dictionary of Synonymous Words). Shanghai: Shanghai Cishu Publisher, 1983 (in Chinese).
[12]Li MQ, Li JZ, Dong ZD, Wang ZY, Lu DJ. Building a large Chinese corpus annotated with semantic dependency. In: Ma Q, Xia F,eds. Proc. of the 2nd SIGHAN Workshop on Chinese Language Processing. 2003.84-91.
[13]Zhang JP. A study of language model and understanding algorithm for large vocabulary spontaneous speech recognition [PH.D.Thesis]. Beijing: Department of Electronic Engineering, Tsinghua University, 1999 (in Chinese with English abstract).
[14]Wang ZY, Xiao X. Duration distribution based HMM speech recognition models. Chinese Journal of Electronics, 2004,32(1):46-49 (in Chinese with English abstract).
[15]Zhou M. A block based dependency parser for unrestricted Chinese text. In: Proc. of the 2nd Chinese Language Processing Workshop. 2000.78-84. http:∥research.microsoft.com/china/papers/Robust_Dependency_Parser_Chinese_Text.pdf