语义分析和结构化语言模型
作者:
基金项目:

Supported bythe National High-Tech Research and Development Plan of China under Grant No.2001AAll4071(国家高技术研究发展计划(863))

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    提出了一个语义分析集成系统,并在此基础上构建了结构化的语言模型.该语义分析集成系统能够自动分析句子中各个词的词义以及词之间的语义依存关系,达到90.85%的词义标注正确率和75.84%的语义依存结构标注正确率.为了描述语言的结构信息和长距离依存关系,研究并分析了两种基于语义结构的语言模型.最后,在中文语音识别任务上测试两类语言模型的性能.与三元语言模型相比,性能最好的语义结构语言模型--中心词三元模型,使绝对字错误率下降0.8%,相对错误率下降8%.

    Abstract:

    An integrated semantic analysis system is presented, and the structured language models are proposed based on it. The semantic analysis system can automatically tag semantic class for each word and analyze the semantic dependency structure between words with the precision of 90.85% and 75.84% respectively. In order to describe sentence structure and long-distance dependency, two kinds of structured language models are examined and analyzed. Finally, these two language models are evaluated on the task of Chinese speech recognition. Experiments show that the best semantic structured language model?headword trigram model?achieves 0.8% absolute error reduction and 8% relative error reduction over the trigram model.

    参考文献
    [1]Jelinek F. Self-Organized language modeling for speech recognition. In: Waibel A, Lee KF, eds. Readings in Speech Recognition.San Mateo: Morgan Kaufmann Publishers, 1990. 450-506.
    [2]Brown PF, DellaPietra VJ, DeSouza PV, Lai JC, Mercer RL. Class-Based n-gram models of natural language. Computational Linguistics, 1992,18(4):467-479.
    [3]Lau R, Rosenfeld R, Roukos S. Trigger-Based language models: A maximum entropy approach. In: Sullivan BJ, ed. Proc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing (ICASSP), VolⅡ. 1993.45-48.
    [4]Bellegarda JR. A multi-span language modeling framework for large vocabulary speech recognition. IEEE Trans. on Speech Audio Processing, 1998,6(5):456-467.
    [5]Gao JF, Suzuki H, Wen Y. Exploring headword dependency and predictive clustering for language modeling. In: Hajic J,Matsumoto Y, eds. Proc. of the Empirical Methods in Natural Language Processing (EMNLP). 2002. 248-256.
    [6]Chelba C. Exploiting syntactic structure for natural language modeling [Ph.D. Thesis]. Johns Hopkins University, 2000.
    [7]Xu P, Chelba C, Jelinek F. A study on rich syntactic dependencies for structured language modeling. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 2002. 191-199.
    [8]Roark B. Probabilistic top-down parsing and language modeling. Computational Linguistics, 2001,27(2):249-276.
    [9]Gao JF, Suzuki H. Unsupervised learning of dependency structure for language modeling. In: Proc. of the 41st Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 2003. 7-12. http:∥research.microsoft.com/~jfgao/paper/dlm-ACL03.pdf
    [10]Li MQ, Li JZ, Wang ZY, Lu DJ. A statistical model for parsing semantic dependency relations in a Chinese sentence. Chinese Journal of Computers, 2004,27(12):1679-1687 (in Chinese with English abstract).
    [11]Mei JJ, Zhu YM, Gao YQ, Yin HX. Tongyici Cilin (Dictionary of Synonymous Words). Shanghai: Shanghai Cishu Publisher, 1983 (in Chinese).
    [12]Li MQ, Li JZ, Dong ZD, Wang ZY, Lu DJ. Building a large Chinese corpus annotated with semantic dependency. In: Ma Q, Xia F,eds. Proc. of the 2nd SIGHAN Workshop on Chinese Language Processing. 2003.84-91.
    [13]Zhang JP. A study of language model and understanding algorithm for large vocabulary spontaneous speech recognition [PH.D.Thesis]. Beijing: Department of Electronic Engineering, Tsinghua University, 1999 (in Chinese with English abstract).
    [14]Wang ZY, Xiao X. Duration distribution based HMM speech recognition models. Chinese Journal of Electronics, 2004,32(1):46-49 (in Chinese with English abstract).
    [15]Zhou M. A block based dependency parser for unrestricted Chinese text. In: Proc. of the 2nd Chinese Language Processing Workshop. 2000.78-84. http:∥research.microsoft.com/china/papers/Robust_Dependency_Parser_Chinese_Text.pdf
    [10]李明琴,李涓子,王作英,陆大(纟金).中文语义依存关系分析的统计模型.计算机学报,2004,27(12):1679-1687.
    [11]梅家驹,竺一鸣,高蕴琦,殷鸿翔.同义词词林.上海:上海辞书出版社,1983.
    [13]张建平.大词汇量连续语音识别中的语言模型和理解算法的研究[博士学位论文].北京:清华大学电子工程系,1999.
    [14]王作英,肖熙.基于段长分布的HMM语音识别模型.电子学报,2004,32(1):46-49.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李明琴,李涓子,王作英,陆大?.语义分析和结构化语言模型.软件学报,2005,16(9):1523-1533

复制
分享
文章指标
  • 点击次数:4562
  • 下载次数: 8100
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2004-05-14
  • 最后修改日期:2004-09-07
文章二维码
您是第19988036位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号