Domain Dependent Language Model Based on Fuzzy Training Subset
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [7]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Statistical language model is very important to speech recognition. To a system of special topic, domain dependent language model is much better than the general model. There are two problems in traditional method. (1) The corpus of special topic is not large enough as general corpus. (2) An article is always related to more than one topic, but these phenomena have not been considered during the process of model training. In this paper, the authors try to solve these two problems. They present a new method to organize the corpus——the method based on fuzzy training subset. And the training of domain dependent models is based on these fuzzy subsets. At the same time, self organized learning has been introduced in training process to improve the models' prediction ability. It can improve the performance of models evidently.

    Reference
    [1]Jelinek F. Self-Organized Language Model for Speech Recognition. Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1990
    [2]Lin Sung-chien, Lee Lin-shan. Chinese language model adaptation based on document classification and multiple domain-specific language models. In: Kokkinakis G, Fakotakis N, Dermates E eds. Proceedings of European Conference of Speech Communication and Technology. Greece, European Speech Communication Association. 1997. 1463~1466
    [3]Clarkson P R, Robinson A J. Language model adaptation using mixtures and an exponentially dacaying cache. In: Pango P A ed. Proceedings of the International Conference of Acoustics Speech and Signal Processing. Munich: IEEE Signal Processing Society, 1997. 799~802
    [4]Chen Lang-zhou, Huang Tai-yi. A new method for text segmenting based on neural network. In: Huang Chang-ning ed. Proceedings of the International Conference on Chinese Information Processing. Beijing: Tsinghua University Press, 1998. 125~129 (陈浪舟,黄泰翼.一种基于神经网络的文本切分算法.见:黄昌宁编.中文信息处理国际会议论文集.北京:清华大学出版社,1998. 125~129)
    [5]Kneser R, Steinbiss V. On the dynamic adaptation of stochastic language modeling. In: Proceedings of the International Conference of Acoustics Speech and Signal Processing. Minneapolis: IEEE Signal Processing Society, 1993. 586~589
    [6]Huang De-shuang. Neural Network and Pattern Recognition System Theory. Beijing: Publishing House of Electronics Industry, 1996 (黄德双.神经网络模式识别理论.北京:电子工业出版社,1996)
    [7]Federico M. Bayesian estimation methods for n-gram language model adaptation. In: Bunnell T H ed. Proceedings of 1996 International Conference of Spoken Language Processing. Philadelphia: Press of University of Delaware, 1996. 240~243
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

陈浪舟,黄泰翼.基于模糊训练集的领域相关统计语言模型.软件学报,2000,11(7):971-978

Copy
Share
Article Metrics
  • Abstract:3614
  • PDF: 4439
  • HTML: 0
  • Cited by: 0
History
  • Received:February 08,1999
  • Revised:June 17,1999
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063