Heterogeneous Graph Network with Window Mechanism for Spoken Language Understanding
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [36]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    Spoken language understanding (SLU), as a core component of task-oriented dialogue systems, aims to extract the semantic framework of user queries. In dialogue systems, the SLU component is responsible for identifying user requests and creating a semantic framework that summarizes user requests. SLU usually includes two subtasks: intent detection (ID) and slot filling (SF). ID is regarded as a semantic utterance classification problem that analyzes the semantics of utterance at the sentence level, while SF is viewed as a sequence labeling task that analyzes the semantics of utterance at the word level. Due to the close correlation between intentions and slots, mainstream works employ joint models to exploit shared knowledge across tasks. However, ID and SF are two different tasks with strong correlation, and they represent sentence-level semantic information and word-level information of utterances respectively, which means that the information of the two tasks is heterogeneous and has different granularities. This study proposes a heterogeneous interactive structure for joint ID and SF, which adequately captures the relationship between sentence-level semantic information and word-level information in heterogeneous information for two correlative tasks by adopting self-attention and graph attention networks. Different from ordinary homogeneous structures, the proposed model is a heterogeneous graph architecture containing different types of nodes and links because a heterogeneous graph involves more comprehensive information and rich semantics and can better interactively represent the information between nodes with different granularities. In addition, this study utilizes a window mechanism to accurately represent word-level embedding to better accommodate the local continuity of slot labels. Meanwhile, the study uses a pre-trained model (BERT) to analyze the effect of the proposed model using BERT. The experimental results of the proposed model on two public datasets show that the model achieves an accuracy of 97.98% and 99.11% on the ID task and an F1 score of 96.10% and 96.11% on the SF task, which are superior to the current mainstream methods.

    Reference
    [1] Young S, Gašić M, Thomson B, Williams JD. Pomdp-based statistical spoken dialog systems:A review. Proc. of the IEEE, 2013, 101(5):1160-1179.[doi:10.1109/JPROC.2012.2225812
    [2] Tur G, De Mori R. Spoken Language Understanding:Systems for Extracting Semantic Information from Speech. New York:John Wiley & Sons, Ltd., 2011.
    [3] Ni JJ, Young T, Pandelea V, Xue FZ, Cambria E. Recent advances in deep learning based dialogue systems:A systematic survey. Artificial Intelligence Review, 2023, 56(4):3055-3155.[doi:10.1007/s10462-022-10248-8
    [4] Haihong E, Niu PQ, Chen ZF, Song MN. A novel bi-directional interrelated model for joint intent detection and slot filling. In:Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence:Association for Computational Linguistics, 2019. 5467-5471.
    [5] Ramshaw LA, Marcus MP. Text chunking using transformation-based learning. In:Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D, eds. Natural Language Processing Using Very Large Corpora. Dordrecht:Springer, 1999. 157-176.
    [6] Qin LB, Liu TL, Che WX, Kang BB, Zhao SD, Liu T. A co-interactive Transformer for joint slot filling and intent detection. In:Proc. of the 2021 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP). Toronto:IEEE, 2021. 8193-8197.
    [7] Zhang LH, Ma DH, Zhang XD, Yan XH, Wang HF. Graph LSTM with context-gated mechanism for spoken language understanding. In:Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York:AAAI Press, 2020. 9539-9546.
    [8] Wu D, Ding L, Lu F, Xie J. SlotRefine:A fast non-autoregressive model for joint intent detection and slot filling. In:Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. 1932-1937.
    [9] Liu YJ, Meng FD, Zhang JC, Zhou J, Chen YF, Xu JN. CM-Net:A novel collaborative memory network for spoken language understanding. In:Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int'l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong:Association for Computational Linguistics, 2019. 1051-1060.
    [10] Qin LB, Xie TB, Che WX, Liu T. A survey on spoken language understanding:Recent advances and new frontiers. In:Proc. of the 30th Int'l Joint Conf. on Artificial Intelligence. Montreal:IJCAI.org, 2021. 4577-4584.
    [11] Wang X, Ji Hy, Shi C, Wang B, Ye YF, Cui P, Yu PS. Heterogeneous graph attention network. In:Proc. of the 2019 World Wide Web Conf. San Francisco:Association for Computing Machinery, 2019. 2022-2032.
    [12] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In:Proc. of the 31st Int'l Conf. on Neural Information Processing Systems. Long Beach:Curran Associates Inc., 2017. 6000-6010
    [13] Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In:Proc. of the 6th Int'l Conf. on Learning Representations. Vancouver:ICIR, 2018.
    [14] Shi C, Li YT, Zhang JW, Sun YZ, Yu PS. A survey of heterogeneous information network analysis. IEEE Trans. on Knowledge and Data Engineering, 2017, 29(1):17-37.[doi:10.1109/TKDE.2016.2598561
    [15] Hemphill CT, Godfrey JJ, Doddington GR. The ATIS spoken language systems pilot corpus. In:Proc. of the 1990 Workshop on Speech and Natural Language. Hidden Valley:Association for Computational Linguistics, 1990. 96-101.
    [16] Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T, Primet M, Dureau J. Snips voice platform:An embedded spoken language understanding system for private-by-design voice interfaces. arXiv:1805.10190, 2018.
    [17] Haffner P, Tur G, Wright JH. Optimizing svms for complex call classification. In:Proc. of the 2003 IEEE Int'l Conf. on Acoustics, Speech, and Signal Processing. Hong Kong:IEEE, 2003. I-632-I-635.
    [18] Raymond C, Riccardi G. Generative and discriminative algorithms for spoken language understanding. In:Proc. of the 8th Interspeech Annual Conf. of the Int'l Speech Communication Association. Anvers:HAL, 2007.
    [19] Deng L, Tur G, He XD, Hakkani-Tur D. Use of kernel deep convex networks and end-to-end learning for spoken language understanding. In:Proc. of the 2012 IEEE Spoken Language Technology Workshop (SLT). Miami:IEEE, 2012. 210-215.
    [20] Tur G, Deng L, Hakkani-Tür D, He XD. Towards deeper understanding:Deep convex networks for semantic utterance classification. In:Proc. of the 2012 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP). Kyoto:IEEE, 2012. 5045-5048.
    [21] Ravuri S, Stolcke A. Recurrent neural network and lstm models for lexical utterance classification. In:Proc. of the 16th Annual Conf. of the Int'l Speech Communication Association. Dresden:ISCA, 2015. 135-139.
    [22] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.[doi:10.1162/neco.1997.9.8.1735
    [23] Wu CS, Hoi SCH, Socher R, Xiong CM. TOD-BERT:Pre-trained natural language understanding for task-oriented dialogue. In:Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. 917-929.
    [24] Yao KS, Zweig G, Hwang MY, Shi YY, Yu D. Recurrent neural networks for language understanding. In:Proc. of the 2013 Interspeech. Lyon:ISCA, 2013. 2524-2528.
    [25] Mesnil G, He XD, Deng L, Bengio Y. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In:Proc. of the 2013 Interspeech. Lyon:ISCA, 2013:3771-3775.
    [26] Mesnil G, Dauphin Y, Yao KS, Bengio Y, Deng L, Hakkani-Tur D, He XD, Heck L, Tur G, Yu D, Zweig G. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 2015, 23(3):530-539.[doi:10.1109/TASLP.2014.2383614
    [27] Coope S, Farghly T, Gerz D, et al. Span-ConveRT:Few-shot span extraction for dialog with pretrained conversational representations. In:Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. 107-121.
    [28] Zhang XD, Wang HF. A joint model of intent determination and slot filling for spoken language understanding. In:Proc. of the 25th Int'l Joint Conf. on Artificial Intelligence. New York:AAAI Press, 2016. 2993-2999.
    [29] Liu B, Lane I. Attention-based recurrent neural network models for joint intent detection and slot filling. In:Proc. of the 2016 Interspeech. San Francisco:ISCA, 2016. 685-689.
    [30] Goo CW, Gao G, Hsu YK, Huo CL, Chen TC, Hsu KW, Chen YN. Slot-gated modeling for joint slot filling and intent prediction. In:Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Vol. 2 (Short Papers). New Orleans:Association for Computational Linguistics, 2018. 753-757.
    [31] Li CL, Li L, Qi J. A self-attentive model with gate mechanism for spoken language understanding. In:Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels:Association for Computational Linguistics, 2018. 3824-3833.
    [32] Qin LB, Che WX, Li YM, Wen HY, Liu T. A stack-propagation framework with token-level intent detection for spoken language understanding. In:Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int'l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong:Association for Computational Linguistics, 2019. 2078-2087.
    [33] Wang JX, Wei K, Radfar M, Zhang WW, Chung C. Encoding syntactic knowledge in transformer encoder for intent detection and slot filling. In:Proc. of the 35th AAAI Conf. on Artificial Intelligence. Palo Alto:AAAI Press, 2021. 13943-13951.
    [34] Kim Y. Convolutional neural networks for sentence classification. In:Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha:Association for Computational Linguistics, 2014. 1746-1751.
    [35] Kingma DP, Ba J. Adam:A method for stochastic optimization. In:Proc. of the 3rd Int'l Conf. on Learning Representations. San Diego:ICLR, 2015.
    [36] Devlin J, Chang MW, Lee K, Toutanova K. BERT:Pre-training of deep bidirectional transformers for language understanding. In:Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Vol. 1 (Long and Short Papers). Minneapolis:Association for Computational Linguistics, 2018. 4171-7186.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张启辰,王帅,李静梅.一种基于窗口机制的口语理解异构图网络.软件学报,2024,35(4):1885-1898

Copy
Share
Article Metrics
  • Abstract:414
  • PDF: 1790
  • HTML: 731
  • Cited by: 0
History
  • Received:May 09,2022
  • Revised:August 08,2022
  • Online: June 14,2023
  • Published: April 06,2024
You are the first2038246Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063