INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING

Citations

WEB OF SCIENCE

19
Citations

SCOPUS

30

초록

Most End-to-End (E2E) Spoken Language Understanding (SLU) networks leverage the pre-trained Automatic Speech Recognition (ASR) networks but still lack the capability to understand the semantics of utterances, crucial for the SLU task. To solve this, recently proposed studies use pre-trained Natural Language Understanding (NLU) networks. However, it is not trivial to fully utilize both pre-trained networks; many solutions were proposed, such as Knowledge Distillation (KD), cross-modal shared embedding and network integration with Interface. We propose a simple and robust integration method for the E2E SLU network with a novel Interface, Continuous Token Interface (CTI). CTI is a junctional representation of the ASR and NLU networks when both networks are pre-trained with the same vocabulary. Thus, we can train our SLU network in an E2E manner without additional modules, such as Gumbel-Softmax. We evaluate our model using SLURP, a challenging SLU dataset and achieve state-of-the-art scores on intent classification and slot filling tasks. We also verify that the NLU network, pre-trained with Masked Language Model (MLM), can utilize a noisy textual representation of CTI. Moreover, we train our model with extra data, SLURP-Synth, and get better results.

키워드

end-to-end spoken language understandinginterface of networksintent classificationslot filling
제목
INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
저자
Seo, SeunghyunKwak, DonghyunLee, Bowon
DOI
10.1109/ICASSP43922.2022.9747047
발행일
2022
유형
Proceedings Paper
저널명
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
페이지
7152 ~ 7156