應用遮罩語言模型於語碼轉換語音識別;Masked Language Model for Code-Switching Automatic Speech Recognition

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/90048

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/90048

Title:	應用遮罩語言模型於語碼轉換語音識別;Masked Language Model for Code-Switching Automatic Speech Recognition
Authors:	陳振鎧;Chen, Cheng-Kai
Contributors:	資訊工程學系
Keywords:	語音識別;語碼轉換;遮罩語音模型;Speech Recognition;Code-Switching;Masked Langauge Model
Date:	2022-09-23
Issue Date:	2022-10-04 12:09:05 (UTC+8)
Publisher:	國立中央大學
Abstract:	近年來，使用語言模型改善端到端語音識別模型的輸出，已然成為單語言語音識別領域的主流方法，但相較於單語言任務的語言模型，語碼轉換任務因其文句結構的特殊性，不僅用於訓練模型的資料極為缺乏，而且傳統模型架構也不易學習多語言的語意資訊。因此，為了解決上述兩個問題，本論文引入遮罩語言模型到語碼轉換語音識別系統內，期望透過通用的語言知識和雙向的內文資訊，使系統產生更精準的結果。其中，遮罩語言模型會使用未標記資料進行自監督式的預訓練以取得通用的語言知識，之後再將模型遷移至語碼轉換語音識別領域進行適應。除此之外，由於遮罩語言模型的訓練會使用完整的雙向內文資訊，同時也會大幅增強語意的理解和模型的效能。因此，我們藉助遮罩語言模型所帶來的優勢，將其應用在語碼轉換語言模型的建立並對端到端語音識別模型的輸出序列進行重評分，以改善整體系統的效能。在本論文中，我們提出將遮罩語言模型取代傳統因果語言模型和加成在標準語音識別系統上的兩種使用方式，並在語碼轉換語料庫SEAME上進行實驗，最終，這兩種系統相較於標準架構，分別取得了最多7%和8.4%的相對混合錯誤率，證明了我們提出的方法能夠解決前述所提到的問題，增強語碼轉換語音識別系統的效能。;In recent years, the use of language models to improve the output of end-to-end speech recognition models has become the mainstream method in the field of monolingual speech recognition. Not only the data for training the model is extremely scarce, but also the traditional model architecture is not easy to learn multilingual semantic information. Therefore, in order to solve the above two problems, this paper introduces a masked language model into the code-switching speech recognition system, hoping to make the system produce more accurate results through general language knowledge and bidirectional context information. Among them, the masked language model uses unlabeled data for self-supervised pre-training to obtain general language knowledge, and then the model is transferred to the field of code-switching speech recognition for adaptation. In addition, since the training of the masked language model will use the complete bidirectional contextual information, it will also greatly enhance the semantic understanding and the performance of the model. Therefore, we take advantage of the masking language model and apply it to establish code-switching language model and re-score the output sequence of the end-to-end speech recognition model to improve the performance of the overall system. In this paper, we propose to replace the traditional causal language model and add the masked language model on the standard speech recognition system, and conduct experiments on the code-switched corpus SEAME. Finally, the two systems are compared. Compared with the standard architecture, relative mixed error rates of up to 7% and 8.4% were achieved, respectively, proving that our proposed method can solve the aforementioned problems and enhance the performance of the code-switched speech recognition system.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	73	View/Open

社群 sharing

Loading...