中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90048
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 42409535      Online Users : 985
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/90048


    Title: 應用遮罩語言模型於語碼轉換語音識別;Masked Language Model for Code-Switching Automatic Speech Recognition
    Authors: 陳振鎧;Chen, Cheng-Kai
    Contributors: 資訊工程學系
    Keywords: 語音識別;語碼轉換;遮罩語音模型;Speech Recognition;Code-Switching;Masked Langauge Model
    Date: 2022-09-23
    Issue Date: 2022-10-04 12:09:05 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 近年來,使用語言模型改善端到端語音識別模型的輸出,已然成為單語言語音識別領域的主流方法,但相較於單語言任務的語言模型,語碼轉換任務因其文句結構的特殊性,不僅用於訓練模型的資料極為缺乏,而且傳統模型架構也不易學習多語言的語意資訊。因此,為了解決上述兩個問題,本論文引入遮罩語言模型到語碼轉換語音識別系統內,期望透過通用的語言知識和雙向的內文資訊,使系統產生更精準的結果。其中,遮罩語言模型會使用未標記資料進行自監督式的預訓練以取得通用的語言知識,之後再將模型遷移至語碼轉換語音識別領域進行適應。除此之外,由於遮罩語言模型的訓練會使用完整的雙向內文資訊,同時也會大幅增強語意的理解和模型的效能。因此,我們藉助遮罩語言模型所帶來的優勢,將其應用在語碼轉換語言模型的建立並對端到端語音識別模型的輸出序列進行重評分,以改善整體系統的效能。在本論文中,我們提出將遮罩語言模型取代傳統因果語言模型和加成在標準語音識別系統上的兩種使用方式,並在語碼轉換語料庫SEAME上進行實驗,最終,這兩種系統相較於標準架構,分別取得了最多7%和8.4%的相對混合錯誤率,證明了我們提出的方法能夠解決前述所提到的問題,增強語碼轉換語音識別系統的效能。;In recent years, the use of language models to improve the output of end-to-end speech recognition models has become the mainstream method in the field of monolingual speech recognition. Not only the data for training the model is extremely scarce, but also the traditional model architecture is not easy to learn multilingual semantic information. Therefore, in order to solve the above two problems, this paper introduces a masked language model into the code-switching speech recognition system, hoping to make the system produce more accurate results through general language knowledge and bidirectional context information. Among them, the masked language model uses unlabeled data for self-supervised pre-training to obtain general language knowledge, and then the model is transferred to the field of code-switching speech recognition for adaptation. In addition, since the training of the masked language model will use the complete bidirectional contextual information, it will also greatly enhance the semantic understanding and the performance of the model. Therefore, we take advantage of the masking language model and apply it to establish code-switching language model and re-score the output sequence of the end-to-end speech recognition model to improve the performance of the overall system. In this paper, we propose to replace the traditional causal language model and add the masked language model on the standard speech recognition system, and conduct experiments on the code-switched corpus SEAME. Finally, the two systems are compared. Compared with the standard architecture, relative mixed error rates of up to 7% and 8.4% were achieved, respectively, proving that our proposed method can solve the aforementioned problems and enhance the performance of the code-switched speech recognition system.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML73View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明