博碩士論文 104522097 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator王騰輝zh_TW
DC.creatorWang Tenghuien_US
dc.date.accessioned2022-1-25T07:39:07Z
dc.date.available2022-1-25T07:39:07Z
dc.date.issued2022
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104522097
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract語碼轉換是一種常見的語言表達模式,即對話中出現兩種以上的語言交錯使用。中文和英語作為世界上的主流語言,中英混合的語言表達方式更是日常的對話交流中不可或缺的,然而不同的族群、環境、說話習慣都會影響語碼轉換的組成特性。 目前語音辨識研究中對此類語碼轉換文本的訓練語料依舊稀缺,因此本論文將利用類神經網路訓練出一個生成器生成語碼轉換文本,並作為kaldi toolkit語言模型的資料增強語料以達到改善中英混合辨識率的目的。 本論文使用的方法是基於SEAME語料庫中的中英文本,訓練 BERT-BiLSTM-CRF模型以得知語碼轉換位置,再將純中文文本應該被進行語碼轉化的位置翻譯成英文,藉此產生符合此語料庫特性的句子。zh_TW
dc.description.abstractCode-switching is a common mode of language expression, that is, the interleaved use of more than two languages in a conversation. Chinese and English are the mainstream languages in the world, and Chinese-English mixed language expressions are indispensable in daily dialogue and communication. However, different ethnic groups, environments, and speaking habits will affect the composition of code-switching characteristics. At present, the training corpus of such code-switched texts in speech recognition research is still scarce. Therefore, this paper will use a neural network to train a generator to generate code-switched texts, and use them as the training data of the kaldi toolkit language model .To achieve the purpose of better accuracy of Chinese-English.speech recognition. The method used in this paper is based on the Chinese-English Code-switching corpus in the SEAME dataset. We training the BERT-BiLSTM-CRF model to know the Code-switching position, and then translate the Chinese words into English words, thereby generating Sentences that match the characteristics of this corpus.en_US
DC.subject語碼轉換zh_TW
DC.subject語音辨識zh_TW
DC.subject基於變換器的雙向編碼器表示技術zh_TW
DC.subject自然語言處理zh_TW
DC.subjectCode-Switchingen_US
DC.subjectSpeech Recognitionen_US
DC.subjectBERTen_US
DC.subjectNLPen_US
DC.title語碼轉換文本資料增強於中英混合語音辨識zh_TW
dc.language.isozh-TWzh-TW
DC.titleCode-Switching Data Augmentation For Chinese-English Speech Recognitionen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明