博碩士論文 104522606 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator李安德zh_TW
DC.creatorRyandhimas Edo Zezarioen_US
dc.date.accessioned2017-7-26T07:39:07Z
dc.date.available2017-7-26T07:39:07Z
dc.date.issued2017
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104522606
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract本研究提出了一種用於聲學建模的語音增強(MTSE)的多風格訓練,以實現強健的自動語音識別。以前的研究已經證實通過使用來自不同聲學條件的訓練數據(可以通過在不同記錄條件下收集數據或通過將噪聲注入乾淨的話語來獲得),基於深神經網絡(DNN)的聲學模型可以被訓練為對不良聲學條件更加強健。在本研究中,MTSE方法採用相同的概念,包括機器學習和基於頻譜回復的語音增強,來產生恢復的語音數據,並用它來擴展原始訓練集。通過對原始訓練數據擴增語音增強恢復的數據,基於DNN的聲學模型可以捕獲輸入分佈中的而外結構,並在異質條件下決定更準確的決策邊界。 提出的MTSE方法在Aurora-4(具有模擬嘈雜語音的標準化英語ASR任務)和MATBN(具有現實世界記錄的噪聲的標準化ASR任務)數據集進行評估。 實驗結果顯示,與Aurora-4 tsk基線系統相比,提出的MTSE系統在字錯誤率(10.01%〜9.06%)中顯著降低9.49%,當與MATBN任務的基線系統相比時,減少了6.15%字符錯誤率(CER)(即12.84%至12.05%)。結果表明,提出的MTSE方法可以成為可行解決方案來處理真實噪聲強健ASR中的噪聲問題。zh_TW
dc.description.abstractThis study presents a multi-style training with speech enhancement (MTSE) for acoustic modeling to achieve robust automatic speech recognition. Previous studies have confirmed that by using training data from diverse acoustic conditions (which can be obtained either by collecting data under different recording conditions or by injecting noise into clean utterances), acoustic models based on deep neural network (DNN) can be trained more robust to adverse acoustic conditions. In this study, the MTSE approach adopts the same concept and includes machine learning and spectral restoration based speech enhancement to generate restored speech data and use it to expand the original training set. By augmenting the speech enhancement restored data with the original training data, the DNN-based acoustic models can capture additional structures in the input distribution and determine more accurate decision boundaries in heterogeneous conditions. The proposed MTSE approach was evaluated on the Aurora-4 (a standardized English ASR task with simulated noisy speech) and MATBN (a standardized Mandarin ASR task with real-world recorded noisy speech) datasets. Experimental results show that the proposed MTSE system can yield a notable reduction of 9.49% in the word error rate (from 10.01% to 9.06%) when compared to the baseline system on the Aurora-4 task and a reduction of 6.15 % in the Character error rate (CER) (i.e., from 12.84% to 12.05%) when compared to the baseline system on the MATBN task. The results suggest that the proposed MTSE approach can be a feasible solution to handle the noise issue in the real-world noise robust ASR.en_US
DC.subjectdeep learningzh_TW
DC.subjectdeep neural networkszh_TW
DC.subjectmulti-style trainingzh_TW
DC.subjectdeep denoising autencoderzh_TW
DC.subjectextreme learningzh_TW
DC.subjecthierarchical extreme learningzh_TW
DC.subjectspectral restorationzh_TW
DC.subjectautomatic speech recognitionzh_TW
DC.subjectdeep learningen_US
DC.subjectdeep neural networksen_US
DC.subjectmulti-style trainingen_US
DC.subjectdeep denoising autencoderen_US
DC.subjectextreme learningen_US
DC.subjecthierarchical extreme learningen_US
DC.subjectspectral restorationen_US
DC.subjectautomatic speech recognitionen_US
DC.title基於語音增強技術之多模態訓練語料強健類神經網路聲學模型zh_TW
dc.language.isozh-TWzh-TW
DC.titleStudy of Robustness of DNN Acoustic Modeling Based on Multi-style Training with Speech Enhancementen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明