基於語音增強技術之多模態訓練語料強健類神經網路聲學模型

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	李安德	zh_TW
DC.creator	Ryandhimas Edo Zezario	en_US
dc.date.accessioned	2017-7-26T07:39:07Z
dc.date.available	2017-7-26T07:39:07Z
dc.date.issued	2017
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104522606
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	本研究提出了一種用於聲學建模的語音增強（MTSE）的多風格訓練，以實現強健的自動語音識別。以前的研究已經證實通過使用來自不同聲學條件的訓練數據（可以通過在不同記錄條件下收集數據或通過將噪聲注入乾淨的話語來獲得），基於深神經網絡（DNN）的聲學模型可以被訓練為對不良聲學條件更加強健。在本研究中，MTSE方法採用相同的概念，包括機器學習和基於頻譜回復的語音增強，來產生恢復的語音數據，並用它來擴展原始訓練集。通過對原始訓練數據擴增語音增強恢復的數據，基於DNN的聲學模型可以捕獲輸入分佈中的而外結構，並在異質條件下決定更準確的決策邊界。提出的MTSE方法在Aurora-4（具有模擬嘈雜語音的標準化英語ASR任務）和MATBN（具有現實世界記錄的噪聲的標準化ASR任務）數據集進行評估。實驗結果顯示，與Aurora-4 tsk基線系統相比，提出的MTSE系統在字錯誤率（10.01％〜9.06％）中顯著降低9.49％，當與MATBN任務的基線系統相比時，減少了6.15％字符錯誤率（CER）（即12.84％至12.05％）。結果表明，提出的MTSE方法可以成為可行解決方案來處理真實噪聲強健ASR中的噪聲問題。	zh_TW
dc.description.abstract	This study presents a multi-style training with speech enhancement (MTSE) for acoustic modeling to achieve robust automatic speech recognition. Previous studies have confirmed that by using training data from diverse acoustic conditions (which can be obtained either by collecting data under different recording conditions or by injecting noise into clean utterances), acoustic models based on deep neural network (DNN) can be trained more robust to adverse acoustic conditions. In this study, the MTSE approach adopts the same concept and includes machine learning and spectral restoration based speech enhancement to generate restored speech data and use it to expand the original training set. By augmenting the speech enhancement restored data with the original training data, the DNN-based acoustic models can capture additional structures in the input distribution and determine more accurate decision boundaries in heterogeneous conditions. The proposed MTSE approach was evaluated on the Aurora-4 (a standardized English ASR task with simulated noisy speech) and MATBN (a standardized Mandarin ASR task with real-world recorded noisy speech) datasets. Experimental results show that the proposed MTSE system can yield a notable reduction of 9.49% in the word error rate (from 10.01% to 9.06%) when compared to the baseline system on the Aurora-4 task and a reduction of 6.15 % in the Character error rate (CER) (i.e., from 12.84% to 12.05%) when compared to the baseline system on the MATBN task. The results suggest that the proposed MTSE approach can be a feasible solution to handle the noise issue in the real-world noise robust ASR.	en_US
DC.subject	deep learning	zh_TW
DC.subject	deep neural networks	zh_TW
DC.subject	multi-style training	zh_TW
DC.subject	deep denoising autencoder	zh_TW
DC.subject	extreme learning	zh_TW
DC.subject	hierarchical extreme learning	zh_TW
DC.subject	spectral restoration	zh_TW
DC.subject	automatic speech recognition	zh_TW
DC.subject	deep learning	en_US
DC.subject	deep neural networks	en_US
DC.subject	multi-style training	en_US
DC.subject	deep denoising autencoder	en_US
DC.subject	extreme learning	en_US
DC.subject	hierarchical extreme learning	en_US
DC.subject	spectral restoration	en_US
DC.subject	automatic speech recognition	en_US
DC.title	基於語音增強技術之多模態訓練語料強健類神經網路聲學模型	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Study of Robustness of DNN Acoustic Modeling Based on Multi-style Training with Speech Enhancement	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 104522606 完整後設資料紀錄