基於語音增強技術之多模態訓練語料強健類神經網路聲學模型;Study of Robustness of DNN Acoustic Modeling Based on Multi-style Training with Speech Enhancement

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/74692

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/74692

Title:	基於語音增強技術之多模態訓練語料強健類神經網路聲學模型;Study of Robustness of DNN Acoustic Modeling Based on Multi-style Training with Speech Enhancement
Authors:	李安德;Zezario, Ryandhimas Edo
Contributors:	資訊工程學系
Keywords:	deep learning;deep neural networks;multi-style training;deep denoising autencoder;extreme learning;hierarchical extreme learning;spectral restoration;automatic speech recognition;deep learning;deep neural networks;multi-style training;deep denoising autencoder;extreme learning;hierarchical extreme learning;spectral restoration;automatic speech recognition
Date:	2017-07-26
Issue Date:	2017-10-27 14:36:25 (UTC+8)
Publisher:	國立中央大學
Abstract:	本研究提出了一種用於聲學建模的語音增強（MTSE）的多風格訓練，以實現強健的自動語音識別。以前的研究已經證實通過使用來自不同聲學條件的訓練數據（可以通過在不同記錄條件下收集數據或通過將噪聲注入乾淨的話語來獲得），基於深神經網絡（DNN）的聲學模型可以被訓練為對不良聲學條件更加強健。在本研究中，MTSE方法採用相同的概念，包括機器學習和基於頻譜回復的語音增強，來產生恢復的語音數據，並用它來擴展原始訓練集。通過對原始訓練數據擴增語音增強恢復的數據，基於DNN的聲學模型可以捕獲輸入分佈中的而外結構，並在異質條件下決定更準確的決策邊界。提出的MTSE方法在Aurora-4（具有模擬嘈雜語音的標準化英語ASR任務）和MATBN（具有現實世界記錄的噪聲的標準化ASR任務）數據集進行評估。實驗結果顯示，與Aurora-4 tsk基線系統相比，提出的MTSE系統在字錯誤率（10.01％〜9.06％）中顯著降低9.49％，當與MATBN任務的基線系統相比時，減少了6.15％字符錯誤率（CER）（即12.84％至12.05％）。結果表明，提出的MTSE方法可以成為可行解決方案來處理真實噪聲強健ASR中的噪聲問題。;This study presents a multi-style training with speech enhancement (MTSE) for acoustic modeling to achieve robust automatic speech recognition. Previous studies have confirmed that by using training data from diverse acoustic conditions (which can be obtained either by collecting data under different recording conditions or by injecting noise into clean utterances), acoustic models based on deep neural network (DNN) can be trained more robust to adverse acoustic conditions. In this study, the MTSE approach adopts the same concept and includes machine learning and spectral restoration based speech enhancement to generate restored speech data and use it to expand the original training set. By augmenting the speech enhancement restored data with the original training data, the DNN-based acoustic models can capture additional structures in the input distribution and determine more accurate decision boundaries in heterogeneous conditions. The proposed MTSE approach was evaluated on the Aurora-4 (a standardized English ASR task with simulated noisy speech) and MATBN (a standardized Mandarin ASR task with real-world recorded noisy speech) datasets. Experimental results show that the proposed MTSE system can yield a notable reduction of 9.49% in the word error rate (from 10.01% to 9.06%) when compared to the baseline system on the Aurora-4 task and a reduction of 6.15 % in the Character error rate (CER) (i.e., from 12.84% to 12.05%) when compared to the baseline system on the MATBN task. The results suggest that the proposed MTSE approach can be a feasible solution to handle the noise issue in the real-world noise robust ASR.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	206	View/Open

社群 sharing

Loading...