類神經網路訓練結合局部資訊於強健性語音辨識之研究

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	徐家鏞	zh_TW
DC.creator	Chia-yung Hsu	en_US
dc.date.accessioned	2015-8-25T07:39:07Z
dc.date.available	2015-8-25T07:39:07Z
dc.date.issued	2015
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=102522123
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	語音是一個在人類社會中不可或缺的要素，隨著科技的進步，人們依靠電腦來處理生活中大大小小事情的比例越來越高，因此為了使電腦能夠處理語音的資料，語音辨識即成為了一個重要的課題。目前的語音辨識技術在乾淨的數字語音辨識上能夠有很好的辨識結果，但是我們實際生活的環境充滿了與我們辨識內容無關的噪音，隨著訊噪比(SNR)越來越低，語音辨識率也不可避免的隨之下降。因此，找出能夠在噪音環境下提升進行語音辨識的方法在我們實際生活的應用上顯得非常重要。近年來，類神經網路 (Neural Network) 在語音辨識上的研究有著豐碩的成果，有效地減少環境以及語者變異對語音訊號造成的影響，大幅提升辨識率，但系統的語音辨識能力仍有改善空間。本論文即提出新的自動語音辨識系統架構，結合Environment Clustering (EC)、Mixture of Experts與類神經網路以進一步提升系統效能。我們將辨識系統分為Offline與Online兩階段：Offline階段依據聲學特性將整個訓練資料集分割成多個子訓練資料集，並建立各子訓練資料集的類神經網路(以類神經子網路稱之)。Online階段則使用GMM-gate來控制類神經子網路的輸出。新提出的系統架構保留子訓練資料集的聲學特性，強健語音辨識系統。實驗上，我們使用Aurora 2連續數字語音資料庫，依據字錯誤率(word error rate, WER)比較我們提出的語音辨識系統架構與傳統以類神經網路建立的辨識系統，平均字錯誤率進步6.86% ，由5.25%降低至4.89%。	zh_TW
dc.description.abstract	Speech sounds is an essential element in human society. With the advance of science and technology, the proportion of people rely on computers to handle everything in our daily life more and more. In order to make the computer capable of handling speech data, speech recognition has become an important issue. Automatic speech recognition (ASR) in clean speech data can achieve good results but the environment we live is full of noise. As the speech SNR get lower and lower, the speech recognition accuracy inevitably decreased. For this reason, find a way to improve the noise speech recognize capability is important in our actual life. Recently, ASR using neural network (NN) based acoustic model (AM) has achieved significant improvements. However, the mismatch (including speaker and speaking environment) of training and testing conditions still confines the applicability of ASR. This paper proposes a novel approach that combines the environment clustering (EC) and mixture of experts (MOE) algorithms (thus the proposed approach is termed EC-MOE) to enhance the robustness of ASR against mismatches. In the offline phase, we split the entire training set into several subsets, with each subset characterizing a specific speaker and speaking environment. Then, we use each subset of training data to prepare an NN-based AM. In the online phase, we use a Gaussian mixture model (GMM)-gate to determine the optimal output from the multiple NN-based AMs to render the final recognition results. We evaluated the proposed EC-MOE approach on the Aurora 2 continuous digital speech recognition task. Comparing to the baseline system, where only a single NN-based AM is used for recognition, the proposed approach achieves a clear word error rate (WER) reduction of 6.86 % (5.25% to 4.89%).	en_US
DC.subject	類神經網路	zh_TW
DC.subject	強健性語音辨識	zh_TW
DC.subject	環境群集	zh_TW
DC.subject	Artificial Neural Network	en_US
DC.subject	Robust Speech Recognition	en_US
DC.subject	Environment Clustering	en_US
DC.title	類神經網路訓練結合局部資訊於強健性語音辨識之研究	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Artificial Neural Network Incorporating Regional Information Training for Robust Speech Recognition	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 102522123 完整後設資料紀錄