博碩士論文 102522123 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator徐家鏞zh_TW
DC.creatorChia-yung Hsuen_US
dc.date.accessioned2015-8-25T07:39:07Z
dc.date.available2015-8-25T07:39:07Z
dc.date.issued2015
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=102522123
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract語音是一個在人類社會中不可或缺的要素,隨著科技的進步,人們依靠電腦來處理生活中大大小小事情的比例越來越高,因此為了使電腦能夠處理語音的資料,語音辨識即成為了一個重要的課題。 目前的語音辨識技術在乾淨的數字語音辨識上能夠有很好的辨識結果,但是我們實際生活的環境充滿了與我們辨識內容無關的噪音,隨著訊噪比(SNR)越來越低,語音辨識率也不可避免的隨之下降。因此,找出能夠在噪音環境下提升進行語音辨識的方法在我們實際生活的應用上顯得非常重要。 近年來,類神經網路 (Neural Network) 在語音辨識上的研究有著豐碩的成果,有效地減少環境以及語者變異對語音訊號造成的影響,大幅提升辨識率,但系統的語音辨識能力仍有改善空間。本論文即提出新的自動語音辨識系統架構,結合Environment Clustering (EC)、Mixture of Experts與類神經網路以進一步提升系統效能。我們將辨識系統分為Offline與Online兩階段:Offline階段依據聲學特性將整個訓練資料集分割成多個子訓練資料集,並建立各子訓練資料集的類神經網路(以類神經子網路稱之)。Online階段則使用GMM-gate來控制類神經子網路的輸出。新提出的系統架構保留子訓練資料集的聲學特性,強健語音辨識系統。實驗上,我們使用Aurora 2連續數字語音資料庫,依據字錯誤率(word error rate, WER)比較我們提出的語音辨識系統架構與傳統以類神經網路建立的辨識系統,平均字錯誤率進步6.86% ,由5.25%降低至4.89%。zh_TW
dc.description.abstractSpeech sounds is an essential element in human society. With the advance of science and technology, the proportion of people rely on computers to handle everything in our daily life more and more. In order to make the computer capable of handling speech data, speech recognition has become an important issue. Automatic speech recognition (ASR) in clean speech data can achieve good results but the environment we live is full of noise. As the speech SNR get lower and lower, the speech recognition accuracy inevitably decreased. For this reason, find a way to improve the noise speech recognize capability is important in our actual life. Recently, ASR using neural network (NN) based acoustic model (AM) has achieved significant improvements. However, the mismatch (including speaker and speaking environment) of training and testing conditions still confines the applicability of ASR. This paper proposes a novel approach that combines the environment clustering (EC) and mixture of experts (MOE) algorithms (thus the proposed approach is termed EC-MOE) to enhance the robustness of ASR against mismatches. In the offline phase, we split the entire training set into several subsets, with each subset characterizing a specific speaker and speaking environment. Then, we use each subset of training data to prepare an NN-based AM. In the online phase, we use a Gaussian mixture model (GMM)-gate to determine the optimal output from the multiple NN-based AMs to render the final recognition results. We evaluated the proposed EC-MOE approach on the Aurora 2 continuous digital speech recognition task. Comparing to the baseline system, where only a single NN-based AM is used for recognition, the proposed approach achieves a clear word error rate (WER) reduction of 6.86 % (5.25% to 4.89%).en_US
DC.subject類神經網路zh_TW
DC.subject強健性語音辨識zh_TW
DC.subject環境群集zh_TW
DC.subjectArtificial Neural Networken_US
DC.subjectRobust Speech Recognitionen_US
DC.subjectEnvironment Clusteringen_US
DC.title類神經網路訓練結合局部資訊於強健性語音辨識之研究zh_TW
dc.language.isozh-TWzh-TW
DC.titleArtificial Neural Network Incorporating Regional Information Training for Robust Speech Recognitionen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明