博碩士論文 104522605 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator特利安zh_TW
DC.creatorRezki Triantoen_US
dc.date.accessioned2017-8-22T07:39:07Z
dc.date.available2017-8-22T07:39:07Z
dc.date.issued2017
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104522605
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract自動語音辨識系統近年來已廣泛地運用在人類生活的各個角落當中,其快速的發展對人類社會有著極大的影響。儘管語音辨識技術近年突飛猛進,仍然有許多方面尚待突破。因此本文嘗試提出新的方法來改善語音辨識的精準度。本文大致上可分成兩個部分: 第一個部分為本文所提出的新的辨識方法-快速長短期記憶聲學模型 (Fast-LSTM)。這個方法主要將延時類神經網路(TDNN)的優點導入各種不同的長短期記憶模型中,藉以提升模型在語音辨識上的速度。文章中我們藉由長距語音以及多聲道音頻來作為模型檢測的樣本。結果發現,與延時類神經網路與深度神經網路(DNN)比較,本文所提出的模型確實可提升語音辨識的速度,然而於精準度上不論是傳統長短期記憶法與本文所提出的快速長短期記憶法,都不及於深度神經網路來的好。本文後半部分將提及其實驗上的一些限制及待改進的部分。 本文的第二個部分為快速長短期記憶聲學模型於關鍵字偵測的運用。實驗結果發現,快速長短期記憶聲學模型在關鍵字的辨識及偵測上可以比過去既有的模型減少10%的錯誤率。zh_TW
dc.description.abstractAutomatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis. In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate.en_US
DC.subject自動語音辨識zh_TW
DC.subject延時類神經網路zh_TW
DC.subject長短期記憶zh_TW
DC.subject喚醒關鍵字zh_TW
DC.subject波束賦形zh_TW
DC.subjectautomatic speech recognitionen_US
DC.subjecttime delay neural networken_US
DC.subjectlong short-term memoryen_US
DC.subjectwake-up-worden_US
DC.subjectbeamformingen_US
DC.title快速-長短期記憶聲學模型於遠距語音辨識及喚醒關鍵字任務zh_TW
dc.language.isozh-TWzh-TW
DC.titleFast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Tasken_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明