快速-長短期記憶聲學模型於遠距語音辨識及喚醒關鍵字任務

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	特利安	zh_TW
DC.creator	Rezki Trianto	en_US
dc.date.accessioned	2017-8-22T07:39:07Z
dc.date.available	2017-8-22T07:39:07Z
dc.date.issued	2017
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104522605
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	自動語音辨識系統近年來已廣泛地運用在人類生活的各個角落當中，其快速的發展對人類社會有著極大的影響。儘管語音辨識技術近年突飛猛進，仍然有許多方面尚待突破。因此本文嘗試提出新的方法來改善語音辨識的精準度。本文大致上可分成兩個部分: 第一個部分為本文所提出的新的辨識方法－快速長短期記憶聲學模型 (Fast-LSTM)。這個方法主要將延時類神經網路(TDNN)的優點導入各種不同的長短期記憶模型中，藉以提升模型在語音辨識上的速度。文章中我們藉由長距語音以及多聲道音頻來作為模型檢測的樣本。結果發現，與延時類神經網路與深度神經網路(DNN)比較，本文所提出的模型確實可提升語音辨識的速度，然而於精準度上不論是傳統長短期記憶法與本文所提出的快速長短期記憶法，都不及於深度神經網路來的好。本文後半部分將提及其實驗上的一些限制及待改進的部分。本文的第二個部分為快速長短期記憶聲學模型於關鍵字偵測的運用。實驗結果發現，快速長短期記憶聲學模型在關鍵字的辨識及偵測上可以比過去既有的模型減少10%的錯誤率。	zh_TW
dc.description.abstract	Automatic speech recognition (ASR) is very rapidly developed in several years in the field of machine learning research. Many applications of ASR are applied in everyday life, such as smart assistant or subtitle generation. In this thesis, we propose two systems. The first system is the automatic speech recognition that is using Fast-LSTM acoustic models. This proposed system utilizes the architecture of TDNN to learn the short temporal features of the inputs on some initial layers and followed by several LSTM layers above it. The CHiME3 dataset that focus on distant-talking and multi-channel audio is used in the experiment. As the front-end system, GEV beamformer utilized by BLSTM network is used to improve the quality of the utterance speech. In the experimental results, the Fast-LSTM model produces faster training time than the standard LSTM or DNN. However, the error rate obtained by using DNN is better than using LSTM or Fast-LSTM, that achieve a 4.87% of word error rate. Some limitation of the training process will be discussed in this thesis. In the second system, the Wake-up-word task is implemented, which is the sub-task of speech recognition. The trained Fast-LSTM model is used as the acoustic model by utilizing two-step classification and use the confidence measures for each generated phoneme from keyword to detect the keyword. The results obtained from the system can detect keywords well by produce a 10% error rate.	en_US
DC.subject	自動語音辨識	zh_TW
DC.subject	延時類神經網路	zh_TW
DC.subject	長短期記憶	zh_TW
DC.subject	喚醒關鍵字	zh_TW
DC.subject	波束賦形	zh_TW
DC.subject	automatic speech recognition	en_US
DC.subject	time delay neural network	en_US
DC.subject	long short-term memory	en_US
DC.subject	wake-up-word	en_US
DC.subject	beamforming	en_US
DC.title	快速-長短期記憶聲學模型於遠距語音辨識及喚醒關鍵字任務	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Fast-LSTM Acoustic Model for Distant Speech Recognition and Wake-up-word Task	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 104522605 完整後設資料紀錄