摘要(中) 語音辨識是人工智慧相當關注的領域,但受限於不同環境的影響,至今依舊
本研究使用github 上所提供的無噪聲語料,以不同的處理方式建構遞歸神
經網絡模型,並選定一些變因做為探討比較的對象。摘要(英) Speech recognition is part of the artificial intelligence that is highly
concerned, but is limited by different environmental influences. It is
still a difficult subject to have a system that can be clearly identified
as humans. This study aims to investigate the functionality of the Mel
Frequency Cepstral Coefficients (MFCCs) and the Connectionist Temporal
Classification (CTC) on speech recognition systems. This study uses the
noise-free corpus provided on github to construct a recursive neural
network model in different ways, and selects some variables as the object
of discussion and comparison.關鍵字(中) ★ 語音辨識 關鍵字(英) 論文目次 摘要 i
Abstract ii
致謝 iii
目錄 iv
表目錄 v
圖目錄 vi
一、緒論 1
1-1研究動機 1
1-2研究目的 1
1-3研究問題 1
二、論文背景知識與相關文獻探討 2
2-1梅爾頻率倒譜系數 (Mel-Frequency Cepstral Coefficients,MFCCs) 2
2-1-1梅爾刻度(Mel scale) 2
2-1-2濾波流程 4
2-2 連接性音頻分類(Connectionist Temporal Classifcation,CTC) 7
2-2-1連接性音頻分類運作簡介 8
2-2-2模型訓練過程之推導 9
2-2-3標籤錯誤率(Llabel error rate) 14
2-3 遞歸神經網絡(Recurrent Neural Networks ,RNN) 15
2-3-1激活函數(Activation function) 15
2-3-2遞歸神經網絡神經元(Recurrent Neural Networks cell) 18
2-3-3長短期記憶神經元(Long Short-Term Memory Network cell) 19
2-3-4遞歸神經網絡原理 23
三、數據庫與實驗模型介紹 24
3-1 實驗框架介紹 24
3-2數據集介紹 24
3-3實驗變因 26
3-4問題敘述和實作流程 26
四、結果與討論 27
4-1 實驗一的模型表現 27
4-1-1數據集(8-bit) 27
4-1-2數據集(16-bit) 33
4-1-3數據集(降速) 36
4-2 實驗二的模型表現 36
五、結論與未來展望 41
