深度學習唇語辨識之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：18.218.135.221

姓名

魏湘凌(Hsiang-Ling Wei) 查詢紙本館藏

畢業系所

資訊管理學系在職專班

論文名稱

深度學習唇語辨識之研究
(A Study on Lip Reading Recognition using Deep Learning)

相關論文

★ 利用資料探勘技術建立商用複合機銷售預測模型	★ 應用資料探勘技術於資源配置預測之研究-以某電腦代工支援單位為例
★ 資料探勘技術應用於航空業航班延誤分析-以C公司為例	★ 全球供應鏈下新產品的安全控管-以C公司為例
★ 資料探勘應用於半導體雷射產業-以A公司為例	★ 應用資料探勘技術於空運出口貨物存倉時間預測-以A公司為例
★ 使用資料探勘分類技術優化YouBike運補作業	★ 特徵屬性篩選對於不同資料類型之影響
★ 資料探勘應用於B2B網路型態之企業官網研究-以T公司為例	★ 衍生性金融商品之客戶投資分析與建議-整合分群與關聯法則技術
★ 應用卷積式神經網路建立肝臟超音波影像輔助判別模型	★ 基於卷積神經網路之身分識別系統
★ 能源管理系統電能補值方法誤差率比較分析	★ 企業員工情感分析與管理系統之研發
★ 資料淨化於類別不平衡問題: 機器學習觀點	★ 資料探勘技術應用於旅客自助報到之分析—以C航空公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-1以後開放)

摘要(中)

近年來，深度學習已成為人工智慧領域的一個熱門研究方向，深度學習利用多層神經網絡從大量數據中學習特徵和模式，並從中生成高度准確的預測和分類結果。它已被成功應用於語音辨識、圖像識別、自然語言處理等領域，成為當今人工智慧發展的重要推動力。
本論文將深度學習應用於唇語辨識中，利用深度學習的訓練技術來分析人們說話時嘴唇的型態及動作變化來識別語音，以MIRACL-VC1 Dataset為樣本，透過深度學習的技術使用卷積神經網絡(CNN)取唇部特徵值並分別於長短期記憶模型(LSTM)及雙向長短期記憶網絡(BiLSTM)進行訓練並比較其片語準確率，透過適當的數據前處理技術，如時間序列正規化，以及參數的調整，實驗結果皆以ResNet152模型呈現出較好的表現，其中ResNet152與BiLSTM結合後的準確率最高。

摘要(英)

In recent years, deep learning has emerged as a popular research direction in the field of artificial intelligence. Deep learning leverages multi-layer neural networks to learn features and patterns from vast amounts of data, generating highly accurate predictions and classifications. It has been successfully applied in various domains, including speech recognition, image recognition, natural language processing, and has become a significant driving force in the advancement of artificial intelligence.
This paper focuses on applying deep learning to lip reading, utilizing deep learning training techniques to analyze the shape and motion variations of the lips during speech in order to recognize spoken words. The MIRACL-VC1 Dataset is used as the sample dataset. Deep learning techniques, specifically Convolutional Neural Networks (CNN), are employed to extract lip features, followed by training with both Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) models. The phrase accuracy of these models is compared. Through appropriate data preprocessing techniques such as time series normalization and parameter adjustment, experimental results demonstrate that the ResNet152 model consistently exhibits superior performance. Particularly, the highest accuracy is achieved when ResNet152 is combined with BiLSTM.
In summary, this paper explores the application of deep learning to lip reading, employing deep learning techniques to analyze lip shape and motion during speech for speech recognition. The MIRACL-VC1 Dataset is used, and lip features are extracted using a Convolutional Neural Network (CNN). Training is performed with LSTM and BiLSTM models. By employing suitable data preprocessing techniques and parameter adjustments, experimental results consistently highlight the superior performance of the ResNet152 model, particularly when combined with BiLSTM.

關鍵字(中)

★ 深度學習
★ 唇語辨識

關鍵字(英)

論文目次

摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
1.4 研究架構 3
第二章文獻探討 4
2.1 唇語辨識 4
2.2 卷積神經網絡(Convolutional Neural Networks, CNN) 7
2.2.1 卷積層(Convolutional Layer) 7
2.2.2 池化層(Pooling Layer) 8
2.2.3 全連接層(Fully Connected Layer) 8
2.2.4 殘差網路(Residual Network, ResNet) 9
2.2.5 VGG模型(Visual Geometry Group Network, VGGNet) 10
2.2.6 多分支卷積網路(InceptionV3) 11
2.3 長短期記憶模型(Long Short-Term Memory, LSTM) 12
2.4 雙向長短期記憶網絡(Bidirectional Long Short-Term Memory，BiLSTM) 13
第三章研究方法 14
3.1 研究流程 14
3.2 資料來源 15
3.3 影像前處理 17
3.4 研究設計及架構 18
第四章實驗 20
4.1 參數設定 20
4.2 預測模型結果 21
第五章研究結論及建議 25
5.1 研究結論 25
5.2 研究限制 25
5.3 未來方向及建議 26
參考文獻 27

參考文獻

[1]邱建晴（2016）。以卷積神經網路分析部落格社群網站垃圾文章。﹝碩士論文。國立臺灣大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/6ff442。
[2]洪文麟（2016）。深度學習應用於以影像辨識為基礎的個人化推薦系統-以服飾樣式為例。﹝碩士論文。國立成功大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/n7425w。
[3]林予凡（2022）。結合CNN-LSTM神經網路估測鋰離子電池之健康狀態與殘電量。﹝碩士論文。大同大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/wsvt5t。
[4]林育如（2015）。數字唇語之辨識與應用。﹝碩士論文。國立東華大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/57dtqf。
[5]Deshmukh, N., Ahire, A., Bhandari, S. H., Mali, A., & Warkari, K. (2021). Vision based Lip Reading System using Deep Learning. 2021 International Conference on Computing, Communication and Green Engineering (CCGE), 1–6. https://doi.org/10.1109/CCGE50943.2021.9776430
[6]Fung, I., & Mak, B. (2018). End-To-End Low-Resource Lip-Reading with Maxout Cnn and Lstm. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2511–2515. https://doi.org/10.1109/ICASSP.2018.8462280
[7]Ghaleh, V. E. C., & Behrad, A. (2010). Lip contour extraction using RGB color space and fuzzy c-means clustering. 2010 IEEE 9th International Conference on Cyberntic Intelligent Systems, 1–4. https://doi.org/10.1109/UKRICIS.2010.5898135
[8]Huang, Y., Liang, J., Pan, B., & Fan, X. (2010). A new lip-automatic detection and location algorithm in lip-reading system. 2010 IEEE International Conference on Systems, Man and Cybernetics, 2402–2405. https://doi.org/10.1109/ICSMC.2010.5641954
[9]沈育璋（2023）。應用 CNN與機器學習模式進行 UAV水稻田影像判釋精度差異之研究。﹝碩士論文。逢甲大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/sdrc38。
[10]陳柏安（2022）。應用Mask R-CNN與SVM於無人機多光譜影像之青花菜成熟度分類。﹝碩士論文。國立中興大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/26abew。
[11]何文翔（2018）。以SVM分類器辨識人體舞姿之研究。﹝碩士論文。國立臺灣海洋大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/n3zkb8。
[12]鍾明軒（2017）。基於HOG演算法及SVM分類器之行人偵測技術。﹝碩士論文。南臺科技大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/a3m28c。
[13]林佳姿（2020）。搭配類神經CNN、LSTM及DNN方法於高混合度之母音辨識。﹝碩士論文。國立中興大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/ed99mu。
[14]蔡名彥（2021）。基於深度學習之人臉膚質檢測。﹝碩士論文。南臺科技大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/5xadrt
[15]孫崧瑋（2023）。智慧化語意分割辨識農耕地景多樣性。﹝碩士論文。國立雲林科技大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/mue2tw。
[16]郭豐瑋（2022）。基於LSTM網路之迴轉式起重機運動預測。﹝碩士論文。國立陽明交通大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/z5bf5a。
[17]鄧凱中（2020）。LSTM 法則應用於連續手勢辨識之研究──手勢辨識系統軟體與硬體於 FPGA 實作。﹝碩士論文。國立臺灣師範大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/rnhzyk。
[18]邱景鴻（2022）。基於BiLSTM模型的音樂類別分析。﹝碩士論文。逢甲大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/k6438n。
[19]李子昂（2022）。基於CNN-BiLSTM-Attention網路模型預測貨櫃吞吐量。﹝碩士論文。國立高雄科技大學﹞臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/wkf37s。
[20]Bashier, I. H., Mosa, M., & Babikir, S. F. (2021). Sesame Seed Disease Detection Using Image Classification. 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), 1–5. https://doi.org/10.1109/ICCCEEE49695.2021.9429640
[21]Chung, J. S., Senior, A., Vinyals, O., & Zisserman, A. (2017). Lip Reading Sentences in the Wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3444–3453. https://doi.org/10.1109/CVPR.2017.367
[22]CNN 基礎與概念. (2021, 十月 31). 知勢 - 提供AI新知與觀點的媒體. https://edge.aif.tw/about-cnn/
[23]CS231n Convolutional Neural Networks for Visual Recognition. (不詳). 讀取於 2023年5月20日, 從 https://cs231n.github.io/convolutional-networks/
[24]Deshpande, A. Adit Deshpande – Engineering at Forward | UCLA CS ’19. 讀取於 2023年5月20日, 從 https://adeshpande3.github.io/
[25]He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition (arXiv:1512.03385). arXiv. http://arxiv.org/abs/1512.03385
[26]iThome.Day 09：CNN 經典模型應用. iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天. 讀取於 2023年5月20日, 從 https://ithelp.ithome.com.tw/articles/10192162
[27]James, Y. (2021, 六月 26). [資料分析&機器學習] 第5.1講: 卷積神經網絡介紹(Convolutional Neural Network). JamesLearningNote. https://medium.com/jameslearningnote/資料分析-機器學習-第5-1講-卷積神經網絡介紹-convolutional-neural-network-4f8249d65d4f
[28]KevinLuo. (2022, 二月 16). 好用的深度學習CNN預訓練模型框架總整理: 從AlexNet到EfficientNet(ML 隨筆). Medium. https://kilong31442.medium.com/好用的深度學習cnn預訓練模型框架總整理-從alexnet到efficientnet-ml-隨筆-f2ccb7a65621
[29]Saha S. (2022,十一月16). A Comprehensive Guide to Convolutional Neural Networks—The ELI5 way.Medium. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
[30]Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition (arXiv:1409.1556). arXiv. http://arxiv.org/abs/1409.1556
[31]Sindhura, P., Preethi, S. J., & Niranjana, K. B. (2018). Convolutional Neural Networks for Predicting Words: A Lip-Reading System. 2018 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), 929–933. https://doi.org/10.1109/ICEECCOT43722.2018.9001505

指導教授

蔡志豐(Chih-Fong Tsai)

審核日期

2023-7-25

推文