博碩士論文 110453013 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系在職專班zh_TW
DC.creator魏湘凌zh_TW
DC.creatorHsiang-Ling Weien_US
dc.date.accessioned2023-7-25T07:39:07Z
dc.date.available2023-7-25T07:39:07Z
dc.date.issued2023
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=110453013
dc.contributor.department資訊管理學系在職專班zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract近年來,深度學習已成為人工智慧領域的一個熱門研究方向,深度學習利用多層神經網絡從大量數據中學習特徵和模式,並從中生成高度准確的預測和分類結果。它已被成功應用於語音辨識、圖像識別、自然語言處理等領域,成為當今人工智慧發展的重要推動力。 本論文將深度學習應用於唇語辨識中,利用深度學習的訓練技術來分析人們說話時嘴唇的型態及動作變化來識別語音,以MIRACL-VC1 Dataset為樣本,透過深度學習的技術使用卷積神經網絡(CNN)取唇部特徵值並分別於長短期記憶模型(LSTM)及雙向長短期記憶網絡(BiLSTM)進行訓練並比較其片語準確率,透過適當的數據前處理技術,如時間序列正規化,以及參數的調整,實驗結果皆以ResNet152模型呈現出較好的表現,其中ResNet152與BiLSTM結合後的準確率最高。zh_TW
dc.description.abstractIn recent years, deep learning has emerged as a popular research direction in the field of artificial intelligence. Deep learning leverages multi-layer neural networks to learn features and patterns from vast amounts of data, generating highly accurate predictions and classifications. It has been successfully applied in various domains, including speech recognition, image recognition, natural language processing, and has become a significant driving force in the advancement of artificial intelligence. This paper focuses on applying deep learning to lip reading, utilizing deep learning training techniques to analyze the shape and motion variations of the lips during speech in order to recognize spoken words. The MIRACL-VC1 Dataset is used as the sample dataset. Deep learning techniques, specifically Convolutional Neural Networks (CNN), are employed to extract lip features, followed by training with both Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) models. The phrase accuracy of these models is compared. Through appropriate data preprocessing techniques such as time series normalization and parameter adjustment, experimental results demonstrate that the ResNet152 model consistently exhibits superior performance. Particularly, the highest accuracy is achieved when ResNet152 is combined with BiLSTM. In summary, this paper explores the application of deep learning to lip reading, employing deep learning techniques to analyze lip shape and motion during speech for speech recognition. The MIRACL-VC1 Dataset is used, and lip features are extracted using a Convolutional Neural Network (CNN). Training is performed with LSTM and BiLSTM models. By employing suitable data preprocessing techniques and parameter adjustments, experimental results consistently highlight the superior performance of the ResNet152 model, particularly when combined with BiLSTM.en_US
DC.subject深度學習zh_TW
DC.subject唇語辨識zh_TW
DC.title深度學習唇語辨識之研究zh_TW
dc.language.isozh-TWzh-TW
DC.titleA Study on Lip Reading Recognition using Deep Learningen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明