以多模態時空域建模的深度學習方法分類影像中的動態模式

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	珊芝莉	zh_TW
DC.creator	S P Kasthuri Arachchi	en_US
dc.date.accessioned	2020-7-3T07:39:07Z
dc.date.available	2020-7-3T07:39:07Z
dc.date.issued	2020
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=103582603
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	對於電腦視覺，影片分類是一重要的過程，可用來分析影片內容之語意訊息。本篇論文改良常見之深度學習分類模型，提出適用於動態影片分類之多模式深度學習方法。當影片處於不同照明等嚴苛環境下，傳統方法提供之手動功能是不足且沒有效率的，尤其是針對複雜內容物的影片。先前的影片分類研究中，主要專注於影片各影片流之關聯性，本篇論文則使用深度學習做為策略，成功地提升影片分類之準確率。大多深度學習模型使用卷積神經網路與長短期記憶網路為基底模型，可用來做物件與行為之分類，並且能夠在動態時間之影片分類任務中能有很好的表現。首先，本篇論文單流的網路及底層之實驗網路包含卷積神經網路 (CNN) 長短期循環網路(LSTM)及環循神經單元(GRU)。在LSTM與GRU模型中，各層的參數與Dropout值都是經由最佳化調整而產生的。本研究三個模型中將被比較：(1) LRCN：將捲積層與遠程時間遞歸相結合、(2) seqLSTMs：對順序數據進行建模的最有效之模型、(3) seqGRUs：在運算量的表現上比LSTM還要好。其次，為了考量空間中動量之關係，本論文提出以影像及光流影像之雙流輸入為主之新穎模型，稱之為狀態交換長短期記憶（SE-LSTM）亦為本篇論文的貢獻。藉由SE-LSTM，將能夠完成動態影片在於短期運動、空間和長期時間信息上分類之任務，並能透過外觀流和運動流先前單元之狀態交換信息來擴展LSTM。此外，本篇論文提出一將SE-LSTM與CNN相結合的雙流模型Dual-CNNSELSTM。為了驗證SE-LSTM模型架構之表現，本篇論文針對各樣視頻如煙花、手勢和人的行為做驗證。實驗結果證明，本論文提出的雙流Dual-CNNSELSTM模型結構其性能明顯優於其他單流和雙流為主之模型，手勢、煙花和人為動作數據集HMDB51的準確度分別達到81.62％，79.87％和69.86％。因此，總體結果證明，所提出的模型適合靜態背景動態模式分類，其表現超越Dual-3DCNNLSTM模型及其他模型。	zh_TW
dc.description.abstract	Video classification is an essential process for analyzing the pervasive semantic information of video content in computer vision. This thesis presents multimodal deep learning approaches to classify the dynamic patterns of videos, beyond common types of pattern classifications. Traditional handcrafted features are insufficient when classifying complex video information due to the similarity of visual contents with different illumination conditions. Prior studies of video classifications focused on the relationship between the standalone streams themselves. In contrary, this study leverages the effects of deep learning methodologies to improve video analysis performance significantly. Convolution Neural Network (CNN) and Long Short-term Memory (LSTM) are widely used to build complex models and have shown great competency in modeling temporal dynamics in video-based pattern classification. First, the single-stream networks and the underlying experimental models consist of CNN, LSTM and Gated Recurrent Unit (GRU) are considered. Their layer parameters are fine-tuned and different dropout values are used with sequence LSTM and GRU models. During this study, the accuracy of three basic models: (1) a Long-term Recurrent Convolutional Network (LRCN), which combine convolutional layers with long-range temporal recursion, (2) seqLSTMs model, one of the most effective structures to model sequential data and (3) seqGRUs model, which has less computational steps than LSTM, are compared. Secondly, an approach with two-stream network architectures taking both RGB and optical flow data as input is used considering spatial motion relationships. As the main contributions of this work, a novel two-stream neural network concept, named state-exchanging long short-term memory (SE-LSTM) is introduced. With the model of spatial motion state-exchanging, the SE-LSTM can classify dynamic patterns of videos integrating short-term motion, spatial, and long-term temporal information. The SE-LSTM extends the general purpose of LSTM by exchanging the information with previous cell states of both appearance and motion streams. Further, a novel two-stream model Dual-CNNSELSTM utilizing the SE-LSTM concept combined with a CNN is proposed. Various video datasets: firework displays, hand gestures and human actions are used to validate the proposed SE-LSTM architecture. Experimental results demonstrate that the performance of the proposed two-stream Dual-CNNSELSTM architecture significantly outperforms other single and two-stream baseline models achieving accuracies of 81.62%, 79.87%, and 69.86% with hand gestures, fireworks displays, and HMDB51 human actions datasets, respectively. Therefore, the overall results signify that the proposed model is most suited to static background dynamic pattern classifications over baseline and Dual-3DCNNLSTM models.	en_US
DC.subject	動態圖形分類	zh_TW
DC.subject	深度學習	zh_TW
DC.subject	時空數據	zh_TW
DC.subject	卷積神經網路	zh_TW
DC.subject	循環神經網路	zh_TW
DC.subject	Dynamic Pattern Classification	en_US
DC.subject	Deep Learning	en_US
DC.subject	Spatiotemporal Data	en_US
DC.subject	Convolution Neural Network	en_US
DC.subject	Recurrent Neural Network	en_US
DC.title	以多模態時空域建模的深度學習方法分類影像中的動態模式	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Modelling Spatial-Motion Multimodal Deep Learning Approaches to Classify Dynamic Patterns of Videos	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 103582603 完整後設資料紀錄