博碩士論文 103582603 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator珊芝莉zh_TW
DC.creatorS P Kasthuri Arachchien_US
dc.date.accessioned2020-7-3T07:39:07Z
dc.date.available2020-7-3T07:39:07Z
dc.date.issued2020
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=103582603
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract對於電腦視覺,影片分類是一重要的過程,可用來分析影片內容之語意訊息。本篇論文改良常見之深度學習分類模型,提出適用於動態影片分類之多模式深度學習方法。當影片處於不同照明等嚴苛環境下,傳統方法提供之手動功能是不足且沒有效率的,尤其是針對複雜內容物的影片。先前的影片分類研究中,主要專注於影片各影片流之關聯性,本篇論文則使用深度學習做為策略,成功地提升影片分類之準確率。大多深度學習模型使用卷積神經網路與長短期記憶網路為基底模型,可用來做物件與行為之分類,並且能夠在動態時間之影片分類任務中能有很好的表現。 首先,本篇論文單流的網路及底層之實驗網路包含卷積神經網路 (CNN) 長短期循環網路(LSTM)及環循神經單元(GRU)。在LSTM與GRU模型中,各層的參數與Dropout值都是經由最佳化調整而產生的。本研究三個模型中將被比較:(1) LRCN:將捲積層與遠程時間遞歸相結合、(2) seqLSTMs:對順序數據進行建模的最有效之模型、(3) seqGRUs:在運算量的表現上比LSTM還要好。 其次,為了考量空間中動量之關係,本論文提出以影像及光流影像之雙流輸入為主之新穎模型,稱之為狀態交換長短期記憶(SE-LSTM)亦為本篇論文的貢獻。藉由SE-LSTM,將能夠完成動態影片在於短期運動、空間和長期時間信息上分類之任務,並能透過外觀流和運動流先前單元之狀態交換信息來擴展LSTM。此外,本篇論文提出一將SE-LSTM與CNN相結合的雙流模型Dual-CNNSELSTM。為了驗證SE-LSTM模型架構之表現,本篇論文針對各樣視頻如煙花、手勢和人的行為做驗證。實驗結果證明,本論文提出的雙流Dual-CNNSELSTM模型結構其性能明顯優於其他單流和雙流為主之模型,手勢、煙花和人為動作數據集HMDB51的準確度分別達到81.62%,79.87%和69.86%。因此,總體結果證明,所提出的模型適合靜態背景動態模式分類,其表現超越Dual-3DCNNLSTM模型及其他模型。zh_TW
dc.description.abstractVideo classification is an essential process for analyzing the pervasive semantic information of video content in computer vision. This thesis presents multimodal deep learning approaches to classify the dynamic patterns of videos, beyond common types of pattern classifications. Traditional handcrafted features are insufficient when classifying complex video information due to the similarity of visual contents with different illumination conditions. Prior studies of video classifications focused on the relationship between the standalone streams themselves. In contrary, this study leverages the effects of deep learning methodologies to improve video analysis performance significantly. Convolution Neural Network (CNN) and Long Short-term Memory (LSTM) are widely used to build complex models and have shown great competency in modeling temporal dynamics in video-based pattern classification. First, the single-stream networks and the underlying experimental models consist of CNN, LSTM and Gated Recurrent Unit (GRU) are considered. Their layer parameters are fine-tuned and different dropout values are used with sequence LSTM and GRU models. During this study, the accuracy of three basic models: (1) a Long-term Recurrent Convolutional Network (LRCN), which combine convolutional layers with long-range temporal recursion, (2) seqLSTMs model, one of the most effective structures to model sequential data and (3) seqGRUs model, which has less computational steps than LSTM, are compared. Secondly, an approach with two-stream network architectures taking both RGB and optical flow data as input is used considering spatial motion relationships. As the main contributions of this work, a novel two-stream neural network concept, named state-exchanging long short-term memory (SE-LSTM) is introduced. With the model of spatial motion state-exchanging, the SE-LSTM can classify dynamic patterns of videos integrating short-term motion, spatial, and long-term temporal information. The SE-LSTM extends the general purpose of LSTM by exchanging the information with previous cell states of both appearance and motion streams. Further, a novel two-stream model Dual-CNNSELSTM utilizing the SE-LSTM concept combined with a CNN is proposed. Various video datasets: firework displays, hand gestures and human actions are used to validate the proposed SE-LSTM architecture. Experimental results demonstrate that the performance of the proposed two-stream Dual-CNNSELSTM architecture significantly outperforms other single and two-stream baseline models achieving accuracies of 81.62%, 79.87%, and 69.86% with hand gestures, fireworks displays, and HMDB51 human actions datasets, respectively. Therefore, the overall results signify that the proposed model is most suited to static background dynamic pattern classifications over baseline and Dual-3DCNNLSTM models.en_US
DC.subject動態圖形分類zh_TW
DC.subject深度學習zh_TW
DC.subject時空數據zh_TW
DC.subject卷積神經網路zh_TW
DC.subject循環神經網路zh_TW
DC.subjectDynamic Pattern Classificationen_US
DC.subjectDeep Learningen_US
DC.subjectSpatiotemporal Dataen_US
DC.subjectConvolution Neural Networken_US
DC.subjectRecurrent Neural Networken_US
DC.title以多模態時空域建模的深度學習方法分類影像中的動態模式zh_TW
dc.language.isozh-TWzh-TW
DC.titleModelling Spatial-Motion Multimodal Deep Learning Approaches to Classify Dynamic Patterns of Videosen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明