English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41624855      線上人數 : 1706
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/83913


    題名: 以多模態時空域建模的深度學習方法分類影像中的動態模式;Modelling Spatial-Motion Multimodal Deep Learning Approaches to Classify Dynamic Patterns of Videos
    作者: 珊芝莉;Arachchi, S P Kasthuri
    貢獻者: 資訊工程學系
    關鍵詞: 動態圖形分類;深度學習;時空數據;卷積神經網路;循環神經網路;Dynamic Pattern Classification;Deep Learning;Spatiotemporal Data;Convolution Neural Network;Recurrent Neural Network
    日期: 2020-07-03
    上傳時間: 2020-09-02 17:40:24 (UTC+8)
    出版者: 國立中央大學
    摘要: 對於電腦視覺,影片分類是一重要的過程,可用來分析影片內容之語意訊息。本篇論文改良常見之深度學習分類模型,提出適用於動態影片分類之多模式深度學習方法。當影片處於不同照明等嚴苛環境下,傳統方法提供之手動功能是不足且沒有效率的,尤其是針對複雜內容物的影片。先前的影片分類研究中,主要專注於影片各影片流之關聯性,本篇論文則使用深度學習做為策略,成功地提升影片分類之準確率。大多深度學習模型使用卷積神經網路與長短期記憶網路為基底模型,可用來做物件與行為之分類,並且能夠在動態時間之影片分類任務中能有很好的表現。
    首先,本篇論文單流的網路及底層之實驗網路包含卷積神經網路 (CNN) 長短期循環網路(LSTM)及環循神經單元(GRU)。在LSTM與GRU模型中,各層的參數與Dropout值都是經由最佳化調整而產生的。本研究三個模型中將被比較:(1) LRCN:將捲積層與遠程時間遞歸相結合、(2) seqLSTMs:對順序數據進行建模的最有效之模型、(3) seqGRUs:在運算量的表現上比LSTM還要好。
    其次,為了考量空間中動量之關係,本論文提出以影像及光流影像之雙流輸入為主之新穎模型,稱之為狀態交換長短期記憶(SE-LSTM)亦為本篇論文的貢獻。藉由SE-LSTM,將能夠完成動態影片在於短期運動、空間和長期時間信息上分類之任務,並能透過外觀流和運動流先前單元之狀態交換信息來擴展LSTM。此外,本篇論文提出一將SE-LSTM與CNN相結合的雙流模型Dual-CNNSELSTM。為了驗證SE-LSTM模型架構之表現,本篇論文針對各樣視頻如煙花、手勢和人的行為做驗證。實驗結果證明,本論文提出的雙流Dual-CNNSELSTM模型結構其性能明顯優於其他單流和雙流為主之模型,手勢、煙花和人為動作數據集HMDB51的準確度分別達到81.62%,79.87%和69.86%。因此,總體結果證明,所提出的模型適合靜態背景動態模式分類,其表現超越Dual-3DCNNLSTM模型及其他模型。
    ;Video classification is an essential process for analyzing the pervasive semantic information of video content in computer vision. This thesis presents multimodal deep learning approaches to classify the dynamic patterns of videos, beyond common types of pattern classifications. Traditional handcrafted features are insufficient when classifying complex video information due to the similarity of visual contents with different illumination conditions. Prior studies of video classifications focused on the relationship between the standalone streams themselves. In contrary, this study leverages the effects of deep learning methodologies to improve video analysis performance significantly. Convolution Neural Network (CNN) and Long Short-term Memory (LSTM) are widely used to build complex models and have shown great competency in modeling temporal dynamics in video-based pattern classification.
    First, the single-stream networks and the underlying experimental models consist of CNN, LSTM and Gated Recurrent Unit (GRU) are considered. Their layer parameters are fine-tuned and different dropout values are used with sequence LSTM and GRU models. During this study, the accuracy of three basic models: (1) a Long-term Recurrent Convolutional Network (LRCN), which combine convolutional layers with long-range temporal recursion, (2) seqLSTMs model, one of the most effective structures to model sequential data and (3) seqGRUs model, which has less computational steps than LSTM, are compared.
    Secondly, an approach with two-stream network architectures taking both RGB and optical flow data as input is used considering spatial motion relationships. As the main contributions of this work, a novel two-stream neural network concept, named state-exchanging long short-term memory (SE-LSTM) is introduced. With the model of spatial motion state-exchanging, the SE-LSTM can classify dynamic patterns of videos integrating short-term motion, spatial, and long-term temporal information. The SE-LSTM extends the general purpose of LSTM by exchanging the information with previous cell states of both appearance and motion streams. Further, a novel two-stream model Dual-CNNSELSTM utilizing the SE-LSTM concept combined with a CNN is proposed. Various video datasets: firework displays, hand gestures and human actions are used to validate the proposed SE-LSTM architecture. Experimental results demonstrate that the performance of the proposed two-stream Dual-CNNSELSTM architecture significantly outperforms other single and two-stream baseline models achieving accuracies of 81.62%, 79.87%, and 69.86% with hand gestures, fireworks displays, and HMDB51 human actions datasets, respectively. Therefore, the overall results signify that the proposed model is most suited to static background dynamic pattern classifications over baseline and Dual-3DCNNLSTM models.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML138檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明