中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/89805
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 41642096      Online Users : 1415
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/89805


    Title: 兩階段自注意力機制於高階特徵之動作識別;Lightweight Informative Feature with Residual Decoupling Self-Attention for Action Recognition
    Authors: 鄭媛;Cheng, Yuan
    Contributors: 資訊工程學系
    Keywords: 動作識別;自注意力機制;輕量化;action recognition;self-attention;lightweight
    Date: 2022-07-26
    Issue Date: 2022-10-04 12:00:32 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 動作識別(Action Recognition)是電腦視覺研究中一門較基礎之領域,由於能延伸的應用非常眾多,因此也是現今仍需不斷精進的技術。隨著深度學習技術不斷地發展,許多圖像識別的研究方法也不斷更新與進步,而這些技術也能套用在動作識別領域以增加其準確率及穩健性,因此本篇論文致力於將許多新提出之方法部分應用於現有的基礎模型並修改其架構,對該基礎模型進行優化。
    本篇論文使用Facebook AI Research提出之SlowFast網路作為欲修改之模型基礎,並參考Actor-Context-Actor Relation Network(ACAR Net)處理高階特徵的概念,提出了Informative Feature with Residual Self-Attention module(IFRSA),並使用了MobileNet提出之separable convolution取代部分卷積層,衍生出輕量化版本Lightweight IFRSA (LIFRSA),且IFRSA中的自注意力機制(self-attention)也以兩階段自注意力機制(decoupling self-attention)取代之,提出了Lightweight Informative Feature with Residual Decoupling Self-Attention(LIFRDeSA)。
    根據實驗結果,本篇論文所提出之方法除了提升了基礎模型的準確率外,同時也考量了模型所需的計算資源,提出輕量且準確率更高之架構。
    ;Action Recognition aims to detect and classify the actions of one or more people in the video, and it can be connected to many different fields and provide several applications, so the accuracy of this basic task becomes an important part for these related researches. Therefore, we focus on enhancing the accuracy of previous work in this paper and manage to reduce its computational cost.
    The base model we used is SlowFast Network, which was a state-of-the-art. We refer to the concept of extracting high-level feature method in Actor-Context-Actor Relation Network(ACAR Net), and propose Informative Feature with Residual Self-Attention module(IFRSA). But the computational cost is very huge, so we first use the separable convolution, which was presented in MobileNet, to replace some convolution in this module. Secondly, the self-attention layer is substituted for decoupling self-attention, then we present Lightweight Informative Feature with Residual Decoupling Self-Attention (LIFRDeSA).
    Experiment on AVA dataset shows that the LIFRDeSA module enhance the accuracy of the baseline, and meanwhile concerning about the computational cost. The model we propose has higher accuracy than the baseline, and the additional part is very lightweight.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML44View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明