博碩士論文 110523039 完整後設資料紀錄

DC 欄位 語言
DC.contributor通訊工程學系zh_TW
DC.creator王品灃zh_TW
DC.creatorPin-Feng Wangen_US
dc.date.accessioned2023-7-19T07:39:07Z
dc.date.available2023-7-19T07:39:07Z
dc.date.issued2023
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110523039
dc.contributor.department通訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在單目標追蹤中,採用階層式(hierarchical)的Vision Transformer(ViT)架構的追蹤器,往往追蹤表現不如plain ViT,同時文獻彼此之間架構都是有差異的,並沒有一個通用的網路架構。本論文提出一個通用的階層式網路架構(HyperXTrack),第一個將骨幹網路的架構,引用到追蹤任務上作為交互作用網路,同時加入時空上下文,空間的上下文是多尺度資訊,時間的上下文提供歷史資訊。HyperXTrack能進行全局與局部空間交互作用,且交互作用計算複雜度為影像解析度的線性複雜度。HyperXTrack每一個block都是先進行比對細緻紋理特徵,再進行整個物件外觀輪廓的交互比對。交互骨幹網路採用本論文所提之注意力機制,同時採用經典的堆疊規則在注意力機制前使用卷積。最後,本論文提出輕量的重新預訓練策略,可以使用預訓練好的MaxViT網路參數,將更改網路交互運算的網路重新訓練一個epoch,就可以讓網路的參數可以遷移到下游任務上。實驗結果顯示,本論文設計的HyperXTrack架構在GOT-10k數據集上AO以75%超越OSTrack的71%,同時僅需要使用30M參數量的階層式架構,就可以超越OSTrack的93M參數量的ViT架構。zh_TW
dc.description.abstractIn single object tracking, the hierarchical Vision Transformer (ViT) architectures usually perform worse than plain ViT among current trackers. At the same time, the network architectures of state-of-the-art trackers are distinct, and thus there is no general purposed network architecture. This paper presents HyperXTrack, the first backbone network architecture that is applied to interaction in visual tracking. In addition, the proposed backbone interacts spatio-temporal context, where spatial context is the multi-scale information and temporal context provides historical information. HyperXTrack proceeds global and local spatial interaction, and computation complexity is linear with image resolution. After correlating with local texture features, the contour of the entire object is interacting. Interaction backbone networks adopt the proposed attention mechanism and the classic stacking rule where convolutions are applied before attention mechanism. Finally, this thesis proposes lightweight re-pretraining strategy. After modifying the existing network MaxViT, this thesis uses the pre-trained MaxViT weights, and re-pretrains only one epoch. Then the network can transfer to the downstream tasks. The experimental results show that HyperXTrack surpasses OSTrack′s 71% in AO with 71.8% on the GOT-10k dataset. HyperXTrack using a hierarchical architecture only needs 30M parameters, which can surpass OSTrack architecture with 93M parameters.en_US
DC.subject單目標追蹤zh_TW
DC.subject階層式zh_TW
DC.subject重新預訓練zh_TW
DC.subject視覺轉換器zh_TW
DC.subject模板更新策略zh_TW
DC.subjectsingle object trackingen_US
DC.subjecthierarchicalen_US
DC.subjectre-pretrainingen_US
DC.subjectvision Transformeren_US
DC.subjecttemplate update strategyen_US
DC.title視覺追蹤的多尺度視覺基礎網路zh_TW
dc.language.isozh-TWzh-TW
DC.titleMulti-Scale Vision Foundation Networks for Visual Trackingen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明