中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/95398
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41666917      線上人數 : 1716
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95398


    題名: 基於注意力和記憶動態融合的單物件追蹤方法;Advancing Single Object Tracking based on Fusion of Attention and Memory Dynamics
    作者: Cheewaprakobkit, Pimpa;Cheewaprakobkit, Pimpa
    貢獻者: 資訊工程學系
    關鍵詞: Temporal Convolutional Network;attention mechanism;spatial-temporal memory;single object tracking;Temporal Convolutional Network;attention mechanism;spatial-temporal memory;single object tracking
    日期: 2024-05-30
    上傳時間: 2024-10-09 16:46:16 (UTC+8)
    出版者: 國立中央大學
    摘要: 深度神經網路已經徹底改變了電腦視覺領域,帶來了單物件追蹤任務的重大進展。然而,這些網路仍然面臨在動態環境中處理目標物件外觀變化和遮擋的挑戰。此外,在長時間內保持一致的追蹤,特別是在面對相似背景物件時,仍是一個重大挑戰。單物件追蹤的核心困難在於目標在整個視頻序列中經常發生的外觀變化,這些變化,例如縱橫比、 大小比例和姿勢狀態的變化,會顯著影響追蹤器的穩定性。此外,被其他物件遮擋和雜亂的背景,使得保持一致追蹤的過程變得複雜。
    為了解決這些挑戰,本論文提出了一種利用時間卷積網路(TCN)、注意力機制、和空間-時序記憶網路相結合的追蹤架構。TCN組件通過捕捉視頻序列中的時間依賴性、 並且起了關鍵作用。這使得模型能夠學習物件的外觀如何隨時間演變,從而對短期外觀變化具有更高的適應性。結合注意力機制提供了雙重好處。首先,它使模型能夠根據當前背景,聚焦在畫面中最相關的區域,降低了模型的計算複雜度。這在背景雜亂或存在多個相似物件的情況下特別有利。其次,注意力機制將模型的注意力,聚焦到對追蹤目標物件至關重要的資訊特徵上。最後一個組件,空間-時序記憶網路,利用了長期記憶的能力。該網路儲存了關於目標物件的歷史訊息,包括其外觀和運動模式。這些儲存的訊息為追蹤器提供了參考點,使其能夠更好地適應目標變形和遮擋。通過有效結合這三個組件,我們提出了架構,來實現比現有方法更優越的追蹤性能。
    我們方法的有效性,透過在多個基準數據集上的廣泛評估,得到了驗證,包括 GOT-10K、OTB2015、UAV123 和 VOT2018。我們的模型在GOT-10K數據集上實現了67.5%的AO(平均重疊度),在OTB2015上取得了72.1%的成功得分(AUC),在UAV123上取得了65.8%的成功得分(AUC),並在VOT2018數據集上實現了59.0%的準確性。
    結果顯示了我們所提出的方法,在單物件追蹤任務中的卓越追蹤能力,展示其解決外觀變化和長期追蹤場景挑戰的潛力。這項研究提供了一個穩定且靈活的解決方案,結合注意力和記憶動態,增強在複雜現實場景中的追蹤精確性和穩定性,從而推動了追蹤系統的發展。
    ;Deep neural networks have revolutionized the field of computer vision, leading to significant advancements in single object tracking tasks. However, these networks still encounter challenges in handling dynamic environments where target objects undergo appearance changes and occlusions. Additionally, maintaining consistent tracking across extended periods, especially when faced with similar-looking background objects, presents a significant challenge. The core difficulty in single object tracking arises from the frequent variations a target′s appearance can undergo throughout the video sequence. These variations, such as changes in aspect ratio, scale, and pose, can significantly impact the robustness of trackers. Additionally, occlusions by other objects and cluttered backgrounds further complicate the process of maintaining a consistent track.
    To address these challenges, this dissertation proposes a novel tracking architecture that leverages the combined strengths of a temporal convolutional network (TCN), an attention mechanism, and a spatial-temporal memory network. The TCN component plays a critical role by capturing temporal dependencies within the video sequence. This enables the model to learn how an object′s appearance evolves over time, resulting in greater resilience to short-term appearance changes. Incorporating an attention mechanism offers a two-fold benefit. Firstly, it reduces the computational complexity of the model by enabling it to focus on the most relevant regions of the frame based on the current context. This is particularly advantageous in scenarios with cluttered backgrounds or multiple similar objects present. Secondly, the attention mechanism directs the model′s focus towards informative features that are critical for tracking the target object. The final component, the spatial-temporal memory network, leverages the power of long-term memory. This network stores historical information about the target object, including its appearance and motion patterns. This stored information serves as a reference point for the tracker, allowing it to better adapt to target deformations and occlusions. By effectively combining these three elements, our proposed architecture aims to achieve superior tracking performance compared to existing methods.
    The effectiveness of our approach is validated through extensive evaluations on several benchmark datasets, including GOT-10K, OTB2015, UAV123, and VOT2018. Our model achieves a state-of-the-art average overlap (AO) of 67.5% on the GOT-10K dataset, a 72.1% success score (AUC) on OTB2015, a 65.8% success score (AUC) on UAV123, and a 59.0% accuracy on the VOT2018 dataset.
    The results highlight the superior tracking capabilities of our proposed approach in single object tracking tasks, demonstrating its potential to address the challenges posed by appearance variations and prolonged tracking scenarios. This research contributes to the advancement of tracking systems by offering a robust and adaptive solution that combines attention and memory dynamics to enhance tracking accuracy and robustness in complex real-world scenarios.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML14檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明