English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 37999171      線上人數 : 825
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89906


    題名: 基於transformer方法的視覺里程計;Visual odometry based on transformer method
    作者: 藍子修;Lan, Zi-Xiu
    貢獻者: 資訊工程學系
    關鍵詞: 視覺里程計;Visual odometry
    日期: 2022-08-08
    上傳時間: 2022-10-04 12:04:18 (UTC+8)
    出版者: 國立中央大學
    摘要: 視覺里程計(VO)相較於物件偵測、分類問題亦或是物件追蹤來說,是一個相對冷門的領域。它是同時定位與地圖構建(SLAM)裡的其中最重要的一個模塊。其目的在於增量式地估計相鄰幀之間的相機運動。其主要應用在自主移動機器人、無人機等相關領域。
    傳統VO方法需要精心設計每個模塊,並使其相互耦合才能有良好的表現。然而隨著機器學習的發展,許多視覺任務皆在其幫助下有著重大的突破。過往在做序列到序列問題的研究通常都會採用長短期記憶模型(LSTM)來處理。不過近幾年提出的模型Transformer有了更大的突破。其自注意力打破了RNN不能並行計算的限制,一舉成為十分熱門的機器學習模型,並且已經在許多不同領域上實現。
    本篇論文主要圍繞在如何運用Transformer的特性來改善VO問題。我們採用Self-attention能夠平行處理的優點來處理堆疊的連續圖像,進而獲得前後幀之間的上下文關係。另外,我們使用條件式位置編碼來解決絕對/相對位置編碼存在固定長度的缺點。最後在實驗中呈現我們的方法是如何產生改進的。
    ;Visual odometry (VO) is a relatively unpopular field compared to object detection, classification problem, or object tracking. It is one of the most important modules in Simultaneous Localization and Mapping (SLAM). Its purpose is to incrementally estimate camera motion between adjacent frames. It is mainly used in autonomous mobile robots, drones and other related fields.
    The traditional VO method needs to carefully design each module and make it coupled with each other to have good performance. However, with the development of machine learning, many vision tasks have achieved major breakthroughs with its help. Previous studies on sequence-to-sequence problems usually use long short-term memory model (LSTM) to deal with them. However, the model Transformer proposed in recent years has made a bigger breakthrough. Its self-attention breaks the limitation that RNN cannot be calculated in parallel, and has become a very popular machine learning model in one fell swoop, and has been implemented in many different fields.
    This paper mainly focuses on how to use the characteristics of Transformer to improve the VO problem. We take advantage of the parallel processing of Self-attention to process stacked consecutive images to obtain the contextual relationship between the previous and subsequent frames. Additionally, we use conditional positional encoding to address the fixed-length disadvantage of absolute/relative positional encoding. Finally, we present in experiments how our method yields improvements.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML68檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明