中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/89906
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38002777      Online Users : 879
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/89906


    Title: 基於transformer方法的視覺里程計;Visual odometry based on transformer method
    Authors: 藍子修;Lan, Zi-Xiu
    Contributors: 資訊工程學系
    Keywords: 視覺里程計;Visual odometry
    Date: 2022-08-08
    Issue Date: 2022-10-04 12:04:18 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 視覺里程計(VO)相較於物件偵測、分類問題亦或是物件追蹤來說,是一個相對冷門的領域。它是同時定位與地圖構建(SLAM)裡的其中最重要的一個模塊。其目的在於增量式地估計相鄰幀之間的相機運動。其主要應用在自主移動機器人、無人機等相關領域。
    傳統VO方法需要精心設計每個模塊,並使其相互耦合才能有良好的表現。然而隨著機器學習的發展,許多視覺任務皆在其幫助下有著重大的突破。過往在做序列到序列問題的研究通常都會採用長短期記憶模型(LSTM)來處理。不過近幾年提出的模型Transformer有了更大的突破。其自注意力打破了RNN不能並行計算的限制,一舉成為十分熱門的機器學習模型,並且已經在許多不同領域上實現。
    本篇論文主要圍繞在如何運用Transformer的特性來改善VO問題。我們採用Self-attention能夠平行處理的優點來處理堆疊的連續圖像,進而獲得前後幀之間的上下文關係。另外,我們使用條件式位置編碼來解決絕對/相對位置編碼存在固定長度的缺點。最後在實驗中呈現我們的方法是如何產生改進的。
    ;Visual odometry (VO) is a relatively unpopular field compared to object detection, classification problem, or object tracking. It is one of the most important modules in Simultaneous Localization and Mapping (SLAM). Its purpose is to incrementally estimate camera motion between adjacent frames. It is mainly used in autonomous mobile robots, drones and other related fields.
    The traditional VO method needs to carefully design each module and make it coupled with each other to have good performance. However, with the development of machine learning, many vision tasks have achieved major breakthroughs with its help. Previous studies on sequence-to-sequence problems usually use long short-term memory model (LSTM) to deal with them. However, the model Transformer proposed in recent years has made a bigger breakthrough. Its self-attention breaks the limitation that RNN cannot be calculated in parallel, and has become a very popular machine learning model in one fell swoop, and has been implemented in many different fields.
    This paper mainly focuses on how to use the characteristics of Transformer to improve the VO problem. We take advantage of the parallel processing of Self-attention to process stacked consecutive images to obtain the contextual relationship between the previous and subsequent frames. Additionally, we use conditional positional encoding to address the fixed-length disadvantage of absolute/relative positional encoding. Finally, we present in experiments how our method yields improvements.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML68View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明