博碩士論文 109522117 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator藍子修zh_TW
DC.creatorZi-Xiu Lanen_US
dc.date.accessioned2022-8-8T07:39:07Z
dc.date.available2022-8-8T07:39:07Z
dc.date.issued2022
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=109522117
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract視覺里程計(VO)相較於物件偵測、分類問題亦或是物件追蹤來說,是一個相對冷門的領域。它是同時定位與地圖構建(SLAM)裡的其中最重要的一個模塊。其目的在於增量式地估計相鄰幀之間的相機運動。其主要應用在自主移動機器人、無人機等相關領域。 傳統VO方法需要精心設計每個模塊,並使其相互耦合才能有良好的表現。然而隨著機器學習的發展,許多視覺任務皆在其幫助下有著重大的突破。過往在做序列到序列問題的研究通常都會採用長短期記憶模型(LSTM)來處理。不過近幾年提出的模型Transformer有了更大的突破。其自注意力打破了RNN不能並行計算的限制,一舉成為十分熱門的機器學習模型,並且已經在許多不同領域上實現。 本篇論文主要圍繞在如何運用Transformer的特性來改善VO問題。我們採用Self-attention能夠平行處理的優點來處理堆疊的連續圖像,進而獲得前後幀之間的上下文關係。另外,我們使用條件式位置編碼來解決絕對/相對位置編碼存在固定長度的缺點。最後在實驗中呈現我們的方法是如何產生改進的。zh_TW
dc.description.abstractVisual odometry (VO) is a relatively unpopular field compared to object detection, classification problem, or object tracking. It is one of the most important modules in Simultaneous Localization and Mapping (SLAM). Its purpose is to incrementally estimate camera motion between adjacent frames. It is mainly used in autonomous mobile robots, drones and other related fields. The traditional VO method needs to carefully design each module and make it coupled with each other to have good performance. However, with the development of machine learning, many vision tasks have achieved major breakthroughs with its help. Previous studies on sequence-to-sequence problems usually use long short-term memory model (LSTM) to deal with them. However, the model Transformer proposed in recent years has made a bigger breakthrough. Its self-attention breaks the limitation that RNN cannot be calculated in parallel, and has become a very popular machine learning model in one fell swoop, and has been implemented in many different fields. This paper mainly focuses on how to use the characteristics of Transformer to improve the VO problem. We take advantage of the parallel processing of Self-attention to process stacked consecutive images to obtain the contextual relationship between the previous and subsequent frames. Additionally, we use conditional positional encoding to address the fixed-length disadvantage of absolute/relative positional encoding. Finally, we present in experiments how our method yields improvements.en_US
DC.subject視覺里程計zh_TW
DC.subjectVisual odometryen_US
DC.title基於transformer方法的視覺里程計zh_TW
dc.language.isozh-TWzh-TW
DC.titleVisual odometry based on transformer methoden_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明