姓名 藍子修(Zi-Xiu Lan)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 基於transformer方法的視覺里程計
(Visual odometry based on transformer method)
摘要(中) 視覺里程計(VO)相較於物件偵測、分類問題亦或是物件追蹤來說,是一個相對冷門的領域。它是同時定位與地圖構建(SLAM)裡的其中最重要的一個模塊。其目的在於增量式地估計相鄰幀之間的相機運動。其主要應用在自主移動機器人、無人機等相關領域。
摘要(英) Visual odometry (VO) is a relatively unpopular field compared to object detection, classification problem, or object tracking. It is one of the most important modules in Simultaneous Localization and Mapping (SLAM). Its purpose is to incrementally estimate camera motion between adjacent frames. It is mainly used in autonomous mobile robots, drones and other related fields.
The traditional VO method needs to carefully design each module and make it coupled with each other to have good performance. However, with the development of machine learning, many vision tasks have achieved major breakthroughs with its help. Previous studies on sequence-to-sequence problems usually use long short-term memory model (LSTM) to deal with them. However, the model Transformer proposed in recent years has made a bigger breakthrough. Its self-attention breaks the limitation that RNN cannot be calculated in parallel, and has become a very popular machine learning model in one fell swoop, and has been implemented in many different fields.
This paper mainly focuses on how to use the characteristics of Transformer to improve the VO problem. We take advantage of the parallel processing of Self-attention to process stacked consecutive images to obtain the contextual relationship between the previous and subsequent frames. Additionally, we use conditional positional encoding to address the fixed-length disadvantage of absolute/relative positional encoding. Finally, we present in experiments how our method yields improvements.
關鍵字(中) ★ 視覺里程計 關鍵字(英) ★ Visual odometry
論文目次 1 Introduction 1
2 Related Work 3
2.1 Simultaneous Localization and Mapping 3
2.1.1 SLAM Methods 4
2.1.2 Visual Odometry Methods 6
2.2 Methods based on Geometry 9
2.2.1 Feature-based Method 10
2.2.2 Optical Flow/Direct Method 13
2.2.3 Comparison of Feature-based and Direct Method 16
2.3 Methods based on Learning 18
2.3.1 Supervised Learning 18
2.3.2 Unsupervised Learning 21
2.3.3 Comparison of Supervised and Unsupervised Learning 26
3 Proposed Method 27
3.1 Overview 27
3.2 Feature Extraction 28
3.3 Attention based Contextual Enhancement 29
3.4 Sequential Modeling 33
3.5 Loss Function 35
4 Experiments 37
4.1 KITTI Datasets 37
4.2 Training and Data Pre-processing 37
4.3 Experimental Results 38
5 Conclusion 42
6 Reference 43
指導教授 施國琛 審核日期 2022-8-8
