A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：18.219.32.237

姓名

陳建宇(Jian-Yu Chen) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

(A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration)

相關論文

★ 基於馬賽克特性之低失真實體電路佈局保密技術	★ 多路徑傳輸控制協定下從無線區域網路到行動網路之無縫換手
★ 感知網路下具預算限制之異質性子頻段分配	★ 下行服務品質排程在多天線傳輸環境下的效能評估
★ 多路徑傳輸控制協定下之整合型壅塞及路徑控制	★ Opportunistic Scheduling for Multicast over Wireless Networks
★ 適用多用戶多輸出輸入系統之低複雜度比例公平性排程設計	★ 利用混合式天線分配之 LTE 異質網路 UE 與 MIMO 模式選擇
★ 基於有限預算標價式拍賣之異質性頻譜分配方法	★ 適用於 MTC 裝置 ID 共享情境之排程式分群方法
★ Efficient Two-Way Vertical Handover with Multipath TCP	★ 多路徑傳輸控制協定下可亂序傳輸之壅塞及排程控制
★ 移動網路下適用於閘道重置之群體換手機制	★ 使用率能小型基地台之拍賣是行動數據分流方法
★ 高速鐵路環境下之通道預測暨比例公平性排程設計	★ 用於行動網路效能評估之混合式物聯網流量產生器

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-8-31以後開放)

摘要(中)

本文提出了一種新穎的整合方法，以應對基於神經網路學習的單眼定位所面臨的挑戰，該方法結合了絕對姿態回歸（Absolute Pose Regression, APR）和相對姿態回歸（Relative Pose Regression, RPR）的優勢。我們引入了一種在變分貝葉斯推斷框架內使用擴展卡爾曼濾波器（Extended Kalman Filter, EKF）來整合預測的絕對和相對姿態的理論一致策略(A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration, VKFPos)。本方法的一個重要機制是在訓練過程中考慮了姿態協方差，使我們的模型能夠有效地建模與每個預測姿態相關的不確定性。在 7-Scenes 和 Oxford RobotCar 等室內外數據集上的實驗結果顯示，我們的單影像定位方法在準確度上與最先進的方法相媲美。此外，在考慮軌跡時序時的定位方面，VKFPos 相對於現有方法展示了更高的準確度，室內數據集至少提高了 10％，而在具有挑戰性的室外數據集上至少提高了 42％。總之，VKFPos 提供了一個穩健可靠的解決方案，展示了其在各種環境和情境中的有效性。

摘要(英)

This paper addresses the challenges in learning-based monocular positioning by proposing a novel integration approach that combines the strengths of Absolute Pose Regression (APR) and Relative Pose Regression (RPR).
We introduce a theoretically consistent strategy for integrating predicted absolute and relative poses using the Extended Kalman Filter (EKF) within the framework of variational Bayesian inference, called VKFPos.
An essential aspect of our method is the consideration of pose covariance during training, enabling our branches to effectively model the uncertainty associated with each predicted pose.
Experimental results on both indoor and outdoor datasets, namely 7-Scenes and Oxford RobotCar, demonstrate that our single-shot method achieves comparable accuracy with state-of-the-art methods.
Moreover, in temporal positioning, VKFPos demonstrates superior accuracy compared to existing methods, achieving a remarkable improvement of at least $10\%$ across indoor datasets and at least $42\%$ in challenging outdoor datasets.
In summary, VKFPos offers a robust and reliable solution, demonstrating its effectiveness across diverse environments and scenarios.

關鍵字(中)

★ 視覺定位
★ 機器學習

關鍵字(英)

★ Visual Positioning
★ Machine Learning

論文目次

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Works 6
2.1 Absolute Pose Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Relative Pose Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Trajectory Considering Approaches . . . . . . . . . . . . . . . . . . . . 7
3 Learning-Based Monocular Positioning with EKF Integration 9
3.1 Absolute Pose Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Relative Pose Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Extended Kalman Filter Integration . . . . . . . . . . . . . . . . . . . . 11
4 Training Strategy 14
4.1 Maximum Likelihood and Prior . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Loss Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Experimental Results 17
5.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.1 7-Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.2 Oxford RobotCar . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 Single-Shot Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.1 Results on 7Scenes Dataset . . . . . . . . . . . . . . . . . . . . . 20
5.4.2 Results on Oxford RobotCar Dataset . . . . . . . . . . . . . . . . 20
5.5 Temporal Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5.1 Results on 7Scenes Dataset . . . . . . . . . . . . . . . . . . . . . 22
5.5.2 Compare with Model-Based Integration . . . . . . . . . . . . . . 22
5.5.3 Results on Oxford RobotCar Dataset . . . . . . . . . . . . . . . . 23
6 Conclusion and Future Work 25
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Bibliography 27

參考文獻

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 770–778, 2016.
[2] Samarth Brahmbhatt, Jinwei Gu, Kihwan Kim, James Hays, and Jan Kautz.
Geometry-aware learning of maps for camera localization. In Proceedings of the
IEEE conference on computer vision and pattern recognition, pages 2616–2625,
2018.
[3] Bing Wang, Changhao Chen, Chris Xiaoxuan Lu, Peijun Zhao, Niki Trigoni, and
Andrew Markham. Atloc: Attention guided camera localization. In Proceedings
of the AAAI Conference on Artificial Intelligence, volume 34, pages 10393–10401,
2020.
[4] Baotong Chen, Jiafu Wan, Lei Shu, Peng Li, Mithun Mukherjee, and Boxing Yin.
Smart factory of industry 4.0: Key technologies, application case, and challenges.
Ieee Access, 6:6505–6519, 2017.
[5] Hsin-Kai Wu, Silvia Wen-Yu Lee, Hsin-Yi Chang, and Jyh-Chong Liang. Current
status, opportunities and challenges of augmented reality in education. Computers
& education, 62:41–49, 2013.
[6] Mark Billinghurst. Augmented reality in education. New horizons for learning,
12(5):1–5, 2002.
[7] Dai-In Han, Timothy Jung, and Alex Gibson. Dublin ar: implementing augmented
reality in tourism. In Information and Communication Technologies in Tourism
2014: Proceedings of the International Conference in Dublin, Ireland, January 21-
24, 2014, pages 511–523. Springer, 2013.
[8] Christopher Stapleton, Charles Hughes, Michael Moshell, Paulius Micikevicius, and
Marty Altman. Applying mixed reality to entertainment. Computer, 35(12):122–
124, 2002.
[9] Oliver J Woodman. An introduction to inertial navigation. Technical report, University of Cambridge, Computer Laboratory, 2007.
[10] Billur Barshan and Hugh F Durrant-Whyte. Inertial navigation systems for mobile
robots. IEEE transactions on robotics and automation, 11(3):328–342, 1995.
[11] Yi Cheng and Gong Ye Wang. Mobile robot navigation based on lidar. In 2018
Chinese control and decision conference (CCDC), pages 1243–1246. IEEE, 2018.
[12] Flavio BP Malavazi, Remy Guyonneau, Jean-Baptiste Fasquel, Sebastien Lagrange,
and Franck Mercier. Lidar-only based navigation algorithm for an autonomous agricultural robot. Computers and electronics in agriculture, 154:71–79, 2018.
[13] Andrea Macario Barros, Maugan Michel, Yoann Moline, Gwenol ´ e Corre, and ´
Fred´ erick Carrel. A comprehensive survey of visual slam algorithms. ´ Robotics,
11(1):24, 2022.
[14] Khalid Yousif, Alireza Bab-Hadiashar, and Reza Hoseinnezhad. An overview to
visual odometry and visual slam: Applications to mobile robotics. Intelligent Industrial Systems, 1(4):289–311, 2015.
[15] Chenghao Li, Haitao Lyu, Hao Wu, and Jiang Qian. Outdoor simultaneous localization and mapping by using millimeter wave radar. In IGARSS 2023-2023 IEEE
International Geoscience and Remote Sensing Symposium, pages 4439–4442. IEEE,
2023.
[16] Yeong Sang Park, Young-Sik Shin, Joowan Kim, and Ayoung Kim. 3d ego-motion
estimation using low-cost mmwave radars via radar velocity factor for pose-graph
slam. IEEE Robotics and Automation Letters, 6(4):7691–7698, 2021.
[17] Andrew J Davison, Ian D Reid, Nicholas D Molton, and Olivier Stasse. Monoslam:
Real-time single camera slam. IEEE transactions on pattern analysis and machine
intelligence, 29(6):1052–1067, 2007.
[18] Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: a
versatile and accurate monocular slam system. IEEE transactions on robotics,
31(5):1147–1163, 2015.
[19] Jakob Engel, Thomas Schops, and Daniel Cremers. Lsd-slam: Large-scale di- ¨
rect monocular slam. In European conference on computer vision, pages 834–849.
Springer, 2014.
[20] Georges Younes, Daniel Asmar, Elie Shammas, and John Zelek. Keyframe-based
monocular slam: design, survey, and future directions. Robotics and Autonomous
Systems, 98:67–88, 2017.
[21] Wei Tan, Haomin Liu, Zilong Dong, Guofeng Zhang, and Hujun Bao. Robust
monocular slam in dynamic environments. In 2013 IEEE International Symposium
on Mixed and Augmented Reality (ISMAR), pages 209–218. IEEE, 2013.
[22] Fangwen Shu, Paul Lesur, Yaxu Xie, Alain Pagani, and Didier Stricker. Slam in the
field: An evaluation of monocular mapping and localization on challenging dynamic
agricultural environment. In Proceedings of the IEEE/CVF winter conference on
applications of computer vision, pages 1761–1771, 2021.
[23] Nan Yang, Rui Wang, Xiang Gao, and Daniel Cremers. Challenges in monocular
visual odometry: Photometric calibration, motion bias, and rolling shutter effect.
IEEE Robotics and Automation Letters, 3(4):2878–2885, 2018.
[24] Alex Kendall, Matthew Grimes, and Roberto Cipolla. Posenet: A convolutional
network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015.
[25] Alex Kendall and Roberto Cipolla. Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA), pages 4762–4769. IEEE, 2016.
[26] Alex Kendall and Roberto Cipolla. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 5974–5983, 2017.
[27] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.
Advances in neural information processing systems, 30, 2017.
[28] Ronald Clark, Sen Wang, Andrew Markham, Niki Trigoni, and Hongkai Wen. Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization. In Pro29
ceedings of the IEEE conference on computer vision and pattern recognition, pages
6856–6864, 2017.
[29] Huseyin Coskun, Felix Achilles, Robert DiPietro, Nassir Navab, and Federico
Tombari. Long short-term memory kalman filters: Recurrent neural estimators for
pose regularization. In Proceedings of the IEEE International Conference on Computer Vision, pages 5524–5532, 2017.
[30] Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. Deepvo: Towards endto-end visual odometry with deep recurrent convolutional neural networks. In 2017
IEEE international conference on robotics and automation (ICRA), pages 2043–
2050. IEEE, 2017.
[31] Robin Kreuzig, Matthias Ochs, and Rudolf Mester. Distancenet: Estimating traveled
distance from monocular images using a recurrent convolutional neural network. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
[32] Ruihao Li, Sen Wang, Zhiqiang Long, and Dongbing Gu. Undeepvo: Monocular
visual odometry through unsupervised deep learning. In 2018 IEEE international
conference on robotics and automation (ICRA), pages 7286–7291. IEEE, 2018.
[33] Felix Ott, Tobias Feigl, Christoffer Loffler, and Christopher Mutschler. Vipr: visualodometry-aided pose regression for 6dof camera localization. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,
pages 42–43, 2020.
[34] Joseph J LaViola. A comparison of unscented and extended kalman filtering for
estimating quaternion motion. In Proceedings of the 2003 American Control Conference, 2003., volume 3, pages 2435–2440. IEEE, 2003.
[35] Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian
Fang, and Long Quan. Kfnet: Learning temporal camera relocalization using kalman
filtering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4919–4928, 2020.
[36] Arthur Moreau, Nathan Piasco, Dzmitry Tsishkou, Bogdan Stanciulescu, and Arnaud de La Fortelle. Coordinet: uncertainty-aware pose regressor for reliable vehicle
localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision, pages 2229–2238, 2022.
[37] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going
deeper with convolutions. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 1–9, 2015.
[38] John Denker and Yann LeCun. Transforming neural-net output levels to probability
distributions. Advances in neural information processing systems, 3, 1990.
[39] David JC MacKay. A practical bayesian framework for backpropagation networks.
Neural computation, 4(3):448–472, 1992.
[40] Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu. Image-based localization using hourglass networks. In Proceedings of the IEEE international conference on computer vision workshops, pages 879–886, 2017.
[41] Long Short-Term Memory. Long short-term memory. Neural computation,
9(8):1735–1780, 2010.
[42] Florian Walch, Caner Hazirbas, Laura Leal-Taixe, Torsten Sattler, Sebastian Hilsenbeck, and Daniel Cremers. Image-based localization using lstms for structured feature correlation. In Proceedings of the IEEE international conference on computer
vision, pages 627–637, 2017.
[43] Mitesh Patel, Brendan Emery, and Yan-Ying Chen. Contextualnet: Exploiting contextual information using lstms to improve image-based localization. In 2018 IEEE
International Conference on Robotics and Automation (ICRA), pages 5890–5896.
IEEE, 2018.
[44] Soroush Seifi and Tinne Tuytelaars. How to improve cnn-based 6-dof camera pose
estimation. In Proceedings of the IEEE/CVF international conference on computer
vision workshops, pages 0–0, 2019.
[45] Chengyu Qiao, Zhiyu Xiang, Yuangang Fan, Tingming Bai, Xijun Zhao, and
Jingyun Fu. Transapr: Absolute camera pose regression with spatial and temporal attention. IEEE Robotics and Automation Letters, 2023.
[46] Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang.
Transformer in transformer. Advances in Neural Information Processing Systems,
34:15908–15919, 2021.
[47] Ganesh Iyer, J Krishna Murthy, Gunshi Gupta, Madhava Krishna, and Liam Paull.
Geometric consistency for self-supervised end-to-end visual odometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 267–275, 2018.
[48] Reza Mahjourian, Martin Wicke, and Anelia Angelova. Unsupervised learning of
depth and ego-motion from monocular video using 3d geometric constraints. In
Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 5667–5675, 2018.
[49] Zhichao Yin and Jianping Shi. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 1983–1992, 2018.
[50] Yang Li, Yoshitaka Ushiku, and Tatsuya Harada. Pose graph optimization for unsupervised monocular visual odometry. In 2019 International Conference on Robotics
and Automation (ICRA), pages 5439–5445. IEEE, 2019.
[51] Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, and Wanqing Li. Transformer guided geometry model for flow-based unsupervised visual
odometry. Neural Computing and Applications, 33:8031–8042, 2021.
[52] Ronald Clark, Sen Wang, Hongkai Wen, Andrew Markham, and Niki Trigoni. Vinet:
Visual-inertial odometry as a sequence-to-sequence learning problem. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
[53] Abhinav Valada, Noha Radwan, and Wolfram Burgard. Deep auxiliary learning for
visual localization and odometry. In 2018 IEEE international conference on robotics
and automation (ICRA), pages 6939–6946. IEEE, 2018.
[54] Noha Radwan, Abhinav Valada, and Wolfram Burgard. Vlocnet++: Deep multitask
learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 3(4):4407–4414, 2018.
[55] Fei Xue, Xin Wang, Zike Yan, Qiuyuan Wang, Junqiu Wang, and Hongbin Zha.
Local supports global: Deep camera relocalization with sequence enhancement. In
Proceedings of the IEEE/CVF International Conference on Computer Vision, pages
2841–2850, 2019.
[56] Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, 30,
2017.
[57] Adam Charles. Kalman filtering: A bayesian approach, 2018.
[58] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[59] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory
Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan
Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith
Chintala. Pytorch: An imperative style, high-performance deep learning library. In
Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran
Associates, Inc., 2019.
[60] Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi,
and Andrew Fitzgibbon. Scene coordinate regression forests for camera relocalization in rgb-d images. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 2930–2937, 2013.
[61] Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman. 1 year, 1000
km: The oxford robotcar dataset. The International Journal of Robotics Research,
36(1):3–15, 2017.

指導教授

黃志煒(Chih-Wei, Huang)

審核日期

2024-5-3

推文