摘要(英) |
We proposes a method that combines data augmentation techniques with semantic information to address the issue of increased positioning errors in visual localization caused by dynamic environments. Visual localization is crucial in applications such as autonomous vehicles, robotics, and augmented reality (AR) / virtual reality (VR). However, in dynamic environments, especially where there is frequent human movement, localization accuracy and stability often significantly decline.
To solve this problem, we adopted the Random Erasing technique from data augmentation. Random Erasing simulates object movement or occlusion by randomly masking parts of the image, allowing the model to learn more diverse features and improve its robustness in dynamic environments. However, we believe the model should learn more useful features. Therefore, we further integrated semantic segmentation techniques to extract human regions in the images and applied special processing to these areas.
This combined approach aims to enhance the model′s adaptability in dynamic environments, ensuring localization accuracy in practical applications.
We conducted experiments on our datasets with varying dynamic characteristics in indoor environment like factory. Experimental results show that this method reduces localization errors caused by human movement. In areas with human movement, our method reduces translation errors by at least 35.8 \% and improves system stability. Additionally, in static environments, our method maintains high accuracy, demonstrating its adaptability across various settings. |
參考文獻 |
1. Alex Kendall, Matthew Grimes, and Roberto Cipolla. Posenet: A convolutional
network for real-time 6-dof camera relocalization. In Proceedings of the IEEE in ternational conference on computer vision, pages 2938–2946, 2015.
2. Noha Radwan, Abhinav Valada, and Wolfram Burgard. Vlocnet++: Deep multitask
learning for semantic visual localization and odometry. IEEE Robotics and Automa tion Letters, 3(4):4407–4414, 2018.
3. Weicai Ye, Xinyue Lan, Shuo Chen, Yuhang Ming, Xingyuan Yu, Hujun Bao,
Zhaopeng Cui, and Guofeng Zhang. Pvo: Panoptic visual odometry. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages
9579–9589, 2023.
4. Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing
data augmentation. In Proceedings of the AAAI conference on artificial intelligence,
volume 34, pages 13001–13008, 2020.
5. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks
for semantic segmentation. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 3431–3440, 2015.
6. Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution net work for semantic segmentation, 2015.
7. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net works for biomedical image segmentation. In Medical image computing and
computer-assisted intervention–MICCAI 2015: 18th international conference, Mu nich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241.
Springer, 2015.
8. Terrance DeVries and Graham W Taylor. Improved regularization of convolutional
neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
9. Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep, and Yong Jae Lee.
Hide-and-seek: A data augmentation technique for weakly-supervised localization
and beyond. arXiv preprint arXiv:1811.02545, 2018.
10. Piotr Wozniak and Dominik Ozog. Cross-domain indoor visual place recognition
for mobile robot via generalization using style augmentation. Sensors, 23(13):6134,
2023.
11. Gabriele Berton, Carlo Masone, and Barbara Caputo. Rethinking visual geo localization for large-scale applications. In Proceedings of the IEEE/CVF Confer ence on Computer Vision and Pattern Recognition, pages 4878–4888, 2022.
12. Suji Jang and Ue-Hwan Kim. On the study of data augmentation for visual place
recognition. IEEE Robotics and Automation Letters, 2023.
13. Carl Toft, Erik Stenborg, Lars Hammarstrand, Lucas Brynte, Marc Pollefeys,
Torsten Sattler, and Fredrik Kahl. Semantic match consistency for long-term vi sual localization. In Proceedings of the European Conference on Computer Vision
(ECCV), pages 383–399, 2018.
14. Johannes L Schonberger, Marc Pollefeys, Andreas Geiger, and Torsten Sattler. Se- ¨mantic visual localization. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 6896–6906, 2018.
15. Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and
Antonio Torralba. Semantic understanding of scenes through the ade20k dataset.
International Journal on Computer Vision, 2018.
16. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio
Torralba. Scene parsing through ade20k dataset. In Proceedings of the IEEE Con ference on Computer Vision and Pattern Recognition, 2017.
17. Carlos Campos, Richard Elvira, Juan J. Gomez Rodr ´ ´ıguez, Jose M. M. Montiel, and ´Juan D. Tardos. ORB-SLAM3: an accurate open-source library for visual, visual- ´
inertial and multi-map SLAM. CoRR, abs/2007.11898, 2020. |