Data Augmentation with Semantic Information in Visual Localization

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：118

、訪客IP：18.223.241.186

姓名

張茵喬(Yin-Qiao Chang) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

(Data Augmentation with Semantic Information in Visual Localization)

相關論文

★ 基於馬賽克特性之低失真實體電路佈局保密技術	★ 多路徑傳輸控制協定下從無線區域網路到行動網路之無縫換手
★ 感知網路下具預算限制之異質性子頻段分配	★ 下行服務品質排程在多天線傳輸環境下的效能評估
★ 多路徑傳輸控制協定下之整合型壅塞及路徑控制	★ Opportunistic Scheduling for Multicast over Wireless Networks
★ 適用多用戶多輸出輸入系統之低複雜度比例公平性排程設計	★ 利用混合式天線分配之 LTE 異質網路 UE 與 MIMO 模式選擇
★ 基於有限預算標價式拍賣之異質性頻譜分配方法	★ 適用於 MTC 裝置 ID 共享情境之排程式分群方法
★ Efficient Two-Way Vertical Handover with Multipath TCP	★ 多路徑傳輸控制協定下可亂序傳輸之壅塞及排程控制
★ 移動網路下適用於閘道重置之群體換手機制	★ 使用率能小型基地台之拍賣是行動數據分流方法
★ 高速鐵路環境下之通道預測暨比例公平性排程設計	★ 用於行動網路效能評估之混合式物聯網流量產生器

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-8-31以後開放)

摘要(中)

我們提出了一種結合資料增強技術和語義資訊的方法，以解決動態環境中因定位誤差增加而導致的視覺定位問題。視覺定位在自動駕駛車輛、機器人和增強現實 (AR) / 虛擬現實 (VR) 等應用中至關重要。然而，在動態環境中，特別是有頻繁人員移動的情況下，定位的準確性和穩定性往往會顯著下降。
為了解決這個問題，採用了資料增強中的隨機擦除方法。隨機擦除通過隨機遮擋圖像的一部分，讓模型學習更多樣化的特徵，從而提高模型的穩定性。然而，我們認為模型應該學習更有用的特徵。因此進一步結合了語義分割技術，以提取圖像中的人員區域，並對這些區域進行特殊處理。
這種結合的方法旨在提高模型在動態環境中的適應性，確保在實際應用中的定位準確性。我們在具有不同動態變化的室內環境（工廠）數據集中進行了實驗。實驗結果顯示，該方法減少了因人員移動而導致的定位誤差。在有人的區域，我們的方法將平移誤差最多降低了35.8\%，旋轉誤差幾乎維持一致，並提高了系統的穩定性。此外，在靜態環境中，我們的方法保持了高精度，展示了其在實際工廠使用場景中的適應性。

摘要(英)

We proposes a method that combines data augmentation techniques with semantic information to address the issue of increased positioning errors in visual localization caused by dynamic environments. Visual localization is crucial in applications such as autonomous vehicles, robotics, and augmented reality (AR) / virtual reality (VR). However, in dynamic environments, especially where there is frequent human movement, localization accuracy and stability often significantly decline.
To solve this problem, we adopted the Random Erasing technique from data augmentation. Random Erasing simulates object movement or occlusion by randomly masking parts of the image, allowing the model to learn more diverse features and improve its robustness in dynamic environments. However, we believe the model should learn more useful features. Therefore, we further integrated semantic segmentation techniques to extract human regions in the images and applied special processing to these areas.
This combined approach aims to enhance the model′s adaptability in dynamic environments, ensuring localization accuracy in practical applications.
We conducted experiments on our datasets with varying dynamic characteristics in indoor environment like factory. Experimental results show that this method reduces localization errors caused by human movement. In areas with human movement, our method reduces translation errors by at least 35.8 \% and improves system stability. Additionally, in static environments, our method maintains high accuracy, demonstrating its adaptability across various settings.

關鍵字(中)

★ 視覺定位
★ 數據增強
★ 語義分割
★ 隨機擦除

關鍵字(英)

★ Visual Localization
★ Data Augmentation
★ Semantic Segmentation
★ Random Erasing

論文目次

中文摘要/Chinese abstract i
英文摘要/English abstract ii
目次/Table of contents
1 Introduction 1
2 Related Works 4
2.1 Data Augmentation 4
2.2 Data Augmentation in Visual Localization 5
2.3 Semantic Information in Visual Localization 5
3 Semantic Augmentation 6
3.1 Erasing region 7
3.2 Erasing Probability 9
3.3 Model and Training Strategy 10
4 Dataset 12
4.1 Hardware Setting 12
4.2 Sequence Information 12
4.3 Data Pre-Processing 13
5 Experimental Setting and Results 15
5.1 Data 15
5.2 Experimental Setup 16
5.3 Impact of different erasing probability 16
5.4 Testing Set Errors 17
5.5 Testing Set Errors in Human Active Section 19
5.6 Testing Set Errors in Non-human Active Section 22
5.7 Compare with Grid Erasing Method 22
6 Conclusion and Future Work 24
6.1 Conclusion 24
6.2 Future Work 24
Bibliography 25

參考文獻

1. Alex Kendall, Matthew Grimes, and Roberto Cipolla. Posenet: A convolutional
network for real-time 6-dof camera relocalization. In Proceedings of the IEEE in ternational conference on computer vision, pages 2938–2946, 2015.
2. Noha Radwan, Abhinav Valada, and Wolfram Burgard. Vlocnet++: Deep multitask
learning for semantic visual localization and odometry. IEEE Robotics and Automa tion Letters, 3(4):4407–4414, 2018.
3. Weicai Ye, Xinyue Lan, Shuo Chen, Yuhang Ming, Xingyuan Yu, Hujun Bao,
Zhaopeng Cui, and Guofeng Zhang. Pvo: Panoptic visual odometry. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages
9579–9589, 2023.
4. Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing
data augmentation. In Proceedings of the AAAI conference on artificial intelligence,
volume 34, pages 13001–13008, 2020.
5. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks
for semantic segmentation. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 3431–3440, 2015.
6. Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution net work for semantic segmentation, 2015.
7. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net works for biomedical image segmentation. In Medical image computing and
computer-assisted intervention–MICCAI 2015: 18th international conference, Mu nich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241.
Springer, 2015.
8. Terrance DeVries and Graham W Taylor. Improved regularization of convolutional
neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
9. Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep, and Yong Jae Lee.
Hide-and-seek: A data augmentation technique for weakly-supervised localization
and beyond. arXiv preprint arXiv:1811.02545, 2018.
10. Piotr Wozniak and Dominik Ozog. Cross-domain indoor visual place recognition
for mobile robot via generalization using style augmentation. Sensors, 23(13):6134,
2023.
11. Gabriele Berton, Carlo Masone, and Barbara Caputo. Rethinking visual geo localization for large-scale applications. In Proceedings of the IEEE/CVF Confer ence on Computer Vision and Pattern Recognition, pages 4878–4888, 2022.
12. Suji Jang and Ue-Hwan Kim. On the study of data augmentation for visual place
recognition. IEEE Robotics and Automation Letters, 2023.
13. Carl Toft, Erik Stenborg, Lars Hammarstrand, Lucas Brynte, Marc Pollefeys,
Torsten Sattler, and Fredrik Kahl. Semantic match consistency for long-term vi sual localization. In Proceedings of the European Conference on Computer Vision
(ECCV), pages 383–399, 2018.
14. Johannes L Schonberger, Marc Pollefeys, Andreas Geiger, and Torsten Sattler. Se- ¨mantic visual localization. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 6896–6906, 2018.
15. Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and
Antonio Torralba. Semantic understanding of scenes through the ade20k dataset.
International Journal on Computer Vision, 2018.
16. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio
Torralba. Scene parsing through ade20k dataset. In Proceedings of the IEEE Con ference on Computer Vision and Pattern Recognition, 2017.
17. Carlos Campos, Richard Elvira, Juan J. Gomez Rodr ´ ´ıguez, Jose M. M. Montiel, and ´Juan D. Tardos. ORB-SLAM3: an accurate open-source library for visual, visual- ´
inertial and multi-map SLAM. CoRR, abs/2007.11898, 2020.

指導教授

黃志煒(Chih-Wei Huang)

審核日期

2024-8-20

推文