結合平行殘差雙融合特徵金字塔網路及自注意力機制之交通燈號辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：95

、訪客IP：3.22.79.125

姓名

龔姿紜(Tzu-Yun Kung) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

結合平行殘差雙融合特徵金字塔網路及自注意力機制之交通燈號辨識
(Combination of Parallel Residual Bi-Fusion Feature Pyramid Network and Self-Attention Mechanism for Traffic Light Recognition)

相關論文

★ 基於嘴唇影像序列之生物認證

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-13以後開放)

摘要(中)

在本篇論文中，我們主要探討如何讓自動駕駛汽車在現代複雜環境下行駛。目前，能夠在道路上行駛的自動駕駛車輛幾乎都是在相對簡單的道路環境中。為了使自動駕駛車輛能夠安全地在相對複雜的道路環境中行駛，需要升級物件偵測和辨識技術以實現這一目標。
過去，在這個領域中大多數技術都是由高強度的卷積神經網絡(CNN)為主導。然而，近年來隨著技術的進步，許多研究人員逐漸將原本處理自然語言 (NLP)技術的方法應用於這個領域，以獲得更出色的成果。有鑒於此，我們提出了一種結合平行殘差雙融合特徵金字塔網路和自注意力機制的物件模型，來實現模擬車輛行進中的交通燈號辨識。
在我們提出的架構中，我們使用主流的一階段物體偵測模型的骨幹，採用多尺度特徵融合金字塔方法和不同的注意力機制模塊，結合架構調整和優化器的選擇。實驗結果顯示，所提出的方法在所有驗證指標中都有顯著的提升。這表明提出的方法在交通燈偵測和辨識方面確實取得了更好的效果。

摘要(英)

In this thesis, we mainly discuss how to make the self-driving car driving under complicated environments in modern era. Currently, the self-driving vehicles that can drive on the road are almost in relatively simple road environment. In order to make the self-driving driving safely in the relatively complex road environment, the object detection and recognition technologies need to be upgraded to achieve this goal.
In the past, most techniques employed in this field were dominated by high-intensity Convolutional Neural Networks (CNN). However, many researchers have gradually applied the original method of processing Natural Language Processing (NLP) technique in this field to achieve better results with the progress of technology recently. In view of this, we propose an object model by combining parallel residual bi-fusion feature pyramid network and self-attention mechanism to realize traffic light recognition in simulated vehicle maneuvering.
In our proposed architecture, we use the backbone of mainstream one-stage object detection model with a multi-scale feature fusion pyramid approach and different attention mechanism modules, coupling with architectural tuning and optimizer selection. Experimental results reveal that the proposed method exhibits noticeable improvement in all verification indicators. It indicates that the proposed method really possesses better results on traffic light detection and recognition.

關鍵字(中)

★ 物件偵測
★ 注意力機制
★ 特徵金字塔
★ 自動駕駛汽車
★ 交通燈號

關鍵字(英)

★ Object detection
★ attention mechanism
★ feature pyramid
★ self-driving car
★ traffic light

論文目次

摘要 i
ABSTRACTS ii
誌謝 iii
目錄 iv
圖目錄 vii
表目錄 ix
一、緒論 1
1-1 研究背景與動機 1
1-2 研究目的 2
1-3 研究方法 5
1-4 論文總覽 6
二、文獻探討與回顧 7
2-1 物件偵測 7
2-1-1物件偵測（Object Detection） 7
2-1-2 YOLO（You Only Look Once） 9
2-2 特徵金字塔 21
2-2-1特徵金字塔網路（Feature Pyramid Network） 21
2-2-2路徑聚合網路（Path Aggregation Network） 22
2-2-3平行殘差雙融合特徵金字塔（PRB-FPN） 23
2-3 注意力機制 24
2-3-1自注意力機制（Self-Attention） 24
2-3-2 Vision Transformer（ViT） 26
2-3-3 Coordinate Attention（CA） 28
2-4 視覺推理分析（Analysis Of Visual Reasoning） 29
三、研究方法 32
3-1 模型架構 32
3-1-1 整體架構流程 32
3-1-2 PRB-FPN 34
3-1-3 Transformer Block 35
3-1-4 Coordinate Attention 37
3-1-5 架構策略與實踐 37
3-2 Adam優化器 38
四、實驗結果 41
4-1 實驗環境 41
4-2 資料集 42
4-2-1訓練資料集（Training Dataset） 42
4-2-2測試資料集（Test Dataset） 47
4-3 驗證指標 49
4-3-1 Precision 49
4-3-2 Recall 50
4-3-3 F1 Score 50
4-3-4 Mean Average Precision 51
4-4 實驗結果 51
4-5 消融實驗（Ablation Experiments） 58
五、結論與未來探討 67
5-1 結論 67
5-2 未來探討 68
參考文獻 69
附錄一 73
附錄二 74

參考文獻

[1] J. SHUTTLEWORTH and S. International. "SAE Standards News: J3016 automated-driving graphic update." https://www.sae.org/news/2019/01/sae-updates-j3016-automated-driving-graphic.
[2] W.-. 地圖型行車影像分享平台. "WoWtchout - 地圖型行車影像分享平台." https://www.youtube.com/@WoWtchout.
[3] A. Bochkovskiy, C.-Y. Wang, and Y. Hong, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv pre-print server, 2020-04-23 2020, doi: None, arxiv:2004.10934.
[4] C.-Y. Wang, A. Bochkovskiy, and Y. Hong, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," arXiv pre-print server, 2022-07-06 2022, doi: None, arxiv:2207.02696.
[5] W. Liu et al., "SSD: Single Shot MultiBox Detector," Springer International Publishing, 2016, pp. 21-37.
[6] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," arXiv pre-print server, 2016-01-06 2016, doi: None, arxiv:1506.01497.
[7] K. He, G. Gkioxari, P. Doll′ar, and R. Girshick, "Mask R-CNN," arXiv pre-print server, 2018-01-24 2018, doi: None, arxiv:1703.06870.
[8] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," arXiv pre-print server, 2016-05-09 2016, doi: None, arxiv:1506.02640.
[9] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," arXiv pre-print server, 2016-12-25 2016, doi: None, arxiv:1612.08242.
[10] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv pre-print server, 2018-04-08 2018, doi: None, arxiv:1804.02767.
[11] G. Jocher. "yolov5." https://github.com/ultralytics/yolov5.
[12] C. Li et al., "YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications," arXiv pre-print server, 2022-09-07 2022, doi: None, arxiv:2209.02976.
[13] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, "RepVGG: Making VGG-style ConvNets Great Again," arXiv pre-print server, 2021-01-11 2021, doi: None, arxiv:2101.03697.
[14] K. Weng, X. Chu, X. Xu, J. Huang, and X. Wei, "EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design," arXiv pre-print server, 2023-02-01 2023, doi: None, arxiv:2302.00386.
[15] D. Wu et al., "Detection of Camellia oleifera Fruit in Complex Scenes by Using YOLOv7 and Data Augmentation," Applied Sciences, vol. 12, no. 22, p. 11318, 2022, doi: 10.3390/app122211318.
[16] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv pre-print server, 2015-04-10 2015, doi: None, arxiv:1409.1556.
[17] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv pre-print server, 2015-12-10 2015, doi: None, arxiv:1512.03385.
[18] G. Huang, Z. Liu, Laurens, and Kilian, "Densely Connected Convolutional Networks," arXiv pre-print server, 2018-01-28 2018, doi: None, arxiv:1608.06993.
[19] glenn-jocher. "ultralytics." https://github.com/ultralytics/ultralytics.
[20] T.-Y. Lin, P. Doll′ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," arXiv pre-print server, 2017-04-19 2017, doi: None, arxiv:1612.03144.
[21] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," arXiv pre-print server, 2018-09-18 2018, doi: None, arxiv:1803.01534.
[22] P.-Y. Chen, M.-C. Chang, J.-W. Hsieh, and Y.-S. Chen, "Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection," IEEE Transactions on Image Processing, vol. 30, pp. 9099-9111, 2021, doi: 10.1109/tip.2021.3118953.
[23] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[24] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-End Object Detection with Transformers," arXiv pre-print server, 2020-05-28 2020, doi: None, arxiv:2005.12872.
[25] Y. Lee, J.-w. Hwang, S. Lee, Y. Bae, and J. Park, "An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection," arXiv pre-print server, 2019-04-22 2019, doi: None, arxiv:1904.09730v1.
[26] A. Dosovitskiy et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv pre-print server, 2020-10-22 2020, doi: None, arxiv:2010.11929.
[27] Q. Hou, D. Zhou, and J. Feng, "Coordinate Attention for Efficient Mobile Network Design," arXiv pre-print server, 2021-03-04 2021, doi: None, arxiv:2103.02907.
[28] J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," 2018: IEEE, doi: 10.1109/cvpr.2018.00745. [Online]. Available: https://dx.doi.org/10.1109/cvpr.2018.00745
[29] S. Woo, J. Park, J.-Y. Lee, and In, "CBAM: Convolutional Block Attention Module," arXiv pre-print server, 2018-07-18 2018, doi: None, arxiv:1807.06521.
[30] T. Aksoy and U. Halici, "Analysis of visual reasoning on one-stage object detection," arXiv preprint arXiv:2202.13115, 2022.
[31] Diederik and J. Ba, "Adam: A Method for Stochastic Optimization," arXiv pre-print server, 2017-01-30 2017, doi: None, arxiv:1412.6980.
[32] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv pre-print server, 2017-06-15 2017, doi: None, arxiv:1609.04747.
[33] K. Behrendt, L. Novak, and R. Botros, "A deep learning approach to traffic lights: Detection, tracking, and classification," in 2017 IEEE International Conference on Robotics and Automation (ICRA), 29 May-3 June 2017 2017, pp. 1370-1377, doi: 10.1109/ICRA.2017.7989163.
[34] K. Behrendt and L. Novak. "Bosch Small Traffic Lights Dataset." https://hci.iwr.uni-heidelberg.de/content/bosch-small-traffic-lights-dataset.
[35] karstenBehrendt. "bosch-ros-pkg/bstld." https://github.com/bosch-ros-pkg/bstld.
[36] Alex, O. Andrienko, A. Harakeh, and Steven, "A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection," arXiv pre-print server, 2018-09-13 2018, doi: None, arxiv:1806.07987.

指導教授

范國清莊啟宏(Kuo-Chin Fan Chi-Hong Chuang)

審核日期

2023-7-18

推文