MFNet：基於點雲與RGB影像的多層級特徵融合神經網路之3D車輛偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：18

、訪客IP：3.22.236.44

姓名

吳丞鎬(Cheng-Haw Wu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

MFNet：基於點雲與RGB影像的多層級特徵融合神經網路之3D車輛偵測
(MFNet: 3D Vehicle Detection Based on Multilevel Fusion Network of Point Cloud and RGB Images)

相關論文

★ 使用bag-of-word特徵進行人臉與行為分析	★ Multi-Proxy Loss:基於度量學習提出之損失函數用於細粒度圖像檢索
★ 最近特徵線嵌入網路之影像物件辨識系統	★ 基於神經正切核實現點雲部件切割之旋轉強健性

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在交通工具逐漸普及的時代，自動駕駛被期許能改善交通壅塞與提供更多安全性，逐漸成為目前各界熱切研究的關鍵技術，例如先進駕駛輔助系統（ADAS）。自動駕駛的核心軟體功能大致可以分為三類：感知、規劃和控制，其中感知是指自動駕駛系統收集環境中的各類訊息，並從訊息中提取相關知識的能力。本篇論文關注於環境感知中確認車輛的偵測和定位能力。
傳統的電腦視覺領域，大部分物件偵測問題都是基於二維方式來研究。而近年隨著人們逐漸理解二維數據的侷限性，以及三維感測器如雙鏡頭相機、LiDAR等設備成本的降低，基於三維的物件偵測問題開始被重視。3D物件偵測可以取得物體的距離訊息和三維座標，並且能藉由感測器資料克服影像辨識中光線、角度和色差等問題，本篇論文的研究目標即是基於LiDAR資料與RGB影像的3D物件（車輛）偵測模型。
針對自動駕駛情境的高精確率3D車輛偵測，我們提出了多層級特徵融合網路（Multilevel Fusion Network，MFNet），這是一個將神經網路的跨層特徵重複利用並融合的深度學習模型，以LiDAR點狀雲和RGB影像作為輸入，藉由Encoder-Decoder網路擷取高解析度特徵圖，將其使用於RPN（Region Proposal Network）構成的初步融合網路與模型後半的高層融合網路，最後預測出多類別（車輛與行人）的機率與3D Bounding Box。
以著名的自動駕駛資料集KITTI為基準的實驗結果表明，我們的方法在3D物件偵測和鳥瞰圖評估都有良好的表現，尤其在高遮擋物件的困難級別評估有突出的平均AP值（mAP），並且處理速度高達約11 FPS，接近實時運算，快於近年的3D車輛偵測模型。

摘要(英)

In an age when transport is becoming more common, people expect that automated driving can improve traffic congestion and provide more security. This has gradually become a key technology in the current zealous research, such as advanced driver assistance systems (ADAS). The core functions of automatic driving can be roughly divided into three categories: perception, planning and control. Perception refers to the ability of the automated driving system to collect various types of information in the environment and extract relevant knowledge from the messages. Our paper focuses on the recognition of vehicles′ detection and positioning capabilities in environmental perception.
In the field of computer vision, most object detection problems are based on two-dimensional methods. In recent years, as people gradually understand the limitations of two-dimensional data and the cost reduction of three-dimensional sensors such as dual-lens cameras and LiDAR, 3D Object Detection has begun to receive attention. The purpose of 3D object detection is to obtain the distance information and 3D coordinates of the object, and to overcome the problems of light, angle, and color difference in image recognition by the sensor data. The research goal of this paper is the 3D object (vehicle) detection model based on LiDAR data and RGB images.
For high-precision 3D vehicle detection in the context of automated driving, we propose the MFNet (Multilevel Fusion Network). MFNet is a deep learning model that reuses and fuses cross-layer features of neural networks. It uses LiDAR point clouds and RGB images as input, and extracts high-resolution feature maps through an Encoder-Decoder network. It uses features to the Initial Fusion Network and High-level Fusion Networks formed by RPN (Region Proposal Network), and finally predicts the probability of multiple categories (vehicles and pedestrians) and 3D Bounding Box.
The experimental results based on the famous automatic driving data set KITTI show that our method has a good performance in 3D Object Detection and Bird′s-Eye View evaluation, especially in the Hard level evaluation of high obstructive objects with outstanding average AP values (mAP). MFNet′s processing speed is up to about 11 FPS, which is close to real-time computing, faster than recent 3D vehicle detection models.

關鍵字(中)

★ 自動駕駛
★ 3D物件偵測
★ LiDAR
★ 深度學習
★ ADAS

關鍵字(英)

★ Autopilot
★ 3D Object Detection
★ LiDAR
★ Deep Learning
★ ADAS

論文目次

摘要 v
Abstract vi
目錄 vii
圖目錄 ix
表目錄 x
第一章緒論 1
1.1 研究動機 1
1.2 相關文獻 4
1.3 系統流程與論文架構 6
第二章相關技術探討 9
2.1 基於R-CNN的物件偵測 9
2.2 基於點雲的深度學習方法 11
第三章資料前處理與特徵擷取網路 12
3.1 資料前處理 12
3.1.1 LiDAR點雲體素化 12
3.1.2 MFNet-aug與資料增強 14
3.2 特徵擷取網路 16
3.2.1 Seq2Seq模型 16
3.2.2 Encoder-Decoder網路 17
第四章多層級特徵融合網路 21
4.1 初步融合網路 21
4.1.1 瓶頸層（Bottleneck Layer） 23
4.1.2 K-means與ROI Pooling層 24
4.1.3 特徵融合 26
4.1.4 Regression-Classification網路 27
4.2 高層融合網路 28
4.2.1 投影ROI至特徵圖 29
4.2.2 特徵融合與Deep Fusion Network 29

第五章實驗結果與討論 31
5.1 實驗設備與KITTI資料集 31
5.1.1 實驗設備 31
5.1.2 The KITTI Vision Benchmark 33
5.2 實驗結果 35
5.2.1 3D車輛偵測 35
5.2.2 車輛與行人分類 36
5.3 實驗數據分析與比較 37
5.3.1 測試指標 37
5.3.2 基於mAP的分析比較 37
5.3.3 運算速度與模型參數 43
第六章結論與未來工作 45
參考文獻 46

參考文獻

[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv:1506.01497 [cs], Jun. 2015.
[2] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354–3361.
[3] Singulato USA, “Planning and Control in Autonomous Driving (with Prediction and Decision),” 06:38:16 UTC.
[4] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[5] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv:1512.03385 [cs], Dec. 2015.
[7] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” arXiv:1608.06993 [cs], Aug. 2016.
[8] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” arXiv:1612.00593 [cs], Dec. 2016.
[9] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” arXiv:1711.06396 [cs], Nov. 2017.
[10] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3D Bounding Box Estimation Using Deep Learning and Geometry,” arXiv:1612.00496 [cs], Dec. 2016.
[11] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliere, and T. Chateau, “Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image,” arXiv:1703.07570 [cs], Mar. 2017.
[12] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3D Object Detection for Autonomous Driving,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2147–2156.
[13] S. Gupta, J. Hoffman, and J. Malik, “Cross Modal Distillation for Supervision Transfer,” arXiv:1507.00448 [cs], Jul. 2015.
[14] S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,” arXiv:1511.02300 [cs], Nov. 2015.
[15] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detection Network for Autonomous Driving,” arXiv:1611.07759 [cs], Nov. 2016.
[16] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-Based Convolutional Networks for Accurate Object Detection and Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142–158, Jan. 2016.
[17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
[18] W. Liu et al., “SSD: Single Shot MultiBox Detector,” arXiv:1512.02325 [cs], vol. 9905, pp. 21–37, 2016.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, USA, 2012, pp. 1097–1105.
[20] L. Perez and J. Wang, “The Effectiveness of Data Augmentation in Image Classification using Deep Learning,” arXiv:1712.04621 [cs], Dec. 2017.
[21] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “AutoAugment: Learning Augmentation Policies from Data,” arXiv:1805.09501 [cs, stat], May 2018.
[22] K. Cho et al., “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation,” arXiv:1406.1078 [cs, stat], Jun. 2014.
[23] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, Cambridge, MA, USA, 2014, pp. 3104–3112.
[24] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” arXiv:1612.03144 [cs], Dec. 2016.
[25] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556 [cs], Sep. 2014.
[26] H. Sak, A. Senior, and F. Beaufays, “Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition,” arXiv:1402.1128 [cs, stat], Feb. 2014.
[27] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167 [cs], Feb. 2015.
[28] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv:1511.06434 [cs], Nov. 2015.
[29] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” arXiv:1512.00567 [cs], Dec. 2015.
[30] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” arXiv:1612.08242 [cs], Dec. 2016.
[31] A. Dertat, “Applied Deep Learning - Part 4: Convolutional Neural Networks,” Towards Data Science, 08-Nov-2017. [Online]. Available: https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2. [Accessed: Jun. 2018].
[32] C. Szegedy et al., “Going Deeper with Convolutions,” arXiv:1409.4842 [cs], Sep. 2014.
[33] A. Dixit, R. Kavicky, and A. Jain, Ensemble Machine Learning. Birmingham: Packt Publishing, 2017.
[34] R. Shanmugamani, Deep Learning for Computer Vision. Birmingham: Packt Publishing, 2018.
[35] G. Larsson, M. Maire, and G. Shakhnarovich, “FractalNet: Ultra-Deep Neural Networks without Residuals,” arXiv:1605.07648 [cs], May 2016.
[36] J. Wang, Z. Wei, T. Zhang, and W. Zeng, “Deeply-Fused Nets,” arXiv:1605.07716 [cs], May 2016.
[37] X. Du, A. Jr, M. H, S. Karaman, and D. Rus, “A General Pipeline for 3D Detection of Vehicles,” arXiv:1803.00387 [cs], Feb. 2018.
[38] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum PointNets for 3D Object Detection from RGB-D Data,” arXiv:1711.08488 [cs], Nov. 2017.

指導教授

范國清韓欽銓(Kuo-Chin Fan Chin-Chuan Han)

審核日期

2018-7-26

推文