在交通工具逐漸普及的時代,自動駕駛被期許能改善交通壅塞與提供更多安全性,逐漸成為目前各界熱切研究的關鍵技術,例如先進駕駛輔助系統(ADAS)。自動駕駛的核心軟體功能大致可以分為三類:感知、規劃和控制,其中感知是指自動駕駛系統收集環境中的各類訊息,並從訊息中提取相關知識的能力。本篇論文關注於環境感知中確認車輛的偵測和定位能力。 傳統的電腦視覺領域,大部分物件偵測問題都是基於二維方式來研究。而近年隨著人們逐漸理解二維數據的侷限性,以及三維感測器如雙鏡頭相機、LiDAR等設備成本的降低,基於三維的物件偵測問題開始被重視。3D物件偵測可以取得物體的距離訊息和三維座標,並且能藉由感測器資料克服影像辨識中光線、角度和色差等問題,本篇論文的研究目標即是基於LiDAR資料與RGB影像的3D物件(車輛)偵測模型。 針對自動駕駛情境的高精確率3D車輛偵測,我們提出了多層級特徵融合網路(Multilevel Fusion Network,MFNet),這是一個將神經網路的跨層特徵重複利用並融合的深度學習模型,以LiDAR點狀雲和RGB影像作為輸入,藉由Encoder-Decoder網路擷取高解析度特徵圖,將其使用於RPN(Region Proposal Network)構成的初步融合網路與模型後半的高層融合網路,最後預測出多類別(車輛與行人)的機率與3D Bounding Box。 以著名的自動駕駛資料集KITTI為基準的實驗結果表明,我們的方法在3D物件偵測和鳥瞰圖評估都有良好的表現,尤其在高遮擋物件的困難級別評估有突出的平均AP值(mAP),並且處理速度高達約11 FPS,接近實時運算,快於近年的3D車輛偵測模型。;In an age when transport is becoming more common, people expect that automated driving can improve traffic congestion and provide more security. This has gradually become a key technology in the current zealous research, such as advanced driver assistance systems (ADAS). The core functions of automatic driving can be roughly divided into three categories: perception, planning and control. Perception refers to the ability of the automated driving system to collect various types of information in the environment and extract relevant knowledge from the messages. Our paper focuses on the recognition of vehicles′ detection and positioning capabilities in environmental perception. In the field of computer vision, most object detection problems are based on two-dimensional methods. In recent years, as people gradually understand the limitations of two-dimensional data and the cost reduction of three-dimensional sensors such as dual-lens cameras and LiDAR, 3D Object Detection has begun to receive attention. The purpose of 3D object detection is to obtain the distance information and 3D coordinates of the object, and to overcome the problems of light, angle, and color difference in image recognition by the sensor data. The research goal of this paper is the 3D object (vehicle) detection model based on LiDAR data and RGB images. For high-precision 3D vehicle detection in the context of automated driving, we propose the MFNet (Multilevel Fusion Network). MFNet is a deep learning model that reuses and fuses cross-layer features of neural networks. It uses LiDAR point clouds and RGB images as input, and extracts high-resolution feature maps through an Encoder-Decoder network. It uses features to the Initial Fusion Network and High-level Fusion Networks formed by RPN (Region Proposal Network), and finally predicts the probability of multiple categories (vehicles and pedestrians) and 3D Bounding Box. The experimental results based on the famous automatic driving data set KITTI show that our method has a good performance in 3D Object Detection and Bird′s-Eye View evaluation, especially in the Hard level evaluation of high obstructive objects with outstanding average AP values (mAP). MFNet′s processing speed is up to about 11 FPS, which is close to real-time computing, faster than recent 3D vehicle detection models.