dc.description.abstract | In an age when transport is becoming more common, people expect that automated driving can improve traffic congestion and provide more security. This has gradually become a key technology in the current zealous research, such as advanced driver assistance systems (ADAS). The core functions of automatic driving can be roughly divided into three categories: perception, planning and control. Perception refers to the ability of the automated driving system to collect various types of information in the environment and extract relevant knowledge from the messages. Our paper focuses on the recognition of vehicles′ detection and positioning capabilities in environmental perception.
In the field of computer vision, most object detection problems are based on two-dimensional methods. In recent years, as people gradually understand the limitations of two-dimensional data and the cost reduction of three-dimensional sensors such as dual-lens cameras and LiDAR, 3D Object Detection has begun to receive attention. The purpose of 3D object detection is to obtain the distance information and 3D coordinates of the object, and to overcome the problems of light, angle, and color difference in image recognition by the sensor data. The research goal of this paper is the 3D object (vehicle) detection model based on LiDAR data and RGB images.
For high-precision 3D vehicle detection in the context of automated driving, we propose the MFNet (Multilevel Fusion Network). MFNet is a deep learning model that reuses and fuses cross-layer features of neural networks. It uses LiDAR point clouds and RGB images as input, and extracts high-resolution feature maps through an Encoder-Decoder network. It uses features to the Initial Fusion Network and High-level Fusion Networks formed by RPN (Region Proposal Network), and finally predicts the probability of multiple categories (vehicles and pedestrians) and 3D Bounding Box.
The experimental results based on the famous automatic driving data set KITTI show that our method has a good performance in 3D Object Detection and Bird′s-Eye View evaluation, especially in the Hard level evaluation of high obstructive objects with outstanding average AP values (mAP). MFNet′s processing speed is up to about 11 FPS, which is close to real-time computing, faster than recent 3D vehicle detection models. | en_US |