以點雲數據做3D物件偵測、辨識、與方位估計的深度學習系統;3D Object Detection, Recognition, and Position Estimation using A Deep Learning System with Point Cloud Data

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/89923

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89923

題名:	以點雲數據做3D物件偵測、辨識、與方位估計的深度學習系統;3D Object Detection, Recognition, and Position Estimation using A Deep Learning System with Point Cloud Data
作者:	王偉齊;Wang, Wei-Chi
貢獻者:	資訊工程學系
關鍵詞:	3D物件偵測;點雲;注意力;GIoU損失;焦點損失;3D object detection;point cloud;attention;GIoU loss;focal loss
日期:	2022-08-10
上傳時間:	2022-10-04 12:04:55 (UTC+8)
出版者:	國立中央大學
摘要:	近年來，深度學習技術的快速崛起，使得它在物件偵測與辨識的應用也逐漸成熟；物件偵測的技術也逐漸從2D擴展到3D應用層面；例如，自動駕駛、安全監控、人機互動、交通控制等。3D偵測要使用3D影像，當前的3D物件偵測受2D偵測方法的影響很大，為了利用2D偵測方法，他們通常將3D資料表示為規則網格（體素或上視圖），或依靠2D影像中的偵測來提出3D框，很少有方法試圖直接偵測3D點雲中的物體。由於點雲數據的稀疏性質，在直接根據場景點預測邊界框參數時，面臨著重大挑戰：3D對象中心可能遠離任何表面點，因此很難準確回歸。在本研究中，我們提出一個可直接估計3D物件位置、方向、與大小的卷積神經網路；透過輸入點雲資料，神經網路擷取特徵並預測物體的類別、位置、和方位角，最後輸出3D邊界框 (bounding box)。本研究所使用的卷積神經網路模式是改自於3D偵測網路VoteNet。我們的主要改進分兩部份，一是將VoteNet中的 PointNet++ 架構加入注意力 (attention) 強化擷取特徵能力，並使用這些特徵進行偵測與辨識；二是修改損失函數 (loss function) ，加入GIoU (generalized intersection over union)損失函數、焦點損失 (focal loss) 函數，使神經網路可以更容易優化。在實驗中，我們使用修改後的VoteNet做3D物件邊框估計與辨識。使用SUN RGB-D資料庫的7,870張影像，其中約80%為訓練樣本，其餘為測試樣本，我們在NVIDIA GeForce RTX 2080ti上進行訓練。原始VoteNet物件偵測辨識系統平均執行速度為每秒11.20張影像，在IoU門檻值為0.25時mAP為66.13%，在門檻值為0.5時mAP為43.87%。經過一連串改動與實驗分析後，我們最終使用的網路架構，平均執行速度為每秒10.94張影像；與原始VoteNet網路相比，在IoU門檻值為0.25時mAP達到67.33%，約改進了1.81%；在門檻值為0.5時mAP為48.19%，約改進了9.84%。 ;Based on the rise of deep learning technology, its application in object detection and recognition gradually mature recently. Object detection technology has gradually developed from 2D to 3D application, like self-driving cars, security monitor, human–computer interaction, and traffic control. 3D images have depth information, but current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D data to regular grids (voxel grids or bird’s eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in 3D points. Due to the sparse nature of the data, we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately. In our research, we propose a neural network that can estimate directly the position and size of 3D objects. After inputting point clouds data, network extracts features, and model outputs 3D bounding boxes. In our research, model we used are revised from the famous 3D detection network VoteNet. We made two improvements of the model. First, we use attention to enhance the ability to extract features. These features are used for detection and identify. Second, we revised the loss function. Adding GIoU loss function and focal loss function makes the model easier to optimize. In the experiment, we used modified VoteNet to execute 3D bounding box estimation and recognition. There are 7,870 images in SUN RGB-D dataset, about 80% of which are training data and the others are testing data. We trained the model on NVIDIA GeForce RTX 2080ti. The average execution speed of the previous VoteNet model is 11.20 frames per second, and the mAP is 66.13% with IoU threshold 0.25, 43.87% mAP with IoU threshold 0.5. After experimental analysis, the network architecture’s average execution speed is 10.94 frames per second. In comparison with the result of previous VoteNet network, our network can reach 67.33% mAP. It has improved by 1.81% with IoU threshold 0.25. With IoU threshold 0.5, our network can reach 48.19% mAP and has improved by 9.84%.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	32	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....