博碩士論文 107522117 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator陳世翔zh_TW
DC.creatorShi-Xiang Chenen_US
dc.date.accessioned2020-7-28T07:39:07Z
dc.date.available2020-7-28T07:39:07Z
dc.date.issued2020
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=107522117
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract近年來,深度學習技術的快速崛起,使得它在物件偵測與辨識的應用也漸趨成熟;物件偵測的技術也逐漸的擴展到3D應用層面;例如,自駕車、虛擬實境、擴增實境、機器手臂。3D偵測要使用3D影像,3D影像相較於2D影像多了深度資訊,然而3D物件偵測因多了深度資料而變得更困難;例如,有效擷取深度影像特徵、處理更複雜的高維度資料、物體之間的混雜和遮擋、更複雜的場景等等。在本研究中,我們提出一個可直接估計3D物件位置、方向、與大小的卷積神經網路 (convolution neural network, CNN);透過輸入RGB與深度影像,卷積神經網路擷取特徵並預測物體的類別、姿態、和位置,最後輸出3D邊界框 (bounding box)。 本研究所使用的卷積神經網路模式是改自於有名的2D偵測網路YOLOv3。我們的主要改進分兩部份,一是修改YOLOv3的輸入端,使用RGB與深度影像作為輸入,且將YOLOv3 中的 Darknet-53 架構加入通道注意力 (channel attention) 強化擷取特徵能力,並使用這些特徵進行多尺度的偵測與辨識;二是物件的3D位移分量藉由物件中心與相機的距離來估計,並修改損失函數 (loss function) 加入四元數 (quaternion) 估計物件的3D旋轉分量,最後預測出多類別的物件機率與三維座標、方向及大小尺寸,並輸出3D邊界框。 在實驗中,我們將YOLOv3修改為6DoF YOLO,使網路預測3D邊界框,在(Falling Thing)資料庫下,使用了20854張影像,其中90%為訓練樣本,其餘為測試樣本,此物件偵測系統的mAP為89.33%,經過一連串改動與實驗分析後,我們最終使用的6DoF SE-YOLO架構,此架構增加約1.014倍的參數量及1.002倍的計算量,影像以416×416解析度進行測試,平均執行速度為每秒35張影像,mAP達到93.59%。zh_TW
dc.description.abstractAccording to rising of deep learning technology, its application in object detection and recognition gradually mature recently. Object detection technology has gradually developed to the 3D application. For example, self-driving cars, virtual reality, augmented reality, and robotic arms. 3D images have depth information, but 2D images haven’t. 3D object detection becomes more difficult due to the depth data. For example, depth image features extracted effectively, complex high-dimensional data handled, object occluded each other, scenes clutter, etc. In our research, we propose a convolution neural network (CNN) that can estimate directly the position and size of 3D objects. After input RGB and depth images extracts features, model outputs 3D bounding boxes. In our research, model adapted from the famous 2D detection network YOLOv3. We made two improvements of model. First, we modify the input which use RGB and depth images. We use channel attention to enhance the ability to extract features. These features used for multi-scale detection and identify. Second, we estimated the 3D translation by localizing object center in the image and estimating distance object distance from the camera. We add quaternion to the loss function that can estimate the 3D rotation. Our model can predict 3D bounding box which contain the object class, 3D coordinate, position and size. In the experiment, we modified YOLOv3 to 6DoF YOLO which can predict the 3D bounding box. There are 20854 images in (Falling Thing) dataset, 90% of which are training data and the others are test data. 6DoF YOLO get 89.33% mAP. After experimental analysis, we finally use the 6DoF SE-YOLO architecture. This architecture increases the parameter calculation amount by 1.014 times and 1.002 times, respectively. Our model can reach 93.59% mAP, and the average execution speed on 416×416 images is 35 frames per second.en_US
DC.subject3D 物件偵測zh_TW
DC.subject方位估計zh_TW
DC.subject四元數zh_TW
DC.subject物件偵測zh_TW
DC.subject6個自由度zh_TW
DC.subject3D Object detectionen_US
DC.subjectposition estimationen_US
DC.subjectquaternionen_US
DC.subjectObject detectionen_US
DC.subject6 degree of freedomen_US
DC.title深度學習的3D物件偵測、辨識、 與方位估計zh_TW
dc.language.isozh-TWzh-TW
DC.title3D Object detection, recognition, and position estimation using CNNen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明