強化特徵對齊的深度學習之3D物件偵測、辨識、與方位估計

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：21

、訪客IP：18.224.73.107

姓名

賴湘锝(Hsiang-Te Lai) 查詢紙本館藏

畢業系所

軟體工程研究所

論文名稱

強化特徵對齊的深度學習之3D物件偵測、辨識、與方位估計
(Enhancing Feature Alignment for 3D Object Detection, Recognition, and Position Estimation using Deep Learning)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，大量研究人力的投入，使得卷積神經網路 (convolution neural network, CNN) 在物件偵測與辨識的技術漸趨成熟。在3D物件偵測的應用上，卷積神經網路也協助了許多自動化任務，像是自駕車、工廠的機械手臂自動化生產技術等。在這些3D的應用任務中，目標物件的精確三維空間資訊是最重要的；但目前卷積神經網路在三維空間方位估計的精準度，還有改善的空間，因此在本研究中，我們藉由感興趣區域卷積 (RoI convolution) 的協助，並加入原始深度資料的再淬鍊，以迴歸出更準確的物體類別、3D位置、3D尺寸、及3D旋轉角度。
本研究的網路模式是從實驗室所發展的9DoF SE-YOLO偵測網路繼續修改而來，稱為 9DoF ADM-YOLO；主要改進的部份有：i.加入對齊偵測模組 (align detection module, ADM)，使得網路能調整錨框之大小及尺寸，並精確地擷取框內的特徵，使得最終迴歸能獲得更準確的結果；ii.加入原始深度資料分支，該分支從原始深度影像擷取特徵，能夠保留較準確的空間資訊，使得空間方面推論更為精確。
在實驗中，我們使用 NVIDIA “墜落物件” (Falling Things) 資料集做測試；該資料集中每組影像包含 RGB 彩色影像與 D 深度影像，一共20類物件，每一類物件包含1,000組影像，共20,000組影像；其中90%組做為訓練樣本，其餘為測試樣本。原始9DoF SE-YOLO物件偵測辨識系統的 mAP 為 93.59%；經過一連串的分析與改進後，最終的 9DoF ADM-YOLO，以 416×416 影像解析度進行測試，其平均執行速度為每秒33張影像，而其 mAP 達到 96.84%；相較於原架構在空間推論的結果，其3D位置估計有14%的提升和改進，3D尺寸估計提升了4%，而3D旋轉角度估計有20%提升。

摘要(英)

In recent years, many researchers have been devoting in the studying of convolution neural networks (CNNs), such that the development of CNNs for object detection and recognition is gradually matured. CNN techniques have applied on many kinds of automated tasks, such as autonomous car, autonomous production, etc. In these 3D application tasks, precise three-dimensional spatial information of target objects is the most important. However, CNNs still need to be improved in respect of three-dimensional spatial estimation. Thus, in this study, we develop a 3D detection CNN to get more accurate estimation on 3D object’s class, 3D position, 3D size, and 3D rotation angle by using RoI convolution and extra lower features of the original depth information.
The proposed CNN model is modified from our previous 9DoF SE-YOLO detection network. The key improvements are (i) adding align detection module to make the single-stage detector capable of generating region proposals, then extract precise features in those regions to infer a more accurate result in the final regression, and (ii) adding a raw depth image branch. This branch extracts lower-level features from the raw depth image to preserve more precise spatial information to infer more accurate spatial information.
In the experiment, we used the “Falling Things” dataset presented by NVIDIA to validate the proposed CNN model. Every image pair in the dataset includes a RGB color image and a D depth image. There are 20 classes of objects and each class has 1,000 image pairs; thus totally we used 20,000 image pairs for the following experiments. 90% image pairs are taken as training set and the remaining are for validation. The mAP of the previous 9DoF SE-YOLO model is 93.59%. After a series of analysis and modifications, the mAP of the proposed 9DoF ADM-YOLO model reaches 96.84% with an average 33 fps execution speed and run on 416×416 images. In comparison with the result of spatial inference by previous network, it has improved 3D position estimation by 14%, 3D size estimation by 4%, and 3D rotation angle estimation by 20%.

關鍵字(中)

★ 深度學習
★ 卷機神經網路
★ 物件偵測
★ 特徵對齊
★ 方位估計

關鍵字(英)

★ deep learning
★ convolution neural network
★ object detection
★ feature alignment
★ pose estimation
★ YOL

論文目次

摘要 ii
Abstract iii
目錄 v
圖目錄 vii
表目錄 ix
第一章緒論 1
1.1 研究動機 1
1.2 系統概述 3
1.3 論文特色 5
1.4 論文架構 5
第二章相關研究 6
2.1 卷積神經網路偵測系統發展 6
2.2 旋轉目標偵測辨識 13
2.3 偵測系統裡的特徵對齊相關技術 18
第三章 9DoF 網路架構修改 22
3.1 9DoF SE-YOLO 架構 22
3.2 基於9DoF SE-YOLO 架構的網路修改 30
第四章實驗結果與討論 40
4.1 實驗設備與開發環境 40
4.2 訓練卷積神經網路 40
4.3 評估準則 45
4.4 卷積神經網路架構的比較 47
4.5 9DoF ADM77-YOLO + depth_conv/2_concat 結果展示 54
第五章結論與未來展望 57
參考文獻 59

參考文獻

[1] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, NV, Dec.3-8, 2012, pp.1106-1114.
[2] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” arXiv:1405.0312.
[3] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and F.-F. Li, “Imagenet large scale visual recognition challenge,” Int. Journal of Computer Vision (IJCV), vol.115, no.3, pp.211-252, 2015.
[4] M. Everingham, L. V. Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge, ” Int. Journal of Computer Vision (IJCV), vol.88, no.2, pp.303-338, 2010.
[5] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional neural networks,” in Proc. of European Conference on Computer Vision Conf., Zurich, Switzerland, Sep.6-12, 2014, pp.818-833.
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.1-9.
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. of International Conference on Learning Representations Conf., San Diego, CA, May 7-9, 2015, pp.1-14.
[8] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proc. of ICML Conf., Lille, France, Jul.7-9, 2015, vol.37, pp.448-456.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[10] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep networks,” in Proc. of Neural Information Processing Systems (NIPS), Montréal, Canada, Dec.7-12, 2015, pp.2377-2385.
[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, Jun.23-28, 2014, pp.580-587.
[12] R. Girshick, “Fast R-CNN,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[13] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, no.6, pp.1137-1149, 2016.
[14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp.779-788.
[15] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HZ, Jul.21-26, 2017, pp.6517-6525.
[16] J. Redmon and A. Farhadi, “Yolov3: an incremental improvement,” arXiv:1804.02767.
[17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. European Conf. on Computer Vision (ECCV), Amsterdam, Holland, Oct.8-16, 2016, pp.21-37.
[18] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “DSSD: Deconvolutional single shot detector,” arXiv:1701.06659.
[19] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” arXiv:1708.02002.
[20] 陳世翔，深度學習的3D物件偵測、辨識、與方位估計，碩士論文，資訊工程系，國立中央大學，桃園市，台灣，2020/6。
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[22] Jonathan Tremblay, Thang To, and Stan Birchfield, “Falling Things: A synthetic dataset for 3D object detection and pose estimation,” arXiv:1804.06534.
[23] Y. Chen, C. Han, N. Wang, and Z. Zhang, “Revisiting feature alignment for one-stage object detection,” arXiv:1908.01570.
[24] A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” in Proc. of IEEE Int. Conf. on Pattern Recognition(ICPR), Hong Kong, Aug.20-24, 2006, pp.850-855.
[25] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22-29, 2017, pp.764-773.
[26] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22-29, 2017, pp.2980-2988.
[27] J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. Journal of Computer Vision (IJCV), vol.104, no.2, pp.154-171, 2013.
[28] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proc. European Conf. on Computer Vision (ECCV), Zurich, Switzerland, Sep.6-12, 2014, pp.346-361.
[29] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, CA, Jun.21-Jul.18, vol.1, 1967, pp.281-297.
[30] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul.21-26, 2017, pp.936-944.
[31] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Li, “Single shot refinement neural network for object detection,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.4203-4212.
[32] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.6154-6162.
[33] X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, and J. Yan, “FOTS: Fast oriented text spotting with a unified network,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) , Salt Lake City, UT, June.18-23, 2018, pp.5676-5685.
[34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Int. Journal of Neural Computation, vol.2, no.8, pp.1735-1780, 1997.
[35] X. Yang, J. Yan, Z. Feng, and T. He, “R3Det: Refined single-stage detector with feature refinement for rotating object,” arXiv:1908.05612.
[36] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,” arXiv:1711.00199.
[37] B. Tekin, S. N. Sinha, and P. Fua, “Real-time seamless single shot 6D object pose prediction,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) , Salt Lake City, UT, 2018, pp.292-301.
[38] S. Shi, X. Wang, and H. Li, “PointRCNN: 3D object proposal generation and detection from point cloud,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) , Long Beach, CA, June.16-20, 2019, pp.770-779.
[39] C.-R. Qi, L. Yi, H. Su, and L.-J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” in Proc. of Int. Conf. on Neural Information Processing Systems (NIPS), Long Beach, CA, Dec.4-9, 2017, pp.5105-5114.
[40] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. of Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, Oct.5-9, 2015, pp.234-241.
[41] D. P. Kingma, and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980.

指導教授

曾定章(Din-Chang Tseng)

審核日期

2021-7-28

推文