使用深度與彩色影像的卷積神經網路做倒車障礙物偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：9

、訪客IP：3.133.144.195

姓名

謝鎧楠(Kai-Nan Hsieh) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用深度與彩色影像的卷積神經網路做倒車障礙物偵測
(Rear obstacle detection using a deep convolutional neural network with RGB-D images)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

汽車成為人們依賴的交通工具後也衍生出許多相關車輛的意外事件。因為駕駛未注意到車輛後方的情況而造成稻車時的碰撞事故是經常發生的事故之一。為了減少這類事故的發生，透過電腦視覺偵測與辨識的技術來了解車輛後方的情況，以提醒駕駛注意車輛後方的安全。最近由於卷積神經網路 (CNN) 的發展使得電腦視覺上在偵測與辨識的能力比以往有更高的準確率及穩定性。透過深度學習的方式來訓練電腦視覺系統找出可能會在倒車時造成危險的物件並利用深度資訊了解該物件與車輛的距離來了解是否可能發生碰撞以警示駕駛，透過 3D 相機 (3D camera) 取得深度資訊來輔助作為判斷影像中障礙物的資訊來識別是否有會造成倒車意外的立體物件存在於影像當中。
由於 3D 相機 Kinect 的彩色相機模組與深度相機模組在位置上及視野 (FOV) 範圍不同，我們必須先將所拍攝到的彩色影像與距離影像透過 Kinect SDK 進行位置校正避免我們後續在準備訓練資料時框選的位置有太大的差異導致訓練誤差。訓練資料準備完畢之後我們修改更快速區域卷積神經網路 (Faster Regions with Convolutional Neural Networks, Faster R-CNN) 的資料輸入端使卷積神經網路可以接受深度影像及彩色影像的四維影像 (RGB-D) 輸入。我們的實驗包含不同的輸入影像：彩色影像、深度影像及彩色與深度影像的四維影像 (RGB-D) 來進行障礙物偵測及兩種不同的卷積網路神經架構對彩色影像及距離影像提取特徵的方式來比較結果，找出障礙物之後透過我們所使用的深度影像資訊計算出車輛與其距離。我們最後實驗的結果顯示對於四維影像的特徵提取方式效果最佳的是對彩色影像及深度影像透過不同的卷積層分別提取特徵圖，卷積層分別提取出特徵圖後將彩色影像的特徵圖及深度影像的特徵圖進行串接，串接結果輸入全連接層進行最後的偵測及辨識。

摘要(英)

Car accident happens frequently after becoming the most popular transportation devices in daily life, and it costs life and properties because of driver’s negligence. Therefore, many motor manufacturing have invested and developed the “Driving Assistant System” in order to promote the safety of driving. Computer Vision (CV) has been adopted due to it’s ability of object detection and recognition. In recent years, Convolutional neural networks (CNN) has dramaticly developed which makes computer vision much more reliable.
We train our “Rear obstacle detection and recognizing system” via deep learning model and use data of color image and depth image which received from Microsoft KinectV2. Because of the field of view (FOV) from KinectV2 is different, we calibrate color image and depth image using Kinect SDK in order to decrease the disparity of pixel position. Our detecting and recognizing system is based on Faster R-CNN. Our input data contains two images, and we experiment two different architectures on convolutional neural networks to extract feature maps from input data. One is single feature extractor and single classifier, and the other is two feature extractor and single classifier. Two feature extractor generate the best detection result. Furthermore, we use only color image or depth image as input doing experiments comparing with previous two methods. Finally, after detecting obstacle we use depth image to estimate the distance between vehicle and obstacle.

關鍵字(中)

★ 卷積神經網路
★ 深度與彩色影像
★ 障礙物偵測

關鍵字(英)

★ Faster R-CNN
★ Rear Obstacle Detection

論文目次

摘要 ii
Abstract iii
致謝 iiiv
目錄 v
圖目錄 vii
表目錄 x
第一章緒論 1
1.1 研究動機 1
1.2 系統架構 2
1.3 系統特色 6
1.4 論文架構 7
第二章相關研究 8
2.1 障礙物偵測 8
2.1.1 單眼動態資訊 8
2.1.2 靜態資訊機器學習法 10
2.1.3 雙眼立體視覺法 11
2.2 卷積神經網路的物件偵測 13
2.2.1 卷積神經網路 13
2.2.2 區域卷積神經網路 14
2.2.3 快速區域卷積神經網路 14
2.3 四維輸入影像的卷積神經網路 15
2.3.1 深度影像的取得 15
2.3.2 四維輸入卷積神經網路架構 16
第三章更快速區域卷積神經網路 20
3.1 更快速區域卷積神經網路介紹 20
3.2 區域建議網路 21
3.3 感興趣區域池化 24
3.4 損失函數 26
第四章深度與彩色影像的更快速區域卷積神經網路 28
4.1 不同架構的網路架構 28
4.1.1 4D-input Faster R-CNN 29
4.1.2 2-input Faster R-CNN 30
4.2 障礙物偵測 32
4.3 障礙物距離估計 32
第五章實驗結果 34
5.1 實驗環境與設備介紹 34
5.2 更快速區域卷積神經網路實驗結果 35
5.2.1 更快速區域卷積神經網路修改 36
5.2.2 訓練資料蒐集 36
5.2.3 訓練方法 39
5.2.4 實驗結果與分析 40
第六章結論與未來展望 57
參考文獻 58

參考文獻

[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, is.6, pp.1137-1149, 2016.
[2] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE MultiMedia, vol.19, no.2, pp.4-10, Feb. 2012.
[3] M. Sotelo, J. Barriga, D. Fernandez, I. Parra, J. Naranjo, M. Marron, S. Alvarez, and M. Gavilan, "Vision-based blind spot detection using optical flow," Lecture Notes in Computer Science, vol.4739, pp.1113-1118, 2007.
[4] C. Braillon, C. Pradalier, J. Crowley, C. Laugier, L. Gravir, and I. Rhone-alpes, “Real-time moving obstacle detection using optical flow models,” in Proc. Intelligent Vehicles Symp., Tokyo, Japan, Jun.13-15, 2006, pp.466-471.
[5] K. Yamaguchi, “Vehicle ego-motion estimation and moving object detection using a monocular camera,” in Proc. 18th Int. Conf. on Pattern Recognition, Hong Kong, China, Aug.22-24, 2006, pp.610-613.
[6] D. Hoiem, A. Efros, and M. Hebert, "Putting objects in perspective," Int. Journal of Computer Vision, vol.80, no.1, pp.3-15, 2008.
[7] A. Saxena, S. Chung, and A. Ng, "3-D depth reconstruction from a single still image," Int. Journal of Computer Vision, vol.76, no.1, pp.53-69, 2008.

[8] M. Collins, R. Schapire, and Y. Singer, “Logistic regression, adaboost and bregman distances,” in Proc. the 13th Annual Conf. on Computational Learning Theory, San Francisico, CA, Jun.27-Jul.1, 2000, pp.1-26.
[9] S. Zhang, C. Wang, S. Chan, X. Wei, and C. Ho, “New object detection, tracking, and recognition approaches for video surveillance over camera network,” IEEE Sensors Journal, vol.15, no.5, pp.2679-2691, 2015.
[10] D. Comaniciu, P. Meer, and S. Member, “Mean shift?: a robust approach toward feature space analysis,” IEEE Trans. on Pattern Anal. and Mach. Intell., vol.24, no.5, pp.603-619, 2002.
[11] Z. Zivkovic and F. DerHeijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognition Letters, vol.27, no.7, pp.773-780, 2006.
[12] S. Chan, B. Liao, K. Tsui, P. Road, and H. Kong, “Bayesian Kalman filtering, regularization and compressed sampling,” in Proc. IEEE Conf. on Circuits and Systems (MWSCAS), Seoul, South Korea, Aug.7-10, 2011, pp.1-4.
[13] H.-S. Sandhu, K.-J. Singh, and D.-S. Kapoor, “Automatic edge detection algorithm and area calculation for flame and fire images,” in Proc. IEEE Conf. on Cloud System and Big Data Engineering, Noida, India, Jan.14-15, 2016, pp.403-407.
[14] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems 2012 (NIPS 2012), Lake Tahoe, Nevada, Dec.3-8, 2012, pp.1-9.
[15] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. Journal of Computer Vision (IJCV), vol.60, is.2, pp.91-110, 2004.
[16] H. Bay, A. Ess, T. Tuytelaars, L. Gool, "SURF: Speeded up robust features", Computer Vision and Image Understanding (CVIU), vol.110, No.3, pp.346–359, 2008.
[17] C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. 4th Alvey Vision Conf., Manchester, UK, Aug.30-Sep.2, 1988, pp.147-152.
[18] J. Shi and C. Tomasi, “Good features to track” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, Jun.21-23, 1994, pp.593-600.
[19] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.779-788.
[20] W. Liu, D.Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, ”SSD: Single shot multibox detector,” in European Conf. on Computer Vision (ECCV), Amsterdam, Holland, Oct.8-16, 2016, pp.21-37.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, Jun.23-28, 2014, pp.580-587.
[22] R. Girshick, “Fast R-CNN,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[23] J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. Journal of Computer Vision (IJCV), vol.104, is.2, pp.154-171, 2013.
[24] J. Aceituno, R. Arnay, J. Toledo, and Leopoldo Acosta, “Using kinect on an autonomous vehicle for outdoors obstacle detection,” IEEE Sensor Journal, vol.16, no.10, May 15, 2016.
[25] J. Choi, D. Kim, H. Yoo, and K. Sohn, “Rear obstacle detection system based on depth from Kinect,” in Proc. 15th Int. IEEE Conf. Intelligent Transportation Systems (ITSC), Sep.16-19, 2012, pp. 98-101.
[26] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGBD images," in Proc. European Conf. on Computer Vision (ECCV), Florence, Italy, Oct.7-13, 2012, pp.746-760.
[27] S. Gupta, R. Girshick, P. Arbelaez, and J. Malik, "Learning rich features from RGB-D images for object detection and segmentation," in Proc. European Conf. on Computer Vision (ECCV), Zurich, Switzerland, Sep.6-12, 2014, pp.345-360.
[28] A. Eitel, J. Springenberg, L. Spinello, M. Riedmiller, W. Burgard, “Multimodal deep learning for robust RGB-D object recognition,” in Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Hamburg, Sep.28-Oct.2, 2015, pp.681-687.
[29] Z. Deng and L. Latecki, "Amodal detection of 3D objects: inferring 3D bounding boxes from 2D ones in RGB-depth images," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp.398-406.
[30] X. Xu, Y. Li, G. Wu, and J. Luo, "Multi-modal deep feature learning for RGB-D object detection," Pattern Recognition, vol.72, pp.300-313, 2017.
[31] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.8-10, 2015, pp.3431-3440.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.37, Is.9, pp.1904-1916, 2015.
[33] M. Zeiler and R.Fergus, “Visualizing and understanding convolutional networks,” in Proc. European Conf. on Computer Vision (ECCV), Zurich, Switzerland, Sep.6-12, 2014, pp.818-833.
[34] A. Krizhevsky, I. Sutskever and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in NIPS Proc. Int.l Conf. on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, Dec.03-06, 2012, pp.1097-1105.
[35] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proc. of the 22nd ACM Int. Conf. on Multimedia, Orlando, FL, 2014, pp.675-678.

指導教授

曾定章

審核日期

2018-8-1

推文