基於圖卷積網路的自動門檢測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：3.137.176.28

姓名

羅昌威(Chanwit Loakhajorn) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於圖卷積網路的自動門檢測
(Automatic Door Detection based on Graph Convolution Network)

相關論文

★ 基於注意力之用於物件定位的語義分割方法	★ 以多模態時空域建模的深度學習方法分類影像中的動態模式
★ 基於職業技能和教育視訊之學習內容生成與總結方法

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

中文摘要
這項研究可以引導機器人找到入口或自動門，也可以引導盲人從遠處到自動門附
近。如今，物件檢測模型的效果已經非常強大，高準確度，高幀率，但主要問題是難以
區分便利商店的自動門和玻璃，因此我們使用圖卷積網絡（GCN）來提高準確度。我們
的想法是使用 GCN 模型並透過周遭的物件來辨識自動門。這個系統分為兩個部分，物
件偵測與物件關聯。物件偵測的部分，我們使用高準確度及高幀率的現有模型。YOLOv4
是今年提出最佳的物件偵測模型，但僅使用 YOLOv4 還是會有大量誤判的狀況，因此需
要透過我們提出的方法來改善。物件關聯部分我們提出了結合 GCN 的全連接層。接著，
在開始物件關聯前，我們需要先將物件偵測的結果轉成圖的架構。我們在 GTX 1080 上
進行訓練，並在 AGX 上測試模型。我們的資料集是一個自製的資料集，我們從 Google
街景中收集並錄製位於台灣的街景影片。該資料集由 100 多家便利商店組成，其中包含
內部和外部環境，我們可以透過 GCN 來減少誤偵測的狀況。在我們的測試資料集中獲
得了 86％的準確度。實際以影片或在真實環境測試，我們的模型可以達到 5 FPS 左右，
證明我們提出的模型可以找到自動門。為了證實我們的模型可以解決問題，我們展示了
實驗部分的結果與模型的運作方式

摘要(英)

Abstract
This research purpose is to navigate robots to find doors and help blind people find
entrances and exits even across a room. Nowadays, the object detection model is very
powerful, high accuracy, and high frame rate but the main problem is hard to distinguish
between glass doors and glass walls in convenience stores. To solve this problem, we use
Graph Convolutional Network (GCN) to improve accuracy results. The idea is we use the
GCN model to identify the entrance by using surrounding objects. The system consists of two
parts, object detector and association part. For the object detector part, we use the advantage
of public models for high accuracy and frame rate per second. The YOLOv4 is the new model
this year and is state of the art compared with the previous model. But YOLOv4 still has
wrong detection so it needs to use our proposed model to fix it. For the association part is our
proposed model, Fully Connected Layer (FC) combined with GCN. Then we need to convert
the output of the object detector to graph structure first before input to association. We train
on GTX 1080 and test the real-time models on the AGX broad. My dataset is custom,
collecting from google street view and recording video from Taiwan. The dataset consists of
more than 100 convenience stores, including inside and outside environments. We can reduce
some wrong detection from the object detector to achieve this. We get test accuracy 86
percent from our test set. However, we are testing on the video to demo and it showed our
result gets fps around 5 frames and it can prove our proposed model can find the doors. To
confirm our model over the problem, we demonstrate in the experiment part and showed how
the model works.

關鍵字(中)

★ 深度學習
★ 電腦視覺
★ 圖卷積網路
★ 自動門偵測

關鍵字(英)

★ Deep Learning
★ Computer Vision
★ Graph Convolution Network
★ Door Detection

論文目次

Table of Contents
中文摘要 ..................................................................................................................................... i
Abstract ....................................................................................................................................... ii
Acknowledgments ..................................................................................................................... iii
Table of Contents ...................................................................................................................... iv
List of Figures ............................................................................................................................ vi
List of Tables ........................................................................................................................... viii
Chapter 1 Introduction ................................................................................................................ 1
1.1 Overview ...................................................................................................................... 1
1.2 Problem Definition ...................................................................................................... 3
1.3 Scope and Limitation ................................................................................................... 3
1.4 Thesis Structure ........................................................................................................... 3
Chapter 2 Related work .............................................................................................................. 4
2.1 Deep learning .................................................................................................................... 4
2.1.1 Convolutional neural networks (CNN) .................................................................................. 4
2.1.2 Fully Connected Layer (FC) .................................................................................................. 4
2.1.3 Activation Function ................................................................................................................ 5
2.1.4 Dropout layer.......................................................................................................................... 6
2.2 Object Detection model. ................................................................................................... 7
2.2.1 YOLOv4: Optimal Speed and Accuracy of Object Detection ............................................... 7
2.2.2 YOLOv3 model .................................................................................................................... 13
2.2.3 Spatial Relation Recognition (SRR)..................................................................................... 15
v
2.3 Find Relation using Graph Convolutional Network (GCN) ........................................... 16
Chapter 3 Proposed Approach .................................................................................................. 20
3.1 Object Detection part ...................................................................................................... 21
3.2 Association part .............................................................................................................. 21
3.3 Dataset ............................................................................................................................ 25
Chapter 4 Experimental Result ................................................................................................. 30
4.1 Analysis object detector part .......................................................................................... 30
4.2 Analysis Association part ............................................................................................... 32
4.3 Analysis of combined ..................................................................................................... 35
4.4 Analysis Result ............................................................................................................... 37
Chapter 5 Conclusion and Future work .................................................................................... 41
5.1 Conclusion ...................................................................................................................... 41
5.2 Future work .................................................................................................................... 41
References ................................................................................................................................ 42

參考文獻

[1] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1440-1448. [2] K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2980-2988. [3] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788. [4] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6517-6525. [5] Joseph Redmon, Ali Farhadi, “YOLOv3: An Incremental Improvement,” IEEE Conference on Computer Vision and Pattern Recognition, 2018, arXiv preprint arXiv:1804.0276. [6] S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, "Path Aggregation Network for Instance Segmentation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 8759-8768. [7] Vinod Nair,Geoffrey E Hinton, “Rectified linear units improve restricted boltzmann machines,” In Proceedings of International Conference on Machine Learning (ICML), 2010, pages 807–814. [8] Shruti Jadon, “Activation Function,” 9 July 2020, taken from https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for -deep-learning-9689331ba092 [9] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, “DropOut: A simple way to prevent neural networks from overfitting,” The journal of machine learning research, 2014, 15(1):1929–1958. [10] Educative.io, “What is dropout in neural networks?,” 9 July 2020, taken from https://www.educative.io/edpresso/what-is-dropout-in-neural-networks [11] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June
43
2017. [12] Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, I-Hau Yeh. “CSPNet: A new backbone that can enhance learning capability of cnn,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPR Workshop), 2020, arXiv preprint arXiv:1911.11929. [13] K. He, X. Zhang, S. Ren and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015. [14] Ayoosh Kathuria, “What’s new in YOLO v3?,” 9 Junly 2020, taken from https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b [15] Kaiyu Yang, Olga Russakovsky, Jia Deng, “SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition,” Computer Vision and Pattern Recognition, 2019, arXiv preprint arXiv:1908.02660. [17] Thomas KipfThomas, “GRAPH CONVOLUTIONAL NETWORKS,” 9 July 2020, taken from https://tkipf.github.io/graph-convolutional-networks/ [16] David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre Rafael Gomez-Bombarelli, Timothy Hirzel, Al ´ an Aspuru-Guzik, Ryan P. Adams, “Convolutional Networks on Graphs for Learning Molecular Fingerprints,” Advances in Neural Information Processing Systems 28, 2015, arXiv preprint arXiv:1509.09292. [18] Thomas N. Kipf, “Semi-Supervised Classification with Graph Convolutional Networks,” International Conference on Learning Representations, 2017, arXiv preprint arXiv:1609.02907. [19] Sergey Ioffe and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 2015, arXiv preprint arXiv:1502.03167. [20] Wangpeng He, Zhe Huang, Zhifei Wei, Cheng Li and Baolong Gu, “TF‐YOLO: An Improved Incremental Network for Real‐Time Object Detection,” Applied Science 2019, 9, 3225; doi:10.3390/app9163225

指導教授

施國琛教授(Timothy K.Shih)

審核日期

2020-7-14

推文