基於物件檢測的招牌辨識及半自動訓練資料產生器

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：87

、訪客IP：3.135.205.190

姓名

洪晨雅(Chen-Ya Hong) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於物件檢測的招牌辨識及半自動訓練資料產生器
(Object detection for signboard recognition and semi-automatic ground truth generator)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

資料驅動的物件檢測技術廣泛應用於各種實際的領域。現今，許多研究專案都是提出改善關於電腦視覺應用的精確度。在這篇論文中，我們提出一個自動的招牌檢測方法以及一個半自動訓練資料產生的方法，以此幫助視覺障礙人士在臺灣的街道上行走。我們認為，當視覺障礙人士行走在街道上時，他們可能會對某些特定商店感興趣。然而，目前並沒有足夠的關於臺灣商店招牌的資料集。因此，我們收集了14種在人們日常生活中較常見的商店的影像。從臺灣數個主要縣市收集超過九百萬張的街道影像，其中只有大約百分之一含有招牌。所以，我們提出一個物件檢測模型可以預先標注不確定的樣本。我們也基於這個模型設計一個流程以便達到半自動訓練資料產生的目的。
我們提出的物件檢測網路是基於Darknet-19這個架構，並且透過引進數種技術改善其精確度，例如，擴張模塊、非局部模塊以及通道注意力。擴張模塊以及非局部模塊的引入都是為了增加感受野，以便獲得更多資訊進而改善網路的精確度。我們也引進通道注意力機制賦予不同通道的特徵圖不同的權重，這個方法進一步改善了精確度。我們所提出的物件檢測網路可以達到91%的精確度，且其速度可達21 FPS。
半自動的訓練資料產生方法包含數個應用程式，例如，Google地圖工具、我們提出的檢測網路以及編輯工具。Google地圖工具是用來收集街道影像成為原始資料。檢測網路是用來過濾含有招牌的影像。編輯工具是用來驗證被過濾出的影像的正確性。
這篇論文的目的是要提供一個收集訓練資料的方法，並且減少在時間及人力資源上的重大負擔。

摘要(英)

Data-driven object detection techniques are widely applied to a variety of practical areas. Nowadays, many research projects have been proposed to improve the accuracy of computer vision applications. In this paper, we propose an automatic signboard detection method and a semi-automatic ground truth generation method to help visually impaired people walk on streets in Taiwan. We consider that when visually impaired people walk down the street, they may be interested in certain stores. However, there is no enough public dataset for signboards of Taiwanese stores. Therefore, we collect images of 14 kinds of the most popular stores in people’s daily lives. The collected street images number over 9 million from several major cities in Taiwan; however, only about 1% of images contain a signboard. We propose an object detection module to pre-label uncertain samples. Based on this module, we also design a process so that semi-automatic ground truth generation can be achieved.
Our proposed object detection network is based on Darknet-19 and we improve it by introducing several techniques, such as the dilated block, the non-local block and channel attention. The dilated block and the non-local block are introduced to increase the receptive field for the purpose of getting more information so that the accuracy of the network will be improve. We also introduce the mechanism of channel attention to give different weights for feature maps of different channels. This method can improve the accuracy again. Our proposed object detection network can achieve the accuracy of 91 % and the speed is 21 FPS.
The semi-automatic ground truth generation contain several applications, such as Google Maps tool, proposed detection network and the editing tool. Google Maps tool is used to collect street images as our raw data. The proposed detection network is used to filter the images which contains signboards. The editing tool is used to verify the correctness of filtered images.
The purpose of this paper is to propose the method of ground truths data collection and reduces significant effort in terms of time and human resources.

關鍵字(中)

★ 深度學習
★ 物件檢測
★ 招牌辨識

關鍵字(英)

★ deep learning
★ object detection
★ signboard recognition

論文目次

1. Introduction 1
2. Related Work 6
2.1 Feature extractor 6
2.1.1 AlexNet 6
2.1.2 VGG 7
2.1.3 ResNet 8
2.1.4 MobileNets and SqueezeNet 8
2.2 object detection network 11
2.2.1 The series of RCNN (Region-based convolutional neural network) 11
2.2.2 Single Shot MultiBox Detector 13
2.2.3 YOLO series 14
2.3 Feature Pyramid Network 17
2.4 DetNet 19
2.5 Labelimg 20
3. Proposed Method 22
3.1 Proposed object detection network 22
3.2 Dilated block 23
3.3 Non-local block 24
3.4 Channel attention 26
4. Efficient Data Collection 27
4.1 Preprocessing for training data 28
4.2 Google Maps tool 30
4.3 Detection method 31
4.4 Editing tool 31
5. Experimental Results 33
5.1 Implementation 33
5.1.1 Training 33
5.1.2 Testing 36
5.2 Comparison on basic networks 39
5.3 Concatenation and addition 40
5.4 Max pooling and space to depth 42
5.5 Experiments about dilated block 43
5.6 Experiments about non-local block 45
5.7 channel attention 50
5.8 Problem 52
6. Conclusion 53
7. Future Work 55
8. Reference 58

參考文獻

[1] Tzutalin, “LabelImg,” Git code (2015). Available: https://github.com/tzutalin/labelImg
[2] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," The 25th International Conference on Neural Information Processing Systems (NIPS′12), Lake Tahoe, U.S.A., 2012, vol.1, pp. 1097-1105.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014.
[4] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with convolutions.” CVPR, 2015.
[5] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 770-778.
[6] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint, arXiv:1704.04861, 2017.
[7] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” arXiv preprint, arXiv:1602.07360, 2016.
[8] A. Wong, M. J. Shafiee, F. Li and B. Chwyl, “Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection” arXiv preprint, arXiv: 1802.06488, 2018.
[9] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. C. Berg, “SSD: Single shot multibox detector,” The 14th European Conference on Computer Vision (ECCV), Amsterdam, the Netherlands, 2016, pp.21-37.
[10] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: unified, real-time object detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 779-788.
[11] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, U.S.A., 2014, pp. 580-587.
[12] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers and A. W. M. Smeulders, " Selective search for object recognition," in International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, 1 September, 2013.
[13] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, U.S.A., 2015, pp. 1440-1448.
[14] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017.
[15] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, “Feature Pyramid Networks for Object Detection” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, U.S.A., 2017, pp. 2117-2125
[16] J. Redmon and A. Farhadi, "YOLO9000: Better, faster, stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, U.S.A., 2017, pp. 6517-6525.
[17] M. Lin, Q. Chen and S. Yan, “Network In Network,” arXiv preprint, arXiv:1312.4400, 2014.
[18] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint, arXiv:1804.02767, 2018.
[19] T. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, “Focal Loss for Dense Object Detection” 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2980-2988
[20] K. He, X. Zhang, S. Ren and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” The 12th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 2014, pp. 346-361.
[21] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng and J. Sun, “DetNet: Design Backbone for Object Detection” The European Conference on Computer Vision (ECCV), 2018, pp. 334-350
[22] F. Yu and V. Koltun "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122, 2015.
[23] X. Wang, R. Girshick, A. Gupta and K. He, “Non-local Neural Networks” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7794-7803
[24] S. Woo, J. Park, J. Lee and I. S. Kweon, “CBAM: Convolutional Block Attention Module” The European Conference on Computer Vision (ECCV), 2018, pp. 3-19

指導教授

施國琛(Timothy K Shih)

審核日期

2019-7-15

推文