基於多尺度預測和循環對抗網路的招牌檢測與識別方法之研製

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：26

、訪客IP：3.133.109.251

姓名

林冠宏(Kuan-Hung Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於多尺度預測和循環對抗網路的招牌檢測與識別方法之研製
(Signboard detection and recognition deep learning modeling based on multiscale prediction and CycleGAN)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

物件偵測在電腦視覺任務上是一個很熱門的領域，此技術被使用在許多領域上。為了在提高預測精確度的同時也要保證執行速度在物件檢測上是一個很大的挑戰。有許多專家、學者已致力於這項任務上並提出了許多方法，使得物件檢測的方法日益成熟。
在物件檢測中的大多資料集其背景相當複雜，使得模型沒有檢測到目標物件或者發生誤判的情形，為了要解決檢測遺漏有許多方法被提出，例如特徵金字塔網路、多尺度預測和注意力模組等，但極少有方法用以解決將背景誤判為目標物件上。在本文中我們提出了一個兩階段訓練方式的物件檢測模型，用以使用在臺灣街景招牌資料集上，此方法添加了部份語意分割技巧且無須使用到像素間的標記，解決由於大多招牌形狀極為相似而引發的誤判情況。此外我們將此方法進一步的改良使其成為一階段的物件檢測模型，使它的預測結果更加穩定且易於訓練。

摘要(英)

Object detection is a popular computer vision task in deep learning and the technique is widely used in many fields. To improve the precision of the models while ensuring the inference time is a big challenge. Many experts and scholars have invested in this works and proposed lots of methods to solve this problem, making object detection become more and more mature.
The scenes in most object detection datasets are very complicated so that the model cannot detect the objects or it might regard background as an object. To conquer miss detection, lots of methods are proposed like Feature Pyramid Network, multi-scales prediction and attention module. However, there are few methods to prevent the models from misjudging non-objects to objects. In this thesis, we propose a two-phase training method used for Taiwan Street View Signboard Dataset. The model is added with some techniques from segmentation without pixel-to-pixel labeling, solving misjudgments caused by the similar shapes of various signboards. We further improve the method into a one-stage detection model, make the model to be more stable and easier for training.

關鍵字(中)

★ 深度學習
★ 物件檢測
★ 招牌辨識

關鍵字(英)

★ deep learning
★ object detection
★ signboard recognition

論文目次

1 Introduction 1
2 Related work 3
2.1 Features Extraction Methods 3
2.1.1 AlexNet 3
2.1.2 VGGNet 4
2.1.3 Residual Neural Network 5
2.2 Object Detection 6
2.2.1 Two-Stage Detector 7
2.2.2 One-Stage Detector 11
2.3 Segmentation 19
2.3.1 Fully Convolutional Networks 19
2.3.2 U-Net 20
3 Proposed Method 21
3.1 Two-Phase Training Methods 21
3.1.1 Bounding Boxes Proposal 21
3.1.2 CycleGAN 23
3.1.3 Region Category Checking 25
3.2 Proposed One-Stage Detector 32
3.2.1 Multi-Scales Prediction 32
3.2.2 Ground Truth for Segmentation 34
3.2.3 Segmentation Approach 35
4 Experimental Results 40
4.1 Datasets 40
4.2 Training 41
4.3 Testing 42
5 Conclusion 45
6 Reference 46

參考文獻

[1] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1 Nov. 1998
[2] A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks.” In NIPS, 2012.
[3] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015.
[4] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
[5] FN Iandola, S Han, MW Moskewicz, K Ashraf, WJ Dally, K Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size," arXiv:1602.07360, 2016
[6] Howard, Andrew G., Zhu, Menglong, Chen, Bo, Kalenichenko, Dmitry, Wang, Weijun, Weyand, Tobias, Andreetto, Marco and Adam, Hartwig, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arxiv:1704.04861, 2017
[7] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, "Feature Pyramid Networks for Object Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, 2017.
[8] Wei Liu Dragomir Anguelov Dumitru Erhan Christian Szegedy Scott Reed, Cheng-Yang Fu, Alexander C. Berg, "SSD: Single Shot MultiBox Detector," Proceedings of the European Conference on Computer Vision (ECCV), 2016.
[9] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
[10] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517-6525, 2017.
[11] J. Redmon and A Farhadi, "Yolov3: An incremental improvement," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[12] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv:2004.10934, 2020.
[13] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
[14] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440-1448, 2015.
[15] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.
[16] C. Hong, C. Lin and T. K. Shih, "Automatic Signboard Detection and Semi-Automatic Ground Truth Generation," 2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media), pp. 256-261, 2019.
[17] C. Wang, H. Mark Liao, Y. Wu, P. Chen, J. Hsieh and I. Yeh, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571-1580, 2020.
[18] K. He, X. Zhang, S. Ren and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[19] Woo S., Park J., Lee JY., Kweon I.S, "CBAM: Convolutional Block Attention Module," Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211, 2018.
[20] S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, "Path Aggregation Network for Instance Segmentation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, 2018.
[21] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431-3440, 2015.
[22] Ronneberger O., Fischer P., Brox T, "U-Net: Convolutional Networks for Biomedical Image Segmentation," Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.
[23] J. Zhu, T. Park, P. Isola and A. A. Efros, "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242-2251, 2017.

指導教授

施國琛(Timothy K Shih)

審核日期

2021-1-15

推文