結合深度神經網路和VP決策樹的街道招牌偵測和檢索

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：46

、訪客IP：3.146.34.191

姓名

莊雲博(Yun-Bo Jhuang) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

結合深度神經網路和VP決策樹的街道招牌偵測和檢索
(Detection and Retrieval of Combining Deep Neural Network and VP Decision Tree in Street Signboard)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-1-14以後開放)

摘要(中)

視障人士自主生活與戶外行動的權利，受到感官能力的限制，對視障者而言是一個難以達成的任務。即便配合視覺輔具與無障礙措施，獲取行動的助益，但招牌的獨特性圖騰設計，傳統機器視覺較不易辨別。為此本研究針對街景商店影像，提出一個創新的招牌偵測與檢索系統，我們以神經網路進行偵測，找出街景上多個招牌物件位置，藉由招牌影像建立一個以圖像資料為索引的結構，使用影像均值雜湊、感知雜湊及差異雜湊算法，可以處理商業獨特性圖騰的識別及無數種商店招牌的種類，並結合VP決策樹迅速搜索的優勢，以檢索方式尋找最相似的特徵節點。街景招牌資料集使用自行蒐整建立的影像，系統的招牌定位模組召回率達為84%，而檢索模組Rank1及Rank5都能成功檢索命中，最後使用偵測與檢索整體實驗平均精確度達86%。本系統開發提供給視障者的視覺輔助，回饋當前店家的類型資訊，使視障朋友也能與常人一樣感知與行動決策。

摘要(英)

The rights of the visually impaired to live independently and to move outdoors are limited by their perception abilities, which is a difficult task for the visually impaired. Even with visual aids and barrier-free measures to help them living better, the unique totem design of the signboard is not easy to distinguish by traditional machine vision. Therefore, in this article we propose an innovative signboard detection and retrieval system for street view store images. We use neural network to detect multiple signboard object positions on the street view, and build a structure indexed by image data from the signboard images. Image mean hashing, perceptual hashing and difference hashing algorithms can handle the recognition of business unique totems and countless types of store signs, and combine the advantages of rapid search of the VP decision tree method to find the most similar feature matrix by retrieval. The Street View signboard dataset uses images created by self-searching. The system’s signboard positioning module recall rate reached 84%, and the retrieval modules Rank1 and Rank5 can successfully retrieve hits. Finally, the average accuracy of the overall experiment of detection and retrieval is up to 86%. This system is developed to provide visual aids for the visually impaired, and feedback the current store type information, so that the visually impaired friends can perceive and make decisions like normal people.

關鍵字(中)

★ 深度學習
★ 街景物件偵測
★ 招牌檢索

關鍵字(英)

論文目次

摘要 iv
ABSTRACT v
謝誌 vi
第1章緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 論文架構 3
第2章方法回顧 4
2.1 深度學習發展歷程 4
2.1.1 卷積神經網路 6
2.2 影像物件偵測 7
2.2.1 YOLO簡介與發展演進 8
2.3 基於雜湊的圖像相似度演算 16
2.3.1 平均雜湊函式 16
2.3.2 感知雜湊函式 17
2.3.3 差異雜湊函式 18
2.4 決策樹 19
2.4.1 最近鄰檢索 19
2.4.2 VP-Tree 20
第3章招牌偵測與檢索系統設計 23
3.1 MIAT系統設計方法論 23
3.1.1 IDEF0階層式架構 24
3.1.2 Grafcet離散事件建模 25
3.1.3 偵測與檢索系統設計 26
3.2 YOLO招牌定位模組設計 28
3.3 影像雜湊編碼模組設計 29
3.4 VP決策樹與檢索模組設計 31
3.5 系統程式合成 32
第4章系統整合與實驗 34
4.1 前置說明 34
4.1.1 開發環境 34
4.1.2 街景資料集 35
4.1.3 標記資料說明 37
4.2 影像定位模組 38
4.2.1 評估方式 39
4.2.2 定位模型訓練 40
4.3 影像檢索模組 43
4.3.1 縮放尺寸招牌圖像相似度比較 43
4.3.2 招牌類別的相似度差異 46
4.4 招牌偵測與檢索模組 48
4.4.1 招牌檢索實驗 48
4.4.2 偵測與檢索整合實驗 50
4.4.3 擴增招牌檢索實驗 53
第5章結論與未來展望 56
5.1 結論 56
5.2 未來展望 57
參考文獻 58
附錄 61

參考文獻

[1] B. Singh and M. Kapoor, "A Survey of Current Aids for Visually Impaired Persons," in 2018 3rd International Conference On Internet of Things: Smart Innovation and Usages (IoT-SIU), 2018, pp. 1-5.

[2] B. Jiang, J. Yang, Z. Lv, and H. Song, "Wearable Vision Assistance System Based on Binocular Sensors for Visually Impaired Users," IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1375-1383, 2019.

[3] M. Murali, S. Sharma, and N. Nagansure, "Reader and Object Detector for Blind," in 2020 International Conference on Communication and Signal Processing (ICCSP), 2020, pp. 0795-0798.

[4] T. Chuang et al., "Deep Trail-Following Robotic Guide Dog in Pedestrian Environments for People who are Blind and Visually Impaired - Learning from Virtual and Real Worlds," in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 5849-5855.

[5] T. N. Dinh, J. Park, and G. Lee, "Korean Text Detection and Binarization in Color Signboards," in 2008 International Conference on Advanced Language Processing and Web Information Technology, 2008, pp. 235-240.

[6] M. A. Panhwar, K. A. Memon, A. Abro, D. Zhongliang, S. A. Khuhro, and S. Memon, "Signboard Detection and Text Recognition Using Artificial Neural Networks," in 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), 2019, pp. 16-19.

[7] R. Saluja, A. Maheshwari, G. Ramakrishnan, P. Chaudhuri, and M. Carman, "OCR On-the-Go: Robust End-to-End Systems for Reading License Plates & Street Signs," in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 154-159.

[8] H. Kang and G. Lee, "Estimation of Effective Commercial Signboard Design Scores Based on Visual Perception," in 2019 IEEE International Symposium on Multimedia (ISM), 2019, pp. 116-1163.

[9] C. Hong, C. Lin, and T. K. Shih, "Automatic Signboard Detection and Semi-Automatic Ground Truth Generation," in 2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media), 2019, pp. 256-261.

[10] S. B. Maind, P. J. I. J. o. R. Wankar, I. T. i. Computing, and Communication, "Research paper on basic of artificial neural network," vol. 2, no. 1, pp. 96-100, 2014.

[11] W. S. McCulloch and W. J. T. b. o. m. b. Pitts, "A logical calculus of the ideas immanent in nervous activity," vol. 5, no. 4, pp. 115-133, 1943.

[12] Y. LeCun, L. Bottou, Y. Bengio, and P. J. P. o. t. I. Haffner, "Gradient-based learning applied to document recognition," vol. 86, no. 11, pp. 2278-2324, 1998.

[13] A. Krizhevsky, I. Sutskever, and G. E. J. C. o. t. A. Hinton, "Imagenet classification with deep convolutional neural networks," vol. 60, no. 6, pp. 84-90, 2017.

[14] W. Liu et al., "Ssd: Single shot multibox detector," in European conference on computer vision, 2016, pp. 21-37: Springer.

[15] R. Sarić, M. Ulbricht, M. Krstić, J. Kevrić, and D. Jokić, "Recognition of Objects in the Urban Environment using R-CNN and YOLO Deep Learning Algorithms," in 2020 9th Mediterranean Conference on Embedded Computing (MECO), 2020, pp. 1-4.

[16] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.

[17] Z. Zuo, K. Yu, Q. Zhou, X. Wang, and T. Li, "Traffic Signs Detection Based on Faster R-CNN," in 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), 2017, pp. 286-288.

[18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.

[19] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.

[20] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.

[21] Redmon, Joseph, and Ali Farhadi ,"Yolov3: An incremental improvement", arXiv preprint arXiv:1804.02767, 2018.

[22] Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao , "Yolov4: Optimal speed and accuracy of object detection", arXiv preprint arXiv:2004.10934 , 2020.

[23] Misra, Diganta. "Mish: A self regularized non-
monotonic neural activation function." arXiv preprint arXiv:1908.08681 , 2019.

[24] M. Rehman, M. Iqbal, M. Sharif, and M. J. W. A. S. J. Raza, "Content based image retrieval: survey," vol. 19, no. 3, pp. 404-412, 2012.

[25] D. N. Krawetz, THE HACKER FACTOR BLOG—Kind of Like That, 2013 [Online]. Available: http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html .

[26] Breiman, Leo and Friedman, Jerome H and Olshen, Richard A and Stone, Charles J, "Classification and regression trees", Wadsworth & Brooks/Cole Advanced Books & Software, 1984.

[27] F. Zaklouta, B. Stanciulescu, and O. Hamdoun, "Traffic sign classification using K-d trees and Random Forests," in The 2011 International Joint Conference on Neural Networks, 2011, pp. 2151-2155.

[28] J. Springer, X. Xin, Z. Li, J. Watt, and A. Katsaggelos, "Forest hashing: Expediting large scale image retrieval," in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 1681-1684.

[29] P. N. Yianilos, "Data structures and algorithms for nearest neighbor search in general metric spaces," in Soda, 1993, vol. 93, no. 194, pp. 311-21.

[30] D. Jiang, H. Sun, J. Yi, and X. Zhao, "The research on nearest neighbor search algorithm based on vantage point tree," in 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), 2017, pp. 354-357.

[31] J. K. Uhlmann, "Satisfying general proximity / similarity queries with metric trees," Information Processing Letters, vol. 40, no. 4, pp. 175-179, 1991/11/25/ 1991.

[32] C.-H. Chen, C.-M. Kuo, C.-Y. Chen, and J.-H. Dai, "The design and synthesis using hierarchical robotic discrete-event modeling," Journal of Vibration and Control, vol. 19, pp. 1603-1613, 08/01 2013.

[33] C.-H. Chen and J.-H. Dai, "Design and high-level synthesis of hybrid controller," in IEEE International Conference on Networking, Sensing and Control, 2004, 2004, vol. 1, pp. 433-438: IEEE.

[34] R. David, "Grafcet: a powerful tool for specification of logic controllers," IEEE Transactions on Control Systems Technology, vol. 3, no. 3, pp. 253-268, 1995.

指導教授

陳慶瀚

審核日期

2021-1-26

推文