街景招牌文字辨識與導盲應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：133

、訪客IP：18.220.140.5

姓名

呂紹賓(LYU,SHAO-BIN) 查詢紙本館藏

畢業系所

通訊工程學系在職專班

論文名稱

街景招牌文字辨識與導盲應用
(Application of Character Recognition and Guide Blindness of Street View Signboard)

相關論文

★ 應用深度學習OCR於兒童閱讀管理

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-1-14以後開放)

摘要(中)

導盲系統的架構，經常搭配視覺偵測與前導機器人應用，來作為障礙物的偵測與行徑路線規劃，本研究提出一個可以增加視覺資料應用性的招牌文字辨識系統，使收集來的影像資料，可以達到更好的應用效果，讓影像不只可以偵測障礙物，同時還可藉由影像上的文字訊息，得知周遭的環境狀態。首先會將街景影像中的招牌圖片切割出來，使文字偵測的區塊可以縮小，並針對招牌文字使用CRAFT 文字偵測模型，找出圖片中擁有文字像素的範圍，有了文字區塊圖片，就可以運用Tesseract 文字辨識模型讀出文字的內容，最後得出的文字資料，可交給電腦或微控制器使用，並結合導盲系統的應用，使盲人朋友們可以清楚知道周圍有哪些商家。

摘要(英)

The architecture of a guide system, often in combination with visual detection and leading robot applications, as a barrier detection and route planning. This study proposed a signboard text recognition system that would enhance the application of visual data. The system would enable the collection of video data to achieve better application results so that the image could not only detect obstacles but also use text messages on the images to identify the surrounding environment. First, you can cut out the picture of a street scene so that the block of text detection can be reduced. Then, the CRAFT text detection model is used to find out the range of text pixels in the picture. With the image, the Tesseract text recognition model is applied to read the text. The final textual information can be used by computers or microcontrollers, combined with the architecture of a guide system to improve the identification of the text messages, so those blind friends can clearly know which shops are around them.

關鍵字(中)

★ 文字辨識

關鍵字(英)

論文目次

摘要 I
Abstract II
致謝 III
目錄 1
圖目錄 3
表目錄 5
第一章、緒論 1
1.1研究背景與動機 1
1.2研究目標 3
1.3論文架構 3
第二章、方法回顧 4
2.1 YOLO物件偵測 4
2.2 CRAFT文字偵測 4
2.2.1 熱力值分數 5
2.2.2 CRAFT模型架構 8
2.2.3 弱監督學習 9
2.2.4 產生文本框 9
2.3 Tesseract OCR文字辨識 10
2.3.1 Tesseract 4.0 11
2.3.2 Pytesseract 15
第三章、街景招牌文字辨識與導盲系統設計 16
3.1 MIAT系統設計方法論 16
3.2招牌文字辨識與導盲系統架構 17
3.3招牌文字辨識系統架構 19
3.4文字辨識系統架構 20
3.4.1 CRAFT文字偵測 22
3.4.2 Tesseract文字辨識 22
3.5導盲眼鏡 23
3.5.1導盲系統設計 24
3.5.2智能導盲杖 24
第四章、實驗 26
4.1 實驗環境 26
4.2 評估方法 28
4.2.1偵測率評估方法 28
4.2.2辨識率評估方法 30
4.3辨識性能實驗 30
4.3.1文字偵測率 31
4.3.2文字辨識率 32
第五章、結論與未來展望 34
5.1 結論 34
5.2 未來展望 35
參考文獻 36
附錄一 39

參考文獻

[1] Y. Qin and Z. Zhang, "Summary of Scene Text Detection and Recognition," in 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), 9-13 Nov. 2020, pp. 85-89.
[2] M. A. Panhwar, K. A. Memon, A. Abro, D. Zhongliang, S. A. Khuhro, and S. Memon, "Signboard Detection and Text Recognition Using Artificial Neural Networks," in 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), 12-14 July 2019, pp. 16-19.
[3] A. Tourani, A. Shahbahrami, S. Soroori, S. Khazaee, and C. Y. Suen, "A Robust Deep Learning Approach for Automatic Iranian Vehicle License Plate Detection and Recognition for Surveillance Systems," IEEE Access, vol. 8, pp. 201317-201330, 2020.
[4] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, "Detecting text in natural image with connectionist text proposal network," in European conference on computer vision, 2016: Springer, pp. 56-72.
[5] X. Zhou et al., "EAST: An Efficient and Accurate Scene Text Detector," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, pp. 2642-2651.
[6] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369-376.
[7] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character region awareness for text detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9365-9374.
[8] R. Smith, "An Overview of the Tesseract OCR Engine," in Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 23-26 Sept. 2007, vol. 2, pp. 629-633.
[9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[10] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
[11] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized intersection over union: A metric and a loss for bounding box regression," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 658-666.
[12] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265-283.
[13] A. Paszke et al., "Pytorch: An imperative style, high-performance deep learning library," in Advances in neural information processing systems, 2019, pp. 8026-8037.
[14] I. Kang, "Clova: Services and Devices Powered by AI," in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018, pp. 1359-1359.
[15] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, "Textboxes: A fast text detector with a single deep neural network," arXiv preprint arXiv:1611.06779, 2016.
[16] Y. Liu and L. Jin, "Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, pp. 3454-3461.
[17] M. Liao, Z. Zhu, B. Shi, G. Xia, and X. Bai, "Rotation-Sensitive Regression for Oriented Scene Text Detection," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018, pp. 5909-5918.
[18] D. He et al., "Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, pp. 474-483.
[19] C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao, "Scene text detection via holistic, multi-channel prediction," arXiv preprint arXiv:1606.09002, 2016.
[20] D. Deng, H. Liu, X. Li, and D. Cai, "Pixellink: Detecting scene text via instance segmentation," arXiv preprint arXiv:1801.01315, 2018.
[21] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, "Single Shot Text Detector with Regional Attention," in 2017 IEEE International Conference on Computer Vision (ICCV), 22-29 Oct. 2017, pp. 3066-3074.
[22] S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, "Textsnake: A flexible representation for detecting text of arbitrary shapes," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 20-36.
[23] X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, and J. Yan, "FOTS: Fast Oriented Text Spotting with a Unified Network," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018, pp. 5676-5685.
[24] T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, and C. Sun, "An End-to-End TextSpotter with Explicit Alignment and Attention," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018, pp. 5020-5029.
[25] P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, "Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 67-83.
[26] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[27] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
[28] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[29] A. Graves, "Generating sequences with recurrent neural networks," arXiv preprint arXiv:1308.0850, 2013.
[30] 莊雲博, "結合深度神經網路和VP決策樹的街道招牌偵測和檢索," National Central University, 2021.
[31] H. Murata and T. Yokota, "Image synthesizing system with surface data perspective transformation," ed: Google Patents, 1996.
[32] M.-H. Wang, "基於深度學習之道路資訊辨識導盲系統," National Central University, 2020.

指導教授

陳永芳陳慶瀚

審核日期

2021-1-20

推文