使用階層式全卷積神經網路偵測街景文字

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：3

、訪客IP：3.17.159.238

姓名

張博崴(Po-Wei Chang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用階層式全卷積神經網路偵測街景文字
(Text Detection in Street View Images with Hierarchical Fully Convolution Neural Networks)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

考量街景圖像中所出現的交通路牌與商家招牌等傳達了重要的影像相關資訊，本研究提出街景影像之招牌/路牌偵測機制，於其中定位文字與圖形區域。研究的挑戰在於街景影像常包含與文字紋理相似的雜亂背景，且畫面中的招牌或路牌可能遭到其他物體遮蔽，天候、光線和拍攝角度等因素亦增加偵測的困難。此外，中文字能夠以垂直和水平方式書寫，因此必須能夠偵測這些不同方向的文字並加以區分。我們所提出的偵測機制分成兩個部分，第一部分定位影像中的路牌及招牌所屬區域，採用全卷積網路(Fully Convolutional Network, FCN)訓練街景路牌及招牌偵測模型，將偵測的招路牌視為感興趣區域(Region of Interest, ROI)。第二部分則於ROI中擷取文字及商標，我們使用區域候選網絡(Region Proposal Network, RPN)訓練文字偵測模型，藉此對影像分別做水平與垂直的文字串偵測，再根據第一部分所偵測的ROI，減少RPN對文字的錯誤偵測。最後我們進行後處理以結合水平及垂直文字串，排除錯誤偵測和處理文字串的複雜交集情形，以文字串長寬比、面積、交集情況、招牌背景顏色等來判定有效的區域。實驗結果顯示本研究能有效的在複雜街景畫面中找出招/路牌並偵測文字與圖案區域，並探討兩種不同架構的深度學習網路在此應用中的使用方式。

摘要(英)

Considering that traffic/shop signs appearing in street view images contain important visual information such as locations of scenes, effects of advertising on billboards, and the information of store, etc., a text/graph detection mechanism in street view images is proposed in this research. However, many of these objects in street view images are not easy to extract with a fixed template. In addition, street view images often contain cluttered backgrounds such as buildings or trees, which may block some parts of the signs, complicating the related detection. Weather, light conditions and filming angle may also increase the challenges. Another issue is related to the Chinese writing style as the characters can be written vertically or horizontally. Detecting different directions of text-lines is one of the contributions in this research. The proposed detection mechanism is divided into two parts. A fully convolutional network (FCN) is used to train a detection model for effectively locating the positions of signs in street view images, which will be viewed as the regions of interest. The text-lines and graphs in the sign regions can then be successfully extracted by Region Proposal Network (RPN). Finally, post-processing is applied to distinguish horizontal and vertical text-lines, and eliminate false detections. Experimental results show the feasibility of the proposed scheme, especially when complex street views are investigated.

關鍵字(中)

★ 文字偵測
★ 招牌偵測
★ 街景
★ 全卷積神經網路
★ 區域候選網絡

關鍵字(英)

★ text detection
★ sign detection
★ street view
★ fully convolutional network
★ region proposal network

論文目次

論文摘要 I
Abstract II
誌謝 III
目錄 IV
附圖目錄 VI
表格目錄 VIII
第一章緒論 1
1.1 研究動機 1
1.2 研究貢獻 4
1.3 論文架構 5
第二章文字偵測和深度學習相關研究 6
2.1 傳統影像處理之方法 7
2.2 深度學習及其常見的模型 9
2.2.1 網路架構簡介 10
2.2.2 網路架構之應用 11
2.3 深度學習文字偵測比較與應用 13
2.3.1 文字偵測優點與比較 13
2.3.2 文字偵測之應用 14
第三章提出方法 16
3.1 招牌/路牌之偵測 17
3.1.1 全卷積神經網路介紹 17
3.1.2 空洞卷積介紹 18
3.1.3 網路模型建立與其訓練流程 20
3.2 招牌文字偵測與定位 25
3.2.1 候選區域網路架構介紹 26
3.2.2 網路模型建立與其訓練流程 29
3.2.3 網路模型之運用及後處理 33
3.2.4 修正和排除文字區塊錯誤偵測 38
第四章實驗結果 43
4.1 招牌/路牌偵測之網路訓練與結果分析 43
4.2 文字偵測網路結果比較 48
4.3 不同情境之偵測結果 55
第五章結論與未來展望 60
參考文獻 62

參考文獻

[1] A. Coates, B. CarFenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wu, A. Y. Ng, “Text detection and character recognition in scene images with unsupervised feature learning.” IEEE International Conference on Document Analysis and Recognition, pp. 440–445, 2011.
[2] T. Wang, D. J. Wu, A. Coates, A. Y. Ng, “End-to-end text recognition with convolutional neural network.” IEEE International Conference on Pattern Recognition (ICPR), 2012.
[3] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature,vol. 529(7587), pp.484-489, 2016.
[4] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
[5] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.
[6] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in IEEE International Conference on Computer Vision (ICCV), 2011.
[7] Y.-F. Pan, X. Hou, and C.-L. Liu, “Hybrid approach to detect and localize texts in natural scene images,” IEEE Trans. Image Processing (TIP), vol. 20, pp. 800–813, 2011.
[8] J. J. Lee, P. H. Lee, S. W. Lee, A. Yuille, C. Koch, “AdaBoost for text detection in natural scene.” IEEE International Conference on Document Analysis and Recognition(ICDAR), pp. 429-434, 2011.
[9] R. Minetto, N. Thomeb, M. Cord, “T-HOG: an effective gradient-based descriptor for single line text regions.” Pattern Recognition, vol.46(3), pp. 1078-1090, 2013.
[10] A. Bissacco, M. Cummins, Y. Netzer, H. Neven, “PhotoOCR: Reading Text in Uncontrolled Conditions.” IEEE International Conference on Computer Vision(ICCV), 2013.
[11] A. Mishra, K. Alahari, C. V. Jawahar, “Top-down and bottom-up cues for scene text recognition.” IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2012.
[12] K. Wang, B. Babenko, S. Belongie, “End-to-end scene text recognition.” IEEE International Conference on Computer Vision(ICCV), 2011.
[13] T. Wang, D. J. Wu, A. Coates, A. Y. Ng, “End-to-end text recognition with convolutional neural network.” IEEE International Conference on Pattern Recognition (ICPR), 2012.
[14] N. Dalal, B. Triggs, “Histograms of oriented gradients for human detection.” IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2005.
[15] B. Epshtein, O. Eyal, W. Yonatan, "Detecting text in natural scenes with stroke width transform." IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2010.
[16] C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, “Detecting texts of arbitrary orientations in natural images.” IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2012.
[17] W. Huang, Z. Lin, J. Yang, J. Wang, “Text localization in natural images usingstroke feature transform and text covariance descriptors.” IEEE International Conference on Computer Vision (ICCV), 2013.
[18] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image and vision computing (IVC), vol. 22, pp. 761–767, 2004.
[19] L. Neumann, K. Matas, “Text localization in real-world images using eficiently pruned exhaustive search.” IEEE International Conference on Document Analysis and Recognition (ICDAR), 2011.
[20] L. Neumann, K. Matas, “Real-time scene text localization and recognition.” IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2012.
[21] W. Huang, Q. Yu, X. Tang, "Robust scene text detection with convolution neural network induced mser trees." European Conference on Computer Vision (ECCV), 2014.
[22] W. Huang, Z. Lin, J. Yang, and J. Wang, “Text localization in natural images using stroke feature transform and text covariance descriptors,” in IEEE International Conference on Computer Vision (ICCV), 2013.
[23] C. L. Zitnick and P. Dolla’r, “Edge boxes: Locating object proposals from edges,” in European Conference on Computer Vision (ECCV), 2014.
[24] L. Sun, Q. Huo, W. Jia, and K. Chen, “A robust approach for text detection from natural scene images,” Pattern Recognition, vol. 48, pp. 2906–2920, 2015.
[25] He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural networks for scene text detection. IEEE Trans. Image Processing (TIP) 25, 2529–2541, 2016.
[26] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
[27] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
[28] K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition", 2015.
[29] T. He, W. Huang, Y. Qiao, J. Yao, "Accurate text localization in natural image with cascaded convolutional text network" in , Mar. 2016.
[30] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Reading text in the wild with convolutional neural networks. IJCV, 116(1):1–20, 2016.
[31] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
[32] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao. Detecting text in natural image with connectionist text proposal network. In ECCV, 2016.
[33] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
[34] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 2013.
[35] Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks 18(5), 602–610, 2005.
[36] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. arXiv:1612.01105, 2016.
[37] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba. Semantic understanding of scenes through the ADE20K dataset. arXiv:1608.05442, 2016.
[38] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature pyramid networks for object detection", CVPR, 2017.

指導教授

蘇柏齊(Po-Chyi Su)

審核日期

2018-8-17

推文