一個結合連接區域精修之全卷積文字串擷取網路

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：28

、訪客IP：18.116.118.198

姓名

曾冠鑫(Guan-Xin Zeng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

一個結合連接區域精修之全卷積文字串擷取網路
(A Fully Convolutional Text-Line Extraction Network with Connectionist Refined Proposals)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

影像中的文字為重要的感興趣區域(regions of interest)，在影像中定位文字供後續處理能夠幫助該影像相關資訊的擷取，並有利於許多有趣應用的開發。近年來語義分割和通用物件檢測框架技術已被文字偵測任務所廣泛採用，兩者在實作中有各自的優勢與缺點。本研究提出結合兩者優點的文字偵測機制，其中包含一個主要文字串偵測網路輔以一個文字精修網路。主要網路利用語意分割的方式並搭配FPN (Feature Pyramid Network)與ASPP (Atrous Spatial Pyramid Pooling)等技術，強化特徵提取效果，藉此偵測文字區域與邊框，將其視為主要結果且具備高召回率。我們接著使用以區域檢測框架為基礎的精修網路再次分析可能的文字區域，將主要結果中較不確定區域以精修網路協助判斷，最後再使用非極大值抑制技術(Non-Maximum Suppression, NMS)得到最終的文字區域偵測結果。實驗結果顯示本研究能有效的在複雜場景中偵測文字，並藉此探討兩種不同架構的深度學習網路在目標應用中的使用方式。

摘要(英)

Texts appearing in images are often regions of interest and locating such areas for further analysis may help to extract image-related information and facilitate many interesting applications. Pixel-based segmentation and region-based object classification are two methodologies for locating text areas in images and have their own pros and cons. In this research, we proposed a text detection scheme consisting a main pixel-based classification network and a supplemented region proposal network. The main network is a Fully Convolutional Network (FCN) employing Feature Pyramid Network (FPN) and Atrous Spatial Pyramid Pooling (ASPP) to identify text areas and borders with higher recall. Certain areas are further processed by the supplemented refinement network, i.e., a simplified Connectionist Text Proposal Network (CTPN) with higher precision. Non-Maximum Suppression (NMS) is then applied to form suitable text-lines. The experimental results show feasibility of the proposed text-detection scheme.

關鍵字(中)

★ 文字偵測
★ 全卷積神經網路
★ 區域候選網絡

關鍵字(英)

★ text detection
★ fully convolutional network
★ region proposal network

論文目次

論文摘要 I
ABSTRACT II
目錄 III
附圖目錄 VI
表格目錄 IX
第一章緒論 1
1.1 研究動機 1
1.2 論文架構 2
第二章文字偵測和深度學習相關研究 3
2.1 傳統影像處理之方法 4
2.2 深度學習及其常見的模型 5
2.2.1網路架構簡介 5
 VGGNet 6
 ResNet 7
 Inception v3 8
 DenseNet 9
2.2.2網路架構之應用 10
2.3 深度學習文字偵測比較與應用 12
 EAST 12
 TextBoxes++ 14
第三章提出方法 17
3.1 方法構想 17
3.2 主要網路偵測方式 18
3.2.1 Pixel-based網路架構介紹 18
 Fully Convolutional Networks for Semantic Segmentation 18
 特徵金字塔網路(feature pyramid network)[34] 19
 空洞卷積(dilated/atrous convolution) 21
 Depthwise separable convolution 24
3.2.2 網路模型建立與其訓練流程 28
3.3 精修網路的偵測方式 40
3.3.1 Region-based網路架構介紹 40
 Faster-RCNN和特徵提取網路 40
 OHEM (online hard example mining)[42] 42
3.3.2 網路模型建立與其訓練流程 42
3.3.3 網路模型之運用及處理 45
3.4 合併結果演算法 46
3.4.1合併結果演算法之步驟一 47
3.4.2合併結果演算法之步驟步驟二 47
3.4.3合併結果演算法之步驟三 48
第四章實驗結果 49
4.1主要網路偵測之網路訓練與結果分析 49
4.2精修網路偵測之網路訓練與結果分析 53
4.3通過NMS修正 55
4.4不同場景之偵測結果 58
第五章結論與未來展望 64
參考文獻 65

參考文獻

[1] N. Dalal, B. Triggs, “Histograms of oriented gradients for human detection.” IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2005.
[2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature,vol. 529(7587), pp.484-489, 2016.
[3] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
[4] D.G. Lowe, “Object recognition from local scale-invariant features,” Proceedings of the International Conference on Computer Vision: 1150–1157. 1999.
[5] P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features,” Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001
[6] C. Cortes, V. Vapnik, Support-vector networks. Machine Learning. 1995, 20 (3): 273–297.
[7] Yoav Freund, Robert Schapire, “Experiments with a New Boosting Algorithm,” Machine Learning: Proceedings of the Thirteenth International Conference, 1996.
[8] Y.-F. Pan, X. Hou, and C.-L. Liu, “Hybrid approach to detect and localize texts in natural scene images,” IEEE Trans. Image Processing (TIP), vol. 20, pp. 800–813, 2011.
[9] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in IEEE International Conference on Computer Vision (ICCV), 2011.
[10] B. Epshtein, O. Eyal, W. Yonatan, "Detecting text in natural scenes with stroke width transform." IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2010.
[11] C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, “Detecting texts of arbitrary orientations in natural images.” IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2012.
[12] W. Huang, Z. Lin, J. Yang, J. Wang, “Text localization in natural images usingstroke feature transform and text covariance descriptors.” IEEE International Conference on Computer Vision (ICCV), 2013.
[13] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image and vision computing (IVC), vol. 22, pp. 761–767, 2004.
[14] L. Neumann, K. Matas, “Text localization in real-world images using eficiently pruned exhaustive search.” IEEE International Conference on Document Analysis and Recognition (ICDAR), 2011.
[15] L. Neumann, K. Matas, “Real-time scene text localization and recognition.” IEEE International Conference Computer Vision and Pattern Recognition (CVPR), 2012.
[16] W. Huang, Q. Yu, X. Tang, "Robust scene text detection with convolution neural network induced mser trees." European Conference on Computer Vision (ECCV), 2014.
[17] W. Huang, Z. Lin, J. Yang, and J. Wang, “Text localization in natural images using stroke feature transform and text covariance descriptors,” in IEEE International Conference on Computer Vision (ICCV), 2013.
[18] C. L. Zitnick and P. Dolla´r, “Edge boxes: Locating object proposals from edges,” in European Conference on Computer Vision (ECCV), 2014.
[19] He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural networks for scene text detection. IEEE Trans. Image Processing (TIP) 25, 2529–2541, 2016.
[20] L. Sun, Q. Huo, W. Jia, and K. Chen, “A robust approach for text detection from natural scene images,” Pattern Recognition, vol. 48, pp. 2906–2920, 2015.
[21] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
[22] K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition", 2015.
[23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna “Rethinking the Inception Architecture for Computer Vision,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger “Densely Connected Convolutional Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
[26] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.
[27] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang “EAST: An Efficient and Accurate Scene Text Detector,” 2017 IEEE Conference on Computer Vision and Pattern Recognition.
[28] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
[29] K.-H. Kim, S. Hong, B. Roh, Y. Cheon, and M. Park, “PVANET: Deep but lightweight neural networks for realtime object detection.” arXiv preprint arXiv:1608.08021, 2016.
[30] L. Huang, Y. Yang, Y. Deng, and Y. Yu, “Densebox: Unifying landmark localization with end to end object detection.” arXiv preprint arXiv:1509.04874, 2015.
[31] M. Liao, B. Shi, X. Bai “TextBoxes++: A Single-Shot Oriented Scene Text Detector,” arXiv preprint arXiv:1801.02765, 2018.
[32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg “SSD: Single Shot MultiBox Detector,” arXiv preprint arXiv:1512.02325
[33] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 40, NO. 4, APRIL 2018
[34] T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie “Feature Pyramid Networks for Object Detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao. Detecting text in natural image with connectionist text proposal network. In ECCV, 2016.
[36] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
[37] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” arXiv preprint arXiv:1704.04861, 2017.
[38] D. Deng, H. Liu, X. Li, D. Cai, “PixelLink: Detecting Scene Text via Instance Segmentation,” AAAI18 – Vision.
[39] G. J. Brostow, J. Shotton, J. Fauqueur, R. Cipolla, “Segmentation and Recognition Using Structure from Motion Point Clouds,” Computer Vision – ECCV 2008 pp 44-57.
[40] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez, J. Almazan, L.P. de las Heras, "ICDAR 2013 Robust Reading Competition", In Proc. 12th International Conference of Document Analysis and Recognition, 2013, IEEE CPS, pp. 1115-1124.
[41] “ICDAR2017 Competition on Multi-lingual scene text detection and script identification,” https://rrc.cvc.uab.es/?ch=8
[42] A. Shrivastava, A. Gupta, R. Girshick, “Training Region-based Object Detectors with Online Hard Example Mining,” arXiv preprint arXiv:1604.03540, 2016.
[43] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, X. Li, “Single shot text detector with regional attention,” ICCV, 2017.
[44] L. Deng, Y. Gong, Y. Lin, J. Shuai, X. Tu, Y. Zhang, Z. Ma, M. Xie “Detecting Multi-Oriented Text with Corner-based Region Proposals,” arXiv preprint arXiv:1804.02690, 2018.
[45] Image Source https://yinguobing.com/separable-convolution/#fn2
[46] Y. Baek, B. Lee, D. Han, S. Yun, H. Lee, “Character Region Awareness for Text Detection” arXiv preprint arXiv:1904.01941, 2019.
[47] X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, J. Yan, “FOTS: Fast Oriented Text Spotting with a Unified Network” arXiv preprint arXiv:1801.01671v2, 2018.
[48] P. Lyu, M. Liao, C. Yao, W. Wu, X. Bai, “Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes” arXiv preprint arXiv:1807.02242, 2018.
[49] J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, X. Xue, “Arbitrary-Oriented Scene Text Detection via Rotation Proposals” arXiv preprint arXiv: 1703.01086, 2017.

指導教授

蘇柏齊(Po-Chyi Su)

審核日期

2019-8-12

推文