利用邊界距離改進裁切式場景文字偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：22

、訪客IP：3.147.80.94

姓名

侯昱宏(Yu-Hong Hou) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

利用邊界距離改進裁切式場景文字偵測
(Exploiting Distance to Boundary for Segmentation-based Scene-Text Spotting)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

由於影像中的文字提供了豐富的資訊，場景文字定位有助於擷取影像
中的感興趣區域。現今許多場景文字定位方法採用基於裁切的像素預測方
式，即將每個像素分類為特定類型，經常是文字類別與背景類別，再將屬於
文字的像素聚集成需要偵測的文字區域。像素預測方式的優點包括易於實
現、良好的性能以及應用的靈活性。然而，自然場景中的文字有著不同大小、
形狀及顏色，要正確地分離文句仍是具有挑戰的議題。本研究提出運用邊界
距離的方式來協助分割文字像素，以達成更精確的場景文字定位。我們的方
法可用於提取單一字元、單詞、文字串或具有相似紋理的圖案，同時也適用
於檢測以矩形、四邊形或任意形狀包圍的文字框。此外，文字標記的過程相
比於其他方法亦更為簡便。我們探討了網路架構、分類不平衡與後處理等議
題。實驗結果顯示此設計的可行性，證實其有助於改進基於裁切的場景文字
定位方法。

摘要(英)

Scene text spotting helps to locate regions of interest in images as texts inside
pictures often provide abundant information. Many existing schemes adopted the
segmentation-based methodology, which classifies each pixel as a specific type,
usually text or background. Major advantages of pixel prediction include easy to
implement, good performance and flexibility. However, appropriately separating
words in such schemes remains a challenging issue.
This research investigates the use of distance to boundary for partitioning
texts to achieve more accurate scene text spotting. The proposed scheme can be
used to extract single characters, words, text-lines or objects with similar textures.
It is also applicable to detecting texts bounded by rectangles, quadrilaterals or
boxes with arbitrary shapes. The labeling process is relatively efficient. The issues
of network architecture, categorical imbalance and post-processing are discussed.
The experimental results demonstrate the feasibility of the proposed design, which
can help to improve segmentation-based scene-text spotting approaches.

關鍵字(中)

★ 深度學習
★ 街景文字定位
★ 語義分割

關鍵字(英)

★ Deep learning
★ scene text spotting
★ semantic segmentation

論文目次

論文摘要........................................................................................................I
Abstract ........................................................................................................ II
目錄.............................................................................................................III
附圖目錄...................................................................................................... V
表格目錄....................................................................................................VII
第一章緒論................................................................................................. 1
1.1 研究動機及貢獻 ........................................................................... 1
1.2 論文架構 ....................................................................................... 4
第二章相關研究......................................................................................... 5
2.1 傳統影像處理方法 ....................................................................... 5
筆畫寬度變化..................................................................... 5
最大穩定極值區域............................................................. 5
滑動窗口文本檢測............................................................. 6
2.2 深度學習方法 ............................................................................... 7
語義分割............................................................................. 7
物件偵測............................................................................. 8
第三章提出方法....................................................................................... 11
3.1 資料標記 ..................................................................................... 11
資料集............................................................................... 11
不同標記方式比較........................................................... 12
IV
標記生成方法................................................................... 14
3.2 網路架構 ..................................................................................... 15
HRNet ................................................................................ 15
ResNeXt............................................................................. 17
架構流程........................................................................... 20
損失函數........................................................................... 21
3.3 訓練細節 ..................................................................................... 22
3.4 後處理(Post-Processing)............................................................. 23
第四章實驗結果....................................................................................... 29
4.1 評估方法 ..................................................................................... 29
4.2 Ablation Study ............................................................................ 30
4.3 後處理實驗 ................................................................................. 31
4.4 ICDAR 測試 ............................................................................... 32
ICDAR2013 ....................................................................... 32
ICDAR2017 ....................................................................... 33
ICDAR2019_ArT .............................................................. 34
不同模型的比較............................................................... 34
第五章結論與未來展望........................................................................... 35
5.1 結論 ............................................................................................. 35
5.2 未來展望 ..................................................................................... 35
參考文獻..................................................................................................... 36

參考文獻

[1] He K, Gkioxari, G Dollár, Girshick, “Mask r-cnn”, In Proceedings of the IEEE
international conference on computer vision, 2017.
[2] B. Epshtein, O. Eyal, W. Yonatan, “Detecting text in natural scenes with stroke
width transform”, IEEE International Conference Computer Vision and Pattern
Recognition (CVPR), 2010.
[3] W. Huang, Z. Lin, J. Yang, J. Wang, “Text localization in natural images using
stroke feature transform and text covariance descriptors”, IEEE International
Conference on Computer Vision (ICCV), 2013.
[4] C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, “Detecting texts of arbitrary orientations
in natural images”, IEEE International Conference Computer Vision and Pattern
Recognition (CVPR), 2012.
[5] L. Neumann, K. Matas, “Text localization in real-world images using
eficiently pruned exhaustive search”, IEEE International Conference on
Document Analysis and Recognition (ICDAR), 2011.
[6] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo
from maximally stable extremal regions”, Image and vision computing (IVC), vol.
22, pp. 761–767, 2004.
[7] W. Huang, Q. Yu, X. Tang, “Robust scene text detection with convolution
neural network induced mser trees”, European Conference on Computer Vision
(ECCV), 2014.
[8] L. Neumann, K. Matas, “Real-time scene text localization and recognition”,
IEEE International Conference Computer Vision and Pattern Recognition
(CVPR), 2012.
[9] W. Huang, Z. Lin, J. Yang, and J. Wang, “Text localization in natural images
using stroke feature transform and text covariance descriptors”, IEEE
37
International Conference on Computer Vision (ICCV), 2013.
[10] C. L. Zitnick and P. Dolla´r, “Edge boxes: Locating object proposals from
edges”, European Conference on Computer Vision (ECCV), 2014.
[11] D.G. Lowe, “Object recognition from local scale-invariant features”,
Proceedings of the International Conference on Computer Vision: 1150–1157.
1999.
[12] P. Viola, M. Jones, “Rapid object detection using a boosted cascade of
simple features”, IEEE Computer Society Conference on Computer Vision and
Pattern Recognition. CVPR 2001.
[13] N. Dalal, B. Triggs, “Histograms of oriented gradients for human
detection”, IEEE International Conference Computer Vision and Pattern
Recognition (CVPR), 2005.
[14] C. Cortes, V. Vapnik, “Support-vector networks.”, Machine Learning. 1995,
20 (3): 273–297.
[15] Yoav Freund, Robert Schapire, “Experiments with a New Boosting
Algorithm”, Machine Learning: Proceedings of the Thirteenth International
Conference, 1996.
[16] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for
semantic segmentation”, CVPR, 2015.
[17] T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie “Feature
Pyramid Networks for Object Detection”, IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2017.
[18] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang “EAST: An
Efficient and Accurate Scene Text Detector”, 2017 IEEE Conference on
38
Computer Vision and Pattern Recognition, 2017.
[19] Yi Li, Zhe Wu, Shuang Zhao, Xian Wu, “PSENet: Psoriasis Severity
Evaluation Network”, Proceedings of the AAAI Conference on Artificial
Intelligence, 2020.
[20] Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee.
“Character Region Awareness For Text Detection”, 2019 IEEE Conference on
Computer Vision and Pattern Recognition (cvpr), 2019.
[21] Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai, “Real-time
Scene Text Detection with Differentiable Binarization”, Proceedings Of the Aaai
Conference on Artificial Intelligence, 2020.
[22] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time
object detection with region proposal networks”, In NIPS, 2015.
[23] Zhi Tian, Weilin Huang, Tong He, Pan He-Yu Qiao, “Detecting Text in
Natural Image with Connectionist Text Proposal Network.”, Eccv 2016 Lecture
Notes in Computer Science, 2016.
[24] Minghui Liao, Baoguang Shi, Xiang Bai, “Textboxes++: A Single-shot
Oriented Scene Text Detector.”, Ieee Transactions on Image Processing, 2018.
[25] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg
“SSD: Single Shot MultiBox Detector”, arXiv preprint arXiv:1512.02325.
[26] Minghui Liao, Zhen Zhu, Baoguang Shi, Gui, Song Xia, Xiang Bai,
“Rotation-Sensitive Regression for Oriented Scene Text Detection”, IEEE
Conference on Computer Vision and Pattern Recognition, 2018.
[27] Yunguan Fu, Nina Brown, Shaheer Saeed, Adrià Casamitjana, Zachary
Baum, Rémi Delaunay, “DeepReg: a deep learning toolkit for medical image
39
registration”, Journal of Open Source Software, 2020.
[28] Baoguang Shi, Xiang Bai, Serge Belongie, “Detecting Oriented Text in
Natural Images by Linking Segments”, IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2017.
[29] Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, “DeRPN: Taking a
Further Step toward More General Object Detection”, Proceedings of the AAAI
Conference on Artificial Intelligence, 2019.
[30] Yuliang Liu, Sheng Zhang, Lianwen Jin, Lele Xie, Yaqiang Wu, Zhepeng
Wang, “Omnidirectional Scene Text Detection with Sequential-free Box
Discretization”, Proceedings of the Twenty-Eighth International Joint
Conference on Artificial Intelligence, 2019.
[31] Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura,
Lluis Bigorda, Sergi Mestre, Joan Mas, David Mota, Jon Almazan, Lluis Heras,
“Icdar 2013 Robust Reading Competition”,12th International Conference on
Document Analysis and Recognition, 2013.
[32] Nibal Nayef, Fei Yin, Imen Bizid, Hyunsoo Choi, Yuan Feng, “Icdar2017
Robust Reading Challenge on Multi-lingual Scene Text Detection and Script
Identification”, 14th Iapr International Conference on Document Analysis and
Recognition, 2017.
[33] Chee Chng, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, “Icdar2019
Robust Reading Challenge on Arbitrary-shaped Text”, International Conference
on Document Analysis and Recognition, 2019.
[34] Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, “Deep High-resolution
Representation Learning For Human Pose Estimation”, IEEE Conference on
40
Computer Vision and Pattern Recognition (cvpr), 2019.
[35] Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He,
“Aggregated Residual Transformations For Deep Neural Networks”, Ieee
Conference on Computer Vision and Pattern Recognition, 2017.
[36] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna “Rethinking the
Inception Architecture for Computer Vision”, IEEE Conference on Computer
Vision and Pattern Recognition, 2016.

指導教授

蘇柏齊(Po-Chyi Su)

審核日期

2021-7-30

推文