基於影像分割之多語言場景文字字元偵測與語言辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：29

、訪客IP：18.217.183.229

姓名

林佳穎(Chia-Yin Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於影像分割之多語言場景文字字元偵測與語言辨識
(Character Spotting and Language Recognition for Multilingual Scene Texts based on Image Segmentation)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-9-30以後開放)

摘要(中)

基於深度學習的自然場景文字分析相關研究在近年來十分盛行，文字
區域偵測更是其中的重要環節。現今文字偵測大多以字串為標記單位，然而
字串中可能包含不同語言的文字，標記時較不易確認該字串文字所屬語言。
本研究提出以字元為單位的偵測方式，不僅能準確標記所屬語言，也讓辨識
時能採用相對應語言模型以達到更好的效果。對於辨識模型而言，字串需要
考量不規則的文字走向，且字串辨識模型通常需要較大量的訓練資料與訓
練時間。反觀字元辨識則不太需要考慮文字走向，訓練模型相對簡單省時，
且面對多語言自然場景文字時能更有彈性地根據語言特性，選擇適合的辨
識單位與方法。本研究使用高解析度網路架構，以字元為偵測單位，標記字
元區域並點出字元中心，且利用多個通道進行語言分類。由於真實資料集字
元標記的缺乏，我們提出針對字元的弱監督式學習方法，使得網路在缺乏字
元標記的情況下也能在偵測字元的表現有明顯的效果提升。在多語言分類
上，不管是偵測後用個別分類器亦或是在偵測的同時進行語言辨識皆有一
定的效果，驗證了字元辨識的可行性。我們實驗以拉丁文(英數字)、中文、
日文、韓文為範例，分析本設計的可行性與合理性。

摘要(英)

In recent years, scene text analysis based on deep learning techniques draw
a lot of research attention. Text detection in natural scenes is an important step of
scene text analysis and most of the existing text detection designs are based on
string detection. However, a string may contain words of different languages so it
is not easy to mark the language to which the string belongs accurately. Scene text
recognition using string-level annotations need to consider the effect of irregular
orientations and require a lot of training data and training time. Conversely,
character-based recognition methodologies do not need to consider orientations,
which simplifies the training processes. Multilingual natural scene text
recognition may be benefited from the flexibility of selecting suitable recognition
models according to different language characteristics. In this research, we use a
high-resolution network architecture to label word regions and point out the
centers of characters, and also employ multiple channels for substring language
classification. Due to the lack of character-level annotations in real datasets, we
propose a weakly supervised learning approach for characters, enabling the
network to improve the detection of characters significantly. The performance of
multi-language recognition is verified by using individual classifiers after
detection or by performing language recognition at the same time. The feasibility
of the proposed design is verified by showing the character detection of different
languages, including Latin, Chinese, Japanese, and Korean, as examples.

關鍵字(中)

★ 深度學習
★ 街景文字定位
★ 多語言文本辨識
★ 弱監督式學習

關鍵字(英)

★ Deep learning
★ Scene text spotting
★ semantic segmentation
★ weakly supervised learning

論文目次

論文摘要 I
Abstract II
目錄 III
第一章緒論 1
1.1 研究動機 1
1.2 研究貢獻 4
1.3 論文架構 5
第二章相關研究 6
2.1 基於深度學習之文字偵測 6
2.1.1 物件偵測 6
2.1.2 語意分割 7
2.2 字元偵測 10
2.3 弱監督式學習( weak supervised learning ) 12
2.4 資料集 14
第三章提出方法 16
3.1資料集與標記方式 16
3.1.1 合成資料集產生 16
3.1.2 資料標記方式 17
3.2 網路架構 19
3.2.1 Backbone and Output Classifier 20
3.3 針對字元的弱監督式學習方法 22
3.3.1 Pseudo label update process 23
3.3.2 Progressing 26
3.4 Loss Function 27
第四章實驗結果 29
4.1 訓練細節 29
4.2 評估方式 29
4.2.1 字數計算評估方式 29
4.2.2 字元分類器評估方式 30
4.3 驗證集 31
4.4 後處理 (Post-processing) 32
4.5 Result 33
4.5.1 Ablation study of ours weakly supervised learning 33
4.5.2 Performance of character detection and classification 33
第五章結論與未來展望 35
5.1 結論 35
5.2 未來展望 35
參考文獻 36

參考文獻

[1] Baoguang Shi, Mingkun Yang , Xinggang Wang, Pengyuan Lyu, Cong Yao , and Xiang Bai. “ ASTER: An Attentional Scene Text Recognizer with Flexible Rectification” CVPR 2016
[2] Canjie Luo, Lianwen Jin, Zenghui Sun. “A Multi-Object Rectified Attention Network for Scene Text Recognition.” CVPR2019
[3] Jaderg, Simonyan, Vedaldi, Zisserman. “Reading text in the wild with convolutional neural network”,IJCV2016
[4] Minghui Liao,Baoguang Shi,Xiang Bai, “Textboxes++ : A Single-shot Oriented Scene Text Detector.” TIP2018
[5] Kaiming He,Georgia Gkioxari,Piotr Dollar,Ross Girshick, “Mask R-CNN”, Facebook AI Research
[6] Yuliang Liu, Sheng Zhang, Lianwen Jim, Lele Xie, Yaqiang Wu, Zhepeng Wang. “Omnidirectional Scene Text Detection with Sequential-free Box Discretization.” Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
[7] X. Zhou,C. Yao,H.Wen,Y. Wang,S. Zhou,W.He, J.Liang “EAST: An efficient and Accurate Scene Text Detector.”,CVPR 2017
[8] Jonathan Long, Evan Shelhamer, Trevor Darrell. “Fully Convolution Networks for Semantic Segmentation.” CVPR2015
[9] Minghui Liao, Zhaoyi Wan,Cong Yao,Kai Chan,Xiang Bai, “Real-time Scene Text Detection with Differentiable Binarization.” AAAI2020
[10] Jian Ye, Zhe Chen, Juhua Liu, Bo Du. “TextFuseNet: Scene Text Detection with Richer Fused Features.” IJCAI2020
[11] Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Tun, Hwalsuk Lee. “Character Region Awareness For Text Detection.” CVPR2019
[12] Linjie Xing, Zhi Tian , Weilin Huang and Matthew R. Scott. “Convolutional Character Networks.” ICCV 2019
[13] Olaf Ronneberger, Philipp Fischer, Thomas Brox. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” MICCAI2015
[14] Zhi-Hua Zhou《A Brief Introduction to Weakly Supervised Learning》journal National Science Review 2018.1
[15] Shangbang Long,Xin He,Cong Yao, “Scene Text Detection and Recognition: The Deep Learning Era”, IJCV2021
[16] Ankush Gupta, Andrea Vedakdi and Andrew Zisserman. “SynthText in the Wild Dataset.” CVPR2016
[17] Max Jaderberg, Karen Simonyan, Andrea Vedaldi and Andrew Zisserman. “Reading Text in the Wild with Convolution Neural Networks.” International journal of computer vision 2016
[18] Prasun Roy, Saumik Bhattacherya, Subhankar Ghosh, Umapada Pal. “STEFANN: Scene Text Editor using Font Adaptive Neural Network.” CVPR’20
[19] Shangbang Long, Cong Yao “UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World” CVPR2020
[20] PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing, ACM Transactions on Graphics journal 2009
[21] Ke Sun,Bin Xiao,Dong Liu,Jingdong Wang. “Deep High-Resolution Representation Learning for Human Pose Estimation”, CVPR2019
[22] Mingxing Tan,Quoc V. Le. “EfficientNet: Rethinking Model Scaling for Convolutioin Neural Networks.”
[23] Nibal Nayef, Fei Yin, Imen Bizid, Hyunsoo Choi, Yuan Feng, “Icdar2017 Robust Reading Challenge on Multi-lingual Scene Text Detection and Script Identification”, 14th Iapr International Conference on Document Analysis and Recognition, 2017.
[24] Chee Kheng Chng,Chee Seng Chan. “Total-Text:A Comprehensive Dataset for Scene Text Detection and Recognition.” ICDAR2017

指導教授

蘇柏齊(Po-Chyi Su)

審核日期

2022-9-19

推文