基於注意力殘差網路之繁體中文街景文字辨識

DC 欄位	值	語言
DC.contributor	軟體工程研究所	zh_TW
DC.creator	蘇冠宇	zh_TW
DC.creator	Kung-Yu Su	en_US
dc.date.accessioned	2020-7-29T07:39:07Z
dc.date.available	2020-7-29T07:39:07Z
dc.date.issued	2020
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107525010
dc.contributor.department	軟體工程研究所	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	街景招牌文字經常傳達豐富的資訊，若能經由視覺技術辨識這些影像中的文字將有利於許多相關應用的開發。儘管電腦視覺於光學文本辨識已有相當成熟的技術，但自然場景文字辨識仍是非常具有挑戰性的任務。除了更多樣的字體、文字大小、與使用者拍攝角度等因素外，繁體中文字訓練資料目前仍不多見，眾多中文字也很難平均地蒐集相對應的照片，即使蒐集了足夠資料也會面臨數據不平衡問題。因此，本研究使用數種繁體中文字體產生高品質訓練影像及標記資料，模擬街景上複雜的文字變化，同時避免人工標記可能造成的誤差。除此之外，本文中亦探討如何使人工生成繁體文字影像更貼近街景真實文字，透過調整光線明亮度、幾何轉換、增加外框輪廓等方式產生多樣化訓練資料以增強模型的可靠性。對於文字偵測及辨識，我們採用兩階段演算法。首先我們採用Deep Lab模型以語意分割方式偵測街景中的單字與文本行所在區域，接著使用STN (Spatial Transformer Network) 修正偵測階段所框列的傾斜文字以利後續辨識階段的特徵提取。我們改良了ResNet50 模型，透過注意力機制改善模型在大型分類任務中的準確率。最後，我們透過使用者的GPS資訊與Google Place API中的地點資訊進行交叉比對，藉此驗證與修正模型輸出文字，增強街景文字的辨識能力。實驗結果顯示本研究能有效偵測及辨識繁體中文街景文字，並在複雜街景測試下表現優於Line OCR及Google Vision。	zh_TW
dc.description.abstract	Texts in nature scenes, especially street views, usually contain rich information related to the images. Although recognition of scanned documents has been well studied, scene text recognition is still a challenging task due to variable text fonts, inconsistent lighting conditions, different text orientations, background noises, angle of camera shooting and possible image distortions. This research aims at developing effective Traditional Chinese recognition scheme for streetscape based on deep learning techniques. It should be noted that constructing a suitable training dataset is an essential step and will affect the recognition performance significantly. However, the large alphabet size of Chinese characters is certainly an issue, which may cause the so-called data imbalance problem when collecting corresponding images. In the proposed scheme, a synthetic dataset with automatic labeling is constructed using several fonts and data augmentation. In an investigated image, the potential regions of characters and text-lines are located. For the located single characters, the possibly skewed images are rectified by the spatial transform network to enhance the performance. Next, the proposed attention-residual network improves the recognition accuracy in this large-scale classification. Finally, the recognized characters are combined using detected text-lines and corrected by the information from Google Place API with the location information. The experimental results show that the proposed scheme can correctly extract the texts from the selected areas in investigated images. The recognition performance is superior to Line OCR and Google Vision in complex street scenes.	en_US
DC.subject	電腦視覺	zh_TW
DC.subject	深度學習	zh_TW
DC.subject	街景文字偵測	zh_TW
DC.subject	繁體中文字辨識	zh_TW
DC.subject	scene text recognition	en_US
DC.subject	scene text detection	en_US
DC.subject	synthetic data	en_US
DC.title	基於注意力殘差網路之繁體中文街景文字辨識	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Traditional Chinese Scene Text Recognition based on Attention-Residual Network	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 107525010 完整後設資料紀錄