博碩士論文 107525010 完整後設資料紀錄

DC 欄位 語言
DC.contributor軟體工程研究所zh_TW
DC.creator蘇冠宇zh_TW
DC.creatorKung-Yu Suen_US
dc.date.accessioned2020-7-29T07:39:07Z
dc.date.available2020-7-29T07:39:07Z
dc.date.issued2020
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107525010
dc.contributor.department軟體工程研究所zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract街景招牌文字經常傳達豐富的資訊,若能經由視覺技術辨識這些影像中的文字將有利於許多相關應用的開發。儘管電腦視覺於光學文本辨識已有相當成熟的技術,但自然場景文字辨識仍是非常具有挑戰性的任務。除了更多樣的字體、文字大小、與使用者拍攝角度等因素外,繁體中文字訓練資料目前仍不多見,眾多中文字也很難平均地蒐集相對應的照片,即使蒐集了足夠資料也會面臨數據不平衡問題。因此,本研究使用數種繁體中文字體產生高品質訓練影像及標記資料,模擬街景上複雜的文字變化,同時避免人工標記可能造成的誤差。除此之外,本文中亦探討如何使人工生成繁體文字影像更貼近街景真實文字,透過調整光線明亮度、幾何轉換、增加外框輪廓等方式產生多樣化訓練資料以增強模型的可靠性。對於文字偵測及辨識,我們採用兩階段演算法。首先我們採用Deep Lab模型以語意分割方式偵測街景中的單字與文本行所在區域,接著使用STN (Spatial Transformer Network) 修正偵測階段所框列的傾斜文字以利後續辨識階段的特徵提取。我們改良了ResNet50 模型,透過注意力機制改善模型在大型分類任務中的準確率。最後,我們透過使用者的GPS資訊與Google Place API中的地點資訊進行交叉比對,藉此驗證與修正模型輸出文字,增強街景文字的辨識能力。實驗結果顯示本研究能有效偵測及辨識繁體中文街景文字,並在複雜街景測試下表現優於Line OCR及Google Vision。zh_TW
dc.description.abstractTexts in nature scenes, especially street views, usually contain rich information related to the images. Although recognition of scanned documents has been well studied, scene text recognition is still a challenging task due to variable text fonts, inconsistent lighting conditions, different text orientations, background noises, angle of camera shooting and possible image distortions. This research aims at developing effective Traditional Chinese recognition scheme for streetscape based on deep learning techniques. It should be noted that constructing a suitable training dataset is an essential step and will affect the recognition performance significantly. However, the large alphabet size of Chinese characters is certainly an issue, which may cause the so-called data imbalance problem when collecting corresponding images. In the proposed scheme, a synthetic dataset with automatic labeling is constructed using several fonts and data augmentation. In an investigated image, the potential regions of characters and text-lines are located. For the located single characters, the possibly skewed images are rectified by the spatial transform network to enhance the performance. Next, the proposed attention-residual network improves the recognition accuracy in this large-scale classification. Finally, the recognized characters are combined using detected text-lines and corrected by the information from Google Place API with the location information. The experimental results show that the proposed scheme can correctly extract the texts from the selected areas in investigated images. The recognition performance is superior to Line OCR and Google Vision in complex street scenes.en_US
DC.subject電腦視覺zh_TW
DC.subject深度學習zh_TW
DC.subject街景文字偵測zh_TW
DC.subject繁體中文字辨識zh_TW
DC.subjectscene text recognitionen_US
DC.subjectscene text detectionen_US
DC.subjectsynthetic dataen_US
DC.title基於注意力殘差網路之繁體中文街景文字辨識zh_TW
dc.language.isozh-TWzh-TW
DC.titleTraditional Chinese Scene Text Recognition based on Attention-Residual Networken_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明