摘要: | 自然場景文字包含街景路標、商店招牌、告示牌以及商品包裝等,可靠地偵測與辨識這些文字有助於實現多種具潛力的應用。自然場景文字可能出現於複雜街景或非平整背景,易受到光線變化、反光、角度扭曲或其他遮蔽物影響,於自然場景影像中準確偵測與辨識文字並不容易。現今常見的研究方法是利用深度學習模型,並以字詞為單位進行標記以利後續的字詞分割、文字偵測及辨識,通常需要較多資料與較大型的深度學習模型來因應存在於字詞的多樣性。此外,經常出現的不同語種文字會增加標記與辨識的困難。考量模型訓練成本與多語種文字偵測的需求,本研究提出以字元間隙為標的之文字偵測模型來協助定位自然場景中的多語種字元,透過字元間隙決定字元中心,再使用近鄰演算法畫出字元框區域,可與其中以較輕量的模型進行字元辨識。然而,偵測字元間隙的挑戰在於現今大部分資料集的標記都是針對字詞,在缺乏字元或字元間隙標記的情況下,本研究先產生接近自然場景的人工資料集,該資料集包含字元標記框以及字元間隙標記框,再搭配弱監督式學習以含有字詞標記的真實資料集進行模型調整,使得模型在微調以及迭代更新下能更準確地定位字元間隙,進而找出字元位置。實驗結果顯示,對於包含多國語種的文字資料集,我們所提出的偵測字元間隙方法以定位字元中心位置是可行的。;Scene text indicates text appearing in street signs, shop signs, notices, and product packaging, etc. and reliably detecting and recognizing scene text is beneficial for a variety of potential applications. Text in natural scenes may appear in complex street views or on uneven backgrounds, and its detection and recognition are easily affected by changes in lighting, reflections, angle distortions, or other obstructions. Nowadays, common research methods adopt deep learning models, with words labeled as units to facilitate subsequent word segmentation, text detection, and recognition. These methods usually require more data and larger deep learning models to handle the diversity of text words. Besides, multilingual text appears quite often and labeling in a unified manner is not a trivial task. Considering the cost of model training and the detection of multilingual text, this study proposes using character gaps or spacings as detection targets to assist in the segmentation of multilingual characters. By detecting character gaps to locate character centers, and then using a nearest neighbor algorithm to draw character bounding boxes, a lighter model can be used for single-character recognition. However, the challenge of detecting character gaps or spacings lies in the fact that most current datasets are labeled for words, lacking labels for characters or character gaps. We form an synthetic image dataset that mimics natural scenes, containing character bounding boxes and character gap boxes. Combined with weakly supervised learning on real datasets with word labels, this approach allows the model to be fine-tuned and iteratively updated to more accurately locate character gaps. Experimental results show that the proposed method is feasible for detecting character gaps or spacings to locate characters in the multilingual datasets. |