dc.description.abstract | Scene text indicates text appearing in street signs, shop signs, notices, and product packaging, etc. and reliably detecting and recognizing scene text is beneficial for a variety of potential applications. Text in natural scenes may appear in complex street views or on uneven backgrounds, and its detection and recognition are easily affected by changes in lighting, reflections, angle distortions, or other obstructions. Nowadays, common research methods adopt deep learning models, with words labeled as units to facilitate subsequent word segmentation, text detection, and recognition. These methods usually require more data and larger deep learning models to handle the diversity of text words. Besides, multilingual text appears quite often and labeling in a unified manner is not a trivial task.
Considering the cost of model training and the detection of multilingual text, this study proposes using character gaps or spacings as detection targets to assist in the segmentation of multilingual characters. By detecting character gaps to locate character centers, and then using a nearest neighbor algorithm to draw character bounding boxes, a lighter model can be used for single-character recognition. However, the challenge of detecting character gaps or spacings lies in the fact that most current datasets are labeled for words, lacking labels for characters or character gaps. We form an synthetic image dataset that mimics natural scenes, containing character bounding boxes and character gap boxes. Combined with weakly supervised learning on real datasets with word labels, this approach allows the model to be fine-tuned and iteratively updated to more accurately locate character gaps. Experimental results show that the proposed method is feasible for detecting character gaps or spacings to locate characters in the multilingual datasets. | en_US |