dc.description.abstract | Scene text recognition has quickly become a hot research topic due to its wide range of applications. Different from general text recognition, complex backgrounds, irregular directions, occlusion of characters, blurred images, etc. often appear in scene texts. Therefore, scene text recognition must be more capable of dealing with image diversification and image quality degradation than general text recognition.
In recent years, with the development of deep learning technology, many methods have been tried to solve the task of scene text recognition. However, for humans, the task of text recognition is not only judged from what the eyes see, but also considers semantic knowledge to give more reasonable recognition results. In order to make the deep learning model closer to human reading, more and more methods have begun to turn to how to make the model learn richer semantic information in recent years. However, in the existing literature, most of them use English datasets for research, and it may not be suitable to directly apply these studies to Chinese datasets. In view of this, this paper proposes a deep learning model that is more suitable for Chinese scene text recognition. We added a language model and used additional text data for spelling error correction training, which enabled our scene text recognition model to have better semantic reasoning capabilities. In addition, we also propose a progressive rectification network, which replaces the most commonly used rectification network in existing literature [1], which enables the model to better handle text with irregular orientations.
In the experiments, we show that the method proposed in this paper outperforms the two classic scene text recognition method [1, 2], which are often compared by other literatures. The method of this paper is also better than the two methods proposed in recent years [3, 4]. In addition, in the ablation study, we also explored the effectiveness of each part of the model, and we believe that this paper is a more suitable method for Chinese scene text recognition. | en_US |