dc.description.abstract | In the process of human information communication, reading and writing are the most basic skills, and characters are the one of important parts. Therefore, in the automated handwriting recognition system, Chinese characters are more complicated and have a larger number of common words than English characters and numerals. Chinese characters possess the difference of font structure and stroke sequence. In-air handwriting scenario, each the Chinese characters has both features of real stroke and virtual stroke, and the presentation is different from handwriting on paper. Handwriting on paper only appears the real stroke, the virtual stroke couldn’t show on paper or screen because of lifting pen. However, when users handwrite in the air, the process of stroke is continuous and one stroke-finished; it makes the in-air handwriting own two characteristics: real and virtual strokes and time sequence.
Based on the above characteristics, this paper proposes to use the Long Short-Term Memory (LSTM) model as the core model for recognition. Deep learning requires a lot of training data. Although there are many institutions in China which devote to establish the Simplified Chinese dataset, it doesn’t fit the Taiwanese habit. Therefore, we collect 492 Traditional Chinese characters, about more than 20,000 data. To extract the turning point of the stroke through the preprocessing. In order to conform the characteristic of LSTM which fixed timing, the stroke is cut many fixed quantities, and been the input of recognition model by using shape context statistical spatial distribution feature. This paper test accuracy and stability by increasing and decreasing of strokes and setting the shape context of different dimensions. According experiments, the accuracy of recognizing Chinese characters is 98.6%.
| en_US |