dc.description.abstract | In this dissertation, a novel method is presented to classify machine printed Chinese characters by matching the code-string-based features which are generated from pseudo skeleton. In our approach, the proposed novel pseudo skeletons of Chinese characters are extracted instead of the skeletons generated by the traditional thinning algorithms. The features of the pseudo skeletons of input and template characters are encoded into two code strings. Next, the edit-distance based matching algorithm is employed to compute the similarity of two characters based on their corresponding encoded strings.
There are three main modules in our work which include preprocessing, feature extraction, and fuzzy matching modules. First, p-skeletons of an input character and the pixel projection histograms are generated in the preprocessing module. Three kinds of virtual-strokes (called v-strokes) are defined by using the fuzzy membership functions. These features are encoded and represented by three kinds of fuzzy variables in the feature extraction module. Based on the encoded strings, the problem of OCR classification is transformed to the matching problem of 1-D string instead of that of 2-D image. At the training stage, the extracted features are stored in the reference database, whereas the fuzzy edit-distance matching algorithm is applied to measure the similarity of an unknown pattern and those in the reference database at the classification stage. Finally, the candidate list is generated as the classification results. Experiments were conducted on 5401 daily-used Chinese characters of various fonts and sizes. Experimental results are illustrated to demonstrate the validity and efficiency of our proposed method. The main contribution of this dissertation is to effectively classify the multi-font Chinese characters using single-font reference database.
In addition, a new method for rotational character classification is also proposed in this dissertation. Similar to p-skeleton generation, the pseudo contour of a character is generated first. The using of pseudo contour instead of original image can greatly speed up the process time. A new clustering method, called circular fuzzy C-mean algorithm, is devised to obtain the rotation invariant feature. At the classification stage, the Hamming distance is applied to measure the similarity of the characters. Experiments shown that the fuzzy ring feature is effective for rotational character classification. | en_US |