dc.description.abstract | Due to the rapid development of scientific technology, electronic products have become smaller with the adding of stronger functions. It is an important issue to assist user how to fully utilize modern Hi-Tech electronic products in storing and retrieving data while saving tremendous human resources and operation time. The purpose of this research is to use a commercialized digital camera to capture the images of name cards or A4-size documents while achieving the goal of segmenting English character images from the documents before performing the task of Optical Character Recognition (OCR). It is important to devise a good segmentation method that can effectively solve the problems of touching characters to obtain good recognition results.
Although digital cameras are portable and easy to use, they suffer the problems resulted from the effects of non-uniform light sources. Moreover, the images captured by digital cameras always slanting or blurring due to the vibration or shaking of hands in taking pictures. Due to the above reasons, the appearing probability of touching characters after binarization becomes much higher comparing with the images captured by using traditional scanners.
In this thesis, we present an effective method to achieve the goal of touching character segmentation. Firstly, image preprocessing is performed including global binarization, connected-component labeling and local binarization to extract the image information for later analysis. Next, a filtering mechanism is devised to segment the correct characters. As to the touching characters, a segmentation method developed by analyzing the peripheral features of character can effectively resolve the problem and produce correct segmentation result.
In the experiments, 50 name cards are tested with totally 10600 characters. Among them, 9550 characters are normal characters and 419 groups of touching characters with 1050 characters are the rest. The average filter accuracy rate is 92.14%, segmentation accuracy rate is 98.57%, and character segmentation accuracy rate is 99.71%. The results demonstrate that the proposed method can effectively segment touching characters. | en_US |