邊緣特徵於英文連字切割之研究; Camera Based Touching Character Segmentation using Peripheral Feature

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/9368

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/9368

題名:	邊緣特徵於英文連字切割之研究;Camera Based Touching Character Segmentation using Peripheral Feature
作者:	林志瑋;Chih-Wei Lin
貢獻者:	資訊工程研究所
關鍵詞:	邊緣特徵;Peripheral Feature
日期:	2007-07-12
上傳時間:	2009-09-22 11:46:18 (UTC+8)
出版者:	國立中央大學圖書館
摘要:	在現今科技日新月異的社會中，電子產品越做越精緻小巧，但其功能卻是越來越強大，因此，如何輔助現代人，在利用高科技電子產品來擷取數位影像資料後，將其檔案電子化以節省資料整理上的人力資源及時間耗費，這是本研究所重視的課題。本研究目的即在利用現今解析度已相當不錯之數位相機，以其可攜性、便利性及高解析度等特性，擷取欲分析文件為數位影像資料，並在文件影像中，於文字資訊辨識前，進行英文字元影像的切割研究。因為要有好的辨識效果，必定要有好的文字切割機制將連字正確的切割開來。數位相機有著隨時取像的優點，但同時也伴隨著光線來源不均勻的影響，其並不像掃描器於取像時有著穩定的光線來源，而且由於取像大多是利用手持相機的方法，因此也會因手抖動，而造成影像發生輕微傾斜或模糊的現象，由於以上外在因素的影響，使得影像在二值化後往往容易發生連字的情形。本研究提供了一個有效正確的連字切割方法，利用影像前處理包含全域二值化、文字區塊標記、區域二值化，來擷取欲分析之影像資料，並利用本研究中所提出之過濾機制，將正確完整之字元給過濾出來，對於淘汰出來的連字部分，則使用本研究中之邊緣特徵切割機制將其進行切割分析，並可將此正確之切割結果提供後續辨識系統之用。本研究針對50張名片，總共約有10600個字元，其中正常字元約9550個字元，而約有419組連字；約為1050個字元，其平均過濾篩選正確率為92.14%，切割正確率為98.57%，而文字切割正確率為99.71%。 Due to the rapid development of scientific technology, electronic products have become smaller with the adding of stronger functions. It is an important issue to assist user how to fully utilize modern Hi-Tech electronic products in storing and retrieving data while saving tremendous human resources and operation time. The purpose of this research is to use a commercialized digital camera to capture the images of name cards or A4-size documents while achieving the goal of segmenting English character images from the documents before performing the task of Optical Character Recognition (OCR). It is important to devise a good segmentation method that can effectively solve the problems of touching characters to obtain good recognition results. Although digital cameras are portable and easy to use, they suffer the problems resulted from the effects of non-uniform light sources. Moreover, the images captured by digital cameras always slanting or blurring due to the vibration or shaking of hands in taking pictures. Due to the above reasons, the appearing probability of touching characters after binarization becomes much higher comparing with the images captured by using traditional scanners. In this thesis, we present an effective method to achieve the goal of touching character segmentation. Firstly, image preprocessing is performed including global binarization, connected-component labeling and local binarization to extract the image information for later analysis. Next, a filtering mechanism is devised to segment the correct characters. As to the touching characters, a segmentation method developed by analyzing the peripheral features of character can effectively resolve the problem and produce correct segmentation result. In the experiments, 50 name cards are tested with totally 10600 characters. Among them, 9550 characters are normal characters and 419 groups of touching characters with 1050 characters are the rest. The average filter accuracy rate is 92.14%, segmentation accuracy rate is 98.57%, and character segmentation accuracy rate is 99.71%. The results demonstrate that the proposed method can effectively segment touching characters.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	大小	格式	瀏覽次數

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....