筆劃特徵用於離線中文字的辨認; Off-Line Chinese Character Recognition Based on Stroke Features

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/8396

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/8396

Title:	筆劃特徵用於離線中文字的辨認;Off-Line Chinese Character Recognition Based on Stroke Features
Authors:	吳偉賢;Wei-Hsien Wu
Contributors:	資訊工程研究所
Keywords:	筆劃次序重排;字串比對;對稱性分析;大分類;筆劃抽取;中文字辨認;筆劃窗;光學辨認系統;attributed string matching;symmetry test;coarse classification;stroke extraction;Chinese character recognition;rearrangement of stroke sequence;stroke window;OCCR
Date:	2000-07-15
Issue Date:	2009-09-22 11:25:52 (UTC+8)
Publisher:	國立中央大學圖書館
Abstract:	離線中文字辨認目前有兩大研究方向：一以統計式特徵(statistic feature)為主另一則以結構式特徵(structure feature)為主。前者著重對文字圖形的分析，抽取有用的特徵資訊；後者則著重於文字中的線段(或筆劃)本身及彼此之間的連結方式，從中取得特徵資訊。本文採後者為研究方向，首先抽取中文字?的筆劃，然後做辨認。一般而言採結構特徵為主的離線中文辨認系統包含四大部分：前處理、筆劃抽取、大分類及辨認。前處理包括去雜訊及骨架化；筆劃抽取的目的在於取得一些有用的資訊，即該輸入字的特徵值。最後依據抽取的特徵值做辨認。由於常用的中文字多達5401字，將輸入字一一與所有資料庫中的字做比對辨認十分沒效率，且沒必要。因此一般都會先對中文字做大分類，使每類字數減少。輸入的字先依同一分類原則做分類，然後才與各分派類中的字做比對，如此便能減少須比對的次數，因而加快辨認速度。本論文針對上述三個主題提出解決方案。首先利用run-length-coding技術產生輸入字的骨架。利用各種run的彼此關係，可以產生一個不含交叉點的骨架，這種特殊的骨架，可避免一般使用細線化(thinning)的方式產生的骨架，在叉點形成的變形，因而有利接下來的筆劃抽取工作。其次我們提出一個階層式的大分類方法。首先，依據中文字的外形分成十大類，然後再對每類字抽取其中的”部首”(radical)。對每個部首分析其四種對稱性，因而可進一步的再加以細分類，因此可有效地降低每類所含的字數。最後我們以逐步增加”筆劃窗”(stroke window)的方式，將輸入字與資料庫中的字做比對。首先，將抽取出的筆劃按水平、垂直、45度角及135度角做分類，然後再依上述次序重排筆劃次序，此重排後的筆劃次序仍保留原次序的筆劃幾何關係。比對之前，先找出同類筆劃之間的關係，包括兩筆劃之間的相對距離、兩筆劃之間的角度以及兩筆劃之間的長度比率。比對時乃以計算筆劃窗內筆劃間的相似度為判斷標準，若高於所訂之值，則逐步增加筆劃至筆劃窗，然後重覆比對直到得到結果為止(比對成功或失敗)。實驗結果顯示我們所提的筆劃抽取方法對雜訊有較高的忍受力及可靠度，大分類方法可有效地降低每類字數，所提的辨認方法亦為可行且有效。 There are two kinds of off-line Chinese character recognition systems: one is based on statistic features, and the other is based on structure features. In this dissertation, we focus on the corresponding subjects of the structure-feature based off-line Chinese character recognition system. A structure-feature based Chinese character recognition system is usually composed of four main modules: preprocessing, stroke extraction, coarse classification and recognition. In the preprocessing module, the scanned image is denoised and skeletonized to facilitate the task of stroke extraction. In this stage, we propose a novel run-length-based skeletonization approach that is more tolerant to noise. The generated skeleton includes no fork point. The special forkless skeleton facilitates and simplifies the task of stroke extraction and makes the result of stroke extraction more reliable. Some structure features can be found for each stroke after the strokes embedded in the character having been extracted, including the end points, the center point, the orientation and the length of the stroke. Further more, some relationships between two strokes can also be found, including the fork points, the distance, the orientation difference, and the length ratio between the two strokes. These extracted features will be utilized in the following steps of recognizing characters. Since Chinese character contains a huge number of characters, it is inefficient to match input character with all the characters in database. Therefore, to preclassify all the characters is necessary. In this dissertation, we also propose an effective preclassification scheme to divide the whole character set into subclasses with each subclass owning fewer characters. The classifier contains two layers: the first layer classifies Chinese characters into ten subclasses according to the pattern of the Chinese characters. In this layer, radicals embedded in the character are also extracted. The second layer further divides the ten subclasses by analyzing four symmetry features in the extracted radical. Finally, an off-line Chinese character recognition methodology is proposed. The extracted stroked are rearranged and formed a 1-D stroke string. In the stroke string, strokes with the same type gather together. The reordered stroke string facilitates the building of intra-character relationships between strokes. While matching input character with characters in database, the difference of the intra-character relationships between the two characters are assessed. The output is the candidate characters being sorted descendingly according to the corresponding matching score. Experimental results reveal that the proposed stroke extraction method has high tolerance with noise as well as more reliable extraction results; whereas, the proposed preclassifier for Chinese characters effectively reduces the members in each subclass. Experimental results also reveal that the proposed recognition scheme is feasible.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Size	Format

社群 sharing

Loading...