利用欄位群聚特徵和四個方向相鄰樹作表格文件分類; Table-Form Classification Using Field Clustering Features and Four Directional Adjacency Trees

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/8448

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/8448

Title:	利用欄位群聚特徵和四個方向相鄰樹作表格文件分類;Table-Form Classification Using Field Clustering Features and Four Directional Adjacency Trees
Authors:	王培儀;Pei-Yi Wang
Contributors:	資訊工程研究所
Keywords:	方向相鄰樹;欄位抽取;線條抽取;表格文件分類;群聚;four directional adjacency trees;field extraction;line extraction;table-form classification;clustering
Date:	2000-07-14
Issue Date:	2009-09-22 11:27:14 (UTC+8)
Publisher:	國立中央大學圖書館
Abstract:	近年來，辦公室自動化已成為時代的潮流。其中，自動文件處理系統在辦公室自動化中佔了不可或缺的地位。在辦公室中，被處理的文件種類相當繁多，其中以表格文件佔絕大多數，並且被廣泛的使用。因此，表格文件分類更是在自動文件處理系統中扮演著一重要之角色。本篇論文提出了一個分類表格文件的新方法，並且對此方法做了深入的介紹。這個方法主要以表格文件中的欄位當作基礎特徵。因此，首先我們必須先抽取出所有的表格線，接著再利用表格線間相交的關係和左上角-右下角配對演算法將表格中所有的欄位取出。在所有的欄位被抽取出來之後，再從這些欄位中擷取出兩種展現欄位間相互關係的特徵來當作比對的依據，即欄位群聚特徵和方向相鄰樹特徵。表格文件的分類即是利用此兩種特徵與資料庫中現有的樣本文件作比對來達成。實驗結果將驗證我們所提的表格分類方法確實可行。 Office automation has become a trend during recent years. Many techniques have been proposed to achieve the goal of office automation. Among those techniques, automatic document processing is one of the most improtant one. In office, there are various kinds of documents to be processed. Most of them are table-form documents and are extensively used in different applications. Table-form classification thereby plays an important role in automatic document processing system. In this thesis, we will present a novel mehtod for recognizing table-form documents. This method adopts the fields in the table-form document as the primary feautre for table-form classification. In our system, we have to extract all table-lines first and then utilize the line-crossing relation matrix and the corner-pair searching algorithm to extract all fields embedded in the table-form document. After that, we will extract two specific and useful features, i.e. the field clustering feature and the four directional adjacency trees (FDAT), which represent the interrelationship between the fields, to serve as the matching basis of the classification system. Last, the recognition of the table-form is achieved by using these two features to compare against a stored table-form library. Experimental results demonstrate the feasibility and the validity of our proposed system in recognizing table-form documents.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Size	Format

社群 sharing

Loading...