利用欄位群聚特徵和四個方向相鄰樹作表格文件分類; Table-Form Classification Using Field Clustering Features and Four Directional Adjacency Trees

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/8448

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/8448

题名:	利用欄位群聚特徵和四個方向相鄰樹作表格文件分類;Table-Form Classification Using Field Clustering Features and Four Directional Adjacency Trees
作者:	王培儀;Pei-Yi Wang
贡献者:	資訊工程研究所
关键词:	方向相鄰樹;欄位抽取;線條抽取;表格文件分類;群聚;four directional adjacency trees;field extraction;line extraction;table-form classification;clustering
日期:	2000-07-14
上传时间:	2009-09-22 11:27:14 (UTC+8)
出版者:	國立中央大學圖書館
摘要:	近年來，辦公室自動化已成為時代的潮流。其中，自動文件處理系統在辦公室自動化中佔了不可或缺的地位。在辦公室中，被處理的文件種類相當繁多，其中以表格文件佔絕大多數，並且被廣泛的使用。因此，表格文件分類更是在自動文件處理系統中扮演著一重要之角色。本篇論文提出了一個分類表格文件的新方法，並且對此方法做了深入的介紹。這個方法主要以表格文件中的欄位當作基礎特徵。因此，首先我們必須先抽取出所有的表格線，接著再利用表格線間相交的關係和左上角-右下角配對演算法將表格中所有的欄位取出。在所有的欄位被抽取出來之後，再從這些欄位中擷取出兩種展現欄位間相互關係的特徵來當作比對的依據，即欄位群聚特徵和方向相鄰樹特徵。表格文件的分類即是利用此兩種特徵與資料庫中現有的樣本文件作比對來達成。實驗結果將驗證我們所提的表格分類方法確實可行。 Office automation has become a trend during recent years. Many techniques have been proposed to achieve the goal of office automation. Among those techniques, automatic document processing is one of the most improtant one. In office, there are various kinds of documents to be processed. Most of them are table-form documents and are extensively used in different applications. Table-form classification thereby plays an important role in automatic document processing system. In this thesis, we will present a novel mehtod for recognizing table-form documents. This method adopts the fields in the table-form document as the primary feautre for table-form classification. In our system, we have to extract all table-lines first and then utilize the line-crossing relation matrix and the corner-pair searching algorithm to extract all fields embedded in the table-form document. After that, we will extract two specific and useful features, i.e. the field clustering feature and the four directional adjacency trees (FDAT), which represent the interrelationship between the fields, to serve as the matching basis of the classification system. Last, the recognition of the table-form is achieved by using these two features to compare against a stored table-form library. Experimental results demonstrate the feasibility and the validity of our proposed system in recognizing table-form documents.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	大小	格式	浏览次数

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....