線上擷取規則分析

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	郭釋謙	zh_TW
DC.creator	Shih-Chien Kuo	en_US
dc.date.accessioned	2003-7-15T07:39:07Z
dc.date.available	2003-7-15T07:39:07Z
dc.date.issued	2003
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=90522059
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	隨著網際網路的發展，越來越多的資訊以HTML的格式來呈現，有用與無用的資訊參雜其中，使用者往往可能花上大筆的時間在找尋資料，因此，透過資訊擷取系統的設計，將輸入的資料以結構化的方式呈現，進而整合資料，建構豐富的搜尋引擎。設計資訊擷取系統，最直接的方法是針對各個網站利用人工撰寫擷取資料的包覆程式(Wrapper)，但是由於網站的格式隨時有可能發生更改，因此如何快速並且自動地產生擷取程式是設計擷取系統最大的挑戰。從1997年開始，Wrapper Induction的方法被提出，利用標示範例網頁，告訴系統要擷取的資訊，讓系統產生擷取規則，接著利用擷取規則來擷取網站的資訊。這類利用標示範例網頁的方式，雖然有不錯的擷取率，但是必須經過十分繁複的標示動作，才能產生擷取規則，因此對使用者來說，並不是那麼的便利，因此減少使用者標示的資訊擷取系統是系統設計的一大挑戰，目前不用使用者標示的系統如IEPAD等僅能解決多筆紀錄的網頁，對於單一紀錄網頁尚無解決辦法，有鑑於此，本篇論文提出一個有效的方法來完成自動化的資訊擷取系統(Information Extraction System)，讓使用者不必經過繁複的標示動作便可將資料完整的擷取到手，同時解決單一記錄以及多筆記錄的網頁擷取問題。	zh_TW
dc.description.abstract	The vast amount of online information available has led to renewed interest in information extraction (IE) systems that analyze input documents to produce a structured representation of selected information from the documents. However, the design of an IE system differs greatly according to its input: from unrestricted free-text to semi-structured Web documents. This paper extends an automatic pattern discovery approach called IEPAD to the rapid generation of IE systems that can extract structured data from semi-structured Web documents. In this novel framework, extraction rules can be trained not only from a multiple-record Web page but also from multiple single-record Web pages (called singular pages). Most of all, this framework requires no annotation labor that is required for many machine-learning based approaches. Evaluation results show a high level of system performance.	en_US
DC.subject	資訊整合	zh_TW
DC.subject	資料檢索	zh_TW
DC.subject	資訊擷取	zh_TW
DC.subject	Information Integration	en_US
DC.subject	Information Extraction	en_US
DC.subject	Information Retrieval	en_US
DC.title	線上擷取規則分析	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	On-Line Extraction Rule Analysis	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 90522059 完整後設資料紀錄