English  |  正體中文  |  简体中文  |  Items with full text/Total items : 65275/65275 (100%)
Visitors : 20890830      Online Users : 317
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/8987


    Title: 具線上學習功能之新型擷取程式;A Novel Wrapper with the On-Line Learning Capability
    Authors: 黃陳科;Chen-Ko Huang
    Contributors: 資訊工程研究所
    Keywords: 擷取規則;包覆程式;擷取程式;extraction rule;wrapper
    Date: 2005-07-05
    Issue Date: 2009-09-22 11:38:54 (UTC+8)
    Publisher: 國立中央大學圖書館
    Abstract: 由於現今網際網路的發達,很多資訊儲存於資料庫,然後再透過網頁呈現;而網頁的編寫目前是透過共同閘道介面(Common Gateway Interface, CGI)程式產生,凡是由同一個共同閘道介面產生的網頁,均有其固定的規則。因此本論文可以使用此一規則反向地將資料一筆一筆擷取,這規則就稱為擷取規則(Extraction Rule)。使用擷取規則將網頁的資料庫反向擷取出資訊的程式,就稱為擷取程式或包覆程式(Wrapper)。包覆程式的功能在於擷取網頁的資訊來源,並將其儲存為根據使用者所定義的格式,以方便將處理過後的資料進一步整合。為顧及網際網路的資訊過於泛濫,因此設計一個可學習的資訊擷取系統自動地產生包覆程式,可以方便整合網頁資訊,並且可省除使用者太過繁複的標示。換言之,資訊擷取系統必須根據訓練網頁所要擷取的內容,產生相對的擷取規則傳至擷取程式處理。鑑於這些考量,本論文發展出一個新的方法,以訊號化為基礎,找出使用者標示範例與網頁的關連性特徵,此方法本論文稱為「以長條圖及邊界標籤為基礎之關連性係數」,用以實現整個擷取系統,可因應網頁資訊的多元性以產生擷取規則、並且具有線上學習效能的擷取程式。 Since the Internet has been very popular and prosperous, a great amount of information now is saved among the database which is accessible through webpages. At present, most webpage-editing is using Common Gateway Interface (CGI) programming; therefore, it is of some certain constant rules. Thus we can extract the information on webpage with these constant rules known as ‘Extraction Rules’. The programming basing on Extraction Rules which can extract the information on webpage is called ‘Wrapper’. Wrapper can not only extract the information which is performed on the webpage, but it can also transform and save information into the format which the user defines. Hence, it allows us to process the information for further purpose. On considering the overwhelming scale of internet information, designing an information extraction system with learning capability can combine the information on the webpage and enable the user build up Wrapper automatically with simple template marking. In other words, the information extraction system must abstract and establish extraction rules according to the training page for wrapper. On account of these, we develop a new method based on signals called” histogram and boundary tag-based correlation coefficient.” The method can discover correlation features between the template which the user marks and webpage, and implement the extraction system. We develop the programming with On-Line Learning Capability to set up extraction rules which will be able to cope with the diverse webpage.
    Appears in Collections:[資訊工程研究所] 博碩士論文

    Files in This Item:

    File SizeFormat
    0KbUnknown316View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback  - 隱私權政策聲明