English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 38242588      線上人數 : 545
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/83977


    題名: 朝向有效率的非監督式網頁資料擷取:從非監督到自我訓練Wrapper;Toward Efficient Unsupervised Web Data Extraction: From Unsupervised to Self-Trained Wrappers
    作者: 時福仁;Said, Naufal
    貢獻者: 資訊工程學系
    關鍵詞: 資訊系統;資料擷取與整合;深層網路;Wrappers;ETL;資料交換;Information Systems;Data Extraction and Integration;Deep web;Wrappers (data mining);ETL;Data exchange
    日期: 2020-07-23
    上傳時間: 2020-09-02 17:49:28 (UTC+8)
    出版者: 國立中央大學
    摘要: 網頁資料擷取在許多智慧商業任務中是一個關鍵元件,像是資料的轉換、交換、分析和解釋。已經有許多人工、監督式或非監督式的Wrapper induction方法被提出 。但是大多數的研究都專注在資料擷取的成效,並沒有專注在擷取的效率。在這篇論文中,我們顯示出非監督式網頁資料擷取的Wrapper生成是和監督式的Wrapper induction同等重要的,因為已經生成的Wrapper可以不需要複雜的分析並更有效率地完成任務,因此,我們將非監督式網頁擷取視為一個Oracle Machine來生成標記的訓練資料並採用兩種方法來生成Wrapper:Schema引導的Finite-State Machine (FSM)和資料驅動的機器學習方法。實驗結果顯示FSM生成的Wrapper可以在較少量的訓練資料中便達到好的成效,而機器學習類的方法則是在測試時更有效率但需要較多的訓練資料來達到同等的成效。此外,FSM生成的Wrapper可以當作是機器學習類方法的Filter來達到減少資料量並改善學習曲線的效果。;Web data extraction is a key component for many business intelligence tasks, such as data transformation, exchange, analysis, and interpretation. Many approaches have been proposed for wrapper induction, either manual, supervised or unsupervised. However, most research focuses on extraction effectiveness. Not much attention has been paid to extraction efficiency. In this thesis, we argue that wrapper generation for unsupervised web data extraction is as important as supervised wrapper induction because the generated wrappers could work more efficiently without sophisticated analysis. Therefore, we can treat unsupervised data extraction as an oracle machine to generate annotated training examples and consider two methods of wrapper generation: schema-guided finite-state machine (FSM) approaches and data-driven machine learning (ML) approaches. The experimental result shows that the FSM wrapper can perform well even with fewer training data, while the ML-based models are more efficient during testing but require more training pages to achieve the same effectiveness. Furthermore, FSM wrappers can work as a filter to reduce the number of training pages and advance the learning curve for ML-based wrappers.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML108檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明