中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/11254
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 81570/81570 (100%)
造访人次 : 47014675      在线人数 : 101
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/11254


    题名: 以資料挖礦法則預測網頁更新規則之研究;Discovering Web Page Modification Pattern with Data Mining
    作者: 張維捷;Wei-Chieh Chang
    贡献者: 企業管理研究所
    关键词: 網頁更新;資料挖礦;樣式;網頁挖礦;關聯規則;web page update;data mining;pattern discovery;WWW;web mining
    日期: 2000-06-22
    上传时间: 2009-09-22 14:17:28 (UTC+8)
    出版者: 國立中央大學圖書館
    摘要: 在一個搜尋引擎的系統中,將會常常需要對其所蒐集的網頁做更新的動作,通常其更新的間隔為一固定時間,由使用者自訂,但是一旦其間隔的設定不佳,則可能造成抓回來的網頁內容都是與先前相同的(間隔太短),或是網頁的內容已經被更新過多次以上了(間隔太長),這樣一來就可能會有浪費網路成本的情況出現。所以本論文利用資料挖礦中產生序列關聯規則的方法,對網頁找出其更新時間的樣式(updated pattern),並以此更新的樣式來產生網頁的更新預測。依照本研究設計的預測更新機制,可以幫助搜尋引擎的管理者,使其在網頁的管理上可以對減低其對於網路的使用。本研究也提出Incremental 的方法來更新本研究的預測規則,利用此Incremental的演算法可以減少掃瞄資料庫的次數,並適時的產生新的、合理的規則。 In the E-Commerce era, many agents roam over internet to find best prices, cluster related merchandize information, etc. Agents have to visit targeted web pages periodically to update information. If agents visit pages too frequently then they end up reloading many existing pages. On the other hand, if agents visit web pages too infrequently, collected data may be out of date. To minimize out-of-date errors, agents temp to visit a site as soon as possible. However, to minimize network traffic and database update cost, system administrators temp to reduce the visit as much as possible. To the best of our knowledge, no research has have been directed to find a scientific approach to solve the dilemma. In the paper, we propose to visit web pages according to past update patterns. That is, a page should be visited as soon as it is expected to be changed, but should not be visited in any other time. To discover the update patterns, we propose to use sequential association rules of data mining methodology. Association rules can find patterns implicitly associated with data that are the update times of each web page. In the paper, each web page will be associated with a sequence of binary digits denoting whether the page is updated in last agent fetching slot. We designed an algorithm to mine patterns from the sequence of binary digits. The patterns will be composed of large item sequences and related association rules. The rule states under some preconditions, the web page will be changed in next time slot. If a precondition match current situation then an agent will be sent to fetch the page. Besides computing patterns for existing pages, the system will also update its database dynamically to consider the factors of newly inserted pages and deleted pages.
    显示于类别:[企業管理研究所] 博碩士論文

    文件中的档案:

    档案 大小格式浏览次数


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明