以資料挖礦法則預測網頁更新規則之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：43

、訪客IP：3.143.0.18

姓名

張維捷(Wei-Chieh Chang) 查詢紙本館藏

畢業系所

企業管理學系

論文名稱

以資料挖礦法則預測網頁更新規則之研究
(Discovering Web Page Modification Pattern with Data Mining)

相關論文

★ 在社群網站上作互動推薦及研究使用者行為對其效果之影響	★ 以AHP法探討伺服器品牌大廠的供應商遴選指標的權重決定分析
★ 以AHP法探討智慧型手機產業營運中心區位選擇考量關鍵因素之研究	★ 太陽能光電產業經營績效評估－應用資料包絡分析法
★ 建構國家太陽能電池產業競爭力比較模式之研究	★ 以序列採礦方法探討景氣指標與進出口值的關聯
★ ERP專案成員組合對績效影響之研究	★ 推薦期刊文章至適合學科類別之研究
★ 品牌故事分析與比較-以古早味美食產業為例	★ 以方法目的鏈比較Starbucks與Cama吸引消費者購買因素
★ 探討創意店家創業價值之研究- 以赤峰街、民生社區為例	★ 以領先指標預測企業長短期借款變化之研究
★ 應用層級分析法遴選電競筆記型電腦鍵盤供應商之關鍵因子探討	★ 以互惠及利他行為探討信任關係對知識分享之影響
★ 結合人格特質與海報主色以類神經網路推薦電影之研究	★ 資料視覺化圖表與議題之關聯

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在一個搜尋引擎的系統中，將會常常需要對其所蒐集的網頁做更新的動作，通常其更新的間隔為一固定時間，由使用者自訂，但是一旦其間隔的設定不佳，則可能造成抓回來的網頁內容都是與先前相同的（間隔太短），或是網頁的內容已經被更新過多次以上了（間隔太長），這樣一來就可能會有浪費網路成本的情況出現。所以本論文利用資料挖礦中產生序列關聯規則的方法，對網頁找出其更新時間的樣式（updated pattern），並以此更新的樣式來產生網頁的更新預測。依照本研究設計的預測更新機制，可以幫助搜尋引擎的管理者，使其在網頁的管理上可以對減低其對於網路的使用。本研究也提出Incremental 的方法來更新本研究的預測規則，利用此Incremental的演算法可以減少掃瞄資料庫的次數，並適時的產生新的、合理的規則。

摘要(英)

In the E-Commerce era, many agents roam over internet to find best prices, cluster related merchandize information, etc. Agents have to visit targeted web pages periodically to update information. If agents visit pages too frequently then they end up reloading many existing pages. On the other hand, if agents visit web pages too infrequently, collected data may be out of date. To minimize out-of-date errors, agents temp to visit a site as soon as possible. However, to minimize network traffic and database update cost, system administrators temp to reduce the visit as much as possible. To the best of our knowledge, no research has have been directed to find a scientific approach to solve the dilemma.
In the paper, we propose to visit web pages according to past update patterns. That is, a page should be visited as soon as it is expected to be changed, but should not be visited in any other time. To discover the update patterns, we propose to use sequential association rules of data mining methodology. Association rules can find patterns implicitly associated with data that are the update times of each web page. In the paper, each web page will be associated with a sequence of binary digits denoting whether the page is updated in last agent fetching slot. We designed an algorithm to mine patterns from the sequence of binary digits. The patterns will be composed of large item sequences and related association rules. The rule states under some preconditions, the web page will be changed in next time slot. If a precondition match current situation then an agent will be sent to fetch the page. Besides computing patterns for existing pages, the system will also update its database dynamically to consider the factors of newly inserted pages and deleted pages.

關鍵字(中)

★ 網頁更新
★ 資料挖礦
★ 樣式
★ 網頁挖礦
★ 關聯規則

關鍵字(英)

★ web page update
★ data mining
★ pattern discovery
★ WWW
★ web mining

論文目次

第一章緒論1
第一節研究動機1
第二節研究目的2
第三節論文結構4
第二章文獻探討5
第一節資料挖礦概述5
第二節序列結構的關聯規則15
第三節網頁挖礦（Web Mining）19
第四節預測31
第三章演算法34
第一節資料結構34
第二節問題描述37
第三節產生Binary Large Sequence38
第四節預測的樣式42
第四章完整的處理流程47
第一節網頁資料庫的維護47
第二節網頁資料的更新預測48
第三節更新預測規則（Incremental updated pattern）49
第四節研究限制61
第五章實驗62
第一節資料來源62
第二節實驗結果63
第六章結論與建議68
第一節成果與貢獻68
第二節未來研究方向68
參考文獻70

參考文獻

[1] A.Z. Broder, S. C. Glassman, M. S. Manasse, and G Zweig. "Syntactic clustering of the web " In Proc. of 6 th International World Wide Web Conference, 1997.
[2] Alper Caglyan and Colin Harrison, "Agent Sourcebook", Hohn Wiley & Sons., Canada,1997
[2] Alper Caglyan and Colin Harrison, "Agent Sourcebook", Hohn Wiley & Sons., Canada,1997
[4] C. M. Brown, B. B. Danzing,D. Hardy, U.Manber, and M. F. Schwartz. "The harvest information discovery and access system." In Proc. 2nd International World Wide Web Conference, 1994.
[5] C.S. Li, P.S. Yu, and V. Castelli, "HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences," Proc. 12th Int’l Conf. Data Eng., Feb. 1996.
[6] D. Konopnicki and O. Shmueli. "W3qs: A query system for the world wide web." In Proc. of the 21 th VLDB Conference, pages 54-65,Zurich,1995.
[6] D. Konopnicki and O. Shmueli. "W3qs: A query system for the world wide web." In Proc. of the 21 th VLDB Conference, pages 54-65,Zurich,1995.
[8] Software Inc. Webtrends. http://www.webtrends.com,1995
[8] Software Inc. Webtrends. http://www.webtrends.com,1995
[10] J. R. Quinlan, "Induction of Decision Trees", Machine Learning, vol. 1, pp. 81-106, 1986.
[11] J. S. Park, M.-S. Chen, and P.S. Yu "An Effective Hash-Based Algorithm for Mining Association Rules,” SIGMOD , pp.175-186, 1995.
[11] J. S. Park, M.-S. Chen, and P.S. Yu "An Effective Hash-Based Algorithm for Mining Association Rules,” SIGMOD , pp.175-186, 1995.
[13] K. A. Oostendorp, W. F. Punch, and R. W. Wiggins. "A tool for individualizing the web ." In Proc. 2 nd International World Wide Web Conference, 1994.
[13] K. A. Oostendorp, W. F. Punch, and R. W. Wiggins. "A tool for individualizing the web ." In Proc. 2 nd International World Wide Web Conference, 1994.
[13] K. A. Oostendorp, W. F. Punch, and R. W. Wiggins. "A tool for individualizing the web ." In Proc. 2 nd International World Wide Web Conference, 1994.
[16] M.-S. Chen, J.S. Park, and P.S. Yu, "Efficient Data Mining for Path Traversal Patterns", IEEE Transactions on Knowledge and Data Engineering , vol. 10 , no.2, pp. 209-221, 1998.
[17] M.-S Chen, J. Han,P. S. Yu,"Data Mining : An Overview from Database Perspective", IEEE Transaction on Knowledge and Data Engineering,1887
[17] M.-S Chen, J. Han,P. S. Yu,"Data Mining : An Overview from Database Perspective", IEEE Transaction on Knowledge and Data Engineering,1887
[19] net.Genesis net. analysis desktop. http://www.netgen.com,1996.
[20] Open Market Inc. Open market web reporter http://www.openmarket.com,1996
[20] Open Market Inc. Open market web reporter http://www.openmarket.com,1996
[22] P. Merialdo P. Atzeni, G. Mecca. "Semistructured and structured data in the web :Going back and forth." In Proc. of the Workshop on the Management of Semistructured Data ,1997.
[23] Ping-Yu hsu,"WebLattice : Modeling Web Documents with Lattices", Business Administration Department of National Central University, Taiwan, August 7,1998.
[24] R. Agrawal, R. Srikant. "Fast Algorithm for Mining Association Rules ",In Proc.20th VLDB conference ,Santiago, Chile, 1994.
[25] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. of the Int'l Conference on Data Engineering (ICDE), Taipei, Taiwan, March 1995.
[25] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. of the Int'l Conference on Data Engineering (ICDE), Taipei, Taiwan, March 1995.
[25] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. of the Int'l Conference on Data Engineering (ICDE), Taipei, Taiwan, March 1995.
[28] R.Cooley, B. Mobasher, and J. Srivastava "Web Mining : Information and Pattern Discovery on the World Wide Web" IEEE pp.558~567 1997
[29] R.Cooley, B. Mobasher, and J. Srivastava "Grouping Web Page References inti Transaction for Mining World Wide Web Browsing Patterns" IEEE pp.2~9 1997
[29] R.Cooley, B. Mobasher, and J. Srivastava "Grouping Web Page References inti Transaction for Mining World Wide Web Browsing Patterns" IEEE pp.2~9 1997
[29] R.Cooley, B. Mobasher, and J. Srivastava "Grouping Web Page References inti Transaction for Mining World Wide Web Browsing Patterns" IEEE pp.2~9 1997
[29] R.Cooley, B. Mobasher, and J. Srivastava "Grouping Web Page References inti Transaction for Mining World Wide Web Browsing Patterns" IEEE pp.2~9 1997
[29] R.Cooley, B. Mobasher, and J. Srivastava "Grouping Web Page References inti Transaction for Mining World Wide Web Browsing Patterns" IEEE pp.2~9 1997
[29] R.Cooley, B. Mobasher, and J. Srivastava "Grouping Web Page References inti Transaction for Mining World Wide Web Browsing Patterns" IEEE pp.2~9 1997
[35] W. B. Frakes and R. Baeza-Yates. "Information Retrieval Data Structures and Algorithms." Prentice Hall, Englewood Cliffs,NJ,1992.
[36] Y. Aumann, Oren Etzioni, R. Feldman, M. Perkowitz, and T. Shmiel, "Predicting event Sequence: Data Mining for Prefetching Web-pages," KDD’98.
[37] Y. S. Marrek and I.Z. Ben Shaul. "Automatically organizing bookmarks per content." In Proc. of 5 th International World Wide Web Conference, 1996.
[38]陳仕昇、陳彥良、許秉瑜 ,"以可重複序列挖掘網路瀏覽規則之研究."資管評論,第九期民國88年。

指導教授

許秉瑜(Ping-yu Hsu)

審核日期

2000-6-22

推文