博碩士論文 89522051 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:25 、訪客IP:3.22.70.9
姓名 楊士賢(Shi-Hsan Yang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 遞增資料關聯式規則探勘之改進
(Extending SWF for Incremental Association Mining by Incorporating Previously Discovered Information)
相關論文
★ 行程邀約郵件的辨識與不規則時間擷取之研究★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作★ 非簡單瀏覽路徑之探勘與應用
★ 應用卡方獨立性檢定於關連式分類問題★ 中文資料擷取系統之設計與研究
★ 非數值型資料視覺化與兼具主客觀的分群★ 關聯性字組在文件摘要上的探討
★ 淨化網頁:網頁區塊化以及資料區域擷取★ 問題答覆系統使用語句分類排序方式之設計與研究
★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘★ 星狀座標之軸排列於群聚視覺化之應用
★ 由瀏覽歷程自動產生網頁抓取程式之研究★ 動態網頁之樣版與資料分析研究
★ 同性質網頁資料整合之自動化研究★ 時序性資料庫中未知週期之非同步週期性樣板的探勘
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 資料探勘在實際的應用上,已經從傳統的針對靜態的資料庫做探勘,演變成針對動態的資料庫做探勘,關聯規則的遞增探勘是其中較早為大家所重視的課題。近期對於關聯式法則遞增探勘提出的演算法有 FUP2、MAAP、PELICAN、SWF等,其中 SWF 在效能上優於其他同型的演算法。而在本篇論文中,我們提出了二個改進 SWF 的演算法-FI_SWF和CI_SWF,我們藉著儲存前一次探勘的頻繁項目集和支持度,對於目前探勘,我們只需要掃描資料庫變動的部分,即可得儲存的項目集的新支持度,不僅降低了在 SWF 中最後一次掃瞄資料庫的時間,也加速候選項目集的產生。在實驗中證明,改良後的 SWF 演算法確實能加快執行時間。雖然我們的演算法須要較多的硬體空間來儲存前一次的頻繁項目集或是侯選項目集,但是在最大記憶體的使用上是相當於SWF演算法。在實際的應用上,當資料探勘變成是一個重複而頻繁的工作時,執行時間更形重要,利用本篇論文提出的演算法來做資料探勘,是一個有效並簡單的好方法。
摘要(英) Incremental mining of association rules from dynamic databases refers to the maintenance
and utilization of the knowledge discovered in the previous mining operations.Sliding-
window-filtering (SWF)is a technique proposed to filtering false candidate 2-itemsets by
segmenting a transaction database into several partitions.SWF computes a set of candidate
2-itemsets that is close to frequent 2-itemsets.Therefore,it is possible to generate several candidate k -itemsets for one database scan.Such a database scan reduction technique greatly increase the performance for frequent itemsets discovery.In this paper,we extend SWF by incorporating previously discovered information and propose two algorithms to boost the
performance for incremental mining.The first algorithm FI SWF (SWF with Frequent
Itemset)reuse the frequent itemsets (and the counts)of previous mining task as FUP2 to
reduce the number of new candidate itemsets that have to be checked.The second algorithm
CI SWF (SWF with Candidate Itemset)reuse the candidate itemsets (and the counts)from the previously mining task.Experimental studies are performed to evaluate performance of the new algorithms.The study shows that the new incremental algorithm is signi ficantly faster than SWF.More importantly,the need for more disk space to store the previously discovered knowledge does not increase the maximum memory required during the execution time.
關鍵字(中) ★ 關聯式規則
★ 資料探勘
關鍵字(英) ★ Data Mining
★ Association Rules
論文目次 中文部分:
第一章 導論...............01
第二章 知識背景...............05
2.1準備工作...............07
2.2 SWF 演算法...............08
第三章 演算法...............08
3.1範例...............08
3.1.1 Preprocessing Procedure 的範例...............11
3.1.2 Incremental Procedure 的範例...............13
3.2虛擬碼...............15
3.2.1 Preprocessing Procedure 的虛擬碼...............15
3.2.2 Incremental Procedure 的虛擬碼...............17
第四章 實驗...............19
4.1資料產生方法...............19
4.2執行速度...............19
4.3記憶體使用...............21
4.4 I/O Cost...............22
4.5 處理大型資料的能力...............23
4.6發生頻頸時的效能頻比...............24
第五章 比較...............26
第六章 結論...............28
第七章 參考文獻...............29
英文部分:
1 Introduction ...............1
2 PRELIMINARIES ...............4
2.1 TheSWFalgorithm...............................5
3 Algorithm ...............7
3.1 Anexample....................................7
3.1.1 Preprocessingprocedure .........................7
3.1.2 Incrementalprocedure ..........................9
3.1.3 Extending the Incremental Procedure ..................9
3.2 SWFwithKnownItemsets............................11
3.2.1 Preprocessing Procedure of KI SWF ..................13
3.2.2 Incremental Procedure of KI SWF ...................13
4 Experiments............... 16
4.1 Generationofsyntheticworkload ........................16
4.2 Running Time ...................................17
4.3 MemoryRequiredandI/Ocost .........................18
4.4 ScaleupPerformance...............................19
4.5 BottleneckDiscussion...............................20
5 Comparison............... 23
6 Conclusion............... 25
參考文獻 [1] R. Agarwal, C. Aggarwal, and V.V.V. Prasad. A Tree Projection Algorithm for Generation of Frequent Itemsets. Jornal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), 2000.
[2] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proc.of ACMSIGMOD, pages 207—216, May 1993.
[3] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proc. Of the 20th International Conference on Very Large Data Bases, pages 478—499, September 1994.
[4] N.F. Ayan, A.U. Tansel, and E. Arkun. An Ecient Algorithm to Update Large Itemsets with Early Pruning. Proc. of 1999 Int. Conf. on Knowledge Discovery and Data Mining, 1999.
[5] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. ACM SIGMOD Record, 26(2):255—264, May 1997.
[6] M.-S. Chen, J. Han, and P.S. Yu. Data Mining: An Overview from Database Perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866—883, December 1996.
[7] M.-S. Chen, J.-S. Park, and P. S. Yu. Efficient Data Mining for Path Traversal Patterns. IEEE Transactions on Knowledge and Data Engineering, 10(2):209—221, April 1998.
[8] D. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. Proc. of 1996 Int’l Conf. on Data Engineering, pages 106—114, February 1996.
[9] D. Cheung, S.D. Lee, and B. Kao. A General Incremental Technique for Updating Discovered Association Rules. Proc. International Conference On Database Systems For Advanced Applications, April 1997.
[10] J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based, Multidimensional Data Mining. COMPUTER (special issues on Data Mining), pages 46—50, 1999.
[11] J. Han and J. Pei. Mining Frequent Patterns by Pattern-Growth: Methodology and Implications. ACM SIGKDD Explorations (Special Issue on Scaleble Data Mining Algorithms), December 2000.
[12] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. Proc.of 2000 Int.Conf.on Knowledge Discovery and Data Mining, pages 355—359, August 2000.
[13] J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Algorithms for association rule mining —a general survey and comparison. SIGKDD Explorations, 2(1):58—64, July 2000.
[14] L. V. S. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of Constrained Frequent Set Queries with 2-Variable Constraints. Proc. of 1999 ACM-SIGMOD Conf. on Management of Data, pages 157—168, June 1999.
[15] C.-H. Lee, C.-R. Lin and M.-S. Chen. Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining. Proc. of the ACM 10th Intern’’l Conf. on Information and Knowledge Management (CIKM-01), November 5-10, 2001.
[16] J.-L. Lin and M.H. Dunham. Mining Association Rules: Anti-Skew Algorithms. Proc.of 1998 Int’l Conf. on Data Engineering, pages 486—493, 1998.
[17] J.-S. Park, M.-S.Chen, and P.S.Yu. Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE Transactions on Knowledge and Data Engineering, 9(5):8 3—825, October 1997.
[18] J. Pei and J. Han. Can We Push More Constraints into Frequent Pattern Mining? Proc. of 2000 Int. Conf. on Knowledge Discovery and Data Mining, August 2000.
[19] J. Pei, J. Han, and L.V.S. Lakshmanan. Mining Frequent Itemsets with Convertible Constraints. Proc. of the Intl. Conf. on Data Engineering, 2001.
[20] A. Savasere, E. Omiecinski, and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. Proc. of the 21th International Conference on Very Large Data Bases, pages 432—444, September 1995.
[21] R. Srikant and R. Agrawal. Mining Generalized Association Rules. Proc. of the 21th International Conference on Very Large Data Bases, pages 407—49, September 1995.
[22] S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. An Efficient Algorithm for the Incremental Updating of Association Rules in Large Databases. Proc. of 1997 Intl. Conf. on Knowledge Discovery and Data Mining, 1997.
[23] H. Toivonen. Sampling Large Databases for Association Rules. Proc. of the 22th VLDB Conference,pages 34—1 45, September 1996.
[24] A.K.H. Tung, J. Han, L.V.S. Lakshmanan, and R. T. Ng. Constraint-Based Clustering in Large Databases. Proc. of 2001 Int. Conf. on Database Theory, January 2000.
[25] A. Veloso, B. Possas, W. Meira Jr., M. B. de Carvalho. Knowledge Management in Association Rule Mining, Integrating Data Mining and Knowledge Management, ICDM’’01: The 2001 IEEE International Conference on Data Mining, California, USA. [26] K. Wang,Y. He and J. Han. Mining Frequent Itemsets Using Support Constraints. Proc.of 2000 Int. Conf. on Very Large Data Bases, September 2000.
[27] M. J. Zaki, S. Parthasarathy and W. Li. New Algorithm for Fast Discovery of Association Rules. In Proc. Of the 3rd Intl. Con. On Knowledge Discovery and Data Mining, 1997.
[28] Zhou Zequn and C.I. Ezeife. A Low-Scan Incremental Association Rule Maintenance Method, Proceedings of the fourteenth Canadian Conference on Artificial Intelligence, AI 2001, holding June 7 to June 9, 2001, Ottawa, Canada.
指導教授 張嘉惠(Chia-Hui Chang) 審核日期 2002-7-4
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明