在多重支持度下有效率的挖掘與維護關聯規則

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：150

、訪客IP：18.216.106.3

姓名

胡雅涵(Ya-Han Hu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

在多重支持度下有效率的挖掘與維護關聯規則
(An Efficient Algorithm for Discovery and Maintenance of Frequent Patterns with Multiple Minimum Supports)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

利用多重門檻值來進行關聯規則挖掘是一項相當重要且符合現實生活的資料採礦方法，相對於傳統單一門檻值的關聯規則挖掘，它允許使用者可以針對每個不同商品設定不同的門檻值，以反映真實世界中購買各種商品頻率不一的問題。以往Liu曾提出MSapriori演算法來挖掘多重門檻值下的頻繁項目集，然而由於其所採取的是Apriori-based的方法而導致效率不佳。在本篇論文中，我們提出了一種與FP-tree相似的結構與方法（稱為MIS-tree）來進行多重門檻值下的頻繁項目集挖掘，實驗結果顯示其效率較傳統MSapriori演算法好上許多。另外，有鑑於實務上應用多重門檻值的資料挖掘方法時，使用者必須多次調整每個商品的門檻值才能找到滿足起所需的頻繁項目集，我們在此也提出了一個維護MIS-tree的方法，讓使用者在調整完各個商品的門檻值後不需要再重新掃瞄資料庫而直接去調整已存在的MIS-tree，如此可以省下許多的執行時間。

摘要(英)

Mining association rules with multiple minimum supports is an important generalization of the association rule mining problem, which was recently proposed by Liu et al. Instead of setting a single minimum support threshold for all items, they allow users to specify multiple minimum supports to reflect the natures of the items and their varied frequencies in the database. In Liu’s paper, an Apriori-based algorithm, named MSapriori, is developed to mine all frequent item sets. In this paper, we study the same problem but with two additional improvements. First, we propose a FP-tree-like structure, MIS-tree, to store the crucial information about frequent patterns. Accordingly, an efficient MIS-tree-based algorithm, called the CFP-growth algorithm, is developed for mining all frequent item sets. We evaluate the performance of the algorithm using both synthetic datasets and real datasets, and the results show that the CFP-growth algorithm is much more efficient and scalable than the MSapriori algorithm. Second, since each item can have its own minimum support, it is very difficult for users to set the appropriate thresholds for all items at a time. In practice, users need to tune items’ supports and run the mining algorithm repeatedly until a satisfactory end is reached. To speed up this time-consuming tuning process, an efficient algorithm which can maintain the MIS-tree structure without rescanning database is proposed. Experiments on both synthetic and real-life datasets show that our MIS-tree maintenance algorithm achieves dramatic saving in computation when tuning supports.

關鍵字(中)

★ 頻繁物件集
★ 關聯規則維護
★ 多重門檻值
★ 多重支持度
★ FP-tree

關鍵字(英)

★ Incremental mechanism
★ FP-tree
★ Multiple Minimum Supports
★ Association Rules
★ Data Mining

論文目次

Table of Contents
List of Illustrations 1
List of Tables 1
1. Introduction 1
2. Related work 4
2.1 MSApriori algorithm 4
Definition 2.1.1 4
2.2 FP-tree and FP-growth algorithm 6
3. Multiple Item Support Tree (MIS-tree): design and construction 8
Lemma 3.1 8
Definition 3.1(MIS-tree） 9
4. Mining Frequent Patterns using MIS-tree 17
Definition 4.1 Conditional pattern 17
Definition 4.2 Conditional frequent pattern 17
Property 4.1 (Node-link property) 17
Property 4.2 (Prefix path property) 17
Lemma 4.1 Fragment growth 22
Corollary 4.1 Pattern growth 22
Corollary 4.2 Pattern growth 23
5. Support tuning: 25
6. Experimental evaluation 30
6.1 Experimental evaluation on four algorithms 31
6.2 Experiments for support tuning 32
7. Conclusion 40
8. References 41

參考文獻

[1] Agrawal, R. and Srikant, R. “Fast algorithms for mining association rules.” VLDB-94, 1994.
[2] Bing Liu, Wynne Hsu, Yiming Ma. Mining Association Rules with Multiple Minimum Supports. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-99, poster), August 15-18, 1999, San Diego, CA, USA.
[3] C. Aggarwal and P. Yu. Online generation of association rules. In Proc. Of 14th ICDE, 1998
[4] D. W. Cheung, V. T. Ng, and B. W. Tam, “Maintenance of discovered association rules in large databases: An incremental update technique”, in Proceeding of the 12th IEEE International Conference on Data Engineering(ICDE-96), New Orleans, Louisana, U.S.A., March 1996, pp.106-114.
[5] Han, J. and Fu, Y. “Discovery of multiple-level association rules from large
databases.” VLDB-95.
[6] J. Han, J. Pei, Y. Yin, “Mining frequent patterns without candidate generation”, Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data, Dallas, TX, 2000.
[7] Kohavi, R., Brodley, C.E., Frasca, B., Mason, L., and Zheng, Z. KDD-cup 2000 Organizers’ Report: Peeling the Onion, SIGKDD Explosion 2(2), 2000, 86-93.
[8] K. K. Loo, Chi Lap Yip and Ben Kao and David Cheung. A lattice-based approach for I/O efficient association rule mining. Information Systems, Volume 27, Issue 1, Pages 41-74, March, 2002.
[9] Lee, W., Stolfo, S. J., and Mok, K. W. “Mining audit data to build intrusion detection models.” KDD-98.
[10] Mannila, H. “Database methods for data mining.” KDD-98 tutorial, 1998.
[11] Ming-Cheng Tseng, Wen-Yang Lin: Mining Generalized Association Rules with Multiple Minimum Supports. DaWaK 2001: 11-20
[12] M.Klemettinen, H.Mannila, P. Ronkainen, H.Toivinen, and A.I. Verkamo. “Finding interesting rules form large sets of discovered association rules”. In CIKM’94, pp.401-408.
[13] R.Feldman, Y. Aumann, A. Amir, and H. Manila. Efficient algorithm for discovering frequent sets in incremental databases. In 2nd SIGKDD workshop DMKD, 1997
[14] S. Thomas, S.Bodagala, K. Alsabti, and S. Ranka. An efficient algorithm for the incremental updation of association rules in large database. In Proc. KDD, 1997.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2003-6-23

推文