交易型資料庫之跨交易關聯規則探勘之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：143

、訪客IP：52.14.209.115

姓名

黃國瑜(Kuo-Yu Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

交易型資料庫之跨交易關聯規則探勘之研究
(Mining Inter-Transaction Association Rules in Transactional Databases)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在本論文中，我們著重於如何設計一個有效率的演算法在跨交易關聯規則上，例如：週期型樣（Periodic Patterns）、頻繁事件序（Frequent Episodes）、頻繁連續事件（Frequent Continuities）及序列型樣（Sequential Pattern）。首先，我們提出了一個三個步驟的FITS模組用於跨交易關聯規則的探勘上。並且，我們結合了垂直和水平資料格式的優點來改善探勘的效能，我們稱之為「雙格式表示法」。此外，根據我們觀察，我們發現「若一個交易內型樣不為緊密的型樣，亦不可能為一個跨交易的緊密的型樣」。因此，我們運用這個策略於FITS模組，在第一步驟中先探勘緊密的交易內頻繁型樣，然後再進行緊密跨交易的頻繁型樣之探勘工作。我們稱此概念為「雙壓縮策略」。從實驗中，我們發現這策略結合FITS模組在跨交易的緊密型樣探勘上更可減少型樣列舉的個數。此外，FITS模組只要經由些徵的修改即可用於其他的跨交易型樣探勘上。在一系列的實驗中，我們證明了我們所提出的模組無論在通用性上及效能上皆優於先前的研究。雖然在記憶體用量上，我們可能會比先前的方法來的多。但是，只要透過適當的資料切割方式，不僅可減少記憶體的用量，同時在效能上仍然優於先前的研究。
雖然FITS模組可運用於週期型樣的探勘上。但由於週期型樣有特定的週期限制，因此FITS模組運用於週期型樣效能上並不理想。基於這個理由，我們為週期型樣設計一個嶄新的SMCA模組。這模組包含了四個子模組，分別為SPMiner（單一週期型樣探勘）、MPMiner（多事件週期型樣探勘）、CPMiner（複雜週期型樣探勘）及APMiner（非同步週期型樣探勘）。SPMiner主要的概念是利用雜湊表快速的計算出有效週期片斷的資訊。而其餘的三個子模組則是利用一個「以週期片斷為基礎」的組合方式來進行型樣的列舉過程。在相關的時間及空間複雜度分析中，皆顯示我們的SMCA模組在週期探勘上優於先前的方法。

摘要(英)

In this dissertation, we focus on how to devise an efficient and effective algorithm for discovering inter-transaction associations such as, periodic patterns, frequent continuities, frequent episodes and sequential pattern. Firstly, we propose a 3-phase FITS model in inter-transaction association mining. We adopt both horizontal and vertical formats to increase the mining efficiency. Furthermore, we focus on the application of FITS to closed pattern mining to reduce the number of patterns to be enumerated. The insight is “If an intra-transaction pattern is not a closed pattern, it will not be a closed frequent inter-transaction pattern”. The bi-format and bi-phase reduction are applied to overcome the problem of the duplicate item extensions especially for closed pattern mining. We have applied the FITS model to all inter-transaction mining tasks with a little modification.
Although the FITS model can be used for periodic pattern mining, it is not efficient enough since the constraints on periodicy are not fully utilized. Therefore, we propose a more general model, SMCA, to mine asynchronous periodic patterns from a complex sequence and correct some problem of the previous works. A 4-phase algorithm, including SPMiner, MPMiner, CPMiner and APMiner, is devised to discover periodic patterns from a transactional database presented in vertical format. The essential idea of SPMiner is to trace the possible segments for period p by a hash table. Besides, to avoid additional scans over the transactional database, we propose a segment-based combination to reduce redundant generation and testing. The experiments have demonstrated good performance of the proposed model on several inter-transaction patterns. Although the efficiency improvement is based on the requirement of additional memory cost, the memory cost can be further reduced by disk-based or partition-based approaches, which in turn also prove to be better than state-of-the-art algorithms. In summary, the proposed model can be orders of magnitude faster than previous works with a modest memory cost.

關鍵字(中)

★ 關聯規則
★ 交易型資料庫
★ 時序型樣
★ 資料探勘

關鍵字(英)

★ Pattern Mining
★ Temporal Pattern
★ Transactional Databases
★ Association Rules
★ Data Mining

論文目次

Table of Contents i
List of Tables iii
List of Figures iv
Abstract vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 OurModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Organization of this Dissertation . . . . . . . . . . . . . . . . . . . . 4
2 Problem Definition 6
2.1 Frequent Continuities . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Frequent Episodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Periodic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Close Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Related Work 15
3.1 Mining Intra-transaction Associations . . . . . . . . . . . . . . . . . . 15
3.1.1 Breadth First Enumeration . . . . . . . . . . . . . . . . . . . 16
3.1.2 Depth First Enumeration. . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Mining Inter-transaction Associations . . . . . . . . . . . . . . . . . . 20
3.2.1 Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Periodic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Frequent Episodes . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Frequent Continuities . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.5 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 FITSModel 27
4.1 Themain idea of the FITSModel . . . . . . . . . . . . . . . . . . . . 28
4.2 Extended Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Frequent Continuities . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Frequent Episodes . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.3 Closed Sequential Patterns . . . . . . . . . . . . . . . . . . . . 42
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 Frequent Continuities . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Frequent Episodes . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.3 Closed Sequential Patterns . . . . . . . . . . . . . . . . . . . . 61
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 SMCAModel 68
5.1 SMCAModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.1 SPMiner: Segment Mining for Single Event Pattern . . . . . . 72
5.1.2 MPMiner: Segment Mining for Multiple Event Pattern . . . . 76
5.1.3 CPMiner: Segment Mining for Complex Pattern . . . . . . . . 78
5.1.4 APMiner: Sequence Mining for Asynchronous Pattern . . . . . 81
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2.1 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2.2 Extra long/large sequence . . . . . . . . . . . . . . . . . . . . 84
5.3 Performance Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6 Contribution and Future Work 90
6.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Bibliography 93

參考文獻

[1] R. Agarwal, C. Aggarwal, and V. Parsad, A tree projection algorithm for generation of frequent itemsets, Journal of Parallel and Distributed Computing 61 (2001), no. 3, 350–371.
[2] R. Agarwal, C. Aggarwal, and V. Prasad, Depth first generation of long patterns, Proceedings of the sixth ACM SIGKDD international conference on Knowledge Discovery and Data Mining (SIGKDD’00), 2000.
[3] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), 1994, pp. 487–499.
[4] , Mining sequential patterns, Proceedings of the 11th International Conference on Data Engineering (ICDE’95), 1995, pp. 3–14.
[5] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, Sequential pattern mining using a bitmap representation, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’02), 2002, pp. 429–435.
[6] R. Bayardo and R. Agrawal, Mining the most interesting rules, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and
data mining, 1999, pp. 145–154.
[7] C. Berberidis, I. Vlahavas, W. Aref, M. Atallah, and A. Elmagarmid, The discovery of weak periodicities in large time series, Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’02), 2002.
[8] D. Burdick, M. Calimlim, and J. Gehrke, Mafia: A maximal frequent itemset
algorithm for transactional databases, Proceedings of the 17th International
Conference on Data Engineering (ICDE’01), 2001.
[9] D. Chakrabarti, Y. Zhan, and C. Faloutsos, R-mat: A recursive model for graph mining, Proceedings of the Fourth SIAM International Conference on Data Mining (SDM’04), 2004.
[10] F. Coenen, G. Goulbourne, and P. Leng, Tree structures for mining association rules., Journal of Data Mining and Knowledge Discovery 8 (2004), no. 1, 25–51.
[11] K. Gouda and M.J. Zaki, Efficiently mining maximal frequent itemsets, Proceedings of the International Conference on Data Mining (ICDM’01), 2001.
[12] D.J. Hand, H. Mannila, and P. Smyth, Principles of data mining, MIT Press,
2001.
[13] J. Han, G. Dong, and Y. Yin, Efficient mining paritial periodic patterns in time series database, Proceedings of the 15th International Conference on Data Engineering (ICDE’99), 1999, pp. 106–115.
[14] J. Han and Y. Fu, Discovery of multiple-level association rules from large
databases, Proceedings of the 1995 International Conference on Very Large Data
Bases (VLDB’95), 1995, pp. 420–431.
[15] J. Han, W. Gong, and Y. Yin, Mining segment-wise periodic patterns in timerelated databases, Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’98), 1998, pp. 214–218.
[16] J. Han, J. Pei, Y. Yin, and R. Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery: An International Journal (DMKD) 8 (2004), no. 1, 53–87.
[17] J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, Proceeding of the ACM SIGMOD International Conference on Management of Data (SIGMOD’00), 2000.
[18] J. Han and J. Pei, Mining frequent patterns by pattern-growth: Methodology
and implications, ACM SIGKDD Explorations (Special Issue on Scalable Data
Mining Algorithms) 2 (2000), no. 2, 14–20.
[19] J. Han, J. Wang, Y. Lu, and P. Tzvetkov, Mining top-k frequent closed patterns without minimum support, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), 2002.
[20] J. Hipp, U. Guntzer, and G. Nakhaeizadeh, Algorithms for association rule mining - a general survey and comparison, SIGKDD Explorations 2 (2000), no. 2,
58–64.
[21] T. Horvth, T. Grtner, and S. Wrobel, Cyclic pattern kernels for predictive graph mining, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’04), 2004, pp. 158–167.
[22] K.Y. Huang, C.H. Chang, and K.Z. Lin, Prowl: An efficient frequent continuity mining algorithm on event sequences, Proceedings of 6th International Conference on DataWarehousing and Knowledge Discovery (DaWak’04), 2004, pp. 351–360.
[23] J. Huan, W. Wang, J. Prins, and J. Yang, Spin: mining maximal frequent subgraphs from graph databases, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’04), 2004,pp. 581–586.
[24] J. Huan, W. Wang, and J. Prins, Efficient mining of frequent subgraphs in the presence of isomorphism, Proceedings of the Third IEEE International Conference on Data Mining (ICDM’03), 2003, p. 549.
[25] A. Inokuchi and H. Kashima, Mining significant pairs of patterns from graph structures with class labels, Proceedings of the Third IEEE International Conference on Data Mining (ICDM’03), 2003, pp. 83–90.
[26] I. Jonassen, J.F. Collins, and D.G. Higgins, Finding flexible patterns in unaligned protein sequences, Protein Science 4 (1995), no. 8, 1587–1595.
[27] R.J. Bayardo Jr., Efficiently mining long patterns from databases., Proceedings of the international conference on Management of data (SIGMOD’98), 1998.
[28] H. Kashima and Y. Tsuboi, Kernel-based discriminative learning algorithms for labeling sequences, trees, and graphs, Proceedings of the Twenty-first international conference on Machine learning (ICML’04), 2004.
[29] M.V. Katti, R.S. Subbu, P.K. Ranjekar, and V.S. Gupta, Amino acid repeat patterns in protein sequences: Their diversity and structural-function implications, Protein Science 9 (2000), 1203–1209.
[30] R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng, Kdd-cup 2000 organizers’s report: Peeling the onion, Proceedings of the SIGKDD Explorations 2 (2000), 86–98.
[31] B. Lan, B.C. Ooi, and K.L. Tan, Bbs: An efficient indexing structure for mining frequent patterns, Proceedings of the 18th International Conference on Data Engineering (ICDE’02), 2002, pp. 453–462.
[32] D.I Lin and Z.M. Kedem, Pincer-search: An efficient algorithm for discovering the maximum frequent set, IEEE Transactions on Knowledge and Data Engineering (TKDE) 14 (2002), no. 3, 553–566.
[33] H. Mannila, H. Toivonen, and A. I. Verkamo, Discovering frequent episodes in sequences, Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD’95), 1995, pp. 210–215.
[34] , Discovering frequent episodes in event sequences, Data Mining and
Knowledge Discovery (DMKD) 1 (1997), no. 3, 259–289.
[35] H. Mannila and H. Toivonen, Discovering generalized episodes using minimal
occurrences, Proceedings of the Second International Conference on Knowledge
Discovery and Data Mining (KDD’96), 1996, pp. 146–151.
[36] S. Ma and J. Hellerstein, Mining partially periodic event patterns with unknown periods, Proceedings of the International Conference on Data Engineering (ICDE’01), 2001, pp. 205–214.
[37] A. Nanopoulos and Y. Manolopoulos, Mining patterns from graph traversals,
Journal of Data and Knowledge Engineering 37 (2001), no. 3, 243–266.
[38] B. Ozden, S. Ramaswamy, and A. Silberschatz, Cyclic association rules, Proceedings of the 14th International Conference on Data Engineering (ICDE’98), 1998, pp. 412–421.
[39] J.S. Park, M.S. Chen, and P.S. Yu, An effective hash based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD’95), 1995, pp. 175–186.
[40] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Discovering frequent closed itemsets for association rules, Proceedings of 7th International Conference on Database Theory (ICDT’99), 1999.
[41] J. Pei, G. Dong, W. Zou, and J. Han, On computing condensed frequent pattern bases, Proceedings of International Conference on Data Mining (ICDM’02), 2002.
[42] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, H-mine: Hyper-structure mining of frequent patterns in large databases, Proceedings of the IEEE International Conference on Data Mining (ICDM’01), 2001.
[43] J. Pei, J. Han, and R. Mao, Closet: An efficient algorithm for mining frequent closed itemsets, Proceedings of the ACM SIGMOD Int. Workshop Data Mining and Knowledge Discovery (SIGMOD’00), 2000.
[44] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, Mining sequential patterns by pattern-growth: The prefixspan approach, IEEE Transaction on Knowledge Data Engineering 16 (2004), no. 11, 1424–1440.
[45] A. Savasere, E. Omiecinski, and S.B. Navathe, An efficient algorithm for mining association rules in large databases, Proceedings of the 21th International Conference on Very Large Data Bases (VLDB’95), 1995, pp. 432–444.
[46] D. Shasha, J.T.L. Wang, and S. Zhang, Unordered tree mining with applications to phylogeny., Proceedings of the 20th International Conference on Data Engineering (ICDE’04), 2004, pp. 708–719.
[47] R. Srikant and R. Agrawal, Mining sequential patterns: Generalizations and
performance improvements, Proceedings of the 5th International Conference on
Extending Database Technology (EDBT’96), 1996, pp. 3–17.
[48] H. Toivonen, Sampling large databases for association rules, In Proc. 1996 Int. Conf. Very Large Data Bases, 1996, pp. 134–145.
[49] A.K.H. Tung, H. Lu, J. Han, and L. Feng, Efficient mining of intertransaction association rules, IEEE Transactions on Knowledge and Data Engineering (TKDE) 15 (2003), no. 1, 43–56.
[50] N. Vanetik, E. Gudes, and S.E. Shimony, Computing frequent graph patterns from semistructured data, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), 2002, p. 458.
[51] C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi, Efficient patterngrowth methods for frequent tree pattern mining., Proceedings of the 8th
Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
(PAKDD’04), 2004, pp. 441–451.
[52] C.Wang,W.Wang, J. Pei, Y. Zhu, and B. Shi, Scalable mining of large disk-based graph databases, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’04), 2004, pp. 316–325.
[53] J.Wang, J. Han, and J. Pei, Closet+: Searching for the best strategies for mining, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), 2003.
[54] J. Wang and J. Han, Bide: Efficient mining of frequent closed sequences., Proceedings of the 20th International Conference on Data Engineering (ICDE’04), 2004, pp. 79–90.
[55] K. Wang and L. Lakshmanan Y. Jiang, Mining unexpected rules by pushing user dynamics, Proceeding of the ninth ACM SIGKDD international conference on
Knowledge discovery and data mining (SIGKDD’03), 2003.
[56] W. Wang, J. Yang, and P.S. Yu, Mining patterns in long sequential data with noise, Proceedings of the ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD’00), 2000, pp. 28–33.
[57] T. Washio and H. Motoda, State of the art of graph-based data mining, ACM
SIGKDD Explorations Newsletter 5 (2003), no. 1, 59–68.
[58] J. Yang, W. Wang, and P.S. Yu, Mining asynchronous periodic patterns in time series data, IEEE Transaction on Knowledge and Data Engineering (TKDE) 15
(2003), no. 3, 613–628.
[59] X. Yan and J. Han, Closegraph: mining closed frequent graph patterns, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’03), 2003, pp. 286–295.
[60] M.J. Zaki and K. Gouda, Fast vertical mining using diffsets, In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD’03), 2003.
[61] M.J. Zaki and C.J. Hsiao, Charm: An efficient algorithm for closed itemset
mining, Proceedings of the 2nd SIAM International Conference on Data Mining
(SDM’02), 2002.
[62] M.J. Zaki, Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering (TKDE) 12 (2000), no. 3, 372–390.
[63] , Spade: An efficient algorithm for mining frequent sequences, Machine
Learning 42 (2001), no. 1/2, 31–60.
[64] , Efficiently mining frequent trees in a forest, Proceedings of the eighth
ACMSIGKDD international conference on Knowledge discovery and data mining
(KDD’02), 2002, pp. 71–80.
[65] B. Zhou, S.C. Hui, and A.C.M. Fong, Cs-mine: An efficient wap-tree mining
for web access patterns., Proceedings of the 6th Asia-Pacific Web Conference on
Advanced Web Technologies and Applications (APWeb’04), 2004, pp. 523–532.

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2006-1-12

推文