摘要(英) |
Mining association rules with multiple minimum supports is an important generalization of the association rule mining problem, which was recently proposed by Liu et al. Instead of setting a single minimum support threshold for all items, they allow users to specify multiple minimum supports to reflect the natures of the items and their varied frequencies in the database. In Liu’s paper, an Apriori-based algorithm, named MSapriori, is developed to mine all frequent item sets. In this paper, we study the same problem but with two additional improvements. First, we propose a FP-tree-like structure, MIS-tree, to store the crucial information about frequent patterns. Accordingly, an efficient MIS-tree-based algorithm, called the CFP-growth algorithm, is developed for mining all frequent item sets. We evaluate the performance of the algorithm using both synthetic datasets and real datasets, and the results show that the CFP-growth algorithm is much more efficient and scalable than the MSapriori algorithm. Second, since each item can have its own minimum support, it is very difficult for users to set the appropriate thresholds for all items at a time. In practice, users need to tune items’ supports and run the mining algorithm repeatedly until a satisfactory end is reached. To speed up this time-consuming tuning process, an efficient algorithm which can maintain the MIS-tree structure without rescanning database is proposed. Experiments on both synthetic and real-life datasets show that our MIS-tree maintenance algorithm achieves dramatic saving in computation when tuning supports. |
參考文獻 |
[1] Agrawal, R. and Srikant, R. “Fast algorithms for mining association rules.” VLDB-94, 1994.
[2] Bing Liu, Wynne Hsu, Yiming Ma. Mining Association Rules with Multiple Minimum Supports. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-99, poster), August 15-18, 1999, San Diego, CA, USA.
[3] C. Aggarwal and P. Yu. Online generation of association rules. In Proc. Of 14th ICDE, 1998
[4] D. W. Cheung, V. T. Ng, and B. W. Tam, “Maintenance of discovered association rules in large databases: An incremental update technique”, in Proceeding of the 12th IEEE International Conference on Data Engineering(ICDE-96), New Orleans, Louisana, U.S.A., March 1996, pp.106-114.
[5] Han, J. and Fu, Y. “Discovery of multiple-level association rules from large
databases.” VLDB-95.
[6] J. Han, J. Pei, Y. Yin, “Mining frequent patterns without candidate generation”, Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data, Dallas, TX, 2000.
[7] Kohavi, R., Brodley, C.E., Frasca, B., Mason, L., and Zheng, Z. KDD-cup 2000 Organizers’ Report: Peeling the Onion, SIGKDD Explosion 2(2), 2000, 86-93.
[8] K. K. Loo, Chi Lap Yip and Ben Kao and David Cheung. A lattice-based approach for I/O efficient association rule mining. Information Systems, Volume 27, Issue 1, Pages 41-74, March, 2002.
[9] Lee, W., Stolfo, S. J., and Mok, K. W. “Mining audit data to build intrusion detection models.” KDD-98.
[10] Mannila, H. “Database methods for data mining.” KDD-98 tutorial, 1998.
[11] Ming-Cheng Tseng, Wen-Yang Lin: Mining Generalized Association Rules with Multiple Minimum Supports. DaWaK 2001: 11-20
[12] M.Klemettinen, H.Mannila, P. Ronkainen, H.Toivinen, and A.I. Verkamo. “Finding interesting rules form large sets of discovered association rules”. In CIKM’94, pp.401-408.
[13] R.Feldman, Y. Aumann, A. Amir, and H. Manila. Efficient algorithm for discovering frequent sets in incremental databases. In 2nd SIGKDD workshop DMKD, 1997
[14] S. Thomas, S.Bodagala, K. Alsabti, and S. Ranka. An efficient algorithm for the incremental updation of association rules in large database. In Proc. KDD, 1997. |