以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數：18 、訪客IP：44.192.253.106

姓名吳郁瑩(Yu-Ying Wu) 查詢紙本館藏 畢業系所資訊管理學系 論文名稱關聯式資料庫之廣義知識探勘

(Generalized Knowledge Discovery from Relational Database)相關論文檔案[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 ( 永不開放) 摘要(中)隨著資料的快速成長與大量累積，資料探勘已被廣泛應用於許多領域，例如：決策支援、詐欺偵測、市場分析、財務預測等等。針對各種不同資料特性與研究議題，已有許多方法與技術被提出，用以從大量資料中歸納出有用的資訊，屬性導向歸納法是其中一項重要技術。然而，現有的屬性導向歸納法存在著二個問題：第一，其只依據二個關鍵門檻值進行歸納，所提供的廣義知識只是資料庫的一個知識片段，若想獲得完整的歸納知識，必須重覆進行多次歸納；第二，現有方法僅關注正向資料，缺乏對負向資料的處理。針對此二項不足，本研究提出二種新的歸納方法，得以一次歸納並產生所有有趣的多階層正向與負向廣義知識。此外，真實世界有著各種不同的知識種類，除了上述正向與負向知識之外，資料庫中亦存在著具有異常誤差的稀少性資料，傳統資料探勘方法僅能偵測異常物件，無法解釋物件中真正發生異常的屬性。因此，本研究提出第三種方法，能從資料庫中挖掘出真正造成物件異常的最小屬性組合，稱之為可疑樣式。經由真實資料集實際測試與評量，證明本研究所提出的方法具可行性並能有效找出有用知識。

摘要(英)Data mining has attracted a great deal of attention in the information industry and in society due to its wide applicability in many areas. Many approaches have been proposed to generalize valuable information patterns and attribute-oriented induction (AOI) is one of the most important methods. However, existing AOI approaches encounter two problems. First, the AOI only provides a snapshot of the generalized knowledge, not a global picture. Second, it only mines knowledge from positive facts in databases. In this study, we proposed two novel methods to generate all interesting multiple-level positive and negative generalized knowledge at one time. Moreover, knowledge types are various in real world. In addition to the positive and negative knowledge, a dataset may include very rare, suspicious values, or the abnormal deviations. Existing researches focused only on the identification of outliers which possess the same dimensional space, what are the explicit anomalous knowledge hidden in the mined outliers is rarely addressed. This study proposed third approach to discover such suspicious knowledge. Both proposed methods have been verified for efficiency and effectiveness by using real datasets.

關鍵字(中)★ 異常偵測

★ 負相關樣式

★ 屬性導向歸納法

★ 多階層知識探勘

★ 資料探勘關鍵字(英)★ anomaly detection

★ attribute-oriented induction

★ knowledge discovery

★ multiple-level mining

★ negative pattern

★ data mining論文目次Abstract I

中文摘要 II

誌謝 III

Contents IV

List of Figures VI

List of Tables VIII

Chapter 1. Introduction 1

1.1. Research Problem in Positive Generalized Knowledge 2

1.2. Research Problem in Negative Generalized Knowledge 4

1.3. Research Problem in Suspicious Knowledge 6

1.4. Organization of the Dissertation 7

Chapter 2. Related Works 8

2.1. Data Mining Researches 8

2.2. Attribute-Oriented Induction Researches 10

2.3. Negative Association Rules Mining Researches 12

2.4. Anomaly Detection Researches 14

Chapter 3. Discovering Positive Generalized Knowledge 18

3.1. Brief Description 18

3.2. Problem Definition 19

3.3. Global Attribute-Oriented Induction 23

3.3.1 Collect and Encode the Task-Relevant Tuples 24

3.3.2 The FGT Algorithm 26

3.3.3 Pruning and Transformation 34

3.4. Experiments 35

3.4.1 Real Dataset 36

3.4.2 Generalized Tuples 36

3.4.3 Performance Evaluations 38

3.5. Summary 41

Chapter 4. Discovering Negative Generalized Knowledge 42

4.1. Brief Description 42

4.2. Problem Definition 43

4.3. Global Negative Attribute-Oriented Induction 47

4.3.1 The NGT Algorithm 48

4.3.2 Pruning and Transformation 54

4.4. Experiments 55

4.4.1 Negative Generalized Tuples 56

4.4.2 Performance Evaluations 57

4.5. Summary 60

Chapter 5. Discovering Suspicious Knowledge 61

5.1. Brief Description 61

5.2. Problem Definition 62

5.3. Approach for Suspicious Pattern Mining 64

5.3.1 Preprocess 65

5.3.2 Clustering-Based Outlier Detection 66

5.3.3 Suspicious Patterns Mining 69

5.4. Experiments 75

5.4.1 Experiment Data Sets 75

5.4.2 Preprocess and Distance Measurement 77

5.4.3 Performance Evaluations 77

5.4.4 Suspicious Patterns Discussion 83

5.5. Summary 84

Chapter 6. Conclusions and Future Works 85

References 87

Appendix 95

參考文獻[1] C. C. Aggarwal and P. S. Yu, "An Effective and Efficient Algorithm for High-Dimensional Outlier Detection," The VLDB Journal, Vol. 14 No. 2, 2005, pp. 211-221.

[2] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, 1994, pp. 487-499.

[3] R. A. Angryk and F. E. Petry, "Knowledge Discovery in Fuzzy Databases Using Attribute-Oriented Induction," In Foundations And Novel Approaches in Data Mining, 2006, pp. 169-196.

[4] M.-L. Antonie and O. R. Zaane, "An Associative Classifier Based on Positive and Negative Rules," In Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2004, pp. 64-69.

[5] M. L. Antonie and O. R. Zaiane, "Mining Positive and Negative Association Rules: An Approach for Confined Rules," In PKDD, 2004, pp. 27–38.

[6] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, ACM Press: New York, NY, 1999.

[7] S. Basu and M. Meckesheimer, "Automatic Outlier Detection for Time Series: An Application to Sensor Data," Knowledge and Information Systems, Vol. 11 No. 2, 2007, pp. 137-154.

[8] S. D. Bay and M. Schwabacher, "Mining Distance-Based Outliers in near Linear Time with Randomization and a Simple Pruning Rule," In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 29-38.

[9] R. J. Beckman and R. D. Cook, "Outlier.......... S," Technometrics, Vol. 25 No. 2, 1983, pp. 119-149.

[10] R. J. Bolton and D. J. Hand, "Unsupervised Profiling Methods for Fraud Detection," Conference on credit scoring and credit control, 2001.

[11] S. Boriah, V. Chandola and V. Kumar, "Similarity Measures for Categorical Data: A Comparative Evaluation," In Proceedings of the 2008 SIAM International Conference on Data Mining, 2008, pp. 243-254.

[12] M. M. Breunig, H.-P. Kriegel, R. T. Ng and J. Sander, "Lof: Identifying Density-Based Local Outliers," ACM SIGMOD Record, Vol. 29 No. 2, 2000, pp. 93-104.

[13] Y. Cai, N. Cercone and J. Han, "Attribute-Oriented Induction in Relational Databases," In Proceedings IJCAI-89 Workshop on Knowledge Discovery in Databases, 1989, pp. 26-36.

[14] Y. Cai, N. Cercone and J. Han, "An Attribute-Oriented Approach for Learning Classification Rules from Relational Databases," In Proceedings of the Sixth International Conference on Data Engineering, 1990, pp. 281-288.

[15] C. L. Carter and H. J. Hamilton, "Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases," IEEE Transactions on Knowledge and Data Engineering, Vol. 10 No. 2, 1998, pp. 193-208.

[16] V. Chandola, A. Banerjee and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Surveys, Vol. 41 No. 3, 2009, pp.

[17] M. S. Chen, J. Han and P. S. Yu, "Data Mining: An Overview from a Database Perspective," IEEE Transactions on Knowledge and Data Engineering, Vol. 8 No. 6, 1996, pp. 866-883.

[18] S. Y. Chen and X. Liu, "The Contribution of Data Mining to Information Science," Journal of Information Science, Vol. 30 No. 6, 2004, pp. 550-558.

[19] Y.-L. Chen and C.-C. Shen, "Mining Generalized Knowledge from Ordered Data through Attribute-Oriented Induction Techniques," European Journal of Operational Research, Vol. 166 No. 1, 2005, pp. 221-245.

[20] Z. Chen, J. Tang and A. W.-C. Fu, "Modeling and Efficient Mining of Intentional Knowledge of Outliers," Seventh International Database Engineering and Applications Symposium, 2003, pp. 44-53.

[21] D. W. Cheung, H. Y. Hwang, A. W. Fu and J. Han, "Efficient Rule-Based Attribute-Oriented Induction for Data Mining," Journal of Intelligent Information Systems, Vol. 15 No. 2, 2000, pp. 175-200.

[22] A. Chiu and A. W. Chee Fu, "Enhancements on Local Outlier Detection," In Proceedings of 7th International Database Engineering and Applications Symposium, 2003, pp. 298-307.

[23] E. F. Codd, "A Relational Model of Data for Large Shared Data Banks," Communications of the ACM, Vol. 13 No. 6, 1970, pp. 377-387.

[24] M. Dash, H. Liu and J. Yao, "Dimensionality Reduction of Unsupervised Data," In Proceedings 1997 IEEE Int. Conf. Tools with AI (ICTAI'97), 1997, pp. 532-539.

[25] W. H. E. Day and H. Edelsbrunner, "Efﬁcient Algorithms for Agglomerative Hierarchical Clustering Methods," J. Classiﬁcation, Vol. 1, 1984, pp. 7-24.

[26] P. Domingos and M. Pazzani, "Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classiﬁer," In Proceedings 1996 Int. Conf. Machine Learning (ML'96), 1996, pp. 105-112.

[27] L. Ertoz, M. Steinbach and V. Kumar, "Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach," In Proceedings of Text Mining Workshop, First SIAM International Conference on Data Mining, 2001.

[28] M. Ester, H.-P. Kriegel, J. Sander and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Database with Noise," In Proceedings of Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226-231.

[29] W. B. Frakes and R. Baeza-Yates, Eds., Information Retrieval: Data Structures and Algorithms, Prentice-Hall: Englewood Cliffs, NJ, 1992.

[30] W. J. Frawley, G. Piatetsky and C. J. Matheus, Knowledge Discovery in Database: An Overview, AAAI/MIT Press: Cambridge, Mass., 1991.

[31] A. Ghoting, S. Parthasarathy and M. E. Otey, "Fast Mining of Distance-Based Outliers in High-Dimensional Datasets," Data Mining and Knowledge Discovery, Vol. 16 No. 3, 2008, pp. 349-364.

[32] S. Guha, R. Rastogi and K. Shim., "Rock: A Robust Clustering Algorithm for Categorical Attributes," In Proceedings 1999 Int. Conf. Data Engineering (ICDE'99), 1999, pp. 512-521.

[33] R. Gwadera, M. J. Atallah and W. Szpankowski, "Reliable Detection of Episodes in Event Sequences," Knowledge and Information Systems, Vol. 7 No. 4, 2005, pp. 415-437.

[34] J. Han, Y. Cai and N. Cercone, "Knowledge Discovery in Databases: An Attribute-Oriented Approach," In Proceedings of the 18th International Conference on Very Large Data Bases, 1992, pp. 547-559.

[35] J. Han and Y. Fu, "Discovery of Multiple-Level Association Rules from Large Databases," In Proceedings of the 21th International Conference on Very Large Data Bases, 1995, pp. 420-431.

[36] J. Han and Y. Fu, "Exploration of the Power of Attribute-Oriented Induction in Data Mining", in U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Ed.), Knowledge Discovery and Data Mining, AAAI/MIT Press: Cambridge, Mass., 1996, pp. 399-421.

[37] J. Han and Y. Fu, "Mining Multiple-Level Association Rules in Large Databases," IEEE Transactions on Knowledge and Data Engineering, Vol. 11 No. 5, 1999, pp. 798-805.

[38] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Second Edition, Morgan Kaufmann: New York, 2006.

[39] J. A. Hartigan, Clustering Algorithms, John Wiley and Sons: New York, NY, 1975.

[40] D. M. Hawkins, Identification of Outliers, Chapman & Hall: London, 1980.

[41] D. M. Hawkins, "[Outlier.......... S]: Discussion," Technometrics, Vol. 25 No. 2, 1983, pp. 155-156.

[42] S. S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999.

[43] Z. He, X. Xu and S. Deng, "Data Mining for Actionable Knowledge: A Survey," Arxiv preprint cs/0501079, 2005.

[44] S. Hettich and S. D. Bay, "The Uci Kdd Archive," Irvine, CA: University of California, Department of Information and Computer Science, Available at: http://kdd.ics.uci.edu, Access date: 2009/10.

[45] U. Hierarchies, "Database Summarization Using Fuzzy Isa Hierarchies," IEEE Transactions on Systems, Man, and Cybernetics-Part B, Vol. 27 No. 1, 1997, pp. 68-78.

[46] A. Hinneburg and D. A. Keim, "An Efﬁcient Approach to Clustering in Large Multimedia Databases with Noise," In Proceedings 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), 1998, pp. 58-65.

[47] C.-C. Hsu, "Extending Attribute-Oriented Induction Algorithm for Major Values and Numeric Values," Expert Systems with Applications, Vol. 27 No. 2, 2004, pp. 187-202.

[48] S. Jiang, X. Song, H. Wang, J. Han and Q. Li, "A Clustering-Based Method for Unsupervised Intrusion Detections," Pattern Recognition Letters, Vol. 27 No. 7, 2006, pp. 802-810.

[49] W. Jin, A. K. H. Tung and J. Han, "Mining Top-N Local Outliers in Large Databases," In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, pp. 293-298.

[50] G. Karypis, E.-H. Han and V. Kumar, "Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling," COMPUTER, Vol. 32, 1999, pp. 68-75.

[51] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons: New York, 1990.

[52] J. Kim, G. Lee, J. Seo, E. Park, C. Park and D. Kim, "An Alert Reasoning Method for Intrusion Detection System Using Attribute Oriented Induction," Information Networking: Convergence in Broadband and Mobile Networking, 2005.

[53] E. M. Knorr and R. T. Ng, "Extraction of Spatial Proximity Patterns by Concept Generalization," Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 347–350.

[54] E. M. Knorr and R. T. Ng, "Algorithms for Mining Distance-Based Outliers in Large Datasets," In Proceedings of the 24rd International Conference on Very Large Data Bases, 1998, pp. 392-403.

[55] E. M. Knorr and R. T. Ng, "Finding Intensional Knowledge of Distance-Based Outliers," In Proceedings of the 25th International Conference on Very Large Data Bases, 1999, pp. 211-222.

[56] K. M. Lee, "Mining Generalized Fuzzy Quantitative Association Rules with Fuzzygeneralization Hierarchies," Joint 9th IFSA World Congress and 20th NAFIPS International Conference, 2001, pp. 2977-2982.

[57] S. T. Li, L. Y. Shue and S. F. Lee, "Business Intelligence Approach to Supporting Strategy-Making of Isp Service Management," Expert Systems with Applications, Vol. 35 No. 3, 2008, pp. 739-754.

[58] C. Lin and C. Hong, "Using Customer Knowledge in Designing Electronic Catalog," Expert Systems with Applications, Vol. 34 No. 1, 2008, pp. 119-127.

[59] J. Lin, E. Keogh, A. Fu and H. V. Herle, "Approximations to Magic: Finding Unusual Medical Time Series," In Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, 2005, pp. 329-334.

[60] B. Liu, W. Hsu and Y. Ma, "Integrating Classiﬁcation and Association Rule Mining," In Proceedings 1998 Int. Conf. Knowledge Discovery and Data Mining, 1998, pp. 80-86.

[61] B. Liu, W. Hsu and Y. Ma, "Mining Association Rules with Multiple Minimum Supports," In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 1999, pp. 337-341.

[62] Q.-H. Liu, C.-J. Tang, C. Li, Q.-W. Liu, T. Zeng and Y.-G. Jiang, "Traditional Chinese Medicine Prescription Mining Based on Attribute-Oriented Relevancy Induction," Journal of Computer Applications, Vol. 27 No. 2, 2007, pp. 449-452.

[63] S. P. Lloyd, "Least Squares Quantization in Pcm," IEEE Transactions Information Theory, Vol. 28, 1982, pp. 128-137.

[64] A. Mccallum, K. Nigam and L. H. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," In Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 169-178.

[65] S. K. Murthy, "Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey," Data Mining and Knowledge Discovery, Vol. 2, 1998, pp. 345-389.

[66] R. Ng and J. Han, "Efﬁcient and Effective Clustering Method for Spatial Data Mining," In Proceedings 1994 Int. Conf. Very Large Data Bases, 1994, pp. 144-155.

[67] A. Patcha and J.-M. Park, "An Overview of Anomaly Detection Techniques: Existing Solutions and Latest Technological Trends," Computer Networks, Vol. 51 No. 12, 2007, pp. 3448-3470.

[68] S. Ramaswamy, R. Rastogi and K. Shim, "Efficient Algorithms for Mining Outliers from Large Data Sets," ACM SIGMOD Record, Vol. 29 No. 2, 2000, pp. 427-438.

[69] G. Raschia and N. Mouaddib, "Saintetiq: A Fuzzy Set-Based Approach to Database Summarization," Fuzzy Sets and Systems, Vol. 129 No. 2, 2002, pp. 137-162.

[70] A. Savasere, E. Omiecinski and S. Navathe, "Mining for Strong Negative Associations in a Large Database of Customer Transactions," In Proceedings of the Fourteenth International Conference on Data Engineering, 1998, pp. 494-502.

[71] P.-N. Tan, V. Kumar and J. Srivastava, "Indirect Association: Mining Higher Order Dependencies in Data," In Proceedings of the 4th European Conference of Principles and Practice of Knowledge Discovery in Databases, 2000, pp. 632-637.

[72] P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Addison Wesley: Boston, 2006.

[73] P. N. Tan, V. Kumar and J. Srivastava, "Selecting the Right Objective Measure for Association Analysis," Information Systems, Vol. 29, 2004, pp. 293-313.

[74] Y. Tao, X. Xiao and S. Zhou, "Mining Distance-Based Outliers from Large Databases in Any Metric Space," In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 394-403.

[75] W. G. Teng, M. J. Hsieh and M. S. Chen, "On the Mining of Substitution Rules for Statistically Dependent Items," In Proceedings of the 2002 IEEE International Conference on Data Mining, 2002, pp. 442–449.

[76] S. Tsumoto, "Knowledge Discovery in Clinical Databases and Evaluation of Discovered Knowledge in Outpatient Clinic," Information Sciences, Vol. 124 No. 1-4, 2000, pp. 125-137.

[77] E. A. Wan, "Neural Network Classification: A Bayesian Interpretation," IEEE Transactions on Neural Networks, Vol. 1 No. 4, 1990, pp. 303-305.

[78] L. Z. Wang, L. H. Zhou and T. Chen, "A New Method of Attribute-Oriented Spatial Generalization," In Proceedings of 2004 International Conference on Machine Learning and Cybernetics, 2004, pp. 1393-1398.

[79] W. Wang, J. Yang and R. Muntz, "Sting: A Statistical Information Grid Approach to Spatial Data Mining," In Proceedings 1997 Int. Conf. Very Large Data Bases, 1997, pp. 186-195.

[80] M. Wu and C. Jermaine, "Outlier Detection by Sampling with Accuracy Guarantees," In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 767-772.

[81] X. Wu, C. Zhang and S. Zhang, "Efficient Mining of Both Positive and Negative Association Rules," ACM Transactions on Information Systems (TOIS), Vol. 22 No. 3, 2004, pp. 381-405.

[82] T. C. Yang and H. Lai, "Comparison of Product Bundling Strategies on Different Online Shopping Behaviors," Electronic Commerce Research and Applications, Vol. 5 No. 4, 2006, pp. 295-304.

[83] X. Yuan, B. P. Buckles, Z. Yuan and J. Zhang, "Mining Negative Association Rules," Seventh International Symposium on Computers and Communication, 2002, pp. 623-628.

[84] M. J. Zaki, "Scalable Algorithms for Association Mining," IEEE Transactions on Knowledge and Data Engineering, Vol. 12 No. 3, 2000, pp. 372-390.

[85] T. Zhang, R. Ramakrishnan and M. Livny, "Birch: An Efﬁcient Data Clustering Method for Very Large Databases," In Proceedings 1996 ACM-SIGMOD Int. Conf. Management of Data, 1996, pp. 103-114.

指導教授陳彥良、張瑞益

(Yen-Liang Chen、Ray-I Chang)審核日期2009-11-11 推文facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu 網路書籤Google bookmarks del.icio.us hemidemi myshare