典型資料模式挖掘研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：21

、訪客IP：18.224.63.123

姓名

胡蕙玲(Hui-Ling Hu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

典型資料模式挖掘研究
(The Research of Typical Pattern Mining)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來由於資訊科技的發達，已有許多技術及方法被成功的發展出來，用來挖掘有用及有趣的資訊模式，包括觀念描述、關聯規則、分類與預測、叢集和演化分析等，本論文提出一種新的資訊模式，稱為典型資料模式，提供決策者對給定的資料集有更好的了解。假定給定一個包含n個物件的資料集，每個物件可以以一組屬性值來描述，典型資料模式挖掘將由資料集中，選擇出一個緊實而適合的k物件子集合，用來代表整個資料集，根據這樣的定義，本研究提出典型資料模式挖掘方法，並且以幾個真實資料集來實作，找出有用的典型資料模式。另外，由於自動化的典型資料挖掘方法無法藉助使用者的專業知識與經驗，本研究也提出動態的使用者互動式典型資料模式挖掘方法，讓使用者可以根據經驗和專業的知識操控參數，以獲得更好的典型資料模式挖掘結果，根據所提出的互動模式，本論文開發使用者互動典型資料模式挖掘系統，以挖掘資訊系統相關典型期刊，提供一個比靜態的典型資料模式挖掘更有效的方法。

摘要(英)

Many approaches have been proposed to discover useful information patterns from databases, such as concept description, associations, sequential patterns, classification, clustering, and deviation detection. This research proposes a new type of information pattern, called typical patterns, which can provide decision makers with a better understanding of a given dataset. Suppose we are given a dataset containing n objects, each of which is described by a set of attribute values. Mining typical patterns is to select a small subset of objects, say k objects, from these n objects so that these k chosen objects are a compact and suitable representation of the original dataset. Accordingly, the Typical Patterns Mining (TPM) algorithms have been developed to mine typical patterns from databases. Also, extensive experiments have been carried out using real datasets to demonstrate the usefulness of typical patterns in practical situations. Then, although TPM is a good method to automatically determine typical patterns, it lacks ability to accommodate user’s experience and domain knowledge, which are very crucial for making decision in a dynamic business environment. Therefore, this research also develops a dynamic and interactive approach for typical pattern mining, called interactive Typical Pattern Mining (iTPM). In this approach, we accommodate users’ experiences and knowledge by allowing users to iteratively adjust the parameters during the interactive process. Then, an iTPM system is developed to mine typical journals of IS field. The results of experiments indicate that iTPM is more effective than the previous static approach.

關鍵字(中)

★ 資料挖掘
★ 典型資料模式挖掘
★ 叢集

關鍵字(英)

★ Data mining
★ Typical patterns mining
★ Clustering

論文目次

Contents
List of Figures iii
List of Tables iv
Chapter 1 Introduction 1
Chapter 2 Related Works 6
2.1 Data Mining 7
2.2.1 Partitioning Clustering Methods 23
2.3 Memory-Based Reasoning (MBR) 28
Chapter 3 Typical Pattern Mining Problem 30
3.1 Data types 32
3.1.1 Numeric data types 32
3.1.2 Nominal data type 33
3.2 Notation definition 35
Chapter 4 Automatic Approach 38
4.1 Typical Pattern Mining I (TPM I) Algorithm 39
4.2 Typical Pattern Mining II (TPM II) Algorithm 46
Chapter 5 User Interactive Approach 48
5.1 Parameters 52
5.2 User interactive Typical Pattern Mining (iTPM) 55
Chapter 6 Implementation 58
6.1 The synthetic data sets 59
6.2 The real data sets 67
6.2.1 TPM I 67
6.2.2 TPM II 70
6.2.3 iTPM 80
Chapter 7 Conclusion 92
References 95
Appendix A. Journals in computer information system catalog of JCR 105
Appendix B. Journals in MIS journals average ranking of AIS web site 108

參考文獻

References
[1] D. A. Adjeroh, K. C. Nwosu, Multimedia database management-requirements and issues, IEEE multimedia 4(3) (1997) 24-33.
[2] R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pages 94-105, Seattle, WA, June, 1998.
[3] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’93), pages 207-216, Washington, DC, May 1993.
[4] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. I. Verkamo, Fast discovery of association rules, In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996.
[5] R. Agrawal, Srikant, Fast algorithms for mining association rules in large databases, In Research Report RJ 9839, IBM Almaden Research Center, San Jose, CA, June 1994.
[6] M. Ankerst, M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: Ordering points to identify the clustering structure, In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pages 49-60, Philadelphia, PA, June, 1999.
[7] Association for Information Systems, “MIS Journal Ranking”. Retrieved April 4, 2006, from the World Wide Web: http://www.isworld.org/csaunders/rankings.htm.
[8] D. BarBara, W. DuMouchel, C. Faloutsos, P. J. Haan, J. H. Helerstein, Y. Ioanniddis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, K. C. Servcik, The New Jersey data reduction report, Bulletin of the Technical Committee on Data Engineering, 20 (1997), 3-45.
[9] S. Basumallick, J. S. K. Wong, Design and implementation of a distributed database system, Journal of System Software 34(4) (1996) 21-29.
[10] A. Berson, S. J. Smith, Data Warehousing, Data Mining, and OLAP, McGraw-Hill, 1997.
[11] P. A. Bradley, 1994, BradleyCase-based reasoning: Business applications, Communication of the ACM, 37(3) (1994) 40-42.
[12] P. Bradley, U. Fayyad, C. Reina, Scaling clustering algorithms to large databases, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pages 9-15, New York, August, 1998.
[13] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees, Wadsworth International Group, 1984.
[14] Y. Cai, N. Cercone, J. Han Attribute-Oriented induction in relational database. In G. Piatetsky-Shapiro, W. J. Frawley, editors, Knowledge Discovery in Databases, Cambridge, 1991.
[15] C. Carter, H. Hamilton, Efficient attribute-oriented generalization for knowledge discovery from large databases, IEEE Trans. Knowledge and Data Engineering, 1998.
[16] S. Chaudhuri, U. Dayal, An overview of data warehousing and OLAP technology, ACM SIGMOD Record, 26 (1997) 65-74.
[17] Y. C. Chen, H. L. Hu, A novel approach for mining typical patterns from databases. Manuscript submitted for publication (2006).
[18] W. Cleveland, Visualizing Data. Summit, Hobart Press, 1993.
[19] S. P. Curran, J. Mingers, Neural networks, decision tree induction and discriminate analysis: An empirical comparison, J. Operational Research Society, 45, 1994.
[20] M. Dash, H. Liu, Feature selection methods for classification, Intelligent Data Analysis: An International Journal, 1, 1997.
[21] R.N. Dave, Validating Fuzzy Partitions obtained through c-shells clustering, Pattern Recognition Letters 17(6) (1996) 613-623.
[22] J. L. Devore. Probability and Statistic for Engineering and the Sciences, 4th ed. Duxbury Press, 1995.
[23] R. Duda, P. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, 1973.
[24] R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, Fourth Edition, Addison-Wesley, 2003.
[25] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases, In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD’96), pages 226-23, Portland, OR, August, 1996.
[26] M. Ester, H. -P. Kriegel, X. Xu, Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification, In Proc. 4th Int. Symp. Large Spatial Databases (SSD’95), pages 67-82, Portland, ME, August, 1995.
[27] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J. D. Ullman, Computing iceberg queries efficiently, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pages 299-310, New York, Aug. 1998.
[28] D. Fisher, Improving inference through conceptual clustering, In Proc. 1987 AAAI Conf., pages 461-465, Seattle, WA, July, 1987.
[29] J. H. Friedman, A recursive partitioning decision rule for nonparametric classifiers, IEEE trans. on Comp., (26) (1977) 404-408.
[30] Y. H. Fu, Scientific Collaboration and Coauthors in Life Science Journal Articles, Journal of Library and Information Studies (17) (2002) 71-80.
[31] P. Ganesan, H. Garcia-Molina, J. Widom, Exploiting Hierarchical Domain Structure to Compute Similarity, ACM Transactions on Information Systems, 21 (1) (2003) 64–93.
[32] D. Goldberg, Genetic Algorithms in Search , Optimization, and Machine Learning. Reading, Addison-Wesley, 1989.
[33] J. Grabmeier, A. Rudolph, Techniques of Cluster Algorithms in Data Mining, Data Mining and Knowledge Discovery journal 6(4) ( 2002) 303-360.
[34] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh, Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, Data Mining and Knowledge Discovery, 1(1997) 29-54.
[35] S. Guha, R. Rastogi, K. Shim, Cure: An efficient clustering algorithm for large databases, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pages 73-84, Seattle, WA, June, 1998.
[36] S. Guha, R. Rastogi, K. Shim, Rock: A robust clustering algorithm for categorical attributes, In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pages 512-521, Sydney, Australia, March, 1999.
[37] C. S. Guynes, L. Pelley, Monitoring database performance in an end user environment, Journal of System Management 44(8) (1993) 27-30.
[38] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Academic Press, San Francisco, 2001.
[39] J. Han, Y. Cai, N. Cersone, Data-driven discovery of quantitative rules in relational databases. IEEE Trans. Knowledge and Data Engineering, 5 (1993) 29-40.
[40] J. Han, Y. Fu, Discovery of multiple-level association rules form large databases, In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB’95), pages 420-431, Zurich, Switzerland, Sept. 1995.
[41] J. Han, Y. Fu, Exploration of the power of attribute-oriented induction in data mining, In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, Cambridge, 1996.
[42] B. C. Hardgrave, K. A. Walstrom, Forums for MIS Scholars, Communications of the ACM 40 (11) (1997) 119-124.
[43] A. Hinneburg, D. A. Keim, An efficient approach to clustering in large multimedia databases with noise, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pages 58-65, New York, August, 1998.
[44] C. W. Holsapple, L. E. Johnson, H. Manakyan, J. Tanner, A Citation Analysis of Business Computing Research Journals, Information Management 25 (5) (1993) 231-244.
[45] N.C. Hsieh, Hybrid Mining Approach in the Design of Credit Scoring Models, Expert Systems with Applications 28(4) (2005) 655-665.
[46] Z. Huang, Extensions to the k-means algorithm for clustering large datasets with categorical values, Data Mining and Knowledge Discovery (2) (1998) 283-304.
[47] P.W. Huang, P.L. Lin, H.Y. Lin, Optimizing storage utilization in R-tree dynamic index structure for spatial databases, The Journal of Systems and Software 55(3) (2001) 291-299.
[48] W. H. Inmon, Building the Data Warehouse, John Wiley & Sons, 1996.
[49] A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: A survey, ACM comput. Surv., (31) (1999) 264-323.
[50] M. James, Classification Algorithms, John Wiley & Sons, 1985.
[51] D. K. Jeffrey, H. G. Kristin, D. Cynthia, A Method for Building Core Journal Lists in Interdisciplinary Subject Areas, Journal of Document 54 (4) (1998) 477-488.
[52] G. Karypis, E.-H. Han, V. Kumar, CHAMELEON: A hierarchical clustering algorithm using dynamic modeling, COMPUTER, (32) (1999) 68-75.
[53] L. Kaufman, P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York, John Wiley & Sons, 1990.
[54] R. L. Kennedy, Y. Lee, B. Van Roy, C. D. Reed, R. P. Lippman, Solving Data Mining Problems Through Pattern Recognition, Prentice Hall, 1998.
[55] R. Kimball, The Data Warehouse Toolkit, John Wiley & Sons, 1996.
[56] S. L. Lauritzen, The EM algorithm for graphical association models with missing data, Computational Statistics and Data Analysis, (19) (1995) 191-120.
[57] S.I. Lee, S. Batzoglou, Application of Independent Component Analysis to Microarrays, Genome Biology 4 (11) No. R76 (2003).
[58] B. Lent, A. Swami, J. Widom, Clustering association rules, In Proc. 1997 Int. Conf. Data Engineering (ICDE’97), pages 220-231, Birmingham, England, Apr. 1997.
[59] H. Liu, H. Motoda, editors, Feature Extraction, Construction, and Selection: A Data Mining Perspective, Kluwer Academic Publishers, 1998.
[60] H. Liu and H. Motoda. Feature Selection for knowledge Discovery and Data Mining. Kluwer Academic Publishers, 1998.
[61] P. B. Lowry, D. Romans, A. Curtis, Global journal prestige and supporting disciplines: A scientometric study of information systems journals, Journal of the Association for Information Systems 5 (2) (2004) 29-75.
[62] C. Lu, M.S. Drew, J. Au, An Automatic Video Classification System Based on a Combination of HMM and Video Summarization, International Journal of Smart Engineering System Design 5 (2003) 33-45.
[63] J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability 1 ( 1967) 281-297.
[64] H. Mannila, H. Toivonen, A. I. Verkamo, Efficient algorithms for discovering association rules, In Proc. AAAI’94 Workshop Knowledge Discovery in Databases (KDD’94), pages 181-192, Seattle, WA, July 1994.
[65] G. S. Mela, Radiological Research in Europe: A Bibliometric Study, European Radiology 13 (4) (2003) 657-662.
[66] R. S. Michalski, R. E. Stepp, Learning from observation: Conceptual clustering, In R. S. Michalski, J. G. Carbonell, T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach (1), San Mateo, Morgan Kaufmann, 1983.
[67] N.A. Mylonopoulos, V. Theoharakis, On-Site: Global Perceptions of IS Journals, Communications of the ACM 44 (9) (2001) 29-33.
[68] W. J. Nash, T. L. Sellers, S. R. Talbot, A.J. Cawthorn, W. B. Ford, The Population Biology of Abalone (_Haliotis_species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report 48 (1994).
[69] J. Neter, M. H. Kutner, C. J. Nachtsheim, L. Wasserman, Applied Linear Statistical Models, Fifth edition, McGraw-Hill, 2005.
[70] R. Ng, J. Han, Efficient and effective clustering method for spatial data mining, In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pages 144-155, Santiago, Chile, September, 1994.
[71] E. Ogston, B. Overeinder, M.V. Steen, F. Brazier, A method for decentralized clustering in large multi-agent systems, Proceedings of the second international joint conference on Autonomous agents and multiagent systems (2003) 789-796.
[72] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, In Proc. 7th Int. Conf. Database Theory (ICDT’99), pages 398-416, Jerusalem, Israel, Jan. 1999.
[73] K. Peffers, Y. Tang , Identifying and evaluating the universe of outlets for information systems research: Ranking the journals, The Journal of Information Technology Theory and Application (JITTA) 5 (1) (2003) 63-84.
[74] J. Pei, J. Han, R. Mao, CLOSET: An efficient algorithm for mining frequent closed itemsets, In Proc. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DMKD00), pages 11-20, Dallas, TX, May 2000.
[75] D. Pyle, Data Preparation for Data Mining, Morgan Kaufmann, 1999.
[76] J. R. Quinlan, Bagging, Boosting, and C4.5, In Proc. 12th Natl. Conf. Artificial Intelligence (AAAI’96), page 725-730, Portland, OR, Aug, 1996.
[77] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.
[78] J. R. Quinlan, Unknown attribute values in induction, In Proc. 6th Int. Workshop on Machine Learning, pages 164-168, Ithaca, NY, June 1989.
[79] R. K. Rainer, M. Miller, Examining differences across journal rankings, Communications of the ACM 48 (2) (2005) 91-94.
[80] R. Ramakrishnan, J. Gehrke, Database Management Systems, Third Edition, McGraw Hill, 2002.
[81] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, In D. E. Rumelhart, J. L. McClelland, editors, Parallel Distributed Processing, MIT Press, 1986.
[82] J. W. Shavlik, T. G. Dietterich, Readings in Machine Learning, San Meteo, Morgan Kaufmann, 1990.
[83] R. C. Schank, Dynamic Memory: A Theory of Reminding and Learning in Computers and People, Cambridge Press, 1983.
[84] G. Sheikholeslami, S. Chatterjee, A. Zhang, WaveCluster: A multiresolution clustering approach for very large sptial databases, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pages 428-439, New York, August, 1998.
[85] A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Fifth Edition, McGraw-Hill, 2005.
[86] R. Srikant, R. Agrawal, Mining generalized association rules, In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB’95), pages 407-419, Zurich, Switzerland, Sept. 1995.
[87] P. N. Tan, V. Kumar, J. Srivastava, Selecting the right objective measure for association analysis, Information Systems (29) (2004) 293–313.
[88] C.W. Tao, Unsupervised Fuzzy Clustering with Multi-Center Clusters, Fuzzy Sets and Systems 128(3) (2002) 305-322.
[89] Thomson Corp., “ISI Web of Knowledge and Journal Report”. Retrieved February 28, 2006, from the World Wide Web: http://www.isisnet.com.
[90] J. D. Ullman, J. Widom, A first Course in Database System, Second edition, Prentice Hall, 2001.
[91] W. Wang, J. Yang, R. Muntz, STING: A statistical information grid approach to spatial data mining, In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB’97), pages 186-195, Athens, Greece, August, 1997.
[92] S. M. Weiss, C. A. Kulikowski, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, Morgan Kaufmann, 1991.
[93] S. M. Weiss, N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998.
[94] M. E. Whitman, A. R. Hendrickson, A. M. Townsend, Research Commentary. Academic Rewards for Teaching, Research and service: Data and Discourse, Information Systems Research 10 (2) (1999) 99-109.
[95] M.S. Yang, C.H. Ko, On A Class of Fuzzy C-Numbers Clustering Procedures for Fuzzy Data, Fuzzy Sets and Systems 84(1) (1996) 49-60.
[96] C. Zadeh, Fuzzy sets, Information Control, 8 (1965) 338-353.
[97] T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: An efficient data clustering method for very large databases, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pages 103-114, Montreal, Canada, June, 1996.
[98] W. Ziarko, The discovery, analysis, and representation of data dependencies in databases, In G. Piatetsky-Shapiro, W. J. Frawley, editors, Knowledge Discovery in Databases, pages 195-209, Menlo Park: AAAI Press, 1991.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2006-6-6

推文