屬性值隨時間改變的資料分類方法研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：228

、訪客IP：13.58.147.207

姓名

林兒萱(Er-Hsuan Lin) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

屬性值隨時間改變的資料分類方法研究
(A Classification Using Time-Sequential Attributes.)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著資訊科技的進步以及產業電子化與企業整合的趨勢，從龐大的資料中擷取出有價值的資訊，已成為研究與實務上的重要議題，資料挖礦是一種不斷循環的資料分析與決策支援過程，主要是以自動或是半自動的方式從大量資料中探索和分析，以發現出有意義的規則，並將其整理成有價值的知識。
　　本研究以分類方法的研究為主要重點，過去的分類方法無法針對屬性值會隨時間改變的屬性作分類，因此我們發展出一套新方法，以序列規則的精神為基礎找出所有符合的規則，並建立出分類器，如此就可以對測試的資料做最有效的預測分類。另外，不像一般序列規則決策過程中只建立單一門檻值，由於很多稀有屬性或是類別可能因為門檻值太高而被忽略，或是因為門檻值太低而產生出太多
規則，兩者都是單一門檻值所衍生出的問題，因此我們針對不同屬性以及不同類別分別建立其多重門檻值，以保有稀有但是重要的屬性以及類別；最後，在分類建立後，測試資料的預測方式也因多重門檻值的設立，而發展定義出一套新方法，可以比傳統預測方式更有效且分類正確性更高。
　　本論文可應用在銀行業或是股市；銀行業的應用中，可以透過顧客多年累積的交易數據、付款情形、繳付信用等有時間順序的資料，以及顧客基本資料等沒有時間順序的資料，共同使用分析，進而給予顧客最適當的類別，可進一步應用在判斷顧客是否合適往後借貸交易，或是其他加銀行決策管理者對顧客做正確判斷的效益，進而幫助降低銀行呆帳成本。同理，也可以利用長時間收集的股市交易資訊等有時間順序關係的資料，透過本研究發展出的演算法，進而針對某股票做分類，分類結果可以提供研究者分析或是預測股市；以上可以看出本研究之應用面很廣泛。

摘要(英)

Classification is an important method for class label predicting from databases. Most existing methods, however, assume that attribute-values are all constant. In many real-life applications, however, attribute-values may change at different time, such as the daily stock price, the blood pressure at different time, or others. We call these attributes time-sequential attributes. In this paper, we first extend the traditional classification problem to deal with time-sequential attributes. Next, the algorithm, called MutipleMIS-SP, is presented to generate all classification rules for classifier generation. In our approach, we also consider the concept of multiple minimum supports since each attribute and attribute-value pair doesn’t have similar frequency in the database. Using the concept of single minimum support may lead to rare item problem and finally result in low classification accuracy. Finally, two classification criteria are proposed to predict the class label using the generated classification rules.
　Detailed experiments were also presented. Seven synthetic datasets and a real-life dataset, BA-CUSTOMER, were used in our performance analyses and the scalability tests were also given. The result shows that the accuracy of MutipleMIS-SP is better than traditional classification technique C4.5 algorithm in both synthetic datasets and the real dataset.

關鍵字(中)

★ 資料挖礦
★ 分類
★ 時間序列
★ 多重門檻值

關鍵字(英)

★ Data mining
★ Classification
★ Time-sequential data
★ Multiple minimum support

論文目次

CONTENTS
CHINESE ABSTRACT.....................................................................III
ABSTRACT.......................................................................V
CONTENTS......................................................................VI
LIST OF TABLES.............................................................. VII
LIST OF GRAPHS..............................................................VIII
CHAPTER 1. INTRODUCTION........................................................1
CHAPTER 2. RELATED WORK........................................................7
2.1 PREVIOUS RESEARCH..........................................................7
2.1.1 An overview of classification ...........................................7
2.1.1.1 ID3 ...................................................................8
2.1.1.2 C4.5...................................................................9
2.1.1.3 ANN ..................................................................10
2.1.1.4 CBA ..................................................................10
2.1.2 Sequential pattern mining ..............................................11
2.1.3 Multiple Minimum Support................................................12
2.2 POTENTIAL APPLICATIONS ...................................................13
2.3 DISCUSSION................................................................15
CHAPTER 3. PROBLEM DEFINITION.................................................17
CHAPTER 4. ALGORITHM .........................................................22
4.1 CLASSIFIER GENERATION ....................................................22
4.2 CLASSIFICATION CRITERIA...................................................38
CHAPTER 5. EXPERIMENTAL EVALUATION............................................42
5.1 SYNTHETIC DATA GENERATION AND REAL-LIFE DATASET...........................42
5.2 PERFORMANCE EVALUATION ...................................................45
CHAPTER 6. CONCLUSIONS AND FUTURE WORKS ......................................50
REFERENCE.....................................................................52

參考文獻

Reference
[1] A. Fadlalla, An experimental investigation of the impact of aggregation on the performance of data mining with logistic regression., Information & Management, 42(5), Jul. 2005, pp. 695-707.
[2] B. Lent, A. Swami, and J. Widom, Clustering association rules., ICDE-97, England, April 1997.
[3] B. Liu, W. Hsu, and Y. Ma, Integrating classification and association rule mining., KDD-98, New York, NY, Aug. 1998.
[4] B. Liu, W. Hsu, and Y. Ma, Mining Association Rules with Multiple Minimum Supports, In Proceedings of KDD-99, 1999.
[5] B. Liu, Y. Ma, and C. K. Wong, Improving an Association Rule Based Classifier., PKDD-2000, 2000, pp. 504-509.
[6] B. Liu, Y. Ma, and C. K. Wong, Classification Using Association Rules: Weaknesses and Enhancements., To appear in Vipin Kumar, et al, (eds), Data mining for scientific applications, 2001.
[7] C. Aggarwal, and P. Yu, Online generation of association rules., ICDE-98, 1998, pp. 402-411.
[8] D. E. Rumelhart, G. E. Hinton, and J. Williams, Learning internal representations by error propagation., In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge, MA, 1(1), 1986, pp. 318-362.
[9] D. Michie, D.J. Spiegelhalter, and C. C. Taylor, Machine Learning, Neural and Statistical Classification., Ellis Horwood, London, 1994.
[10] E. Baralis, and P. Garza. A lazy approach to pruning classification rules., In Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM-02, IEEE Computer Society Press, Los Alamitos, CA, 2002, pp. 35-42.
[11] E. Baralis, and S. Chiusano, Essential classification rule sets., ACM Trans. Database Syst. 29(4), 2004, pp. 635-674.
[12] F. Coenen, and P. H. Leng, An Evaluation of Approaches to Classification Rule Selection., ICDM-04, 2004, pp. 359-362.
[13] G. Piatetsky-Shapiro, and W. J. Frawley, Knowledge Discovery in Databases., AAAI/MIT Press 1991, 1991.
[14] G. Dong, X. Zhang, L. Wong, and, J. Li, CAEP: Classification by aggregating emerging patterns., DS-99 (LNCS 1721), Japan, Dec. 1999.
[15] H. G. Hwang, C. Y. Ku, D. C. Yen, and C. C. Cheng, Critical factors influencing the adoption of data warehouse technology: a study of the banking industry in Taiwan., Decision Support Systems, 37(1), Apr. 2004, pp. 1-21.
[16] H. Hu and J. Li, Using Association Rules to Make Rule-based Classifiers Robust., Using Association Rules to Make Rule-based Classifiers Robust. ADC-05, 2005, pp. 47-54.
[17] H. Lu, and H. Y. Liu, Decision Tables: Scalable Classification Exploring., RDBMS Capabilities, VLDB-00, 2000.
[18] H. Mannila, Database methods for data mining., KDD-98 tutorial, 1998.
[19] J. F. Wang, Z. R. Li, C. Z. Cai, and Y. Z. Chen, Assessment of approximate string matching in a biomedical text retrieval problem., Computers in Biology & Medicine, Oct., 35(8), 2005, pp. 717-724.
[20] J. Han and Y. Fu, Discovery of multiple-level association rules from large databases., VLDB-95, 1995.
[21] J. Han, J. Pei, Y. Yin, and R. Mao, Mining frequent patterns without candidate generation：A Frequent-Pattern Tree Approach., Data Mining and Knowledge Discovery, 8(1), Jan. 2004, pp. 53-87.
[22] J. He, X. Liu, Y. Shi, W. Xu, and N. Yan, Classifications of credit cardholder behavior by using fuzzy linear programming., International Journal of Information Technology & Decision Making, 3(4), Dec. 2004, pp. 633-650.
[23] J. Li, H. Shen, and R. Topor, Mining the optimal class association rule set., Knowledge-Based System, 15(7), 2002, pp. 399-405.
[24] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu, PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth, in 17th International Conference in Data Engineering, ICDE-01, 2001, pp. 215-224.
[25] J. R. Quinlan. Induction of decision trees., Machine Learning, 1(1), 1986, pp. 81-106.
[26] J. R. Quinlan, C4.5: Programs for Machine Learning., Morgan Kaufmann, 1993.
[27] J. R. Quinlan, and R. M. Cameron-Jones, FOIL: A midterm report., In Proc. 1993, European Conf. Machine Learning, Vienna, Austria, 1993, pp. 3-20.
[28] J. S. Park, S. Chen, M. S., and P. S. Yu, An effective hash based algorithm for mining association rules., SIGMOD-95, 1995, pp.175-186.
[29] K. J. Cios, and G. W. Moore, Uniqueness of medical data mining., Artificial Intelligence in Medicine, 26(1/2), Sep. 2002, pp. 1-24.
[30] K. Wang, S. Zhou, and Y. He, Growing decision tree on support-less association rules., KDD-00, Boston, MA, Aug. 2000.
[31] M. C. Tseng, W. Y. Lin, and R. Jeng, Efficient Remining of Generalized Association Rules Under Multiple Minimum Support Refinement., KES 3, 2005, pp. 1338-1344.
[32] M. K. Jeong, J. C. Lu, X. Huo, B. Vidakovic, and C. Di, Wavelet-Based Data Reduction Techniques for Process Fault Detection., Technometrics, 48(1), Feb. 2006, pp. 26-40.
[33] M. L. Gargano, and B. G. Raggad, Data mining - a powerful information creating tool., OCLC Systems & Services, 15(2), Jun 1999, pp. 81-90.
[34] M. Last, and A. Kandel, Discovering useful and understandable patterns in manufacturing data., Robotics & Autonomous Systems, 49(3/4), Dec. 2004, pp. 137-152.
[35] N. C. Hsieh, An integrated data mining and behavioral scoring model for analyzing bank customers., Expert Systems with Applications, 27(4), Nov. 2004, pp. 623-633.
[36] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines., Cambridge University Press, Cambridge, U.K., 2000.
[37] P. Clark and T. Niblett, The CN2 induction algorithm, Machine Learning., 3, 1989, pp. 261–283.
[38] Q. Yang, I. Tian, Y. Li, and K. Wang, Building Association-Rule Based Sequential Classifiers for Web-Document Prediction., Data Mining Knowledge Discovery, 8(3), 2004, pp. 253-273.
[39] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases., SIGMOD-93, 1993, pp. 207-216.
[40] R. Agrawal and R. Srikant, Fast algorithms for mining association rules., VLDB-94, 1994.
[41] R. Agrawal, R. Srikant, Mining sequential patterns., Proceedings of 1995 International Conference Data Engineering, 1995, pp. 3-14.
[42] R. Bayardo, and R. Agrawal. Mining the most interesting rules., In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, N.Y., 1999, pp. 145-154.
[43] R. C. Wu; R. S. Chen; and C. R. Fan, Design an intelligent CIM system based on data mining technology for new manufacturing processes., International Journal of Materials & Product Technology, 21(6), 2004, pp.1-1.
[44] R. Duda, and P. Hart, Pattern Classification and Scene Analysis., JohnWiley & Sons, 1973.
[45] R. Goodwin, M. Russell, E. Tuv, A. Borisov, M. Janakiram, and S. Louchheim, Advancements and Applications of Statistical Learning / Data Mining in Semiconductor Manufacturing., Intel Technology Journal, 8(4), Nov. 2004, pp. 325-336.
[46] R. Michalski, I. Mozetic, J. Hong, and N. Lavrac, The AQ15 inductive learning system: an overview and experiments., In Proceedings of IMAL 1986, University de Paris-Sud, Orsay, 1986.
[47] R. Mutihac, J. Gillen, P. C. M. V. Zijl and J. J. Pekar, Exploratory Analysis of Functional Magnetic Resonance Imaging Data., The Abdus Salam ICTP Preprint, IC-04-69, 2004, pp. 1-27.
[48] R. Rastogi, and K. Shim, Mining optimized association rules with categorical and numeric attributes., ICDE-98, 1998.
[49] R. Srikant, Q. Vu, and R. Agrawal, Mining association rules with item constraints., KDD-97, 1997, pp. 67-73.
[50] R. Srikant, and R. Agrawal, Mining Sequential Patterns: Generalizations and Performance Improvements., In Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, 1996, Expanded version available as IBM Research Report RJ 9994.
[51] R. T. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang, Exploratory mining and pruning optimizations of constrained association rules., SIGMOD-98, 1998.
[52] S. Brin, R. Motwani, J. Ullman, and S. Tsur, Dynamic Itemset counting and implication rules for market basket data., SIGMOD-97, 1997, pp. 255-264.
[53] S. Haykin, Neural Networks: A Comprehensive Foundation., New York: Macmillan, 1994, pp. 113-38.
[54] S. J. Lee, and K. Siau, A review of data mining techniques., Industrial Management and Data Systems, 101(1), 2001, pp. 41-46.
[55] S. M. Weiss, and N. Indurkhya, Predictive Data Mining: A Practical Guide., Morgan Kaufmann Publishers, San Francisco, CA, 1998.
[56] Technology Forecast: 1997, Price Waterhouse World Technology Center., Menlo Park, CA, 1997.
[57] T. S. Lim, W. Y. Loh, and Y. S. Shih, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms., Machine Learning, 39, 2000.
[58] U. M. Fayyad, S. G. Djorgovski, and N. Weir, Automating the Analysis and Cataloging of Sky Surveys., Advances in Knowledge Discovery and Data Mining, 1996, pp. 471-493.
[59] W. Cohen, Fast Effective Rule Induction., In Proceedings of ICML-95, 1995.
[60] W. Li, J. Han, and J. Pei, CMAR: Accurate and efficient classification based on multiple class-association rules., ICDM-01, San Jose, CA, Nov. 2001, pp. 369-376.
[61] http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm
[62] http://www.codeproject.com/csharp/ID3.asp
[63] http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html
[64] http://www.usc.edu/dept/ancntr/Paris-in-LA/Analysis/c45.html
[65] http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html
[66] X. Yin and J. Han, CPAR: Classification based on predictive association rules, In Proceedings of 2003 SIAM International Conference on Data Mining, SDM-03, 2003.
[67] X. Wang, H. Qu, P. Liu, and Y. Cheng, A self-learning expert system for diagnosis in traditional Chinese medicine., Expert Systems with Applications, 26(4), May 2004, pp. 557
[68] Y. L. Chen, M. C. Chiang, and M. T. Ko, Discovering time-interval sequential patterns in sequence databases., Expert Syst. Applicat., 25(3), 2003, pp. 343-354.
[69] Z. Huo, M. L. Giger†, and C. E. Metz, Effect of dominant features on neural network performance in the classification of mammographic lesions., Phys. Med. Biol., 44, 1999, pp. 2579-2595.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2006-6-22

推文