不同標籤屬性變化下的決策樹建構系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：97

、訪客IP：3.135.190.224

姓名

胡筱薇(Hsiao-Wei Hu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

不同標籤屬性變化下的決策樹建構系統
(Constructing Decision Trees from Data with Various Label-Driven Inductions)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在眾多資料探勘技術中，決策樹是相當受歡迎的一種分類方法，主要是因為決策樹分類方法所探勘出來的規則有較佳的可讀性。然而，在絕大部份探討決策樹分類的文獻中，均假設標籤為類別屬性，而這樣的假設往往無法反映所有真實世界的情況，因為分類標籤本身可能是具有概念階層的資料、連續性的數值資料亦或是具有概念階層的連續性數值資料。為了降低此研究構面的落差以及處理在不同標籤屬性變化之下的分類，本研究針對三種不同的標籤屬性，分別設計與發展其專屬的決策樹分類法，並命名為(1)HLC(Hierarchical Label Classifier)，(2)CLC(Continuous Label Classifier)，以及 (3)HCC(Hierarchical Continuous-label Classifier)。
本研究所提出的三種創新決策樹分類法並不同於傳統決策樹分類法，其中在主要功能方面即包括:如何控制決策樹的長成、如何選擇適當的測試屬性、如何決定最適合代表葉節點的標籤以及如何預測新的資料。在HLC的發展策略方面，主要是以考量資料在概念階層之間的分布情形，來設計具概念階層之標籤間的相似度測量，而在CLC的發展策略中，本研究透過發展一創新的動態離散化方法來協助開發在數值屬性標籤間的相似度測量，最後，HCC的發展策略則是透過同時考量HCL與CLC的相關處理，進而設計其專屬的相似度測量方法。
實驗結果說明 HLC, CLC 和 HCC 不僅能由各式各樣的標籤資料集來挖掘出規則，而且得到具說服性的正確率和精確率。

摘要(英)

Presently, decision tree classifiers are designed to classify the data with categorical or Boolean labels. In many practical situations, however, there are more complex classification scenarios, where the labels to be predicted are not just nominal variable with flat structure. For example, the predicted labels can be (1) hierarchically related, (2) continuous variable, or (3) hierarchical continuous variable. Unfortunately, existing research paid little attention to the issue of classification for constructing a DT from data with various types of labels. To remedy this research gap, this research has developed three innovative label-driven DT algorithms named (1)HLC (Hierarchical Label Classifier), (2)CLC (Continuous Label Classifier), and (3)HCC (Hierarchical Continuous-label Classifier)
HLC, CLC and HCC are different from the traditional decision tree classifiers in some major functions including growing a decision tree, selecting attribute, assigning labels to represent a leaf and making a prediction for a new data. The development strategy of the proposed algorithms is mainly based on measuring similarity among labels by considering data distribution over the predefined concept hierarchy and by a proposed dynamic discretization for the continuous label at each node during the tree-induction process.
The experimental results show that this research can not merely mine classification rules from variety types of labels, but also gets convincing accuracy and precision of rules.

關鍵字(中)

★ 資料離散化
★ 決策樹
★ 資料探勘
★ 概念階層

關鍵字(英)

★ Decision Tree
★ Data Discretization
★ Concept Hierarchy
★ Data Mining

論文目次

Table of Contents
-----------------------------------------------------
ABSTRACT I
中文摘要 II
ACKNOWLEDGMENT III
LIST OF TABLES VI
LIST OF ILLUSTRATIONS VII
CHAPTER 1 INTRODUCTION 1
1.1. DECISION TREE AND CURRENT LIMITATIONS 2
1.2. RESEARCH OBJECTIVES AND SCOPE 4
1.3. SIGNIFICANCE OF THE RESEARCH 5
1.4. ORGANIZATION OF THE DISSERTATION 6
CHAPTER 2 LITERATURE REVIEW 7
2.1. DECISION TREE INDUCTION 7
2.2. CONTINUOUS DATA DISCRETIZATION 9
2.3. CONCEPT HIERARCHY 10
CHAPTER 3 CONSTRUCTING A DECISION TREE FROM DATA WITH HIERARCHICAL CLASS LABELS 13
3.1. RESEARCH PROBLEM 14
3.2. PROBLEM DEFINITIONS 16
3.2.1. TRAINING DATA AND DECISION TREE 16
3.2.2. CLASS HIERARCHICAL TREE 18
3.2.3. PARTIAL HIERARCHICAL TREE 19
3.3. THE HLC ALGORITHM: HIERARCHICAL LABEL CLASSIFIER 21
3.3.1. ATTRIBUTE SELECTION MEASURE: HIERARCHICAL-ENTROPY VALUE 25
3.3.2. STOP CRITERIA FOR HLC 28
3.3.3. LABEL ASSIGNMENT FOR HLC 29
3.4. PERFORMANCE EVALUATION 31
3.5. DISCUSSION 36
CHAPTER 4 CONSTRUCTING A DECISION TREE FROM DATA WITH CONTINUOUS LABELS 37
4.1. RESEARCH PROBLEM 38
4.2. PROBLEM DEFINITION 42
4.3. THE CLC ALGORITHM: CONTINUOUS LABEL CLASSIFIER 44
4.3.1. DETERMINING NON-OVERLAPPING INTERVALS 45
4.3.2. ATTRIBUTE SELECTION MEASURE: COMPUTING THE GOODNESS VALUE 48
4.3.3. STOP CRITERIA FOR CLC 50
4.3.4. LABEL ASSIGNMENT FOR CLC 50
4.4. PERFORMANCE EVALUATION 51
4.4.1. FIRST EXPERIMENT: COMPARING CLC AND C4.5 WITH FOUR DISCRETIZATION METHODS. 53
4.4.2. SECOND EXPERIMENT: CLC AND REGRESSION TREES 56
4.4.3. THIRD EXPERIMENT: SUPPLEMENTARY COMPARISONS 57
4.5. DISCUSSION 60
CHAPTER 5 CONSTRUCTING A DECISION TREE FROM DATA WITH HIERARCHICAL CONTINUOUS-LABELS 62
5.1. RESEARCH PROBLEM 63
5.2. PROBLEM DEFINITION 66
5.2.1. LABELS AND INTERVALS 68
5.2.2. CLASS HIERARCHICAL TREE (CHT) 71
5.2.3. PARTIAL HIERARCHICAL TREE (PHT) 72
5.3. THE HCC ALGORITHM: HIERARCHICAL CONTINUOUS-LABEL CLASSIFIER 74
5.3.1. ATTRIBUTE SELECTION MEASURE: COMPUTING THE OVERALL GOODNESS VALUE 76
5.3.2. STOP CRITERIA FOR HCC 81
5.3.3. LABEL ASSIGNMENT FOR HCC 82
5.4. PERFORMANCE EVALUATION 83
CHAPTER 6 CONCLUSION AND FEATURE RESEARCH 92
6.1. SUMMARY AND CONCLUSION 92
6.2. FEATURE RESEARCH 94
LIST OF REFERENCES 96
APPENDIX 102

參考文獻

List of References
----------------------------------------------------------
[1] G. Adomavicius and A Tuzhilin, “Using data mining methods to build customer profiles,” IEEE Computer (February), pp. 74–82, 2001.
[2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An interval classifier for database mining applications,” Proceedings of the 18th International Conference on Very Large Databases, pp.560– 573, 1992, Vancouver, BC.
[3] R. Agrawal, T. Imielinski and A. Swami, “Database mining: a performance perspective,” IEEE Transactions on Knowledge and Data Engineering, vol.5, no.6, pp.914–925, 1993.
[4] M. Ankerst, M. Ester and H.P. Kriegel, "Towards an effective cooperation of the computer and the user for classification," Proc. 6th Int. Conf. on Knowledge Discovery and Data Mining, pp.178-188, 2000.
[5] J.B. Ayers, Handbook of Supply Chain Management, 2rd Edition, Auerbach Publication, Boca Raton, FL., pp. 426, 2006.
[6] Z. Barutcuoglu, R.E. Schapire and O.G. Troyanskaya, “Hierarchical multi-label prediction of gene function,” Bioinformatics, vol.22, no.7, pp.830–836, 2006.
[7] J. Bogaert, R. Ceulemans and E.D. Salvador-Van, “Decision tree algorithm for detection of spatial processes in landscape transformation,” Environmental Management, vol.33, no.1, pp.62-73, 2004.
[8] F. Bonchi, F. Giannotti, G. Mainetto and D. Pedreschi, “A classification-based methodology for planning audit strategies in fraud detection,” In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, pp. 175-184, 1999.
[9] L. Borzemski, “The Use of Data Mining to Predict Web Performance,” Cybernetics & Systems, vol.37, no.6, pp.587-608, 2006.
[10] I. Bose and R.K., “Mahapatra, Business data mining—a machine learning perspective,” Information & Management, vol.39, no.3, pp.211-225, 2001.
[11] L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, “Classification and Regression Trees,” New York, Chapman & Hall, 1993.
[12] C.E. Brodley and P.E. Utgoff, “Multivariate Decision Trees,” Machine Learning, vol.19, pp.45-77, 1995.
[13] J.R. Cano, F. Herrera and M. Lozano, "Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability," Data & Knowledge Engineering, vol.60, no.1, pp. 90-108, 2007.
[14] J. Catlett, “Megainduction: machine learning on very large databases,” PhD thesis, University of Sydney, 1991.
[15] J. Cerquides and R.L. Mantaras, “Proposal and empirical comparison of a parallelizable distance-based discretization method,” In Third International Conference on Knowledge Discovery and Data Mining, pp.139–142, 1997.
[16] Y.L. Chen, C.L. Hsu and S.C. Chou, “Constructing a multi-valued and multi-labeled decision tree,” Expert Systems with Applications, vol.25, pp.199-209, 2003.
[17] M.R. Chmielewski and J.W. Grzymala-Busse, “Global discretization of continuous attributes as preprocessing for machine learning”, International Journal of Approximate Reasoning, vol.15, pp.319-331, 1996.
[18] Y.H. Cho, J.K. Kim and S.H. Kim, “A Personalized Recommender System Based on Web Usage Mining and Decision Tree Induction,” Expert Systems with Applications, vol.23, pp.329-342, 2002.
[19] W. Desheng, “Detecting information technology impact on firm performance using DEA and decision tree,” International Journal of Information Technology & Management, vol.5, no.2/3, pp.5-5, 2006.
[20] P. Domingos, "The role of Occam's razor in knowledge discovery," Data Mining and Knowledge Discovery, vol.3, no.4, pp.409-425, 1999.
[21] J. Dougherty, R. Kohavi and M. Sahami, “Supervised and unsupervised discretization of continuous features,” In: Proceedings of international conference on machine learning, pp.194-202, 1995.
[22] R. Duda, P. Hart and D. Stork, “Pattern Classification,” second ed., New York: Wiley, 2001.
[23] J. Durkin, “Induction via ID3,” AI Expert, vol.7, pp.48-53, 1992.
[24] U.M. Fayyad and K.B Irani, „On the Handling of Continuous-values Attributes in Decision Tree Generation,” Machine Learning, vol.8, pp.87-102, 1992.
[25] U. Fayyad and K. Irani, “Multi-interval discretion of continuous-values attributes for classification learning,” In: Proceedings of 13th International Joint Conference on Artificial Intelligence, pp.1022–1029, 1993.
[26] S.A. Gaddam, V. V. Phoha and K. S. Balagani, “K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods,” IEEE Transactions on Knowledge & Data Engineering, vol.19, no.3, pp.345-354, 2007.
[27] J. Han and M. Kamber, Data mining: concepts and techniques, San Francisco, CA, Morgan Kaufmann, 2001.
[28] J. Han and Y. Fu, “Dynamic generation and refinement of concept hierarchies for knowledge discovery in database,” AAAI Workshops Knowledge Discovery in Database (WS-94-03), pp.157–168, 1994.
[29] J. Han, Y.Cai and N. Cercone, "Data-driven discovery of quantitative rules in relational databases," IEEE Trans. on Knowledge and Data Engineering, vol.5, no.1, pp.29-40, 1993.
[30] J. Han and Y. Fu, "Exploration of the power of attribute-oriented induction on data mining," In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, pp.399-421, 1996.
[31] J. Han and Y.Fu, "Discovery of multiple-level association rules from large databases," In Proc. 1995 Int. Conf. Very Large Data Bases (VLDB'95), pp.420-431, 1995.
[32] C.-W. Hsu, C.-C. Chang and C.-J Lin, A Practical Guide to Support Vector Classification, Available: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/ guide.pdf, 2003.
[33] H.W. Hu, Y.L. Chen and K. Tang, "A Dynamic Discretization Approach for Constructing Decision Trees with a Continuous Label," IEEE Transactions on Knowledge and Data Engineering, 2009.
[34] G. Jagannathan, R.N. Wright, "Privacy-preserving imputation of missing data," Data & Knowledge Engineering, vol.65, no.1, pp.40-56, 2008.
[35] R. Jin and G. Agrawal, “Efficient decision tree construction on streaming data,” In proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.571-576, 2003.
[36] M. Kamber, L. Winstone, W. Gong, S. Cheng and J. Han, "Generalization and Decision Tree Induction: Efficient Classification in Data Mining," In Proc. of 1997 Int'l Workshop on Research Issues on Data Engineering, pp.111-120, 1997.
[37] G.V. Kass, “An exploratory technique for investigating large quantities of categorical data,” Applied Statistics, vol.29, pp.119-127, 1980.
[38] L. Kaufman, P.J. Rousseeuw, “Finding Group in data: An Introduction to Cluster Analysis,” John Wiley & Sons, New York, 1990.
[39] D.A. Keim, M.C, Hao and U. Dayal, "Hierarchical pixel bar charts," IEEE Transaction on Visualization and Computer Graphics, vol.8, no.3, pp.255-269, 2002.
[40] R. Kerber, “Discretization of numeric attributes,” In: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), pp.123–128, 1992.
[41] S. Kramer, “Structural regression trees,” In Proceedings of the Thirteenth National Conference on Artificial Intelligence, MIT Press, Cambridge, pp.812-819, 1996.
[42] N. Kushmerick, “Learning to remove Internet advertisements,” Third International Conference on Autonomous Agents, 1999.
[43] M. Last, “Online classification of nonstationary data streams,” Intelligent Data Analysis, vol.6, no.2, pp.129-147, 2002.
[44] M. Last, M. Friedman and A. Kandel, “Using Data Mining for Automated Software Testing,” International Journal of Software Engineering and Knowledge Engineering, vol.14, no.4, pp.369-393, 2004.
[45] X.B. Li, J. Sweigart, J. Teng, J. Donohue and L. Thombs, “A dynamic programming based pruning method for decision trees,” INFORMS Journal on Computing, vol.13, pp.332-344, 2001.
[46] W.Y. Loh and Y. S. Shih, “Split selection methods for classification trees,” Statistica Sinica, vol.7, pp.815-840, 1997.
[47] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol.1, pp.281-297, 1967.
[48] M. Mehta, R. Agrawal and J. Rissanen, “SLIQ: a fast scalable classifier for data mining,” In: Proc of the Fifth International Conference on Extending Database Technology, pp.18–32, 1996.
[49] B. Michael and L. Gordon, “Mastering Data Mining: The Art & Science of Customer Relationship Management,” New York, Wiley, 2000.
[50] P.M. Murphy and D.W. Aha, “UCI repository of machine learning database,” for information contact ml-repository@ics.uci.edu, 1994.
[51] S. Piramuthu, “Feature Selection for Financial Credit-Risk Evaluation Decisions,” INFORMS Journal on Computing, vol.11, pp.258-266, 1999.
[52] J.R. Quinlan, “Induction on decision trees,” Machine Learning, vol.1, pp.81–106, 1986.
[53] J.R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Series in Machine Learning, Kluwer Academic Publishers, 1993.
[54] J.R. Quinlan, “Improved Use of Continuous Attributes in C4.5,” Artificial Intelligence, vol.4, pp.77-90, 1996.
[55] J.C. Shafer, R. Agrawal and M. Mehta, “SPRINT: A scalable parallel classifier for data mining,” Proceedings of the 22nd International Conference on Very Large Databases, pp.544–555, 1996.
[56] M. Shaw, C. Subramaniam, G.W. Tan and M.E. Welge, “Knowledge management and data mining for marketing,” Decision Support Systems, vol.31, pp.127–137, 2001.
[57] M. Shaw, C. Subramaniam, G.W. Tan and M.E. Welge, “Knowledge management and data mining for marketing,” Decision Support Systems, vol.31, pp. 127–137, 2001.
[58] C.C. Shen and Y.L. Chen, “A dynamic-programming algorithm for hierarchical discretization of continuous attributes,” European Journal of Operational Research, vol.184, no.2, pp:636-651, 2008.
[59] S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, K.R.K. Murthy, “Improvements to the SMO Algorithm for SVM Regression,” IEEE Transactions on Neural Networks, vol.11, no.5, pp.1188-1193,1999.
[60] B. Stenger, A. Thayananthan, P. Torr and R. Cipolla, “Estimating 3D hand pose using hierarchical multi-label classification,” Image and Vision Computing, vol.5, no.12, pp.1885–1894, 2007.
[61] K. Thearling, “Data mining and CRM: zeroing in on your best customers,” dmDitrect (December), vol.20, 1999.
[62] M. Umano, H. Okamoto, I. Hatono, H. Tamura, F. Kawachi, S. Umedzu and J. Kinoshita, “Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems,” Proceedings of the third IEEE International Conference on Fuzzy Systems, vol.3, pp.2113–2118, Orlando, FL, 1994.
[63] P. van der Putten, ”Data mining in direct marketing databases,” Complexity and management: a collection of essays. Singapore, World Scientific Publishers In W. Baets (Ed.), 1999.
[64] T. Van de Merckt, "Decision tree in numerical attribute space," in Proceedings of the 13th International Joint Conference on Artificial Intelifence, pp.1016-1021, 1993.
[65] H. Wang and C. Zaniolo, ”CMP: A fast decision tree classifier using multivariate predictions,” Proceedings of the 16th International Conference on Data Engineering, pp.449–460, Bombay, India, 2000.
[66] F. Wu, J. Zhang and V. Honavar, “Learning Classifiers Using Hierarchically Structured Class Taxonomies,” Proceedings of Symposium on Abstraction Reformulation, and Approximation, pp.313-320, 2005.
[67] Y. Yang and J.O. Pedersen, “A Comparatative Study on Feature Selection in Text Categorization,” Proc. 14th Int"l Conf. Machine Learning, 1997.
[68] J. Yang and J. Stenzel, “Short-term load forecasting with increment regression tree,” Electric Power Systems Research, vol.76, no.9-10, pp.880-888, June 2006.
[69] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Research and Development in Information Retrieval, pp.46-54, 1998.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2009-11-3

推文