資源有限下的決策樹建構

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：58

、訪客IP：18.222.113.192

姓名

吳家齊(Chia-Chi Wu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

資源有限下的決策樹建構
(Resource-Constrained Decision Tree Induction)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

分類是資料探勘中一個非常重要的研究領域。在現存的許多分類器當中，決策樹可能是最受歡迎、也最常被使用的分類模型。現有的大多數決策樹演算法皆致力於將分類精確度最大化、將分類錯誤率最小化。然而，在許多現實生活應用中，從以現有資料建立決策樹，到用決策樹分類未來資料的每個過程，都可能包含了各式各樣不同種類的成本或資源消耗。依據我們所面對的問題，我們也有可能需要在有限的資源底下完成分類工作。因此，如何在資源有限下建立出最適用的決策樹是一個很重要的議題。在本研究中，我們首先提出了兩個改良自傳統TDIDT﹝Top-Down Induction on Decision Trees, 由上而下的決策樹建構﹞的演算法。接著，我們採用了一個全新的方法來處理多種資源限制的問題。我們所提出的新方法先從訓練資料集中粹取出所有合法的分類規則，再利用這些粹取出的規則建出一棵決策樹。我們使用實際資料來進行完整的實驗評估。實驗結果顯示，我們提出的方法在不同資源限制下的表現都是令人滿意的。

摘要(英)

Classification is one of the most important research domains in data mining. Among the existing classifiers, decision trees are probably the most popular and commonly-used classification models. Most of the decision tree algorithms aimed to maximize the classification accuracy and minimize the classification error. However, in many real-world applications, there are various types of cost or resource consumption involved in both the induction of decision tree and the classification of future instance. Furthermore, the problem we face may require us to complete a classification task with limited resource. Therefore, how to build an optimum decision tree with resource constraint becomes an important issue. In this study, we first propose two algorithms which are improved versions of traditional TDIDT(Top-Down Induction on Decision Trees) algorithms. Then, we adopt a brand new approach to deal with multiple resource constraints. This approach extracts association classification rules from training dataset first, and then builds a decision tree from the extracted rules. Empirical evaluations were carried out using real datasets, and the results indicated that the proposed methods can achieve satisfactory results in handling data under different resource constraints.

關鍵字(中)

★ 決策樹
★ 資料探勘
★ 分類
★ 成本感知學習

關鍵字(英)

★ data mining
★ cost-sensitive learning
★ decision tree
★ classification

論文目次

中文摘要 I
ABSTACT II
誌謝 III
1. INTRODUCTION 1
1.1. MOTIVATIONS AND RESEARCH OBJECTIVES 2
1.2. ORGANIZATION OF THE DISSERTATION 4
2. LITERATURE REVIEW 5
2.1. MISCLASSIFICATION-COST SENSITIVE DECISION TREES 5
2.1.1 Instance Weighting 6
2.1.2 Threshold Adjusting 7
2.2. TEST-COST SENSITIVE DECISION TREES 7
2.3. MISCLASSIFICATION-AND-TEST COST SENSITIVE DECISION TREES 8
2.4. COST-CONSTRAINED DECISION TREES 9
3. COST-EFFECTIVE CLASSIFICATION TREES UNDER A TIME CONSTRAINT 11
3.1. RESEARCH PROBLEM 11
3.2. PROBLEM DEFINITION 12
3.2.1. Training Data and Decision Tree 12
3.2.2. Attribute and Misclassification Cost, and Attribute Time 13
3.3. ALGORITHM 14
3.4. EXAMPLE 17
3.5. PERFORMANCE EVALUATION 22
3.6. SUMMARY 27
4. BUILDING A COST-CONSTRAINED DECISION TREE WITH MULTIPLE CONDITION
ATTRIBUTES 29
4.1. RESEARCH PROBLEM 29
4.2. PROBLEM DEFINITION 31
4.3. THE ALGORITHM 34
4.3.1. Multi-Dimensional Attribute (MDA) 35
4.3.2. Information Need and Distance 37
4.3.3 Multi-Dimensional Information Gain 39
4.3.4. Splitting Criterion 40
4.3.5. Label Assignment Test 42
4.3.6. Termination Condition 42
4.3.7. Updating the Bottom Nodes 43
4.4. PERFORMANCE EVALUATION 44
4.4.1. Performance Evaluation under Different Cost Constraints 46
4.4.2. Performance evaluation with different numbers of condition attributes 51
4.5. SUMMARY 54
5. COST-SENSITIVE DECISION TREE WITH MULTIPLE RESOURCE CONSTRAINTS 55
5.1. RESEARCH PROBLEM 55
5.2. PROBLEM DEFINITION 56
5.3. ALGORITHM 60
5.3.1 Phase1: Extract the Whole Classification Rules 62
5.3.2 Phase2: Build a Decision Tree with Extracted Rules 64
5.3.3. Phase3: Adjustment for Producing the Final Decision Tree 70
5.3.4. Time Complexity Analysis 73
5.4. PERFORMANCE EVALUATION 74
5.4.1. Performance Evaluation with Single Constrained Resource 77
5.4.2. Performance Evaluation with Two Constrained Resources 79
5.4.3. Performance Evaluation with Multiple Constrained Resources 80
5.5. SUMMARY 82
6. CONCLUSION 83
REFERENCE 85

參考文獻

[1] A. Arnt and S. Zilberstein, “Learning Policies for Sequential Time and Cost Sensitive Classification”, Proceedings of the 1st international workshop on Utility-based data mining, pp. 39-45, 2005.
[2] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, CA, 1984.
[3] X. Chai, L. Deng, Q. Yang, and C. X. Ling, “Test-Cost Sensitive Naïve Bayesian Classification”, Proceeding of the 2004 IEEE International Conference on Data Mining 2004.
[4] P. Chan and S. Stolfo, “Toward Scalable Learning with Non-Uniform Class and Cost Distributions”, Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 164-168, New York, 1998.
[5] P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive.”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[6] C. Elkan, “The Foundations of Cost-Sensitive Learning”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973-978, Seattle, 2001.
[7] J. Gehrke, R. Ramakrishnan, and V. Ganti, “Rainforest: A framework for fast decision tree construction of large datasets”, Proceeding of the 1998 International Conference on Very Large Data Bases, pp. 416-427, 1998.
[8] J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Y. Loh, “BOAT-optimistic decision tree construction”, Proceeding of the 1999 ACM-SIGMOD International Conference. Management of Data, pp. 311-323, 1999.
[9] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006.
[10] M T. Kai, “Inducing cost-sensitive trees via instance weighting”, Principles of Data Mining and Knowledge Discovery, Second European Symposium, pp. 23-26, Springer-Verlag, 1998.
[11] X. B. Li, “A scalable decision tree system and its application in pattern recognition and intrusion detection”, Decision Support Systems, 41, pp. 112-130, 2005.
[12] C.X. Ling, Q. Yang, J. Wang, and S. Zhang, “Decision Trees with Minimal Costs”, Proceedings of 2004 International Conference on Machine Learning, pp. 69, 2004.
[13] C. X. Ling, V. S. Sheng, and Q. Yang, “Test Strategies for Cost-Sensitive Decision Tree”, IEEE Transactions on Knowledge and Data Engineering, 18(8), pp. 1055-1067, 2006.
[14] C. X. Ling and V. S. Sheng, “Cost-Sensitive Learning and The Class Im-Balance Problem”, Encyclopedia of Machine Learning, Springer, 2008.
[15] W. Y. Loh and Y. S. Shih, “Split selection methods for classification trees”, Statistica Sinica, 7(4), pp. 815-840, 1997.
[16] M. Metha, J. Rissanen, and R. Agrawal, “MDL-based decision tree pruning”, Proceedings of the 1995 International Conference on knowledge Discovery and Data Mining (KDD’95), pp. 216-221, 1995.
[17] M. Metha, R. Agrawal, and J. Rissanen, “SLIQ: A fast scalable classifier for data mining”, Proceedings of the 5th International Conference on Extending Database Technology, 1996.
[18] A. Ni, X. Zhu, and C. Zhang, “Any-Cost Discovery: Learning Optimal Classification Rules”, Australian joint conference on artificial intelligence, 3809, pp. 123-132, Sydney, Australie, 2005.
[19] S. W. Norton, “Generating Better Decision Trees.” Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 800-805, Detroit, Michigan, 1989.
[20] M. Núñez, “The Use of Background Knowledge in Decision Tree Induction.” Machine Learning, 6, pp. 231-250, 1991.
[21] A. Papagelis and D. Kalles, “GATree: Genetically Evolved Decision Trees”, 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pp. 203-206, 2000.
[22] F. Provost, T. Fawcett, and R. Kohavi, “The Case Against Accuracy Estimation for Comparing Induction Algorithms”, Proc. 15th Intl. Conf. in Machine Learning, pp. 445-453, Madison, WI, 1998.
[23] Z. Qin, S. Zhang, and C. Zhang, “Cost-Sensitive Decision Trees with Multiple Cost Scales.” Australian joint conference on artificial intelligence, 3339, pp. 380-390, Cairns , Australie, 2004.
[24] J. R. Quinlan, “Induction of decision trees”, Machine Learning, 1, pp. 81-106, 1986.
[25] J. R. Quinlan, “Simplifying decision trees”, International Journal of Man-Machine Studies, 27, pp. 221-234, 1987.
[26] J. R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, Morgan Kaufmann, 1993.
[27] R. Rastogi and K. Shim, “Public: A decision tree classifier that integrates building and pruning”, Proceedings of the 1998 International Conference on Very Large Data Bases, pp. 404-415, 1998.
[28] J. Shafer, R. Agrawal, and M. Mehta. “SPRINT: A scalable parallel classifier for data mining”, Proceedings of 1996 International Conference on Very Large Data Bases, pp. 544-555, 1996.
[29] V. S. Sheng and C. X. Ling, “Feature Value Acquisition in Testing: A Sequential Batch Test Algorithm” Proceedings of the 23nd International Conference on Machine Learning, pp. 809-816, 2006.
[30] M. Tan and J. Schlimmer, “Cost-Sensitive Concept Learning of Sensor Use in Approach and Recognition”, Proceedings of the Sixth International Workshop on Machine Learning, pp. 392-395, Ithaca, New York, 1989.
[31] M. Tan, “Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics”, Machine Learning, 13, pp. 7-33, 1993.
[32] K. M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), pp. 659-665, 2002.
[33] P. D. Turney, “Cost-Sensitive Classification: Empirical Evaluation of A Hybrid Genetic Decision Tree Induction Algorithm”, Journal of Artificial Intelligence Research, 2, pp. 369-409, 1995.
[34] P. D. Turney, “Types of Cost in Inductive Concept Learning”, Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, pp. 15-21, 2000.
[35] Q. Yang, C. Ling, X. Chai, and R. Pan, “Test-Cost Sensitive Classification on Data with Missing Values”, IEEE Transactions on Knowledge and Data Engineering, 18(5), pp. 626-638, 2006.
[36] S. Zhang, Z. Qin, C. X. Ling, and S. Sheng, “ “Missing is Useful”: Mining Values in Cost-Sensitive Decision Trees”, IEEE Transactions on Knowledge and Data Engineering, 17(12), pp. 1689-1693, 2005.
[37] S. Zhang, X. Zhu, J. Zhang, and C. Zhang, “Cost-Time Sensitive Decision Tree with Missing Values”, KSEM 2007, pp. 447-459, 2007.
[38] H. Zhao, “A Multi-Objective Genetic Programming Approach to Developing Pareto Optimal Decision Trees”, Decision Support System, 43(3), pp. 809-826, 2007.
[39] H. Zhao, “Instance Weighting Versus Threshold Adjusting for Cost-Sensitive Classification.” Knowledge Information System, 15(3), pp. 321-334, 2008.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2010-7-21

推文