資料挖掘的多值及多標籤決策樹分類法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：12

、訪客IP：18.223.188.252

姓名

許昌齡(Chang-Ling Hsu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

資料挖掘的多值及多標籤決策樹分類法
(Multi-valued and Multi-labeled Decision Tree Classifiers for Data Mining)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

現今，決策樹分類法要求屬性及類別標籤均須為單值。然而，真實世界存在著多值多標籤的資料，為了能處理此種多值多標籤資料的分類，本研究首先設計了一個決策樹分類法並命名為MMC (Multi-valued and Multi-labeled Classifier)；其次，藉由重新設計此演算法，我們發展另一個分類法並命名為MMDT (Multi-valued and Multi-labeled Decision Tree) 以改善 MMC 的正確率。
MMC 和 MMDT不同於傳統決策樹分類法的一些主要功能，包括生長決策樹、選擇屬性、以標籤代表葉節點及預測新的資料。MMC的發展策略主要基於多標籤間的相似度測量，而MMDT 的發展策略主要暨基於多標籤間的相似度測量及評分。
實驗結果說明 MMC 和 MMDT 不僅能從大量的多值及多標籤資料集來挖掘出規則，而且得到具說服性的正確率和規則良好度。

摘要(英)

Presently, decision tree classifiers require that attributes and class label of data set to be single-valued. However, there exist classification problems with multi-valued and multi-labeled data. Aiming to handle this multi-valued and multi-labeled data, this research has developed a decision tree classifier named MMC (Multi-valued and Multi-labeled Classifier) first. Then, by redesigning the algorithm, this research has further developed another classifier named MMDT (Multi-valued and Multi-labeled Decision Tree) to improve the accuracy of MMC.
MMC and MMDT are different from the traditional decision tree classifiers in some major functions including growing a decision tree, selecting attribute, assigning labels to represent a leaf and making a prediction for a new data. The development strategy of MMC is mainly based on measuring similarity among multiple labels; the development strategy of MMDT is mainly based on both measuring similarity and scoring among multiple labels.
The experimental results show that MMC and MMDT can not only mine classification rules from a large multi-valued and multi-labeled data set, but also get convincing accuracy and goodness of rules.

關鍵字(中)

★ 資料挖掘
★ 多值屬性
★ 多標籤
★ 分類
★ 決策樹

關鍵字(英)

★ multiple labels
★ classification
★ decision tree
★ data mining
★ Multi-valued attribute

論文目次

Chinese Abstract I
English Abstract II
Contents IV
List of Figures VI
List of Tables VII
1. Introduction 1
1.1 Background and Motivation 1
1.2 Statements of Problem 2
1.3 Purposes of the Study 3
2. Literature Review 4
2.1 Clarification for the Confusion among the Multi-labeled Data, Two-classed Data and Multi-classed Data 4
2.2 Difficulties in Handling the Multi-valued and Multi-labeled Data by Traditional Classifiers 5
3. The Algorithms 8
3.1 MMC and MMDT Related Affairs and Symbols 8
3.1.1 MMC and MMDT Related Affairs 8
3.1.2 Symbols of MMC and MMDT 11
3.2 The Algorithms of MMC and MMDT 12
3.2.1 Measuring the Label Similarity 13
3.2.2 The MMC Algorithm 15
3.2.2.1 To determine the internal node and its branches 16
3.2.2.2 To determine the leaf node 20
3.2.3 Label Ratio and the MMDT Algorithm 21
3.2.3.1 Label Ratio 22
3.2.3.1.1 Label Similarity 23
3.2.3.2 The MMDT Algorithm 25
3.2.3.2.1 Function next_attribute of MMDT 27
3.2.3.2.1.1 Function weighted-labelRatio 28
3.2.4 Label Prediction for New Data and Evaluation on the Label Prediction 29
4. Experiments 30
4.1 Experimental Design 30
4.2 Experimental Results 32
4.2.1 Comparisons between MMC and MMDT 32
4.2.2 Examination on the Behavior of MMC and MMDT 34
5. Summary and Conclusion 41
References 43

參考文獻

Adams, W. J. and Yellen, J. L. (1976). Commodity Bundling and the Burden of Monopoly. Quarterly Journal of Economics, 90(3), 475-498.
Agrawal, R., Ghosh, S., Imielinski, T., Iyer, B., and Swami, A. (1992). An Interval Classifier for Database Mining Applications. Proceedings of the 18th International Conference on Very Large Databases. (pp. 560-573). Vancouver, BC.
Blake, C. L. & Merz, C. J. (2004). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International.
Chen, Y.-L., Hsu, C.-L., and Chou, S.-C. (2003). Constructing a multi-valued and multi-labeled decision tree, Expert Systems with Applications, 25(2), 199-209.
Date, C. J. (1999). An Introduction to Database Systems, 7th edition. Addison Wesley.
Gehrke, J., Ramakrishnan, R., and Ganti V. (1998). Rainforest: A framework for fast decision tree construction of large datasets. Proceedings of the 24th International Conference on Very Large Databases. New York.
Gordon, D. F. and Desjardins, Marie (1995). Evaluation and Selection of Biases in Machine Learning. Machine Learning, 20(1-2), 5-22.
Guiltinan, J. P. (1987). The Price Bundling of Services: A Normative Framework. Journal of Marketing, 51(2), 74-85.
Han, J., Nishio, S., Kawano, H., and Wang, W. (1998). Generalization-Based Data Mining in Object-Oriented Databases Using an Object-Cube Model. Data and Knowledge Engineering, 25(1-2), 55-97.
Han, J. (2000). From Data Mining To Web Mining: An Overview. Conference tutorial (in PowerPoint), 2000 International Database Systems Conference. Hong Kong, ftp://ftp.fas.sfu.ca/pub/cs/han/slides/hkw00.ppt.
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. (pp. 279-333). San Francisco, CA: Morgan Kaufmann.
Hettich, S. and Bay, S. D. (2004). The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science.
Kotler, P. (1999). Marketing Management: Analysis, Planning, Implementation, and Control. Prentice Hall.
Mantaras, R. L. D. (1991). A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning, 6, 81-92.
Mehta, M., Agrawal, R., and Rissanen, J. (1996). SLIQ: A Fast Scalable Classifier for Data Mining. Proceedings of the Fifth International Conference on Extending Database Technology.
Quinlan, J. R. (1979). Discovering rules from large collections of examples: a case study. In Michie, D. (Ed.), Expert Systems in the Microelectronic Age. Edinburgh, Scotland: Edinburgh University Press.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Rastogi, R. and Shim, K. (1998). Public: A decision tree classifier that integrates building and pruning. Proceedings of the 24th International Conference on Very Large Databases.
Shafer, J. C., Agrawal, R., and Mehta, M. (1996). SPRINT: a scalable parallel classifier for data mining. Proceedings of the 22nd International Conference on Very Large Databases. (pp. 544-555). Mumbai (Bombay), India.
Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379-423; 623-656.
Silver, E. A. and Peterson, R. (1985). Decision systems for inventory management and production planning, 2nd edition. New York: Wiley.
Steinberg, D. and Colla, P. L. (1995). CART: Tree-Structured Nonparametric Data Analysis. San Diego, CA: Salford Systems.
Umano, M., Okamoto, H., Hatono, I., Tamura, H., Kawachi, F., Umedzu, S., and Kinoshita, J. (1994). Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. Proceedings of the third IEEE International Conference on Fuzzy Systems, 3. (pp. 2113-2118). Orlando, FL.
Wang, K., Zhou, S., and Liew, S. C. (1999). Building hierarchical classifiers using class proximity. Proceedings of the 25th International Conference on Very Large Data Bases. (pp. 363-374). Edinburgh, Scotland.
Wang, H., & Zaniolo, C. (2000). CMP: a fast decision tree classifier using multivariate predictions. Proceedings of the 16th International Conference on Data Engineering (pp. 449-460).
Zaiane, O. R. and Han, J. (1995). Resource and knowledge discovery in global information systems: A preliminary design and experiment. Proceedings of the First International Conference on Knowledge Discovery and Data Mining. (pp. 331-336). Montreal, Quebec.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2004-7-1

推文