在成本限制下，以屬性導向歸納法為基礎歸納資料

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：161

、訪客IP：3.133.160.156

姓名

余美儒(YU, MEI-RU) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

在成本限制下，以屬性導向歸納法為基礎歸納資料

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著科技的進步，資料的重要性逐漸被人們所看重，許多學者爭相投入資料探勘領域，期待從眾多資料中找出背後所隱含的價值。屬性導向歸納法（Attribute Oriented Induction，簡稱為AOI）為一種以歸納為基礎的資料分析技術，是重要的資料探勘方法之一，最早於1990年代首次被提出。主要將關聯式資料庫中的每一個屬性一般化以進行知識挖掘，此方法將屬性依據使用者背景知識設定而成的概念樹將資料進行歸納。
使用屬性導向歸納法主要有三個問題，第一個是在歸納大量資料時會以限制最終歸納資料筆數為主，所以無法保證每筆歸納出來的結果都具有一定的明確度。第二個問題是傳統AOI歸納資料筆數可以由使用者自行設定，但因其設定會影響歸納結果明確度，所以不容易設定恰當的最終門檻值。第三個問題是大量資料中隱藏很多的雜訊，然而AOI並無過濾雜訊的功能，在歸納時會將資料中的資訊與雜訊混合在一起，使得歸納出來的資料不明確，變得模糊。
因此，本研究導入成本的概念，將屬性一般化所喪失的詳細度量化為成本，以成本限制的方式，讓每一筆歸納結果都具有一定明確度。同時，以聚合式階層分群方法為基礎依照挑選資料歸納方式提出兩種演算法。評估兩種方法在不同成本限制以及不限制最終歸納資料筆數前提下，將資料庫歸納後的結果明確度和原始資料代表力。本研究改善傳統AOI方法歸納結果較粗略、無法過濾雜訊的問題，歸納出明確程度較高且能分辨原始資料中資訊和雜訊的表格。

摘要(英)

With the progress of technology, the importance of "information" has gradually been valued by people. Therefore, researchers in many fields have started to dive into the field of data mining and have developed lots of solutions, looking forward to mine the information behind large database. One of the important methods called Attribute Oriented Induction (short for AOI) has been proposed in 1990.
AOI generalizes each attribute in relational databases according to concept trees ascension for knowledge mining and summarizes the data based on the conceptual tree set by the user′s background knowledge. However, there are three main problems about AOI. The first is the threshold of the number of data items to be summarized when summarizing a large amount of data. Therefore, there is no guarantee that each of the summarized results will have a certain degree of certainty. The second problem is that the setting of the traditional AOI threshold will affect the clarity of the induction result. The third problem is that the AOI does not have the function of filtering noise. It will mix information and noise in the data when it is summarized, making the summarized data unclear and blurred.
This study introduces cost to quantify the losing details when attribute generalizing and in a cost-constrained manner to make each generalized tuple with certain degree of certainty. According to the different data selection method (Minimum cost, Random), we proposed two algorithms based on the aggregate hierarchical clustering method. Finally, we find the performance of one of our method superior than traditional AOI and provide more useful information.

關鍵字(中)

★ 屬性導向歸納法
★ 聚合式階層分群
★ 成本限制
★ 雜訊過濾
★ 資訊挖掘

關鍵字(英)

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章、緒論 - 1 -
1-1　研究背景 - 1 -
1-2　研究動機 - 1 -
1-3　研究目的 - 4 -
第二章、文獻探討 - 5 -
2-1　屬性導向歸納法 - 5 -
2-1-1　　屬性導向歸納法效能提升的研究 - 7 -
2-1-2　　解決傳統屬性導向歸納法問題的方法 - 7 -
2-1-3　　以傳統屬性導向歸納法為基礎的應用 - 8 -
2-2　分群方法 - 10 -
2-2-1　　階層式分群 - 10 -
2-2-2　　非階層式分群 - 11 -
第三章、研究方法 - 13 -
3-1　研究概述 - 13 -
3-2　問題定義 - 13 -
3-2-1　　概念樹定義 - 13 -
3-2-2　　Input Table定義 - 14 -
3-2-3　　Output Table定義 - 15 -
3-2-4　　定義成本的計算 - 15 -
3-2-5　　定義共同祖先 - 17 -
3-2-6　　資料合併 - 18 -
3-2-7　　定義AOI中的資訊與雜訊 - 18 -
3-3　演算法流程 - 19 -
3-3-1　　演算法1 - 19 -
3-3-2　　演算法2 - 21 -
第四章、實驗 - 22 -
4-1　資料集 - 22 -
4-2　實驗設計 - 22 -
4-3　衡量指標 - 23 -
4-4　實驗結果 - 24 -
4-4-1　　歸納資料數、表格成本與原始資料量之影響 - 24 -
4-4-2　　成本限制與歸納結果中資訊比例之影響 - 29 -
4-4-3　　本研究與傳統AOI歸納結果之比較 - 41 -
第五章、結論與建議 - 48 -
5-1　研究發現 - 48 -
5-2　研究貢獻 - 49 -
5-3　研究限制與未來發展 - 50 -
參考文獻 - 52 -
附錄一：屬性概念樹 - 54 -
附錄二：成本限制0.7兩演算法十次歸納結果的資訊比例 - 57 -
附錄三：成本限制0.8兩演算法十次歸納結果的資訊比例 - 60 -
附錄四：演算法1歸納結果中的資訊 - 63 -
附錄五：演算法2歸納結果中的資訊 - 80 -

參考文獻

1. Cai, Y., N. Cercone, and J. Han. An attribute-oriented approach for learning classification rules from relational databases. in Data Engineering, 1990. Proceedings. Sixth International Conference on. 1990. IEEE.
2. Han, J., Y. Cai, and N. Cercone. Knowledge discovery in databases: An attribute-oriented approach. in VLDB. 1992.
3. Warnars, H., et al., Easy understanding of Attribute Oriented Induction (AOI) characteristic rule algorithm. International journal of Applied Engineering Research (IJAER), 2016. 11(8): p. 5369-5375.
4. Han, J. and M. .Kamber, Data Mining: Concepts and Techniques. Morgan Kaufinann, 2006.
5. Han, J., et al., Discovery of data evolution regularities in large databases. Journal of Computer and Software Engineering (a special issue on methodologies and tools for intelligent information systems),(to appear), 1993.
6. Carter, C.L. and H.J. Hamilton, Efficient attribute-oriented generalization for knowledge discovery from large databases. IEEE Transactions on knowledge and data engineering, 1998. 10(2): p. 193-208.
7. Cheung, D.W., et al., Efficient rule-based attribute-oriented induction for data mining. Journal of Intelligent Information Systems, 2000. 15(2): p. 175-200.
8. Chen, Y.-L. and C.-C. Shen, Mining generalized knowledge from ordered data through attribute-oriented induction techniques. European Journal of Operational Research, 2005. 166(1): p. 221-245.
9. Huang, S.-M., P.-Y. Hsu, and W.-C. Wang. A study on the modified attribute oriented induction algorithm of mining the multi-value attribute data. in Asian Conference on Intelligent Information and Database Systems. 2012. Springer.
10. Chen, Y.-L., Y.-Y. Wu, and R.-I. Chang, From data to global generalized knowledge. Decision Support Systems, 2012. 52(2): p. 295-307.
11. Qu, Y., X. Li, and H. Wang. Improvement of attribute-oriented induction method based on attribute correlation with target attribute. in Machine Learning and Cybernetics (ICMLC), 2014 International Conference on. 2014. IEEE.
12. Wu, Y.-Y., Y.-L. Chen, and R.-I. Chang, Mining negative generalized knowledge from relational databases. Knowledge-Based Systems, 2011. 24(1): p. 134-145.
13. Hamilton, H.J., R.J. Hilderman, and N. Cercone. Attribute-oriented induction using domain generalization graphs. in Tools with Artificial Intelligence, 1996., Proceedings Eighth IEEE International Conference on. 1996. IEEE.
14. Raschia, G. and N. Mouaddib, SAINTETIQ: a fuzzy set-based approach to database summarization. Fuzzy sets and systems, 2002. 129(2): p. 137-162.
15. Lee, D.H. and M.H. Kim, Database summarization using fuzzy ISA hierarchies. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1997. 27(4): p. 671-680.
16. Chunxi, Z., et al. A method of rough set based on the attribute oriented induction and relevance analysis. in Grey Systems and Intelligent Services, 2007. GSIS 2007. IEEE International Conference on. 2007. IEEE.
17. Wang, L.-Z., L.-H. Zhou, and T. Chen. A new method of attribute-oriented spatial generalization. in Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on. 2004. IEEE.
18. Knorr, E.M. and R.T. Ng. Extraction of Spatial Proximity Patterns by Concept Generalization. in KDD. 1996.
19. Warnars, S. Mining Patterns with Attribute Oriented Induction. in The International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Tangerang, Indonesia. 2015.
20. Warnars, S. Attribute oriented induction of high-level emerging patterns. in Granular Computing (GrC), 2012 IEEE International Conference on. 2012. IEEE.
21. Warnars, S., Mining Frequent and Similar Patterns with Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) Data Mining Technique. 2014.
22. Al-Mamory, S.O. and Z.N. Abdullah, Combining the Attribute Oriented Induction and Graph Visualization to Enhancement Association Rules Interpretation. Distributed Agents for Web Content Filtering, 2016: p. 10.
23. Jain, A.K., M.N. Murty, and P.J. Flynn, Data clustering: a review. ACM computing surveys (CSUR), 1999. 31(3): p. 264-323.
24. Garai, G. and B. Chaudhuri, A novel genetic algorithm for automatic clustering. Pattern Recognition Letters, 2004. 25(2): p. 173-187.
25. Ma, E.W. and T.W. Chow, A new shifting grid clustering algorithm. Pattern recognition, 2004. 37(3): p. 503-514.
26. Steinbach, M., G. Karypis, and V. Kumar. A comparison of document clustering techniques. in KDD workshop on text mining. 2000. Boston.
27. 薪資分布圖（無日期）。民 107 年 6 月 12日，取自：中華民國統計資訊網：http://www.stat.gov.tw/public/Data/732910591KIDMI9KP.pdf/

指導教授

陳彥良

審核日期

2018-7-2

推文