博碩士論文 89443007 詳細資訊


姓名 沈清正(Ching-Cheng Shen)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 運用屬性導向歸納法的技術挖掘序列資料的廣義知識
(Mining generalized knowledge from ordered data through attribute-oriented induction techniques)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 屬性導向歸納法(簡稱為AOI方法)是最重要的資料挖礦方法的其中一種,AOI方法的輸入值包含一個關連式資料表和屬性相關的概念階層,輸出是任務相關資料所歸納之廣義特徵,雖然AOI方法用在廣義特徵的尋找非常有用,但它只能挖掘關連式資料的特徵,這些資料並不具有序列性,如果資料具有序列性,現有的AOI方法就無法找到廣義的知識,基於這個問題,本論文提出一種以AOI技術為基礎的動態規劃演算法,可以從序列資料中找到廣義的特徵,透過演算法的使用,我們可以發覺一串連續K筆的序列歸納tuples,它可以用來描述K個連續資料區段的廣義特徵,而K值是可以由使用者自行定義。
摘要(英) The attribute-oriented induction (AOI for short) method is one of the most important data mining methods. The input of the AOI method contains a relational table and a concept tree (concept hierarchy) for each attribute, and the output is a small relation summarizing the general characteristics of the task-relevant data. Although AOI is very useful for inducing general characteristics, it has the limitation that it can only be applied to relational data, where there is no order among the data items. If the data are ordered, the existing AOI methods are unable to find the generalized knowledge. In view of this weakness, this paper proposes a dynamic programming algorithm, based on AOI techniques, to find generalized knowledge from an ordered list of data. By using the algorithm, we can discover a sequence of K generalized tuples describing the general characteristics of different segments of data along the list, where K is a parameter specified by users.
關鍵字(中) ★ 屬性導向歸納法
★ 概念階層
★ 資料挖礦
★ 關連式資料
★ 序列資料
★ 動態規劃法
關鍵字(英) ★ Concept Hierarchy
★ Ordered Data Dynamic Programming
★ Data Mining
★ Attribute-Oriented Induction
★ Relational Data
論文目次 目錄
摘要………………………………………………………………………………………………I
Abstract……………………………………………………………………………………………II
目錄………………………………………………………………………………………………IV
圖目………………………………………………………………………………………………VI
表目………………………………………………………………………………………………VIII
第一章 簡介………………………………………………………………………………………1
第二章 AOI演算法和序列歸納問題定義…………………………………………………………6
2.1. AOI演算法簡介………………………………………………………………………………6
2.2. 序列歸納問題的定義…………………………………………………………………………8
第三章 序列屬性導向歸納演算法(THE ORDERED AOI ALGORITHM)………………………10
3.1. 序列屬性導向歸納演算法的概述……………………………………………………………10
3.2. OAOI演算法的第一個階段…………………………………………………………………12
3.3. OAOI演算法的第二個階段…………………………………………………………………13
3.4. OAOI演算法的第三個階段…………………………………………………………………16
3.5. OAOI演算法的第四個階段…………………………………………………………………17
3.6. OAOI演算法的第五個階段…………………………………………………………………23
3.7. OAOI演算法的時間複雜度…………………………………………………………………23
3.8. OAOI演算法的空間複雜度…………………………………………………………………25
第四章 擴展序列屬性導向歸納演算法(THE EXTENDED ORDERED AOI ALGORITHM)……27
4.1. 資料量的條件限制…………………………………………………………………………28
4.2. 共有子孫樹(COMMON-CHILD TREE)………………………………………………………29
4.3. 資料前處理…………………………………………………………………………………32
4.4. EOAOI演算法的效能評估…………………………………………………………………33
4.4.1. 資料產生……………………………………………………………………………………34
4.4.2. 執行效能-執行時間………………………………………………………………………35
4.4.3. 產出品質-視窗平滑度的大小(The smoothing degree of the window)……………………37
第五章 最佳化數值資料概念階層演算法(AN OPTIMAL ALGORITHM FOR BUILDING CONCEPT HIERARCHIES FROM NUMERICAL DATA)……………………………………………………………………………40
5.1. 數值資料概念階層的問題定義……………………………………………………………43
5.2. ONCH演算法的說明………………………………………………………………………45
5.3. ONCH演算法的效能評估…………………………………………………………………51
5.4. 運用最佳化數值概念階層之EOAOI的品質效能評………………………………………55
第六章 結論……………………………………………………………………………………58
參考文獻…………………………………………………………………………………………61
圖目
圖1. 原始序列資料量與執行時間關係圖………………………………………………………36
圖2. 廣義tuples的數量與執行時間關係圖………………………………………………………36
圖3. ub/lb比率與執行時間的關係圖……………………………………………………………37
圖4. 一個概念階層的例子………………………………………………………………………43
圖5. QD和QDM之間的差別……………………………………………………………………47
圖6. 屬性「信用卡消費金額」的資料分佈……………………………………………………52
圖7. 不同演算法的執行時間……………………………………………………………………53
圖8. 不同演算法對屬性「信用卡消費金額」的建樹距離……………………………………53
圖9. 屬性「客戶每月的消費金額」的資料分佈………………………………………………54
圖10. 不同演算法對屬性「客戶每月的消費金額」的建樹距離………………………………54
圖11. 屬性s3的資料分佈圖………………………………………………………………………56
圖A.1. 屬性「Location of Manufacturer」的概念樹……………………………………………65
圖A.2. 屬性「Light Vehicle Model」的概念樹…………………………………………………66
圖A.3. 屬性「Engine Displacement」的概念樹…………………………………………………66
圖A.4. 屬性「Price」的概念樹…………………………………………………………………67
圖B. 一棵範圍從0到100的共同子孫樹…………………………………………………………68
圖C. 屬性s3的概念階層…………………………………………………………………………69
圖E.1. 屬性「頻率」的概念階層………………………………………………………………76
圖E.2. 屬性「年齡」的概念階層………………………………………………………………76
圖E.4. 屬性「個人月收入」的概念階層………………………………………………………76
圖E.5. 屬性「家庭平均月收入」的概念階層…………………………………………………77
圖E.6. 屬性「人口數」的概念階層……………………………………………………………77
圖E.7. 屬性「家庭經濟」的概念階層…………………………………………………………77
圖E.3. 屬性「職業」的概念階層………………………………………………………………78
表目
表格 1. 10個tuples 和 4 屬性的樣本資料表……………………………………………………7
表格 2. 表格1的資料用AOI 方法歸納後的結果………………………………………………8
表格 3. 表格1的資料用我們的演算法運算的結果……………………………………………9
表格 4. 由表格1的資料所計算出的 F(i,r)值……………………………………………………13
表格 5. 由表格4的資料所計算出的E(i, j, r) 值…………………………………………………16
表格 6. 由表格5的資料所計算出的DI(i, j) 值…………………………………………………17
表格 7. 由表格6的資料所計算出D(i, j, s) 矩陣表………………………………………………22
表格 8. 由表格6的資料所計算出B(i, j, s) 矩陣表………………………………………………22
表格 9. 6個學科成績的平均值和標準差………………………………………………………34
表格10(a). 視窗平滑度R = 1的產出結果…………………………………………………………38
表格10(b). 視窗平滑度R = 2的產出結果………………………………………………………38
表格10(c). 視窗平滑度R = 10的產出結果………………………………………………………39
表格10(d). 視窗平滑度R = 50的產出結果………………………………………………………39
表格11(a). EOAOI2演算法使用最佳概念階層取得的序列特徵………………………………57
表格11(b). EOAOI2演算法使用等距分割概念階層取得的序列特徵…………………………57
表格D.1. 原始資料序列歸納的結果……………………………………………………………72
表格D.2. 最顯著葉節點值佔區段資料百分比的平均…………………………………………73
表格D.3. 男性持卡人序列歸納的結果…………………………………………………………73
表格D.4. 女性持卡人序列歸納的結果…………………………………………………………74
表格D.5. 未婚持卡人序列歸納的結果…………………………………………………………74
表格D.6. 已婚持卡人序列歸納的結果…………………………………………………………74
表格D.7. 正常持卡人序列歸納的結果…………………………………………………………75
表格D.8. 異常持卡人序列歸納的結果…………………………………………………………75
參考文獻 Cai, Y., Cercone, N., Han, J., 1990. An attribute-oriented approach for learning classification rules from relational databases. In: Proceedings of Sixth International Conference on Data Engineering, pp. 281–288.
Carter, C.L., Hamilton, H.J., 1995. Performance evaluation of attribute-oriented algorithms for knowledge discovery from databases. In: Proceedings of Seventh International Conference on Tools with Artificial Intelligence, pp. 486–489.
Carter, C.L., Hamilton, H.J., 1998. Efficient attribute-oriented generalization for knowledge discovery from large databases. IEEE Transactions on Knowledge and Data Engineering, 10 (2), 193–208.
Chaudhuri, S., Dayal., U., 1997. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26, 65-74.
Chen, M.S., Han, J., Yu, P. S., 1996. Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883.
Cheung, D.W., Hwang, H.Y., Fu, A.W., Han, J., 2000. Efficient rule-based attribute-oriented induction for data mining. Journal of Intelligent Information Systems, 15 (2), 175-200.
Codd, E.F., Codd, S.B., Salley, C.T., 1993. Beyond decision support. Computer World, 27(30), 87-89.
Fayyad, U., Irani, K., 1993. Multi-interval discretion of continuous-values attributes for classification learning. In: Proceedings of 13th International Joint Conference on Artificial Intelligence, Chambery, France, pp.1022-1029.
Hamilton, H.J., Hilderman, R.J., Cercone, N., 1996. Attribute-oriented induction using domain generalization graphs. In: Proceedings of Eighth IEEE International Conference on Tools with Artificial Intelligence, pp. 246–252.
Han, J., Cai, Y., Cercone, N., 1992. Knowledge discovery in databases: an attribute-oriented approach. In: Proceedings of International Conference on Very Large Data Bases (VLDB-92), pp. 547-559.
Han, J., Cai, Y., Cercone, N., 1993. Data-driven discovery of quantitative rules in relational databases. IEEE Transactions on Knowledge and Data Engineering, 5 (1), 29 –40.
Han, J., Fu, Y., 1994. Dynamic generation and refinement of concept hierarchies for knowledge discovery in database. In: Proceedings of AAAI'94 Workshop Knowledge Discovery in Database, Seattle, WA, pp.157-168.
Han, J., Fu, Y., 1995. Discovery of multiple-level association rule from large database. In: Proceedings of 21th International Conference on Very Large Data Bases, Zurich, Switzerland, pp.420-431
Han, J., Kamber, M., 2001. Data Mining: Concepts and Techniques, Academic Press.
Han, J., Nishio, S., Kawano, H., Wang, W., 1998. Generalization-based data mining in object-oriented databases using an object-cube model. Data and Knowledge Engineering, 25, 55-97.
Hu, X., Cercone, N., 1996. Mining knowledge rules from databases: a rough set approach. In: Proceedings of the Twelfth International Conference on Data Engineering, pp. 96–105.
Lu, W., Han, J., Ooi, B.C., 1993. Discovery of general knowledge in large spatial databases. In: Proceedings of 1993 Far East Workshop on Geographic Information Systems (FEGIS-93), pp. 275-289.
Kerber, R. 1992. Discretization of numeric attributes. In: Proceedings of Tenth national Conference on Artificial Intelligence, San Jose, California, pp.123-128.
Kaufman, L., P. J. Rousseeuw. 1990. Finding Group in data: An Introduction to Cluster Analysis. John Wiley & Sons, New York
MacQueen, J. 1967. Some methods for classification and ayalysis of multivariate observations. In: Proceedings of 5th Berkeley symp. Math. Statist. Prob., 1, pp.281-297.
McClean, S., Scotney, B., Shapcott, M., 2000. Incorporating domain knowledge into attribute-oriented data mining. International Journal of Intelligent Systems, 15 (6), 535-548.
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
Shan, N., Hamilton, H.J., Cercone, N., 1995. GRG: knowledge discovery using information generalization, information reduction, and rule generation. In: Proceedings of the Seventh International Conference on the Tools with Artificial Intelligence, pp. 372–379.
Srikant, R. Agrawal. R. 1995. Mining generalized association rules. In: Proceedings of 21th International Conference on Very Large Data Bases, Zurich, Switzerland, pp.407-419.
Tsumoto, S., 2000. Knowledge discovery in clinical databases and evaluation of discovered knowledge in outpatient clinic. Information Sciences, 124 (1), 125-137.
指導教授 陳彥良(Yen-Liang Chen) 審核日期 2005-6-27
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡