非數值型資料視覺化與兼具主客觀的分群

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：3.144.238.20

姓名

丁智凱(Zhi-Kai Ding) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

非數值型資料視覺化與兼具主客觀的分群
(Categorical Data Visualization and Clustering with Subjective Factor)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 關聯性字組在文件摘要上的探討
★ 淨化網頁：網頁區塊化以及資料區域擷取	★ 問題答覆系統使用語句分類排序方式之設計與研究
★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘	★ 星狀座標之軸排列於群聚視覺化之應用
★ 由瀏覽歷程自動產生網頁抓取程式之研究	★ 動態網頁之樣版與資料分析研究
★ 同性質網頁資料整合之自動化研究	★ 時序性資料庫中未知週期之非同步週期性樣板的探勘

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

資料分群領域一直存在的問題在於如何決定資料適當的群聚數目，這方面牽涉到主觀的人為因素，不同的人可能會有不同的看法，所以利用視覺化幫助決定資料群聚個數是結合主觀因素的一個方式。然而資料視覺化問題長久以來都是以數值型資料作為分析的對象，對於非數值型資料的視覺化仍是一大困難。從這兩個觀點出發，本篇論文提出一個方法來探索非數值型的資料，可以同時做到分群與視覺化的功能：主要做法是將非數值型資料以互動式視覺化的方式呈現，增加使用者對初步分群結果的了解，並且結合人為主觀因素考量，擷取使用者對資料的巨觀角度，最後求得非數值型資料分群的結果。
本篇論文的分群方式不需事先給定群聚數目，它的主要架構分成三個步驟，第一步是整合動態分群方法和搭配由分類概念衍生的相似度量測方式，將非數值型資料分群，在此一步驟中通常會給一個閥值較高的門檻值以確保必須要很相似的資料才會被分在同一群中，所以此時得到的初步分結果其群聚數目會比正確的群聚數目多，第二步驟便是擷取使用者的主觀因素，在此利用我們提出的非數值型群聚資料的互動式視覺化分析，讓使用者可以看到第一步分群後不同群聚的差異，可以進一步判斷兩個群聚相似與否的條件。第三步驟便是利用第二步所獲得的合併條件將現有群聚合併，以達到兼具主客觀的分群效果和求出最正確的群聚。在這整個過程中，資料的視覺化是透過分群架構來完成，作法是將資料轉換成代表群聚的圖形秀出。簡而言之，本篇論文的主要貢獻是提供了一個兼具主客觀的分群演算法，提昇了使用者對資料分群結果的可信度，其主要貢獻可以分為下列三點，
1. 擷取人為主觀因素幫助探索資料的群聚結構
2. 提供非數值型群聚資料視覺化的功能
3. 提供各種的非數值型資料一體適用的參數設定方法

摘要(英)

Clustering is a useful method to explore the structures of complex data sets. However how to determine the appropriate cluster number is still a problem. It involves with human factor because different people may have different point. Therefore, integrating visualization and subjective factor to help user explore the data set is a practical way. Unfortunately, most visualization methods concerns only numeric data. Categorical data visualization is still an unresolved issue. In this paper, a new clustering approach called CDCS is introduced. Its central idea is a subjective factor extracting strategy.
In the first step, the CDCS employs a single-pass clustering approach with a classification based similarity function to cluster data strictly. These clusters discovered from the first step are called s-clusters because they are usually small. Then, users can use our interactive visualization tool to observe the grouped s-clusters under certain merge threshold. At last, users can choose an appropriate merge threshold to merge them. This new approach can increase the clustering result reliability by extracting subjective factor and our experiment also shows that CDCS generates better quality clusters than other typical algorithms.

關鍵字(中)

★ 群聚分析
★ 非數值型資料
★ 視覺化

關鍵字(英)

★ visualiztion
★ categorical data
★ cluster analysis

論文目次

圖表目錄 3
表格目錄 4
1. 緒論 5
1. 緒論 5
1.1 本篇論文的貢獻 7
1.2 本篇論文的架構 8
2. 相關研究 10
2.1 非數值型分群演算法之相關研究 10
2.1.1 擴充現有演算法的分群方式並修正相似度計算 10
2.1.2 關聯式規則為基礎的分群方式 12
2.1.3 分析屬性值關係為基礎的分群方式 12
2.2 資料視覺化相關研究 13
2.2.1 線性轉換映射 14
2.2.2 賽門投射演算法 14
2.2.3自我組織映射法 15
2.3.4 資料分析導向的方法 15
2.3.4.1 馬賽克法 16
2.3.4.2 樹狀轉換法 16
2.3.4.3 排序屬性值法 17
2.3 總結 18
3. 結合主觀因素之非數值型資料分群演算法(CDCS) 20
3.1非數值型資料之相似度的計算與動態分群 21
3.2資料的巨觀角度 22
3.3群聚的分組與合併 24
3.4總結 26
4.1 群聚的視覺化 27
4.1.1視覺化原理 27
4.1.2 視覺化的呈現方法 27
4.2 互動式的視覺化分析 31
4.2.1 系統概觀 31
4.2.2 相似度觀察 32
4.3 複雜的資料分佈 36
4.4 總結 37
5.實驗結果與比較 38
5.1 資料分群數目與分群結果的品質 38
5.2 資料分群數目與合併參數的關係 42
5.3 分群準確率與分群數目的關係 45
5.4 總結 46
6.結論與未來展望 47
6.1 結論 47
6.2 未來展望 47
7.參考文獻 49

參考文獻

[1]. A. K. H. Tung, J. Hou, and J. Han, ”Spatial Clustering in the Presence of Obstacles,” In Proceedings of 2001 Int. Conf. on Data Engineering (ICDE'01), 2001.
[2]. A. Konig, “Interactive Visualization and Analysis of Hierarchical Neural Projections for Data Mining,” In Proceedings of IEEE Transactions on Neural Networks, Vol. 11, No. 3, MAY 2000
[3]. B. Liu, W. Hsu, and Y. Ma, "Integrating Classification and Association Rule Mining." Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98, full paper), New York, USA, 1998
[4]. C.J.Merz, P.Murphy, UCI repository of machine learning databases, 1996
(http:://www.cs.uci.edu/~mlearn/MLRepository.html)
[5]. D. Gibson, J. Kleinberg and P. Raghavan, “Clustering categorical data: an approach based on dynamical systems,” In Proceedings of VLDB, Volume 8, Issue 3-4, pp 222-236, 1998.
[6]. D. Fisher, “Improving Inference through Conceptual Clustering,” In Proceedings of AAAI-87 Sixth National Conference on Artificial Intelligence, 1987.
[7]. D. Meretakis and B. Wuthrich, “Extending Naïve Bayes Classifiers Using Long Itemsets”, KDD-99 Fifth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 8/15/99 - 8/18/99 San Diego, CA USA
[8]. E-H. Han, G. Karypis, V. Kumar and B. Mobasher, “Clustering based on association rule hypergraphs,” In Proccedings of SIGMOD'97 Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD'97), May 1997.
[9]. E Sirin and F. Yaman, “Visualizing Dynamic Hierarchies in TreeMaps,” Department of Computer Science. Available : http://www.cs.umd.edu/class/spring2002/cmsc838f/Project/DynamicTreemap.pdf.
[10]. F. Hoppner, F. Klawonn, R. Kruse and T. Runkler, FUZZY CLUSTER ANALYSIS, pp.43-59, JOHN WILEY & SONS, LTD, 1999
[11]. J. Han and M. Kamber, Data mining : Concepts and Techniques, pp. 335-381, pp.226-271, pp.39-98. Morgan Kaufmann Publishers, 2000.
[12]. J. T. TOU and R. C. Gonzalez, Pattern Recognition Principles, pp. 90-91, Addison-Wesley Publishing Company, 1974.
[13]. M.C. Su and Y.C. Liu, “A hierarchical approach to ART-like clustering algorithm,” Neural Networks, 2002. IJCNN '02. Proceedings of the 2002 International Joint Conference on , 2002 , Page(s): 788 -793 vol.1
[14]. M. Friendly, “Visualizing Categorical Data: Data, Stories, and Pictures,” SAS Users Group International, 25th Annual Conference. 2002
[15]. N. R. Pal and V Kumar Eluri, “Two Efficient Connectionist Schemes for Structure Preserving Dimensionality Reduction,” In Proceedings of IEEE Transactions on Neural Networks, Vol. 9, No. 6, 1998
[16]. P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” In Proceedings of Advances in Knowledge Discovery and Data Mining, 1996.
[17]. S.Guha, R. Rastogi and K. Shim, “ROCK: a robust clustering algorithm for categorical attributes,” In Proceedings of International Conference on Data Engineering, 1999.
[18]. S. Ma. and J. L. Hellstein, “Ordering Categorical Data to Improve Visualization,” To appear in IEEE Symposium on Information Visualization, Oct. 1999.
[19]. T. Kohonen, Self-organizing maps, Berlin New York, Springer-verlag , 1995
[20]. T. Kohonen., S. Kaski, K. Lagus and T. Honkela, “Very large two-level SOM for the browsing of newsgroups,” In Proceedings of ICANN96, International Conference on Artificial Neural Networks, 1996.
[21]. T. M. Mitchell, Machine Learning, pp. 177-184, McGraw Hill, 1997.
[22]. V. Ganti, J. Gehrke and R. Ramakrishnan, “CACTUS - Clustering Categorical Data Using Summaries,” In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 1999.
[23]. W. A. Kosters, E. Marchiori and A. A. J. Oerlemans, “Mining Clusters with Association Rules,” In Proceedings of Advances in Intelligent Data Analysis, Third International Symposium, 1999.
[24]. W. Li, J. Han, and J. Pei, `` CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules, ', Proc. 2001 Int. Conf. on Data Mining (ICDM'01), San Jose, CA, Nov. 2001.
[25]. Y. Zhang, A. Fu , C.H. Cai, P. Heng, ”Clustering Categorical Data,” In Proceedings of 16th IEEE International Conference on Data Engineering, 2000.
[26]. Z. Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Mining and Knowledge Discovery 2, 283–304 (1998)

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2003-7-9

推文