非數值型資料視覺化與兼具主客觀的分群; Categorical Data Visualization and Clustering with Subjective Factor

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/8817

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/8817

Title:	非數值型資料視覺化與兼具主客觀的分群;Categorical Data Visualization and Clustering with Subjective Factor
Authors:	丁智凱;Zhi-Kai Ding
Contributors:	資訊工程研究所
Keywords:	群聚分析;非數值型資料;視覺化;visualiztion;categorical data;cluster analysis
Date:	2003-06-25
Issue Date:	2009-09-22 11:35:19 (UTC+8)
Publisher:	國立中央大學圖書館
Abstract:	資料分群領域一直存在的問題在於如何決定資料適當的群聚數目，這方面牽涉到主觀的人為因素，不同的人可能會有不同的看法，所以利用視覺化幫助決定資料群聚個數是結合主觀因素的一個方式。然而資料視覺化問題長久以來都是以數值型資料作為分析的對象，對於非數值型資料的視覺化仍是一大困難。從這兩個觀點出發，本篇論文提出一個方法來探索非數值型的資料，可以同時做到分群與視覺化的功能：主要做法是將非數值型資料以互動式視覺化的方式呈現，增加使用者對初步分群結果的了解，並且結合人為主觀因素考量，擷取使用者對資料的巨觀角度，最後求得非數值型資料分群的結果。本篇論文的分群方式不需事先給定群聚數目，它的主要架構分成三個步驟，第一步是整合動態分群方法和搭配由分類概念衍生的相似度量測方式，將非數值型資料分群，在此一步驟中通常會給一個閥值較高的門檻值以確保必須要很相似的資料才會被分在同一群中，所以此時得到的初步分結果其群聚數目會比正確的群聚數目多，第二步驟便是擷取使用者的主觀因素，在此利用我們提出的非數值型群聚資料的互動式視覺化分析，讓使用者可以看到第一步分群後不同群聚的差異，可以進一步判斷兩個群聚相似與否的條件。第三步驟便是利用第二步所獲得的合併條件將現有群聚合併，以達到兼具主客觀的分群效果和求出最正確的群聚。在這整個過程中，資料的視覺化是透過分群架構來完成，作法是將資料轉換成代表群聚的圖形秀出。簡而言之，本篇論文的主要貢獻是提供了一個兼具主客觀的分群演算法，提昇了使用者對資料分群結果的可信度，其主要貢獻可以分為下列三點， 1. 擷取人為主觀因素幫助探索資料的群聚結構 2. 提供非數值型群聚資料視覺化的功能 3. 提供各種的非數值型資料一體適用的參數設定方法 Clustering is a useful method to explore the structures of complex data sets. However how to determine the appropriate cluster number is still a problem. It involves with human factor because different people may have different point. Therefore, integrating visualization and subjective factor to help user explore the data set is a practical way. Unfortunately, most visualization methods concerns only numeric data. Categorical data visualization is still an unresolved issue. In this paper, a new clustering approach called CDCS is introduced. Its central idea is a subjective factor extracting strategy. In the first step, the CDCS employs a single-pass clustering approach with a classification based similarity function to cluster data strictly. These clusters discovered from the first step are called s-clusters because they are usually small. Then, users can use our interactive visualization tool to observe the grouped s-clusters under certain merge threshold. At last, users can choose an appropriate merge threshold to merge them. This new approach can increase the clustering result reliability by extracting subjective factor and our experiment also shows that CDCS generates better quality clusters than other typical algorithms.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Size	Format

社群 sharing

Loading...