非數值型資料視覺化與兼具主客觀的分群; Categorical Data Visualization and Clustering with Subjective Factor

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/8817

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/8817

题名:	非數值型資料視覺化與兼具主客觀的分群;Categorical Data Visualization and Clustering with Subjective Factor
作者:	丁智凱;Zhi-Kai Ding
贡献者:	資訊工程研究所
关键词:	群聚分析;非數值型資料;視覺化;visualiztion;categorical data;cluster analysis
日期:	2003-06-25
上传时间:	2009-09-22 11:35:19 (UTC+8)
出版者:	國立中央大學圖書館
摘要:	資料分群領域一直存在的問題在於如何決定資料適當的群聚數目，這方面牽涉到主觀的人為因素，不同的人可能會有不同的看法，所以利用視覺化幫助決定資料群聚個數是結合主觀因素的一個方式。然而資料視覺化問題長久以來都是以數值型資料作為分析的對象，對於非數值型資料的視覺化仍是一大困難。從這兩個觀點出發，本篇論文提出一個方法來探索非數值型的資料，可以同時做到分群與視覺化的功能：主要做法是將非數值型資料以互動式視覺化的方式呈現，增加使用者對初步分群結果的了解，並且結合人為主觀因素考量，擷取使用者對資料的巨觀角度，最後求得非數值型資料分群的結果。本篇論文的分群方式不需事先給定群聚數目，它的主要架構分成三個步驟，第一步是整合動態分群方法和搭配由分類概念衍生的相似度量測方式，將非數值型資料分群，在此一步驟中通常會給一個閥值較高的門檻值以確保必須要很相似的資料才會被分在同一群中，所以此時得到的初步分結果其群聚數目會比正確的群聚數目多，第二步驟便是擷取使用者的主觀因素，在此利用我們提出的非數值型群聚資料的互動式視覺化分析，讓使用者可以看到第一步分群後不同群聚的差異，可以進一步判斷兩個群聚相似與否的條件。第三步驟便是利用第二步所獲得的合併條件將現有群聚合併，以達到兼具主客觀的分群效果和求出最正確的群聚。在這整個過程中，資料的視覺化是透過分群架構來完成，作法是將資料轉換成代表群聚的圖形秀出。簡而言之，本篇論文的主要貢獻是提供了一個兼具主客觀的分群演算法，提昇了使用者對資料分群結果的可信度，其主要貢獻可以分為下列三點， 1. 擷取人為主觀因素幫助探索資料的群聚結構 2. 提供非數值型群聚資料視覺化的功能 3. 提供各種的非數值型資料一體適用的參數設定方法 Clustering is a useful method to explore the structures of complex data sets. However how to determine the appropriate cluster number is still a problem. It involves with human factor because different people may have different point. Therefore, integrating visualization and subjective factor to help user explore the data set is a practical way. Unfortunately, most visualization methods concerns only numeric data. Categorical data visualization is still an unresolved issue. In this paper, a new clustering approach called CDCS is introduced. Its central idea is a subjective factor extracting strategy. In the first step, the CDCS employs a single-pass clustering approach with a classification based similarity function to cluster data strictly. These clusters discovered from the first step are called s-clusters because they are usually small. Then, users can use our interactive visualization tool to observe the grouped s-clusters under certain merge threshold. At last, users can choose an appropriate merge threshold to merge them. This new approach can increase the clustering result reliability by extracting subjective factor and our experiment also shows that CDCS generates better quality clusters than other typical algorithms.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	大小	格式	浏览次数

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....