複合式群聚演算法; A Hybrid Approach to Clustering Algorithms

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/8650

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/8650

Title:	複合式群聚演算法;A Hybrid Approach to Clustering Algorithms
Authors:	劉逸群;YiChun Liu
Contributors:	資訊工程研究所
Keywords:	類神經網路;群聚分析;Neural Network;Clustering Analysis;Hierarchica
Date:	2002-06-30
Issue Date:	2009-09-22 11:32:22 (UTC+8)
Publisher:	國立中央大學圖書館
Abstract:	群聚分析對於分析檢視資料間的複雜結構是一種非常有效的工具，因此它的應用十分的廣泛。然而對於所有群聚分析方法，有兩個問題必須解決： (1)決定正確的群聚數目 (2)如何採用適當的相似度量測傳統上第一種的解決方式是增加群聚的數目，然後合併群聚，配合特定的驗證函數，以再分裂方式得到理想的群聚數目，但是這種方法的缺點是必須耗費大量的計算時間。第二個問題在於不同的資料，其幾何結構不盡相同，因此，無法使用同一種相似度量測，並且採用不同的相似度量測會影響到分群的結果。而群聚分析傳統的處理方法上幾乎是使用歐幾里德距離作為相似度量測方法，但事實上，有許多複雜的結構使用歐幾里德距離並無法獲得令人滿意的結果。本論文提出一個新的演算法結合階層式分群演算法與適應性共振理論的優點不僅針對群聚數目的決定提供了一個合理的解決方法，並且對於處理任意形狀分布的群集有很好的效果。我們使用了許多的資料集來測試所提出的方法，彰顯我們提出的方法相較於其他演算法的過人之處。 Clustering algorithms are effective tools for exploring the structures of complex data sets, therefore, are of great value in a number of applications. For most of clustering algorithms, two crucial problems required to be solved are (1) the determining of the optimal number of clusters (2) the determining of the similarity measure based on which patterns are assigned to corresponding clusters. The estimation of the number of clusters in the data set is the so-called cluster validity problem. Conventional approaches to solving the cluster validity problem usually involves increasing the number of clusters, and/or merging the existing clusters, computing some certain cluster validity measures in each run, until partition into optimal number of clusters is obtained. Since most validity measures usually assume a certain geometrical structure in cluster shapes, these approaches fail to estimate the correct number of clusters in real data with a large variety of distributions within and between clusters. The second crucial problem faces a similar situation. While it is easy to consider the idea of a data cluster on a rather informal basis, it is very difficult to give a formal and universal definition of a cluster. Most of the conventional clustering methods assume that patterns having similar locations or constant density create a single cluster. In order to mathematically identify clusters in a data set, it is usually necessary to first define a measure of similarity or proximity which will establish a rule for assigning patterns to the domain of a particular cluster center. As it is to be expected, the measure of similarity is problem dependent. That is, different similarity measures will result in different clustering results. In this paper, we propose a hierarchical approach to ART-like clustering algorithm which is able to deal with data consisting of arbitrarily geometrical-shaped clusters. Combining hierarchical and ART-like clustering is suggested as a natural feasible solution to the two problems of determining the number of clusters and clustering data.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Size	Format

社群 sharing

Loading...