一個估計資料群數的新方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：20

、訪客IP：3.144.253.160

姓名

范文翔(Wen-Hsiang Fan) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

一個估計資料群數的新方法
(A new method for estimating the number of clusters)

相關論文

★ 時間數列模型之統計推論

★ 高維度共變異矩陣之推估及其應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

估計資料群數是群集分析(cluster analysis)中一個重要的問題。在本篇論文中，我們嘗試模型選取中最被普遍使用的貝氏訊息準則(Bayesian information criterion)做為群集問題中選取群數的標準。然而，在資料變數為一維的情況下，我們發現使用BIC會高估資料的真實群數；即使嘗試各種不同的懲罰項，並沒有找到一個有效的一致性訊息準則(consistent information criterion)。因此，本篇論文提出了一個群數估計的新方法，並經由程式模擬說明其估計資料群數的準確性。

摘要(英)

A major problem in cluster analysis is to find the number of clusters. In this paper, we try to use Bayesian information criterion(BIC), a wide-used criterion in model selection problem, as a criterion to estimate the number of clusters. However, we found that the ture number of clusters would be overestimated when using BIC as a criterion in one dimension case. We can not find a consistent information criterion in the problem of number estimation. We propose a new method for estimating the number of clusters and show the currency of the method via simulation study.

關鍵字(中)

★ K平均值分群演算法
★ 訊息準則

關鍵字(英)

★ Information criterion
★ K-means clustering algorithm

論文目次

一、緒論..................................1
1.1 研究背景..............................1
1.2 研究動機..............................2
二、文獻回顧..............................3
2.1 Gap統計量.............................3
2.2 Calinski-Harabasz index...............4
2.3 Krzanowski-Lai index..................5
2.4 Hartigan統計量........................5
三、一致性訊息準則在集群分析上的探討......7
3.1 高估分群群數現象的發生................9
3.2 低估分群群數現象的發生................11
3.3 使用一致性訊息準則估計群數的模擬結果..12
四、估計群數的新方法......................16
4.1 最小變量法............................18
4.2 模擬研究..............................19
五、結論與未來方向........................22
參考文獻..................................23

參考文獻

[1] Calinski, R. B. and Harabasz, J. A.(1974). A denrite method for cluster analysis. Communications in Statistics 3, 1-27.
[2] Hartigan, J. A.(1975). Clustering Algorithms. Wiley.
[3] Kaufman, L. and Rousseeuw, P.(1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley.
[4] Krzanowski, W. J. and Lai, Y. T.(1985). A criterion for determining the number of clusters in a data set. Biometrics 44, 23-34.
[5] Milligan, G. W. and Cooper, M. C.(1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159-179
[6] Sugar, Catherine A. and James, Gareth M.(2003). Finding the number of clusters in a data set: An information theoretic approach. Journal of the American Statistical Association 98, 750-763.
[7] Tibshirani, R., Walther, G., and Hastie, T.(2001). Estimating the number of clusters in a data set via the gap statistics. Journal of the Royal Statistical Society, Series B 63, 411-423.

指導教授

銀慶剛(Ching-Kang Ing)

審核日期

2008-7-17

推文