針對雙屬性集合問題的兩階段分群演算法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：44

、訪客IP：18.118.193.28

姓名

蕭雅君(Ya-Chun Hsiao) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

針對雙屬性集合問題的兩階段分群演算法
(Two-Staged Clustering Algorithm for Two-Attributes-Set Problem)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

分群在許多領域中被廣泛的研究與應用，分群在資料探勘技術中更是一項很重要的領域。分群是將相似的資料劃分成同一群並以少數的群集代替龐大的資料。然而目前的傳統演算法，用以計算相似度進行分群的屬性與分群後用以表達群集特徵的屬性必頇是相同的，但實際上可以加以區別的，例如當銀行想了解不同背景信用卡使用者的消費行為，則一般會希望以年齡薪資等個人資料來區隔群集、描述群集的特徵屬性，且希望分出的群集其群內消費行為是相似的，因此需要兩組不同的屬性，一組是個人資料作為區隔群集、描述群集的特徵屬性，另一組是消費行為作為計算相似度進行分群的屬性。我們將計算相似度的屬性稱為分群距離屬性，描述群的特徵屬性稱為使用者表達屬性，而傳統演算法在分群時，表達屬性與距離屬性必頇是相同的，因此無法針對前述的問題產生良好的分群結果。因此我們提出兩階段分群演算法，可以處理距離屬性與表達屬性是不同的問題，使得分群的結果可用表達屬性來區隔與描述，而群內距離屬性依然是相似的。

摘要(英)

Cluster analysis has recently become a highly active topic in data mining research. However, existing clustering algorithms had a common problem for applying on practical application that they consider only one set of attributes for both partitioning data space and measuring similarity between objects when clustering data. There are some practical situations that two different sets of attributes are required for both procedures. For example, a bank needs to cluster their customers to learn about customers’ consumption behaviors of different background. Then customers should be clustered by the attribute set of consumption behaviors, while the bank still need to know the characteristics of every cluster from the customers’ personal information like age and income. Therefore, two different sets of attributes are required that one set is for similarity-measuring, called similarity-measuring attribute, and the other one, called dataset-partitioning attribute, is for partitioning data set as well as describing resulting clusters. Traditional algorithms do not distinguish the two sets of attributes which lead to low quality clustering results in such two-attributes-set problem. We propose Two-Clustering Algorithm to solve the two-attributes-set problem, generating resulting clusters that can be segmented or described by dataset-partitioning attributes and objects in the same cluster are similar in similarity-measuring attributes as well.

關鍵字(中)

★ 群集分析
★ 分群
★ 資料挖掘

關鍵字(英)

★ Clustering
★ Cluster analysis
★ Data mining

論文目次

List of Figures ............................................................................................................................... ii
List of Tables ................................................................................................................................ iv
Chapter 1 Introduction................................................................................................................. 1
1.1 Background ....................................................................................................................... 1
1.2 Motivation ......................................................................................................................... 2
1.3 Research Objectives ......................................................................................................... 6
1.4 Thesis Framework ............................................................................................................ 6
Chapter 2 Related Works ............................................................................................................. 7
Chapter 3 The Problem and the Definitions ........................................................................... 10
3.1 Research Problem ........................................................................................................... 10
3.2 Definitions ....................................................................................................................... 11
Chapter 4 Two-Staged Clustering Algorithm ......................................................................... 18
4.1 Overview of the Algorithm ............................................................................................ 18
4.2 The Clustering Algorithm .............................................................................................. 22
4.3 Example ........................................................................................................................... 26
Chapter 5 Experiments .............................................................................................................. 33
5.1 Content of Experiments .................................................................................................. 33
5.2 Performance Evaluation ................................................................................................. 34
Chapter 6 Conclusions and Future Works .............................................................................. 58
References .................................................................................................................................... 60

參考文獻

[1] P. Berkhin, “Survey of clustering data mining techniques,” Technical Report, Accrue Software, 2002.
[2] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufmann, 2000.
[3] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data clustering: a review,” ACM Computing Surveys, 31(3):264--323, 1999.
[4] B. Liu, Y. Xia, and P. Yu, “Clustering through decision tree construction,” In SIGMOD-00, 2000.
[5] C.H. Cheng, A.W. Fu, and Y. Zhang, “Entropy-based subspace clustering for mining numerical data,” KDD-99, pp. 84-93, 1999.
[6] P.-N. Tan, M. Steinbach, and V. Kumar, “Introduction to Data Mining,” Addison Wesley, 2006.
[7] Y.L. Chen, W.H. Hsu, and Y.H. Lee, “TASC: two-attribute-set clustering through decision tree construction,” European Journal of Operational Research, 174(2), pp. 930–944, 2006.
[8] W.H. Hsu, J.A. Jao, and Y.L. Chen, “Discovering conjecturable rules through tree-based clustering analysis,” Expert Systems with Applications, 29 (3), pp. 493-505, 2005.
[9] G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: Hierarchical clustering using dynamic modeling,” IEEE Computer, vol. 32, pp. 68–74, Aug. 1999.
[10] S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Proc. 15th Int’l Conf. Data Eng., IEEE CS Press, Los Alamitos, Calif. , pp. 512-521, 1999.
[11] S. Guha, R. Rastogi, and K. Shim, “CURE: An Efficient Clustering Algorithm for Large Databases,” Proc. ACM SIGMOD Int’l Conf. Management of Data, ACM Press, New York, pp. 73-84, 1998.
[12] Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
[13] http://lib.stat.cmu.edu/datasets/
[14] L. Kaufman and P. J. Rousseeuw, “Finding Groups in Data: an Introduction to Cluster Analysis,” John Wiley & Sons, 1990.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2009-11-11

推文