樹狀資料之中心集群分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：46

、訪客IP：18.216.227.61

姓名

林政延(Zheng-Yan Lin) 查詢紙本館藏

畢業系所

工業管理研究所

論文名稱

樹狀資料之中心集群分析
(Center-based clustering with tree structured data)

相關論文

★ 二階段作業研究模式於立體化設施規劃應用之探討–以半導體製造廠X及Y公司為例	★ 推行TPM活動以改善設備總合效率並提昇企業競爭力...以U公司桃園工廠為例
★ 資訊系統整合業者行銷通路策略之研究	★ 以決策樹法歸納關鍵製程暨以群集法識別關鍵路徑
★ 關鍵績效指標(KPI)之建立與推行 - 在造紙業	★ 應用實驗計劃法- 提昇IC載板錫球斷面品質最佳化之研究
★ 如何從歷史鑽孔Cp值導出新設計規則進而達到兼顧品質與降低生產成本目標	★ 產品資料管理系統建立及導入-以半導體IC封裝廠C公司為例
★ 企業由設計代工轉型為自有品牌之營運管理	★ 運用六標準差步驟與FMEA於塑膠射出成型之冷料改善研究(以S公司為例)
★ 台灣地區輪胎產業經營績效之研究	★ 以方法時間衡量法訂定OLED面板蒸鍍有機材料更換作業之時間標準
★ 利用六標準差管理提升生產效率－以Ａ公司塗料充填流程改善為例	★ 依流程相似度對目標群組做群集分析- 以航空發動機維修廠之自修工件為例
★ 設計鏈績效衡量指標建立 —以電動巴士產業A公司為例	★ 應用資料探勘尋找影響太陽能模組製程良率之因子研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

集群分析為資料探勘中非常熱門的一門應用領域，其主要作用為藉由資料間的特性將資料分群。傳統上，集群分析較常被使用在數值型資料上及類別型的資料上。然而群集分析卻很少應用在有關樹狀資料型態的研究上。然而樹狀資料常以各種不同形式出現在我們日常生活中，像是物料清單以及檔案資料結構。因此本研究主要針對如何將集群分析應用在樹狀資料。
集群分析主要分為兩種形式，一種為階層式分群法，另一種為非階層式分群法。目前有關樹狀資料的分群研究大多為階層式分群。然而在結果上比起階層式分群法，非階層式分群法有其優勢。因此本研究主要著重在如何對樹狀資料做非階層式的集群分析。在集群分析中，另外一個需要考量的重點為相似度的定義，在本篇研究中，我們採用的樹編輯距離為最主流的樹狀資料相似度衡量方式，其概念為藉由找出轉換一樹狀資料至另一樹狀資料之間的最小編輯步驟為兩樹狀資料之間的距離。
在過去，有學者Syu (2014) 提出方法將非階層式分群應用在字串型資料上。然而相對於字串型資料，樹狀資料要考慮的不僅僅為資料順序問題，還要考慮到節點以及分支結構的特性。在考量到樹狀資料的特性下，我們結合了K-means與K-modes兩種演算法的特性作為我們建立中心點的基礎。依此中心點，我們便能對樹狀資料做非階層式分群。

摘要(英)

Cluster analysis is a very popular topic in the fields of data mining. The main purpose of clustering is to cluster objects according to the characteristic of objects. However, the number of researches on tree-structured data clustering analysis is very few. Tree-structured data are everywhere in our daily life with variant form, such as bill of material (BOM), XML structure.
Most studies of clustering on tree-structured data are hierarchical. However, non-hierarchical clustering has its own advantages when compare to hierarchical clustering. Therefore, we focus on applied non-hierarchical clustering on tree-structured data. The other important thing in cluster analysis is the similarity measure. The similarity we adopted was the tree edit distance which was the most popular similarity measure when measuring tree-structured data.
In the past, Syu (2014) proposed a method to applied string data on non-hierarchical clustering. However, tree-structured data has more characteristic we need to concern, such as level, node, and arc. We proposed a method which combined the concepts of both K-means and K-modes to determine the center of cluster. Through the center we determined, we can make the combination of non-hierarchical clustering and tree-structured data.

關鍵字(中)

★ 資料探勘
★ 樹狀資料
★ 樹編輯距離
★ 非階層式分群

關鍵字(英)

論文目次

摘要 ii
Abstract iii
Contents iv
Contents of Figures vi
Contents of Tables viii
Chapter 1 Introduction 1
1-1 Background and motivation 1
1-2 Research objective and framework 2
Chapter 2 Literature Review 3
2-1 Tree-structured data 3
2-2 Cluster analysis 3
2-2-1 K-means algorithm 5
2-2-2 K-modes algorithm 5
2-2-3 parameter setting 6
2-3 Similarity measure 6
2-3-1 Similarity measure of string data 7
2-3-2 Similarity measure of tree-structured data 8
Chapter 3 Methodology 12
3-1 Step 1: Select K initial points as K clusters 14
3-2 Step 2: Assign objects into K clusters 14
3-3 Step 3: Compute the center of new clusters 14
3-3-1 Decide the node 15
3-3-2 Decide the amount of arcs 16
3-3-3 Standardize all the trees into the same dimension 16
3-4 Step 4: Calculate distance between objects and clusters. 21
3-5 Step 5: Check if any objects moved to other clusters? 22
3-6 Step 6: Output final result 23
Chapter 4 Numerical Example 24
4-1 Hierarchical clustering 25
4-2 Non-hierarchical clustering 28
Chapter 5 Conclusion 33
Reference 35
Appendix : Complete hierarchical clustering results with three clusters 37

參考文獻

1. Ahmad, A., Dey, L., “A k-mean clustering algorithm for mixed numeric and categorical data”, Data & Knowledge Engineering, vol.63, no.2, pp.503-527, 2007.

2. Cheng, Y., et al., “Pattern matching of alarm flood sequences by a modified Smith–Waterman algorithm”, Chemical Engineering Research and Design, vol.91, no.6,pp.1085-1094, 2013.

3. Cohen, W. W., et al., “A comparison of string distance metrics for Name-matching tasks”, American Association Artificial Intelligence, 2003.

4. Cao, F., et al., “A dissimilarity measure for the k-Modes clustering algorithm”, Knowledge-Based Systems, vol.26, pp.120-127, 2012.

5. Erisoglu, M., et al., “A new algorithm for initial cluster centers in k-means algorithm”, Pattern Recognition Letters, vol.32, no.14, pp.1701-1705, 2011.

6. Fu, K. S., Lu, S. Y., “A Clustering Procedure for Syntactic Patterns”, IEEE Transactions On Systems, Man, And Cybernetics, vol.7, no.10, pp.734-742,1977.

7. Gamero, F. I., et al., “Process diagnosis based on qualitative trend similarities using a sequence matching algorithm”, Journal of Process Control,vol.24, no.9,pp.1412-1424, 2014.

8. Huang, Z., “Extensions to the k-means algorithm for clustering large data set with categorical values”, Data mining and knowledge discovery, vol.2, pp.283-304, 1998.

9. Huang, W., et al., “a simple method to analyze the similarity of biological sequences based on the fuzzy theory”, Journal of Theoretical Biology, vol.265, no.3,pp.323-328, 2010.

10. Jain, A. K., Dubes, R. C., Algorithms for clustering data, Prentice-Hall, Inc., 1988.

11. Jain, A. K., “Data clustering: 50 years beyond K-means”, Pattern recognition letters, vol.31, pp.651-666, 2010.

12. Khan, S. S., Ahmad, A., “Cluster center initialization algorithm for K-means clustering”, Pattern Recognition Letters, vol.25, no.11, pp.1293-1302, 2004.

13. Liao, C. M., “Agglomerative Hierarchical clustering with the string data”, National Central University, 2014.

14. Liu, N., Wang, T., “A relative similarity measure for the similarity analysis of DNA sequences”, Chemical Physics Letters, vol.408, no.4-6,pp.307-311, 2005.

15. Mardia, K. V., et al., Multivariate Analysis, Academic Press, 1979.

16. Pandi, M. H., et al., “A novel similarity measure for sequence data”, Journal of Information Processing Systems, vol.7, no.3, pp.413-424, 2011.

17. Pham, D. T., et al., “Selection of K in K-means clustering”, Mechanical Engineering Science, pp.103-119, 2004.

18. Syu, J. W., “Center-based algorithm with the string data”, National Central University, 2014.

19. Wu, D., et al., “Similarity measure models and algorithms for hierarchical cases”, Expert Systems with Applications, vol.38, no.12, pp.15049-15056, 2011.

20. Wang, W. J., “New similarity measures on fuzzy sets and on elements”, Fuzzy Sets and Systems, vol.85, no.3, pp.305-309, 1997.

21. Yeung, D. S., Wang, X. Z., “Improving performance of similarity-based clustering by feature weight learning”, IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.24, no.4, pp.556-561, 2002.

22. Zhang, K., Shasha, D., “Simple fast algorithms for the editing distance between trees and related problems”, SIAM Journal on Computing, vol.18, no.6, pp.1245-1262, 1989.

指導教授

曾富祥

審核日期

2015-7-6

推文