博碩士論文 101426027 詳細資訊

以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:75 、訪客IP:
姓名 廖秋閔(Chiu-Min Liao)  查詢紙本館藏   畢業系所 工業管理研究所
論文名稱 使用凝聚型階層式分群法對流成行資料分群
(Agglomerative Hierarchical clustering with the string data)
★ 二階段作業研究模式於立體化設施規劃應用之探討–以半導體製造廠X及Y公司為例★ 推行TPM活動以改善設備總合效率並提昇 企業競爭力...以U公司桃園工廠為例
★ 資訊系統整合業者行銷通路策略之研究★ 以決策樹法歸納關鍵製程暨以群集法識別關鍵路徑
★ 關鍵績效指標(KPI)之建立與推行 - 在造紙業★ 應用實驗計劃法- 提昇IC載板錫球斷面品質最佳化之研究
★ 如何從歷史鑽孔Cp值導出新設計規則進而達到兼顧品質與降低生產成本目標★ 產品資料管理系統建立及導入-以半導體IC封裝廠C公司為例
★ 企業由設計代工轉型為自有品牌之營運管理★ 運用六標準差步驟與FMEA於塑膠射出成型之冷料改善研究(以S公司為例)
★ 台灣地區輪胎產業經營績效之研究★ 以方法時間衡量法訂定OLED面板蒸鍍有機材料更換作業之時間標準
★ 利用六標準差管理提升生產效率-以A公司塗料充填流程改善為例★ 依流程相似度對目標群組做群集分析- 以航空發動機維修廠之自修工件為例
★ 設計鏈績效衡量指標建立 —以電動巴士產業A公司為例★ 應用資料探勘尋找影響太陽能模組製程良率之因子研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 由於科技的進步,使得資料量快速地成長。而資料採礦(Data mining)是可有效幫助我們組織成千上萬資料的方法,讓管理者可以從資料中得到相關資訊,做出適當的決策。其中群集分析為資料採礦中常使用的方法之一,而分群的依據來自於資料的特徵。在群集分析中較常使用的資料型態為類別型資料(Qualitative data)與數值型資料(Quantitative data),而流程型資料或字串型資料在過去較少被大家所討論,因此在本研究中,我們將針對流程型資料(字串型資料)提出可行的分群方法。
關於相似度的衡量方法,我們採用以下兩種方法,分別為Jaro similarity與Edit distance,其中距離愈大表示相似度愈小,且根據所定義的相似度或距離,我們可列出相似度矩陣,並利用相似度來對資料做分群。而在本研究中,我們採用凝聚型階層式分群方法來做分群,其中包含最短距離法、最長距離法和平均距離法等方法。在凝聚型階層式分群方法中,一開始每筆資料為各自一群,將最相似的群體逐一合併後,最終全部資料將會屬於同一群體。階層式分群方法的優點為可自己決定分群的群數,且透過階層分群圖可清楚明瞭分群的步驟。
摘要(英) Due to the progressing of the science and technology, the data is growing rapidly. Data mining help us to organize the thousands of data efficiently and the managers can obviously find out the information that they do not know before and make appropriate decisions. Cluster analysis is one of the methods that are widely used in data mining according to the features of the data. Most of data applied to cluster analysis are qualitative and quantitative and the string data (flow data) is seldom discussed in cluster analysis. Therefore in this research, we try to propose some possible clustering methods to handle the string data.
About the similarity measure, we adopt two measurements as follows. One is Jaro similarity and the other is Edit distance. The larger the value of distance is, the smaller the value of similarity will be. According to the similarity or distance that we define, we can obtain the similarity matrix. Hence, clustering the data is based on this matrix. In our study, we consider the agglomerative hierarchical clustering such as single linkage, complete linkage and average linkage to group string data. In the initial of agglomerative clustering, each string data is in its own cluster. It means that every cluster includes exactly one string. Then the most similar strings are grouped. After a series of merge operations, finally lead all strings to the same cluster. The advantage of hierarchical clustering algorithm is that we can decide the number of groups which we want to divide and we can obviously know the clustering steps through the hierarchical tree.
We use three examples to present our methodology. The data type in our research is string data. Two benchmark examples and an engine parts dataset. Because different parts are passing different repair workstations, every part has its own repair procedure. Our study is focusing on dealing with the problem about counting similarity between strings. We want to cluster the string data and the clustering result can help the workstations work efficiently.
關鍵字(中) ★ 資料採礦
★ 群集分析
★ 相似度測量
★ 字串型資料
★ 凝聚型階層式分群
關鍵字(英) ★ Data mining
★ Cluster analysis
★ Similarity Measure
★ String data
★ Agglomerative clustering
論文目次 Contents
中文摘要 i
Abstract iii
1. Introduction 1
1.1 Background/Motivation 1
1.2 Research objectives 2
1.3 Research Methodology 3
2. Literature Review 4
2.1 Cluster Analysis 4
2.2 Group Technology 5
2.3 Similarity measure 5
2.3.1 Euclidean distance 6
2.3.2 Jaro similarity 6
2.3.3 Jaro-Winkler similarity 8
2.3.4 Edit distance 9
2.4 Clustering Techniques 10
2.4.1 Hierarchical 11
2.4.2 Non-Hierarchical 13
2.5 Clustering Validity assessment 14
3. Methodology 17
3.1 Jaro similarity 18
3.2 Normalized Edit distance 20
3.2.1 Edit distance 20
3.2.2 Normalized Edit Distance 22
4. Numerical Example 24
4.1 Example 1 24
4.2 Example 2 28
4.3 Example 3 31
5. Conclusion and Future Research 35
5.1 Conclusion 35
5.2 Future Research 36
Reference 37
Appendix: The Procedure of Clustering 41
參考文獻 Reference
1. Baeza-Yates, R. A., “Introduction to data structures and algorithms related to information retrieval”, In Information Retrieval: Data Structures and Algorithms, pp.13-27.
2. Cohen, W. W., Ravikumar, P. and Fienberg, S. E., “A Comparison of String Distance Metrics for Name-Matching Tasks”, Proceedings of the ACM Workshop on Data Cleaning, Record Linkage and Object Identification, Washington DC, August 2003.
3. Dunn, J. C., “Well-Separated Clusters and Optimal Fuzzy Partitions”, Journal of Cybernetics, Vol. 4, No. 1, pp.95-104, 1974.
4. Gupta, V. and Lehal, G. S., “A Survey of Text Mining Techniques and Applications ”, Journal of Emerging Technologies in Web Intelligence, Vol. 1, No. 1, pp.60-76, 2009.
5. Halkidi, M., Vazirgiannis, M., “A density-based cluster validity approach using muti-representatives.”, Pattern Recognition, Vol. 29, No. 6, pp.773-786, 2008.
6. Harhalakis, G., Nagi, R. and Proth, J. M., “An efficient heuristic in manufacturing cell formation for group technology applications,” International Journal of Production Research, Vol. 28, pp.185-198, 1990.
7. Heragu, S., “Group technology and cellular manufacturing”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 24, No. 2, pp.203-215, 1994.
8. Jain, A. K., Murty, M. N. and Flynn, P. J., “Data Clustering: A Review”, ACM Computing Surveys, Vol. 31, No. 3, pp.264-323, 1999.
9. Jain, A. K., “Data clustering: 50 years beyond K-means”, Pattern Recognition Letters, Vol. 31, pp. 651-666, 2010.
10. Jaro, M. A., “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida”, Journal of the American Statistical Association, Vol.89, pp.414-420, 1989.
11. Jaro, M. A., “Probabilistic linkage of large public health data file”, Statistics in Medicine, vol.14, pp.491-498, 1995.
12. Jon, R. K., “A patent analysis of cluster analysis”, Applied Stochastic Models in Business and Industry - Special issue on the 6th International Symposium on Business and Industrial Statistics (ISBIS-6), Vol. 25, No. 4, pp.460-467, 2009.
13. Kim, Y. G., Suh, J. H. and Park, S. C., “Visualization of patent analysis for emerging technology”, Expert Systems with Applications, vol. 34, pp. 1804-1812, 2008.
14. Knuth, D., “The Art of Computer Programming”, Addison-Wesley, Reading, MA, 1973.
15. Kusiak, A., “The generalized group technology concept”, International Journal of Production Research, Vol. 25, No. 4, pp. 561-569, 1987.
16. Kusiak, A. and Chow, W., “Decomposition of manufacturing systems”, IEEE Trans. Robotics and Automation, Vol. 4, No. 5, pp. 457-471, 1988.
17. Kusiak, A. and Cho, M., “Similarity coefficient algorithms for solving the group technology problem”, International Journal of Production Research, Vol. 30, No. 11, pp. 2633-2646, 1992.
18. McCallum, A. and Wellner, B. (2003), “Object Consolidation by Graph Partitioning with a Conditionally-Trained Distance Metric,” Proceedings of the ACM Workshop on Data Cleaning, Record Linkage and Object Identification, Washington DC, August 2003.
19. Murty, M. N. and Jain, A. K., “knowledge based clustering scheme for collection management and retrieval of library books”, Pattern Recognition, Vol. 28, No. 7, pp.949-963, 1995.
20. Nair, G. J., and Narendran, T. T., “CASE: a clustering algorithm for cell formation with sequence data,” International Journal of Production Research, Vol. 36, pp.157-179, 1998.
21. Ngai, E. W. T., Xiu, L. and Chau, D. C. K., “Application of data mining techniques in customer relationship management:A literature review and classification”, Expert Systems with Applications, Vol. 36, pp.2592-2602, 2009.
22. Oehler, K. L. and Gray, R.M., “Combining Image Compression and Classification Using Vector Quantization”, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 17, No. 5, pp.461-473, 1995.
23. Okuda, T., Tanaka, E. and Kasai, T., “A Method for the Correction of Garbled Words Based on the Levenshtein Metric”, IEEE Transactions on Computers, Vol. 25, No. 2, pp.172-178, 1976.
24. Rohlf, F. J., “Methods of Comparing Classifications”, Annual Review of Ecology and Systematics, Vol. 5, pp.101–113, 1974.
25. Seyed Hosseini, S. M., Maleki, A. and Gholamian, M. R., “Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty”, Expert Systems with Applications, Vol.37, pp.259–5264, 2010.
26. Teymourian, E., Mahdavi, I. and Kayvanfar, V., “A new cell formation model using sequence data and handling cost factors”, Industrial Engineering and Operations Management, Vol.4, pp.22–24, 2011.
27. Vendramin, L. and Campello, R. J. and Hruschka, E. R., “Relative clustering validity criteria: A comparative overview”, Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol.3, No. 4, pp.209-235, 2010.
28. Wemmerlov, U. and Hyer, N. L., “Procedures for the part family/machine group identification problem in cellular manufacturing”, Journal of Operations Management, Vol.6, No. 2, pp.125-147, 1986.
29. Winkler, W. E., “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage”, Proceedings of the Section on Survey Research, pp.354-359, 1990.
30. Winkler, W. E., “The State of Record Linkage and Current Research Problems”, Statistical Society of Canada, Proceedings of the Survey Methods Section, pp.73-80, 1999.

31. Winkler, W. E., “Overview of Record Linkage and Current Research Directions”, Statistical Research Division U.S. Census Bureau, 2006.
32. Won, Y. and Kim, S., “Multiple criteria clustering algorithm for solving the group technology problem with multiple process routings”, Computers & Industrial Engineering, Vol.32, No. 1, pp.207-220, 1997.
33. Xu, R. and Wunsch, D., “Survey of clustering algorithms”, IEEE Transactions on Neural Networks, Vol.16, No. 3, pp.645-678, 2005.
34. Zalik, K. R. and Zalik, B., “Validity index for clusters of different sizes and densities.”, Pattern Recognition, Vol. 32, pp.211-234, 2011.
35. Zhang, K., “Algorithms for the constrained editing distance between ordered labeled trees and related problems”, Pattern Recognition, Vol. 28, pp.463-471, 1995.
36. 曾固鈺,「依流程相似度對目標群組做集分析-以航空發動機維修廠之自修工件為例」,國立中央大學工業管理研究所碩士論文,2013.
指導教授 曾富祥(Fu-Shiang Tseng) 審核日期 2014-7-2
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明