使用凝聚型階層式分群法對流成行資料分群

DC 欄位	值	語言
DC.contributor	工業管理研究所	zh_TW
DC.creator	廖秋閔	zh_TW
DC.creator	Chiu-Min Liao	en_US
dc.date.accessioned	2014-7-2T07:39:07Z
dc.date.available	2014-7-2T07:39:07Z
dc.date.issued	2014
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=101426027
dc.contributor.department	工業管理研究所	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	由於科技的進步，使得資料量快速地成長。而資料採礦(Data mining)是可有效幫助我們組織成千上萬資料的方法，讓管理者可以從資料中得到相關資訊，做出適當的決策。其中群集分析為資料採礦中常使用的方法之一，而分群的依據來自於資料的特徵。在群集分析中較常使用的資料型態為類別型資料(Qualitative data)與數值型資料(Quantitative data)，而流程型資料或字串型資料在過去較少被大家所討論，因此在本研究中，我們將針對流程型資料(字串型資料)提出可行的分群方法。關於相似度的衡量方法，我們採用以下兩種方法，分別為Jaro similarity與Edit distance，其中距離愈大表示相似度愈小，且根據所定義的相似度或距離，我們可列出相似度矩陣，並利用相似度來對資料做分群。而在本研究中，我們採用凝聚型階層式分群方法來做分群，其中包含最短距離法、最長距離法和平均距離法等方法。在凝聚型階層式分群方法中，一開始每筆資料為各自一群，將最相似的群體逐一合併後，最終全部資料將會屬於同一群體。階層式分群方法的優點為可自己決定分群的群數，且透過階層分群圖可清楚明瞭分群的步驟。本研究所探討的個案資料，資料型態皆為流程型資料(字串型資料)，共使用了三個例子，其中兩個例子為標竿資料，廣泛被許多學者使用；另外一個例子來自於發動機在執行翻修工作時，所產生的待維修零件，因為不同的維修零件所經過的維修站不同，所以各自會有不同的維修流程。本研究中主要在解決流程型資料(字串型資料)間的相似度問題，使我們可以針對資料相似度做分群，讓管理者可以根據分群結果安排適當的維修工作或做其它決策。	zh_TW
dc.description.abstract	Due to the progressing of the science and technology, the data is growing rapidly. Data mining help us to organize the thousands of data efficiently and the managers can obviously find out the information that they do not know before and make appropriate decisions. Cluster analysis is one of the methods that are widely used in data mining according to the features of the data. Most of data applied to cluster analysis are qualitative and quantitative and the string data (flow data) is seldom discussed in cluster analysis. Therefore in this research, we try to propose some possible clustering methods to handle the string data. About the similarity measure, we adopt two measurements as follows. One is Jaro similarity and the other is Edit distance. The larger the value of distance is, the smaller the value of similarity will be. According to the similarity or distance that we define, we can obtain the similarity matrix. Hence, clustering the data is based on this matrix. In our study, we consider the agglomerative hierarchical clustering such as single linkage, complete linkage and average linkage to group string data. In the initial of agglomerative clustering, each string data is in its own cluster. It means that every cluster includes exactly one string. Then the most similar strings are grouped. After a series of merge operations, finally lead all strings to the same cluster. The advantage of hierarchical clustering algorithm is that we can decide the number of groups which we want to divide and we can obviously know the clustering steps through the hierarchical tree. We use three examples to present our methodology. The data type in our research is string data. Two benchmark examples and an engine parts dataset. Because different parts are passing different repair workstations, every part has its own repair procedure. Our study is focusing on dealing with the problem about counting similarity between strings. We want to cluster the string data and the clustering result can help the workstations work efficiently.	en_US
DC.subject	資料採礦	zh_TW
DC.subject	群集分析	zh_TW
DC.subject	相似度測量	zh_TW
DC.subject	字串型資料	zh_TW
DC.subject	凝聚型階層式分群	zh_TW
DC.subject	Data mining	en_US
DC.subject	Cluster analysis	en_US
DC.subject	Similarity Measure	en_US
DC.subject	String data	en_US
DC.subject	Agglomerative clustering	en_US
DC.title	使用凝聚型階層式分群法對流成行資料分群	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Agglomerative Hierarchical clustering with the string data	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 101426027 完整後設資料紀錄