博碩士論文 101426027 完整後設資料紀錄

DC 欄位 語言
DC.contributor工業管理研究所zh_TW
DC.creator廖秋閔zh_TW
DC.creatorChiu-Min Liaoen_US
dc.date.accessioned2014-7-2T07:39:07Z
dc.date.available2014-7-2T07:39:07Z
dc.date.issued2014
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=101426027
dc.contributor.department工業管理研究所zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract由於科技的進步,使得資料量快速地成長。而資料採礦(Data mining)是可有效幫助我們組織成千上萬資料的方法,讓管理者可以從資料中得到相關資訊,做出適當的決策。其中群集分析為資料採礦中常使用的方法之一,而分群的依據來自於資料的特徵。在群集分析中較常使用的資料型態為類別型資料(Qualitative data)與數值型資料(Quantitative data),而流程型資料或字串型資料在過去較少被大家所討論,因此在本研究中,我們將針對流程型資料(字串型資料)提出可行的分群方法。 關於相似度的衡量方法,我們採用以下兩種方法,分別為Jaro similarity與Edit distance,其中距離愈大表示相似度愈小,且根據所定義的相似度或距離,我們可列出相似度矩陣,並利用相似度來對資料做分群。而在本研究中,我們採用凝聚型階層式分群方法來做分群,其中包含最短距離法、最長距離法和平均距離法等方法。在凝聚型階層式分群方法中,一開始每筆資料為各自一群,將最相似的群體逐一合併後,最終全部資料將會屬於同一群體。階層式分群方法的優點為可自己決定分群的群數,且透過階層分群圖可清楚明瞭分群的步驟。 本研究所探討的個案資料,資料型態皆為流程型資料(字串型資料),共使用了三個例子,其中兩個例子為標竿資料,廣泛被許多學者使用;另外一個例子來自於發動機在執行翻修工作時,所產生的待維修零件,因為不同的維修零件所經過的維修站不同,所以各自會有不同的維修流程。本研究中主要在解決流程型資料(字串型資料)間的相似度問題,使我們可以針對資料相似度做分群,讓管理者可以根據分群結果安排適當的維修工作或做其它決策。zh_TW
dc.description.abstractDue to the progressing of the science and technology, the data is growing rapidly. Data mining help us to organize the thousands of data efficiently and the managers can obviously find out the information that they do not know before and make appropriate decisions. Cluster analysis is one of the methods that are widely used in data mining according to the features of the data. Most of data applied to cluster analysis are qualitative and quantitative and the string data (flow data) is seldom discussed in cluster analysis. Therefore in this research, we try to propose some possible clustering methods to handle the string data. About the similarity measure, we adopt two measurements as follows. One is Jaro similarity and the other is Edit distance. The larger the value of distance is, the smaller the value of similarity will be. According to the similarity or distance that we define, we can obtain the similarity matrix. Hence, clustering the data is based on this matrix. In our study, we consider the agglomerative hierarchical clustering such as single linkage, complete linkage and average linkage to group string data. In the initial of agglomerative clustering, each string data is in its own cluster. It means that every cluster includes exactly one string. Then the most similar strings are grouped. After a series of merge operations, finally lead all strings to the same cluster. The advantage of hierarchical clustering algorithm is that we can decide the number of groups which we want to divide and we can obviously know the clustering steps through the hierarchical tree. We use three examples to present our methodology. The data type in our research is string data. Two benchmark examples and an engine parts dataset. Because different parts are passing different repair workstations, every part has its own repair procedure. Our study is focusing on dealing with the problem about counting similarity between strings. We want to cluster the string data and the clustering result can help the workstations work efficiently. en_US
DC.subject資料採礦zh_TW
DC.subject群集分析zh_TW
DC.subject相似度測量zh_TW
DC.subject字串型資料zh_TW
DC.subject凝聚型階層式分群zh_TW
DC.subjectData miningen_US
DC.subjectCluster analysisen_US
DC.subjectSimilarity Measureen_US
DC.subjectString dataen_US
DC.subjectAgglomerative clusteringen_US
DC.title使用凝聚型階層式分群法對流成行資料分群zh_TW
dc.language.isozh-TWzh-TW
DC.titleAgglomerative Hierarchical clustering with the string dataen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明