Semantic Tree II:具語意描述能力的分群演算法;+C4779Semantic Tree II:A Clustering Algorithm with Ability of Semantic Description

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/13118

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/13118

題名:	Semantic Tree II:具語意描述能力的分群演算法;+C4779Semantic Tree II:A Clustering Algorithm with Ability of Semantic Description
作者:	饒祐安;Yu-An Jao
貢獻者:	資訊管理研究所
關鍵詞:	資料挖掘;分群;Data Mining;Clustering
日期:	2004-05-31
上傳時間:	2009-09-22 15:24:05 (UTC+8)
出版者:	國立中央大學圖書館
摘要:	在資料探勘這個領域當中，分群是ㄧ項重要的課題。現有的分群方法存在著兩項缺點：1. 無法預測新的資料項應屬於哪個群組、2. 分群後的結果無具備語意描述的能力。【Liu et al, 3】提出CLTree分群法，透過建構decision tree的方式來完成分群，decision tree最終的葉節點就是分群的結果，因此，每一個群都能夠從decision tree中獲得一個唯一的語意描述。然而，CLTree仍舊有其弱點存在。Semantic Tree【李育璇】在文中指出，CLTree分群法在分群空間所使用的分群屬性，與建構決策樹時所使用的分類屬性完全相同，是同樣的一個集合，此項缺點限制了許多使用者實務上能夠應用的範圍，例如：銀行可能想要將信用卡消費者的消費行為作分群，消費族群的特徵作分類等。雖然Semantic Tree克服CLTree的一項弱點，並在模擬實驗上有很好的結果，但是Semantic Tree與CLTree一樣，皆屬於density-based分群法，也就是說，它們只針對數值資料作分群，無法處理名目資料。然而，實務應用中存在著大量數值與名目混合的資料，只能夠處理數值資料的分群法，無法滿足實務上的需求。所以，本研究提出Semantic Tree II分群演算法，它能夠同時處理數值與名目資料，並具備分群結果語意描述的能力，模擬實驗的結果也證明，Semantic Tree II的確能夠處理實務上的真實資料。 Clustering analysis is an important task in data mining. Due to the nature of the clustering theory, these techniques keep the result of clustering, which gives the chance of better utilization of managing the objects. Yet they all have some common shortages:(1) Unable to predict new objects. (2) Difficult to give clear semantic description for each cluster. In [Liu et al, 3], a decision tree, called CLTree is built based on decision trees in classification to represent a result of clustering. The technique introduced in the paper uses the same attribute set for both partitioning the dataset and constructing the decision tree. However, in a practical situation, it is possible that the two kinds of attributes may be different from each other. [Lee, 1] proposed an improved technique, Semantic Tree, to allow different attributes set for clustering and partitioning which brings better chances for the technique to be applied. A drawback for the above two techniques is that both techniques are density-based, i.e. they can be applied only to numerical attributes. This can be fatal when we want to cluster those categorical datasets. In this paper, we develop a new technique using k-nearest neighbor graph, which allows both numerical and categorical attributes. The technique also covers the convenience of unsupervised learning as well as the ability of prediction of decision trees.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	大小	格式	瀏覽次數

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....