具語意描述能力的分群演算法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：20

、訪客IP：18.118.126.159

姓名

李育璇(Yu-Hsuan Lee) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

具語意描述能力的分群演算法
(Clustering algorithms with ability of semantic description)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

分群屬於資料挖掘中一個重要的領域，但在以往的研究中，分群方法大多無法讓分群結果同時具有語意描述，其中CLTree雖然可以克服這項缺點，但仍有一些弱點存在，也就是用來建立語意描述的屬性與分群的屬性是完全相同的，實務上卻有可能遇到用來建立語意描述的屬性與分群的屬性不完全相同、甚至完全不同的情況。例如網站設計者在重新設計網站架構時，常一起被存取的網頁表示彼此間有相似的特性，便可將一起被存取的網頁記錄視為分群屬性對網頁做分群；但用以建立網站架構的分類目錄的屬性不可能也使用一起被存取的網頁記錄，因為這種目錄不易於了解也不具有語意上的任何意義，我們通常會另外選擇有意義的屬性來做分類，例如網頁內容的主題、關鍵字、撰寫時間等，而在本研究中語意描述的屬性也就是所謂的分類屬性。
因此在本研究裡，擴展CLTree的概念，發展出三種可以處理分類屬性與分群屬性不需完全相同的具語意描述能力的分群演算法。

摘要(英)

In the previous works, most of the clustering methods can’’t give each of the clusters a semantic description. There is a novel clustering method, CLTree, which can solve such a problem. However, the attributes which are used in clustering are the same as the ones used in establishing the semantic descriptions. We may meet a situation that the attributes used in clustering are different from the ones in establishing semantic descriptions. For example, when redesigning the architecture of a web site, we may use the browsing log to be the clustering attributes since the pages which are often access together imply that they have more similar properties. At the same time, we would choose other attributes, such as subjects, keywords, or last modified time of web pages, to build the hierarchical directory because browsing log is meaningless in interpreting the website’’s architecture. Note that we use classification attributes denote attributes which can used in establishing semantic descriptions.
In this paper, we extend the concept of CLTree and develop three clustering algorithms with ability of semantic description. These algorithms can use different classification attributes and clustering attributes.

關鍵字(中)

★ 資料挖掘
★ 分群

關鍵字(英)

★ Clustering
★ Data Mining

論文目次

1. 緒論 1
1.1. 分群的簡介 1
1.2. 傳統分群方法的缺點 2
1.3. CLTree-改進傳統演算法的缺點 3
1.4. CLTree的缺點 5
2. 問題描述與相關定義 10
3. Semantic Clustering方法 15
3.1. Decision Tree 15
3.2. Entropy法 16
3.2.1. 衡量方式-Entropy 16
3.2.2. 切割方法 17
3.3. 等距二元法 18
3.3.1. 衡量方式-Diversity 18
3.3.2. 切割方法-step1切割節點 18
3.3.3. 切割方法-step2合併節點 20
3.4. 等頻二元法 21
3.5. 終止條件 23
3.6. Semantic Clustering演算法架構 24
3.7. 範例說明 25
4. 模擬結果 30
4.1. 模擬環境 30
4.2. 模擬資料的產生 30
4.3. 模擬方式 31
4.4. 參數的調整 34
4.4.1. 等距二元法 35
4.4.2. 等頻二元法 36
4.4.3. Entropy法 37
4.4.4. CLTree 39
4.5. 模擬結果 39
4.5.1. 效率 40
4.5.2. 正確率 46
4.5.3. 群數正確率 50
4.5.4. 信用卡資料模擬結果 54
4.6. 小結 56
5. 結論 58
參考文獻 60
附錄A 61
附錄B 63
附錄C 65

參考文獻

[1] A.K. Jain, M.N. Murty, and P.J. Flynn, Data clustering: a review, ACM Computing Surveys, 31(3):264--323, 1999.
[2] B. Liu, Y. Xia, and P. Yu, Clustering through decision tree construction, In SIGMOD-00, 2000.
[3] C.H. Cheng, A.W. Fu, and Y. Zhang, Entropy-based subspace clustering for mining numerical data, KDD-99, 84-93, 1999.
[4] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
[5] J.R. Quinlan, C4.5 : Programs for Machine Learning, Morgan Kaufmann, 1993.
[6] J.R. Quinlan, Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research, 4:77-90, 1996.
[7] M. Halkidi, Y. Batistakis, M. Vazirgiannis, Clustering algorithms and validity measures, Tutorial paper, Proceedings of SSDBM Conference, 3 -22, Virginia, USA, 2001.
[8] M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms, IEEE, 2002.
[9] N. Ye, and X. Li, A scalable, incremental learning algorithm for classification problems, Computers & Industrial Engineering Journal, 43(4):677-692, 2002.
[10] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining aplications, In Proc. of the ACM SIGMOD, 1999.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2003-6-18

推文