姓名 |
彭康桓(Kang-Huan Peng)
查詢紙本館藏 |
畢業系所 |
資訊工程學系 |
論文名稱 |
基於粒化計算結合超立方體覆蓋之決策產生演算法 (A Decision Generation Algorithm with Hyper-Rectangle Covers based on Granular Computing)
|
相關論文 | |
檔案 |
[Endnote RIS 格式]
[Bibtex 格式]
[相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載]- 本電子論文使用權限為同意立即開放。
- 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
- 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
|
摘要(中) |
在知識挖掘和資料探勘中,離散化(Discretization)與屬性分配(Feature Selection)為必要之資料前處理之技術,前者主要的目標是將連續型資料屬性轉換成離散屬性,藉由結合連續型屬性將資料由為數龐大之資料(quantitative data)轉換為具有一定質量之資料(qualitative data)。後者是在模型的建構中,選擇適當地相關屬性,目的是排除掉不必要地、不恰當之資料以改善實驗結果之效能或是效率。使用這些方法可以得到更精確、簡易表示形式之資料應用在許多分類演算法上以獲得較好的效能或分類準確率。
而在類別型分類演算法(Classification algorithm designed to learn in categorical data)中,基於分散式粒化計算之決策產生演算法是一個擴展性強、辨識率高之分類演算法。此類以處理類別型資料(categorical data)的分類演算法若要處理線性資料(numeric data),預先從連續性屬性萃取出知識(extracting knowledge)是其重要關鍵。大部份先前處理類別型資料的分類演算法中,通常是將線性資料對應到非線性空間上,以避免線性資料相對於非線性資料太過離散的問題。
因此,本篇論文嘗試模型化數值型的線性資料,我們發現基於密度的超立方體覆蓋之啟發式模擬結晶法可以提供基於分散式粒化計算之決策產生演算法高質量規則,將模組化後之超立方體覆蓋規則加入基於分散式粒化計算之決策產生演算法,以改善基於分散式粒化計算之決策產生演算法對於數值型資料處理,我們還加入了在資料前處理之基於顆粒計算觀點之屬性分配(feature selection),來加強數值型或是混合式資料上的分類準確率。
我們採用監督式學習(supervised learning)方式進行實驗,並和傳統熱門演算法做比較,而實驗結果證實了演算法在對於數值型亦或是混合式資料上皆有相對良好的表現。 |
摘要(英) |
Discretization and feature selection are essential preprocessing techniques in many data mining and knowledge discovery tasks. The main goal of discretization is to transform a set of quantitative data into qualitative data; the main goal of feature selection is to select relative attributes for model reduction that obtains the optimal attribute subsets using selected measures, in order to achieve better accuracy or efficiency. With the preprocessing techniques, the preprocessed data can be regarded as the simplified and concise representation of information, which can be applied to many classification algorithms.
Distributed Decision Generation Based on Granular Computing (DGAGC), a recently developed classification algorithm by National Central University, Taiwan, has an excellent recognition rate on categorical dataset. However, it has a relatively low recognition rate on numerical data. To improve DGAGC, we propose a new algorithm for discretization of numerical data. First, the proposed algorithm decides whether an attribute should be treated as categorical or numerical in DGAGC. Second, the numerical data are preprocessed by the SC algorithm, a recently developed classification algorithm for numerical data by National Central University, Taiwan. Third, the SC algorithm transforms the numerical data into the corresponding categorical data. Finally, the transformed data, together with the untransformed categorical data, are handled by DGAGC for data classification. We compare the proposed classification algorithm to other famous classification algorithms using the UCI database and the KEEL database. The results show that the proposed algorithm improves DGAGC on handling numerical data. The results also show that, the proposed algorithm achieves a relatively good recognition rate when compared with existing popular classification algorithms. |
關鍵字(中) |
★ 離散化 ★ 屬性分配 ★ 粒化計算 ★ 分散式計算 ★ 資料探勘 ★ 分類演算法 |
關鍵字(英) |
★ Discretization ★ Feature Selection ★ Granular Computing ★ Distributed Computing ★ Data Mining ★ Classification Algorithm |
論文目次 |
摘要 I
Abstract II
目錄 III
圖目錄 IV
表目錄 V
第一章 緒論 1
1-1 前言 1
1-2 背景知識 2
1-3 問題定義與實作目標 5
1-4 論文貢獻 7
1-5 文章架構 8
第二章 相關研究 9
2-1 基於粒化計算之分散式決策產生演算法 9
2-2 基於密度的超立方體覆蓋之啟發式演算法 27
2-3 熱門分類器 29
第三章 基於粒化計算結合超立方體覆蓋之決策產生演算法 35
3-1 演算法執行流程 35
3-2 數值型屬性和類別型屬性 36
3-3 基於信任度(confidence)之屬性分類 38
3-4 DGAHGC演算法前處理執行與預測方式 41
3-5 基於粒化計算結合超立方體覆蓋之決策產生演算法 46
3-6 基於粒化計算結合超立方體覆蓋之決策產生演算法之範例 50
3-7基於粒化計算結合超立方體覆蓋之決策產生演算法之時間複雜度 53
第四章 實驗結果與分析 56
4-1 實驗環境 56
4-2 DGAHGC之參數分析 58
4-3 實驗準確率 66
第五章 結論與未來方向 72
參考文獻 73 |
參考文獻 |
[1] "Distributed Decision Generation Based on Granular Computing"
[2] "Efficient Classification Using Density-Based Hyper-Rectangle Covers"
[3] "Granules and Reasoning Based on Granular Computing"
[4] "Granular Computing: An introduction"
[5] "On Modeling Data Mining with Granular Computing"
[6] "Potential Applications of Granular Computing in Knowledge Discovery and Data Mining"
[7] "Granular Computing as a Basis for Consistent Classification Problems"
[8] "Induction of Classification Rules by Granular Computing"
[9] "Policy Generation for Privacy Protection Based on Granular Computing"
[10] "A Rule Generation Algorithm Based on Granular Computing"
[11] "A Nonparametric Multiclass Pattern Classifier"
[12] "Optimal Subclasses with Dichotomous Variables for Feature Selection and Discrimination"
[13] "Discretization: An Enabling Technique"
[14] "Supervised and Unsupervised Discretization of Continuous Features"
[15] "An introduction to variable and feature selection"
[16] "The Top Ten Algorithms in Data Mining"
[17] "C4.5: Programs for Machine Learning"
[18] "Fast Algorithms for Mining Association Rules"
[19] "The Optimality of Naive Bayes"
[20] "Discretization for Naïve-Bayes Learning: Managing Discretization Bias and Variance"
[21] "Support-Vector Networks"
[22] "Efficient Construction and Usefulness of Hyper-Rectangle Greedy Covers"
[23] "Granular Computing using Information Tables"
[24] "A Generalized Decision Logic Language for Granular Computing Fuzzy Systems"
[25] "Rough Sets, Theoretical Aspects of Reasoning about Data"
[26] "UCI Machine Learning Repository"
[27] "Knowledge Extraction based on Evolutionary Learning"
[28] "Weka" |
指導教授 |
王尉任(Wei-Jen Wang)
|
審核日期 |
2013-8-22 |
推文 |
facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
|
網路書籤 |
Google bookmarks del.icio.us hemidemi myshare
|