博碩士論文 104453017 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系在職專班zh_TW
DC.creator歐先弘zh_TW
DC.creatorHsien-Hung Leoen_US
dc.date.accessioned2017-8-21T07:39:07Z
dc.date.available2017-8-21T07:39:07Z
dc.date.issued2017
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104453017
dc.contributor.department資訊管理學系在職專班zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract特徵屬性篩選(Feature Selection)在資料探勘裡,是很重要的資料前處理步驟,主要目的是希望在給定一個資料集時,可以透過特徵選取技術來去除不相關或是冗餘的特徵值,從目前現有相關文獻中,並沒有針對每一類特徵屬性篩選,與三種不同的資料類型(數值型、離散型、混合型)進行實驗,因此本研究選定了三種特徵屬性篩選技術:資訊獲利(Information Gain, GA)、基因演算法(Genetic Algorithm, GA)、決策樹(Decision Tree, DT),探討在這三種類型的未篩選與特徵屬性篩選下,在不同類型的資料集當中的分類表現,從UCI取得真實世界不同領域的40個資料集,實驗結果會在分類器:支持向量機 (Support Vector Machines, SVM)、最近鄰居法(K-Nearest Neighbor, KNN)、決策樹(Decision Tree, DT)、類神經網路(Artificial Neural Network, ANN)、AdaBoost、Bagging上進行驗證,希望透過正確率表現,探討出哪種特性的資料集透過哪種特徵屬性篩選,會提升某分類器演算法的效能,做為分析人員在進行實驗時的參考。 依據研究所得之結果,離型散資料不論使用哪一種單一分類器或是Adaboost的分類演算法,其基準正確率表現最佳,建議不需再進行特徵屬性篩選步驟;離散型資料使用Bagging多重分類器下選擇KNN分類器,經過DT特徵屬性篩選演算法後,其正確率會較執行其它演算法較佳;混合型資料除了IG特徵屬性篩選演算法,透過GA或是DT 特徵屬性篩選演算法,其正確率會比基準較佳;數值型資料中除了GA特徵屬性篩選演算法,透過GA或是DT 特徵屬性篩選演算法,其正確率會比基準較佳;數值型資料在MLP的基準正確率表現最佳,建議不需再進行特徵屬性篩選步驟。針對不同資料類型,在選定分類器之後,可參考本研究挑選正確率最佳的特徵屬性篩選方法優先進行。zh_TW
dc.description.abstractFeature selection is an important process for pattern recognition applications. The purpose of feature selection is to avoid classifier’s performance degradation. The removed feature(s) must be redundant, irrelevant, or of the least possible use. There is no related study which compares different feature selection methods with different data types, such as categorical, numerical, and mixed-type of datasets for classification performance. Therefore, in this thesis, three major feature selection methods were chosen, which are Information Gain (IG), Genetic Algorithm (GA) and Decision Tree (DT), and the research aim is to compare the classification accuracy of using these feature selection methods over different types of datasets. We illustrate the capability of the result by extensive experiments on analyzing 40 real-world datasets from UCI. In addition, six different classification techniques are compared, including Support Vector Machines (SVM), K-Nearest Neighbor (KNN), Decision Tree (DT), Artificial Neural Network (ANN), AdaBoost and Bagging. The experimental results show that the need for feature selection over categorical datasets is not strong. However, bagging based KNN and DT could increase the performance. For the mixed-type and numerical datasets, using GA and DT perform better. Particularly, if MLP is used, there is no need to do the feature selection process for numerical datasets. We demonstrate that different feature selection methods could increase the accuracy of some classification models.en_US
DC.subject資料探勘zh_TW
DC.subject特徵屬性篩選zh_TW
DC.subject分類演算法zh_TW
DC.subjectData Miningen_US
DC.subjectFeature Selecteden_US
DC.subjectClassification Algorithmen_US
DC.title特徵屬性篩選對於不同資料類型之影響zh_TW
dc.language.isozh-TWzh-TW
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明