博碩士論文 984203045 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator朱啟源zh_TW
DC.creatorChi-yuan Chuen_US
dc.date.accessioned2011-7-20T07:39:07Z
dc.date.available2011-7-20T07:39:07Z
dc.date.issued2011
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=984203045
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract特徵選取(feature selection)和樣本選取(instance selection)在資料探勘裡,是兩個很重要的資料前處理技術,主要目的是希望再給定一個資料集時,可以透過特徵選取技術來去除不相關或是冗餘的特徵值,或是透過樣本選取技術來消除重覆及錯誤的資料,特別的是基因演算法(genetic algorithm)是過去最被廣泛應用在這資料前處理技術的演算法,而目前這兩種資料前處理的方法,在過去往往是被分開探討的,所以目前尚未清楚特徵選取和樣本選取同時執行與個別單獨執行,其執行效能與結果有什麼樣的不同,因此本研究的目的是透過基因演算法去處理特徵選取與樣本選取,並且探討兩種資料前處理方法之間的順序,在不同的領域資料集中的分類表現,實驗的結果來自於不同領域的四個大型資料集與四個小型資料集在分類器(例如:support vector machines and k-nearest neighbor)上的表現,而其中這八個資料集的維度特徵與資料樣本數目並不相同,目的是希望可以將這樣的方法不僅可以應用在不同領域的資料集,還可以應用在差異性大的資料集,除此之外,本研究除了找到不同的資料前處理模式,更進一步的分析資料集的特性,目的是希望透過正確率與時效性的兩個層面,更進一步的探討那種特性的資料集適合應用何種資料前處理方法,透過找出一定的規律和準則,讓不同領域的資料集皆能夠在分類器上或實驗的時效性上,皆有較佳的表現。 zh_TW
dc.description.abstractFeature selection and instance selection are two important data preprocessing steps in data mining, where the former aims at removing some irrelevant and/or redundant features from a given dataset and the later for discarding the faulty data. In particular, genetic algorithms have been widely used for these tasks in related studies. However, these two data processing tasks are generally considered separately in literature. It is unknown about the performance differences between performing both feature and instance selection and feature or instance selection individually. Therefore, the aim of this paper is to perform feature selection and instance selection based on genetic algorithms using different priorities to examine the classification performances over different domain datasets. Experimental results based on four small and large scale datasets containing various numbers of features and data samples show that performing both feature and instance selection usually make the classifiers (i.e., support vector machines and k-nearest neighbor) perform slightly poorer than feature selection or instance selection individually. However, while there is not a significant difference in classification accuracy between these different data preprocessing methods, the combination of feature and instance selection largely reduces the computational effort of training the classifiers than feature and instance selection individually. By considering both classification effectiveness and efficiency, performing feature and instance selection is the optimal solution for data preprocessing in data mining. en_US
DC.subject資料探勘zh_TW
DC.subject特徵選取zh_TW
DC.subject基因演算法zh_TW
DC.subject樣本選取zh_TW
DC.subjectdata miningen_US
DC.subjectfeature selectionen_US
DC.subjectinstance selectionen_US
DC.subjectgenetic algorithmsen_US
DC.title資料前處理之研究:以基因演算法為例zh_TW
dc.language.isozh-TWzh-TW
DC.titleFeature and Instance Selection Using Genetic Algorithms:An Empirical Studyen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明