DC 欄位 |
值 |
語言 |
DC.contributor | 資訊管理學系 | zh_TW |
DC.creator | 邱子安 | zh_TW |
DC.creator | Tzu-An Chiu | en_US |
dc.date.accessioned | 2018-1-18T07:39:07Z | |
dc.date.available | 2018-1-18T07:39:07Z | |
dc.date.issued | 2018 | |
dc.identifier.uri | http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=104423035 | |
dc.contributor.department | 資訊管理學系 | zh_TW |
DC.description | 國立中央大學 | zh_TW |
DC.description | National Central University | en_US |
dc.description.abstract | 隨著儲存媒體的技術進步,企業在儲存資料時不再像過去需要考慮容量問題,會將所有資料儲存下來以待後續分析,但是這使得資料過於繁雜,因此,在進行資料探勘時,資料前處理就變成一個重要的課題。特徵選取(feature selection)與樣本選取(instance selection)是前處理的兩大重要技術,過去的研究中往往專注討論一項,同時討論二者的研究並不常見,過去同時討論兩者的研究也只有使用基因演算法(genetic algorithm)作為特徵與樣本選取的方式,沒有其他方式的組合與比較,所以我們並不清楚用其他的特徵或樣本選取方式的組合是否會比基因演算法的組合更佳,同時,也不清楚其他方法在同時使用特徵與樣本選取時,先後順序是否會對效能有所影響。因此,本研究的目的是透過組合數種較具代表性的特徵與樣本選取方式,來探討選取方式之間的優劣以及先後順序的影響,以及在信用評估與破產預測兩個領域的資料集是否有差異。兩個領域中各使用了變數數量與類別的比例都不相同的資料集,目的在找出資料集的特性不同時,對於選取方式的選擇是否也會造成影響。實驗中使用了多個具代表性的分類器進行比較,目的是在找出選取方式的先後順序與最佳組合之外,找到最佳的分類器或分類器組合(classifier ensembles),作為後續實驗的參考依據。 | zh_TW |
dc.description.abstract | With advances in media storage technology, many companies do not consider the capacity when they store their data like they used to in the past. They now save all the data for further analysis, but this makes the data too complicated for practical usage. Thus, data pre-processing becomes an important issue in data mining. Feature selection and instance selection are two important tasks in data pre-processing, but the literatures often focused on a single task. Few literatures discuss both tasks at the same time, but they only use genetic algorithm as feature and instance selection function. We could not know if there are performance differences between other combination of pre-processing methods and genetic algorithm.
Therefore, the aim of this research is to perform feature selection and instance selection with several representatives of feature and instance selection methods using different priorities to examine the classification performances over two differnet domain, namely bankruptcy prediction and credit scoring.
We use datasets with different amount of features and different ratio of classes, to find out if the characteristic of the dataset will affect the performance of feature or instance selection. We also use several representatives of classifiers to find out which classifier or classifier ensembles is the best for further usage.
| en_US |
DC.subject | 資料探勘 | zh_TW |
DC.subject | 特徵選取 | zh_TW |
DC.subject | 樣本選取 | zh_TW |
DC.subject | 分類器組合 | zh_TW |
DC.subject | 基因演算法 | zh_TW |
DC.title | 在破產預測與信用評估領域對前處理方式與分類器組合的比較分析 | zh_TW |
dc.language.iso | zh-TW | zh-TW |
DC.title | Comparative analysis of pre-processing methods and classifier ensembles for bankruptcy prediction and credit scoring | en_US |
DC.type | 博碩士論文 | zh_TW |
DC.type | thesis | en_US |
DC.publisher | National Central University | en_US |