基於相關性與自動編碼器的同質集成與二階段特徵選擇

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/93260

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/93260

Title:	基於相關性與自動編碼器的同質集成與二階段特徵選擇
Authors:	謝鎮安;Hsieh, Chen-An
Contributors:	資訊管理學系
Keywords:	特徵選擇;高維資料集;自動編碼器;集成學習;穩定性;Feature selection;High-dimensional dataset;Autoencoder;Ensemble learning;Stability
Date:	2023-07-24
Issue Date:	2024-09-19 16:50:57 (UTC+8)
Publisher:	國立中央大學
Abstract:	本研究旨在將自動編碼器特徵選擇應用於監督式任務，研究該方法與相關性特徵選擇在預測性能和穩定性方面的表現，並進一步分析同質集成架構與本研究提出的二階段結合架構對特徵選擇效能的影響，以建立更好的特徵選擇方法。本研究建構了基於Gedeon方法的自動編碼器特徵選擇，並與Impurity、Anova、ReliefF和Mutual Information四種相關性特徵選擇進行比較。實驗結果顯示，自動編碼器特徵選擇在沒有使用架構改進的情況下表現不佳。在同質集成實驗中，相關性特徵選擇能透過犧牲少量的預測性能換取更好的穩定性，使其在整體表現上更好；自動編碼器特徵選擇透過同質集成架構能獲得穩定性與預測性能上的提升，並在預測性能上贏過相關性特徵選擇。在二階段實驗中，以自動編碼器特徵選擇作為第一階段的方法是最佳的結合順序。透過結合兩種不同評估方式的特徵選擇方法，在預測性能上優於未集成與同質集成的所有特徵選擇方法。根據實驗結果，本研究建議在進行特徵選擇時，應根據不同應用情境選擇同質集成或二階段結合架構，來提升特徵選擇的整體效能。同質集成著重於提升穩定性，而二階段結合則能有效提升預測性能，並透過對前後兩個階段的特徵選擇使用同質集成來保持良好的穩定性。 ;This study aims to apply autoencoder feature selection to supervised tasks, investigate its prediction performance and stability compared to relevance feature selection, and further ana-lyze the impact of homogeneous ensemble and the proposed two-phase combination on feature selection effectiveness to establish a better feature selection method. We constructed an autoencoder feature selection method based on the Gedeon method and compared it with four relevance feature selection methods: Impurity, Anova, ReliefF, and Mutu-al Information. The experimental results showed that the autoencoder feature selection per-formed poorly without architectural improvements. In the homogeneous ensemble experiment, relevance feature selection achieved better overall evaluation by sacrificing a small amount of prediction performance in exchange for im-proved stability. The autoencoder feature selection improved stability and prediction perfor-mance, outperforming relevance feature selection in prediction performance. In the two-phase combination, using autoencoder feature selection as the first-phase is the optimal combination order. Combining two different evaluation feature selections in this order, it outperforms all non-ensemble and homogeneous ensemble feature selection methods in prediction performance. Based on the experimental results, this study suggests that feature selection should be cho-sen based on different application scenarios, either using a homogeneous ensemble or the two-phase combination, to enhance the effectiveness of feature selection. The homogeneous ensemble focuses on improving stability. In contrast, the two-phase combination effectively im-proves prediction performance and maintains good stability by applying a homogeneous en-semble to the feature selection in both phases.
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	18	View/Open

社群 sharing

Loading...