摘要: | 真實世界中的資料時常存有很多問題,像是含有雜訊、不相關的資料、資料量過大等等,因此在使用這些資料之前,必須先進行前處理,其中維度精簡為常見的資料前處理方法。運用維度精簡可以把重要的特徵保留起來,並減少資料維度。而集成維度精簡是指使用多種不同的維度精簡演算法,將他們所選的特徵子集透過不同方式進行融合,進而去提升維度精簡的穩健性和分類正確率。然而,近年來深度學習技術受到很大的重視,但因為其相關研究多是用來處理非結構化資料,較少有使用深度學習技術於高維度結構化資料,以及鮮少有較完整的討論於機器學習和深度學習技術的研究,因此本研究欲探討使用機器學習與深度學習為主的維度精簡和分類技術,於高維度結構化資料集。同時也想了解深度學習的效果是否能夠優於傳統機器學習,以及比較單一維度精簡和集成維度精簡的表現,以此找出較佳的維度精簡方法組合。 本研究針對二十個高維度結構化資料集,維度介於44到22283。應用機器學習與深度學習之維度精簡與分類技術,並且引用集成式學習、特徵融合的概念進行維度精簡。實驗使用五折交叉驗證,並紀錄平均正確率、平均ROC曲線下面積(Area Under Curve)、平均CPU運算時間,最後進行結果與分析,以及探討資料在不同維度間,維度精簡的優劣和推薦用法。 根據結果顯示,在本研究中,不管是維度精簡或分類器技術,使用深度學習方法表現會優於機器學方法,而使用集成式維度精簡的表現會優於單一維度精簡,其中以並列式維度精簡為最佳,最後,單一維度精簡的最佳的方法為SAE+MLP,序列式集成維度精簡方法中表現最好的方法是IG+SAE+MLP,並列式集成維度精簡則為AE+SAE(SFC)+MLP。 ;In the real world, data often presents many issues, such as noise, irrelevant information, and excessive data volume. Therefore, preprocessing is necessary before using this data. Dimensionality reduction is a common data preprocessing method that aims to retain important features and reduce data dimensionality. Ensemble dimensionality reduction refers to the use of multiple different dimensionality reduction algorithms, and combining their selected subsets of features in different ways. Through ensemble techniques, the robustness and classification accuracy of dimensionality reduction can be improved. In recent years, deep learning techniques have received significant attention. However, most of the related research has focused on handling unstructured data, with limited studies on the use of deep learning techniques for high-dimensional structured data, and a lack of comprehensive discussions on machine learning and deep learning techniques. Therefore, this study aims to investigate dimensionality reduction and classification techniques based on machine learning and deep learning, specifically for high-dimensional structured datasets. It also aims to understand whether deep learning can outperform traditional machine learning methods, and compare the performance of single dimensionality reduction and ensemble dimensionality reduction methods to identify optimal combinations of dimensionality reduction techniques. This study focuses on twenty high-dimensional structured datasets, ranging from 44 to 22,283 dimensions. Machine learning and deep learning-based dimensionality reduction and classification techniques are applied, incorporating ensemble learning and feature fusion concepts for dimensionality reduction. The experiments use five-fold cross-validation and record average accuracy, average area under curve, and average CPU time. Finally, the results are analyzed to evaluate the advantages and recommendations of dimensionality reduction across different dimensions. According to the experimental results of this study, both dimensionality reduction and classifier techniques using deep learning methods outperform machine learning methods. Ensemble dimensionality reduction outperforms single dimensionality reduction, with parallel dimensionality reduction being the best approach. Finally, the best single dimensionality reduction method is found to be SAE+MLP, and the best performing method among sequential ensemble dimensionality reduction approaches is IG+SAE+MLP, while AE+SAE(SFC)+MLP is the preferred approach for parallel ensemble dimensionality reduction. |