單一與並列式集成特徵選取方法於多分類類別不平衡問題之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：3.21.103.8

姓名

陳冠蓁(Kuan-Chen Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

單一與並列式集成特徵選取方法於多分類類別不平衡問題之研究
(Comparison of single feature selection and ensemble feature selection in multi-class imbalanced classification)

相關論文

★ 單一類別分類方法於不平衡資料集－搭配遺漏值填補和樣本選取方法	★ 單分類方法於類別不平衡資料集之研究－結合特徵選取與集成式學習
★ 應用文字探勘技術於股價預測：探討傳統機器學習及深度學習技術與不同財經新聞來源之關係	★ 混合式前處理於類別不平衡問題之研究 - 結合機器學習與生成對抗網路

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-31以後開放)

摘要(中)

近年來個人裝置、嵌入式系統盛行，資料透過網路從全世界匯集成許多高維度資料，這些資料中僅僅單一資料集就可能達到數個拍位元（Petabyte），這些巨量資料雖然對企業來說增加了許多新的商業機會，但其高維度的特性同時也讓很多企業都感到困擾，由於資料量過於龐大所以企業需要更多的儲存空間，而且若是要利用這些高維度資料建立資料探勘模型，會使得訓練時間很長並可能導致模型學習表現不佳，為了避免上述高維度特性所產生的問題，可以採用資料前處理方法中常被使用的特徵選取技術來降低資料的維度，本研究也因此以特徵選取為主要研究，希望探討出不同高維度資料集其最佳的特徵選取方法。
在分類問題中，現今多數特徵選取相關研究都是採用二元分類，但真實世界中，多分類的分類問題也是需要處理的問題。在多分類特徵選取相關文獻中，較少有同時應用過濾類（Filter）、包裝類（Wrapper）、嵌入類（Embedded）這三類型的特徵選取技術，且多分類特徵選取的文獻中，多分類還未有單一特徵選取方法的並列式集成特徵選取技術搭配。
本研究針對十個高維度多分類類別不平衡資料集，應用三種類型的單一特徵選取方法，含六個過濾類（Filter）、五個包裝類（Wrapper）、四個嵌入類（Embedded）並引用並列式集成的概念進行特徵選取。同時對於類別不平衡的問題，採用資料層級（Data Level）中增加少數法SMOTE，將樣本分布平衡化，最後紀錄平均正確率、平均 ROC曲線下面積、運算時間，欲探討哪種是最佳的特徵選取方法的組合。
從本研究的實驗結果來看，針對多分類不平衡資料集，建議使用SMOTE方法，此外，崁入類特徵選取在SVM分類器，進行先SMOTE後特徵選取，搭配Lasso+XGBoost的聯集，有最高的預測表現之平均正確率；崁入類特徵選取在SVM分類器，進行先特徵選取後SMOTE，搭配Lasso+RandomForest+XGBoost聯集，有最高的預測表現之平均ROC曲線下面積。

摘要(英)

In recent years, personal devices and embedded systems have become prevalent. Data is collected from all over the world into many high-dimensional data through the Internet. Only a single data set of these data may reach several petabytes. For example, many new business opportunities have been added, but its high-dimensional characteristics also trouble many companies. Because the amount of data is too large, companies need more storage space, and if they want to use these high-dimensional data to establish data mining. The model will take a long time to train and may lead to poor model learning performance. In order to avoid the problems caused by the above-mentioned high-dimensional characteristics, the feature selection technology often used in the data preprocessing method can be used to reduce the data dimension. Therefore, feature selection is the main research, hoping to explore the best feature selection method for different high-dimensional datasets. Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches
In classification problems, most of the current research on feature selection uses binary classification, but in the real world, multi-class classification problems are also problems that need to be dealt with. In literature related to multi-class feature selection, there are few feature selection methods that apply all the three types of filter, wrapper, and embedded. There is also no parallel Ensemble feature selection technology collocation of a single multiclass feature selection method.
This study applies three types of single feature selection methods for ten high-dimensional imbalanced datasets, including six filter methods, five wrapper methods, four embedded methods. At the same time, for the problem of imbalance datasets, the SMOTE is added to the data level to make the samples be balance, and finally the average accuracy rate, average area under the ROC curve, and computing time are recorded.
From the results of this experiment, it is recommended to use the SMOTE method for multi-class unbalanced datasets. In addition, the embedded feature selection method is selected in the SVM classifier, and SMOTE is performed first and then the feature selection is performed. The combination of Lasso+XGBoost has the highest average accuracy of prediction performance; Second, the embedded features are selected in the SVM classifier, and the features are selected first and then SMOTE. With the Lasso+RandomForest+XGBoost union, there is the highest average area under the ROC curve of the prediction performance.

關鍵字(中)

★ 多分類特徵選取
★ 類別不平衡
★ 集成學習
★ 分類

關鍵字(英)

★ multiclass feature selection
★ class imbalance
★ ensemble learning
★ classification

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 x
表目錄 xii

第一章緒論 1
一、研究背景 1
二、研究動機 2
三、研究目的 4

第二章文獻探討 6
一、特徵選取 6
1. 過濾類（Filter） 7
2. 包裝類（Wrapper） 9
3. 嵌入類（Embedded） 16
二、集成學習 18
（一）序列式集成（Sequential ensemble） 18
（二）並列式集成（Parallel ensemble） 19
三、監督式學習 20
（一） SVM Classifier 20
1. OVR演算法（one-vs.-rest） 21
2. OVO演算法（one-vs.-one） 22
（二） Naïve Bayes Classifier 22
（三） K-Nearest Neighbor Classifier 23
四、類別不平衡 23
（一）資料層級法（Data level Approach） 24
1. 增加少數法（Over-sampling） 24

第三章研究方法 26
一、實驗架構 26
二、實驗準備 27
（一）實驗硬體與軟體應用 27
（二）實驗套件與參數設定 28
（三）實驗資料集 30
（四）實驗資料前處理 31
三、實驗一 32
（一）Baseline 32
（二）單一特徵選取 33
（三）並列式集成特徵選取 34
四、實驗二 36
（一）單一特徵選取 37
1. 先SMOTE後特徵選取 37
2. 先特徵選取後SMOTE 38
（二）並列式集成特徵選取 39
1. 先SMOTE後特徵選取 39
2. 先特徵選取後SMOTE 40
五、實驗驗證準則 42

第四章研究結果 43
一、實驗一 43
（一）Baseline 43
（二）單一特徵選取 44
1. 過濾類（Filter） 44
2. 包裝類（Wrapper） 48
3. 崁入類（Embedded） 52
（三）並列式集成特徵選取 56
（四）實驗一統整 56
1. 過濾類（Filter） 56
（1）平均正確率 57
（2）平均ROC曲線下面積 58
（3）CPU運算時間 60
2. 包裝類（Wrapper） 60
（1）平均正確率 60
（2）平均ROC曲線下面積 62
（3）CPU運算時間 64
3. 崁入類（Embedded） 65
（1）平均正確率 65
（2）平均ROC曲線下面積 67
（3）CPU運算時間 69
二、實驗二 69
（一）單一特徵選取 69
1. 過濾類（Filter） 70
（1）先SMOTE後特徵選取 70
（2）先特徵選取後SMOTE 70
2. 包裝類（Wrapper） 77
（1）先SMOTE後特徵選取 77
（2）先特徵選取後SMOTE 77
3. 崁入類（Embedded） 84
（1）先SMOTE後特徵選取 84
（2）先特徵選取後SMOTE 84
（二）並列式集成特徵選取 91
（三）實驗二統整 92
1. 過濾類（Filter） 92
（1）平均正確率 92
（2）平均ROC曲線下面積 94
（3）CPU運算時間 96
2. 包裝類（Wrapper） 97
（1）平均正確率 97
（2）平均ROC曲線下面積 99
（3）CPU運算時間 101
3. 崁入類（Embedded） 102
（1）平均正確率 102
（2）平均ROC曲線下面積 104
（3）CPU運算時間 106
三、討論與小結 107

第五章結論 109
一、結論與貢獻 109
二、未來研究方向與建議 112

參考文獻 114
附錄一、過濾類（Filter）特徵選取使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 119
附錄二、過濾類（Filter）特徵選取使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 122
附錄三、過濾類（Filter）特徵選取使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 125
附錄四、包裝類（Wrapper）特徵選取使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 128
附錄五、包裝類（Wrapper）特徵選取使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 131
附錄六、包裝類（Wrapper）特徵選取使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 134
附錄七、崁入類（Embedded）特徵選取使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 137
附錄八、崁入類（Embedded）特徵選取使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 140
附錄九、崁入類（Embedded）特徵選取使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 143
附錄十、過濾類（Filter）先SMOTE後特徵選取，使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 146
附錄十一、過濾類（Filter）先特徵選取後SMOTE，使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 149
附錄十二、過濾類（Filter）先SMOTE後特徵選取使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 152
附錄十三、過濾類（Filter）先特徵選取後SMOTE使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 155
附錄十四、過濾類（Filter）先SMOTE後特徵選取使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 158
附錄十五、過濾類（Filter）先特徵選取後SMOTE使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 161
附錄十六、包裝類（Wrapper）先SMOTE後特徵選取使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 164
附錄十七、包裝類（Wrapper）先特徵選取後SMOTE使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 167
附錄十八、包裝類（Wrapper）先SMOTE後特徵選取使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 170
附錄十九、包裝類（Wrapper）先特徵選取後SMOTE使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 173
附錄二十、包裝類（Wrapper）先SMOTE後特徵選取使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 176
附錄二十一、包裝類（Wrapper）先特徵選取後SMOTE使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 179
附錄二十二、崁入類（Embedded）先SMOTE後特徵選取使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 182
附錄二十三、崁入類（Embedded）先特徵選取後SMOTE使用SVM分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 185
附錄二十四、崁入類（Embedded）先SMOTE後特徵選取使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 188
附錄二十五、崁入類（Embedded）先特徵選取後SMOTE使用貝式分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 191
附錄二十六、崁入類（Embedded）先SMOTE後特徵選取使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 194
附錄二十七、崁入類（Embedded）先特徵選取後SMOTE使用K-最近鄰分類器：平均正確率、平均ROC曲線下面積、CPU運算時間 197

參考文獻

[1] Y. Li, T. Beaubouef, and others, “Data Mining: Concepts, Background and Methods of Integrating Uncertainty in Data Mining,” CCSC SC Stud EJ, vol. 3, pp. 2–7, 2010.
[2] C. Dobre and F. Xhafa, “Intelligent services for Big Data science,” Spec. Sect. Innov. Methods Algorithms Adv. Data-Intensive Comput., vol. 37, pp. 267–281, Jul. 2014, doi: 10.1016/j.future.2013.07.014.
[3] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From Data Mining to Knowledge Discovery in Databases,” AI Mag., vol. 17, no. 3, Art. no. 3, Mar. 1996, doi: 10.1609/aimag.v17i3.1230.
[4] M. Munson, “A study on the importance of and time spent on different modeling steps,” Sigkdd Explor., vol. 13, pp. 65–71, May 2012, doi: 10.1145/2207243.2207253.
[5] K. Cios and L. Kurgan, “Trends in Data Mining and Knowledge Discovery,” Adv. Tech. Knowl. Discov. Data Min., Nov. 2003, doi: 10.1007/1-84628-183-0_1.
[6] Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognit., vol. 40, no. 12, pp. 3358–3378, Dec. 2007, doi: 10.1016/j.patcog.2007.04.009.
[7] C. Zhang, J. H. Sun, and K. C. Tan, “Deep Belief Networks Ensemble with Multi-objective Optimization for Failure Diagnosis,” in 2015 IEEE International Conference on Systems, Man, and Cybernetics, Kowloon Tong, Hong Kong, Oct. 2015, pp. 32–37. doi: 10.1109/SMC.2015.19.
[8] T. Fawcett and F. Provost, “Adaptive Fraud Detection,” Data Min. Knowl. Discov., vol. 1, no. 3, pp. 291–316, Sep. 1997, doi: 10.1023/A:1009700419189.
[9] R. M. Valdovinos and J. S. Sanchez, “Class-dependant resampling for medical applications,” in Fourth International Conference on Machine Learning and Applications (ICMLA’05), Dec. 2005, p. 6 pp.-. doi: 10.1109/ICMLA.2005.15.
[10] K. Ezawa, M. Singh, and S. W. Norton, “Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management,” in In Proceedings of the 13th International Conference on Machine Learning, 1996, pp. 139–147.
[11] J. Sun, M. Rahman, Y. S. Wong, and G. S. Hong, “Multiclassification of tool wear with support vector machine by manufacturing loss consideration,” Int. J. Mach. Tools Manuf., vol. 44, no. 11, pp. 1179–1187, Sep. 2004, doi: 10.1016/j.ijmachtools.2004.04.003.
[12] S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, “Data Preprocessing for Supervised Learning,” Int. J. Comput. Sci., vol. 1, pp. 111–117, Jan. 2006.
[13] F. Famili, W.-M. Shen, R. Weber, and E. Simoudis, “Data Preprocessing and Intelligent Data Analysis,” Intell Data Anal, 1997, doi: 10.1016/S1088-467X(98)00007-9.
[14] O. E. de Noord, “The influence of data preprocessing on the robustness and parsimony of multivariate calibration models,” Chemom. Intell. Lab. Syst., vol. 23, no. 1, pp. 65–70, Apr. 1994, doi: 10.1016/0169-7439(93)E0065-C.
[15] A. B. Patel, M. Birla, and U. Nair, “Addressing big data problem using Hadoop and Map Reduce,” in 2012 Nirma University International Conference on Engineering (NUiCONE), Dec. 2012, pp. 1–5. doi: 10.1109/NUICONE.2012.6493198.
[16] L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” Jan. 2003, vol. 2, pp. 856–863.
[17] Y. Zhai, Y.-S. Ong, and I. W. Tsang, “The Emerging ‘Big Dimensionality,’” IEEE Comput. Intell. Mag., vol. 9, no. 3, pp. 14–26, Aug. 2014, doi: 10.1109/MCI.2014.2326099.
[18] H. Liu and R. Setiono, “A Probabilistic Approach to Feature Selection - A Filter Solution,” undefined, 1996, Accessed: Jun. 18, 2022. [Online]. Available: https://www.semanticscholar.org/paper/A-Probabilistic-Approach-to-Feature-Selection-A-Liu-Setiono/7285ee82aa0cde847fafb8b1109dd19dbdc04e35
[19] V. Fonti and E. Belitser, “Paper in Business Analytics Feature Selection using LASSO,” undefined, 2017, Accessed: Jun. 18, 2022. [Online]. Available: https://www.semanticscholar.org/paper/Paper-in-Business-Analytics-Feature-Selection-using-Fonti-Belitser/24acd159910658223209433cf4cbe3414264de39
[20] L. Du, Y. Xu, and H. Zhu, “Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm,” Ann. Data Sci., vol. 2, no. 3, pp. 293–300, Sep. 2015, doi: 10.1007/s40745-015-0060-x.
[21] H. Zhao, S. Wang, and Z. Wang, “Multiclass Classification and Feature Selection Based on Least Squares Regression with Large Margin,” Neural Comput., vol. 30, no. 10, pp. 2781–2804, Oct. 2018, doi: 10.1162/neco_a_01116.
[22] J. Izetta, P. F. Verdes, and P. M. Granitto, “Improved multiclass feature selection via list combination,” Expert Syst. Appl., vol. 88, pp. 205–216, Dec. 2017, doi: 10.1016/j.eswa.2017.06.043.
[23] R. J. Cascaro, B. D. Gerardo, and R. P. Medina, “Filter Selection Methods for Multiclass Classification,” in Proceedings of the 2nd International Conference on Computing and Big Data, New York, NY, USA, Oct. 2019, pp. 27–31. doi: 10.1145/3366650.3366655.
[24] L. Yijing, G. Haixiang, L. Xiao, L. Yanan, and L. Jinling, “Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data,” Knowl.-Based Syst., vol. 94, pp. 88–104, Feb. 2016, doi: 10.1016/j.knosys.2015.11.013.
[25] C.-F. Tsai and Y.-T. Sung, “Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches,” Knowl.-Based Syst., vol. 203, p. 106097, Sep. 2020, doi: 10.1016/j.knosys.2020.106097.
[26] B. Krawczyk, M. Koziarski, and M. Woźniak, “Radial-Based Oversampling for Multiclass Imbalanced Data Classification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 8, pp. 2818–2831, Aug. 2020, doi: 10.1109/TNNLS.2019.2913673.
[27] M. L. Bermingham et al., “Application of high-dimensional feature selection: evaluation for genomic prediction in man,” Sci. Rep., vol. 5, no. 1, Art. no. 1, May 2015, doi: 10.1038/srep10312.
[28] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, Jan. 2014, doi: 10.1016/j.compeleceng.2013.11.024.
[29] M. Dash and H. Liu, “Feature selection for classification,” Intell. Data Anal., vol. 1, no. 1, pp. 131–156, Jan. 1997, doi: 10.1016/S1088-467X(97)00008-5.
[30] S. Wang et al., “Pathological Brain Detection by Artificial Intelligence in Magnetic Resonance Imaging Scanning (Invited Review),” Prog. Electromagn. Res., vol. 156, pp. 105–133, 2016, doi: 10.2528/PIER16070801.
[31] Y. Saeys, I. Inza, and P. Larrañaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, Oct. 2007, doi: 10.1093/bioinformatics/btm344.
[32] Y. Lu, I. Cohen, X. S. Zhou, and Q. Tian, “Feature selection using principal feature analysis,” in Proceedings of the 15th ACM international conference on Multimedia, New York, NY, USA, Sep. 2007, pp. 301–304. doi: 10.1145/1291233.1291297.
[33] J. Shlens, “A Tutorial on Principal Component Analysis.” arXiv, Apr. 03, 2014. doi: 10.48550/arXiv.1404.1100.
[34] K. Kira and L. A. Rendell, “A Practical Approach to Feature Selection,” in Machine Learning Proceedings 1992, D. Sleeman and P. Edwards, Eds. San Francisco (CA): Morgan Kaufmann, 1992, pp. 249–256. doi: 10.1016/B978-1-55860-247-2.50037-1.
[35] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artif. Intell., vol. 97, no. 1, pp. 273–324, Dec. 1997, doi: 10.1016/S0004-3702(97)00043-X.
[36] S.-U. Guan, Y. Qi, and C. Bao, “An Incremental Approach to MSE-Based Feature Selection.,” Int. J. Comput. Intell. Appl., vol. 6, pp. 451–471, Dec. 2006, doi: 10.1142/S1469026806002064.
[37] G. S. Krishnan and S. K. S., “A novel GA-ELM model for patient-specific mortality prediction over large-scale lab event data,” Appl. Soft Comput., vol. 80, pp. 525–533, Jul. 2019, doi: 10.1016/j.asoc.2019.04.019.
[38] J. H. Holland, “Genetic Algorithms and Adaptation,” in Adaptive Control of Ill-Defined Systems, O. G. Selfridge, E. L. Rissland, and M. A. Arbib, Eds. Boston, MA: Springer US, 1984, pp. 317–333. doi: 10.1007/978-1-4684-8941-5_21.
[39] S. Cateni, M. Vannucci, M. Vannocci, and V. Colla, “Variable Selection and Feature Extraction Through Artificial Intelligence Techniques,” 2013. doi: 10.5772/53862.
[40] Y. Chtioui, D. Bertrand, and D. Barba, “Feature selection by a genetic algorithm. Application to seed discrimination by artificial vision,” 1998, doi: 10.1002/(SICI)1097-0010(199801)76:1<77::AID-JSFA948>3.0.CO;2-9.
[41] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 - International Conference on Neural Networks, Nov. 1995, vol. 4, pp. 1942–1948 vol.4. doi: 10.1109/ICNN.1995.488968.
[42] S. Mirjalili and A. Lewis, “The Whale Optimization Algorithm,” Adv. Eng. Softw., vol. 95, pp. 51–67, May 2016, doi: 10.1016/j.advengsoft.2016.01.008.
[43] T. Thaher et al., “An Enhanced Evolutionary Student Performance Prediction Model Using Whale Optimization Algorithm Boosted with Sine-Cosine Mechanism,” Appl. Sci., vol. 11, no. 21, Art. no. 21, Jan. 2021, doi: 10.3390/app112110237.
[44] S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Adv. Eng. Softw., vol. 69, pp. 46–61, Mar. 2014, doi: 10.1016/j.advengsoft.2013.12.007.
[45] E. Rashedi, H. Nezamabadi-pour, and S. Saryazdi, “GSA: A Gravitational Search Algorithm,” Inf. Sci., vol. 179, no. 13, pp. 2232–2248, Jun. 2009, doi: 10.1016/j.ins.2009.03.004.
[46] M. Zhu and J. Song, “An Embedded Backward Feature Selection Method for MCLP Classification Algorithm,” Procedia Comput. Sci., vol. 17, pp. 1047–1054, Jan. 2013, doi: 10.1016/j.procs.2013.05.133.
[47] M. P, “Feature Selection Methods: A Study,” vol. 12, pp. 371–377, May 2021.
[48] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. R. Stat. Soc. Ser. B Methodol., vol. 58, no. 1, pp. 267–288, 1996.
[49] R. Muthukrishnan and R. Rohini, “LASSO: A feature selection technique in predictive modeling for machine learning,” in 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Oct. 2016, pp. 18–20. doi: 10.1109/ICACA.2016.7887916.
[50] A. E. Hoerl and R. W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, vol. 42, no. 1, pp. 80–86, 2000, doi: 10.2307/1271436.
[51] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.
[52] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, “Random Forests and Decision Trees,” Int. J. Comput. Sci. IssuesIJCSI, vol. 9, Sep. 2012.
[53] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
[54] C. Chen et al., “Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier,” Comput. Biol. Med., vol. 123, p. 103899, 2020, doi: https://doi.org/10.1016/j.compbiomed.2020.103899.
[55] T. G. Dietterich, “Ensemble Methods in Machine Learning,” in Multiple Classifier Systems, Berlin, Heidelberg, 2000, pp. 1–15. doi: 10.1007/3-540-45014-9_1.
[56] Z.-H. Zhou, “Ensemble Learning,” in Encyclopedia of Biometrics, S. Z. Li and A. K. Jain, Eds. Boston, MA: Springer US, 2015, pp. 411–416. doi: 10.1007/978-1-4899-7488-4_293.
[57] S. Rayana, W. Zhong, and L. Akoglu, “Sequential Ensemble Learning for Outlier Detection: A Bias-Variance Perspective,” in 2016 IEEE 16th International Conference on Data Mining (ICDM), Dec. 2016, pp. 1167–1172. doi: 10.1109/ICDM.2016.0154.
[58] P. Bühlmann, “Bagging, Boosting and Ensemble Methods,” in Handbook of Computational Statistics: Concepts and Methods, J. E. Gentle, W. K. Härdle, and Y. Mori, Eds. Berlin, Heidelberg: Springer, 2012, pp. 985–1022. doi: 10.1007/978-3-642-21551-3_33.
[59] A. Ben Brahim and M. Limam, “Ensemble feature selection for high dimensional data: a new method and a comparative study,” Adv. Data Anal. Classif., vol. 12, Apr. 2017, doi: 10.1007/s11634-017-0285-y.
[60] C.-F. Tsai and Y.-C. Hsiao, “Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches,” Decis. Support Syst., vol. 50, no. 1, pp. 258–269, Dec. 2010, doi: 10.1016/j.dss.2010.08.028.
[61] P. Y. Pawar and S. Gawande, “A Comparative Study on Different Types of Approaches to Text Categorization,” 2012, doi: 10.7763/IJMLC.2012.V2.158.
[62] “Support-vector networks | SpringerLink.” https://link.springer.com/article/10.1007/BF00994018 (accessed Jun. 18, 2022).
[63] H. Zhang, “The Optimality of Naive Bayes,” Jan. 2004, vol. 2.
[64] “A comparative study of feature selection and machine learning techniques for sentiment analysis | Proceedings of the 2012 ACM Research in Applied Computation Symposium.” https://dl.acm.org/doi/10.1145/2401603.2401605 (accessed Jun. 18, 2022).
[65] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 42, no. 4, pp. 463–484, Jul. 2012, doi: 10.1109/TSMCC.2011.2161285.
[66] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

指導教授

蔡志豐蘇坤良

審核日期

2023-7-17

推文