參考文獻 |
[1] X. Wu, X. Zhu, G. Q. Wu, and W. Ding, "Data mining with big data," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97-107, 2014, doi: 10.1109/TKDE.2013.109.
[2] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, "A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463-484, 2012, doi: 10.1109/TSMCC.2011.2161285.
[3] W.-C. Lin and C.-F. Tsai, "Missing value imputation: a review and analysis of the literature (2006–2017)," Artificial Intelligence Review, vol. 53, no. 2, pp. 1487-1509, 2020/02/01 2020, doi: 10.1007/s10462-019-09709-4.
[4] P. Vuttipittayamongkol, E. Elyan, and A. Petrovski, "On the class overlap problem in imbalanced data classification," Knowledge-Based Systems, vol. 212, p. 106631, 2021/01/05/ 2021, doi: https://doi.org/10.1016/j.knosys.2020.106631.
[5] M. Alibeigi, S. Hashemi, and A. Hamzeh, "DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets," Data & Knowledge Engineering, vol. 81-82, pp. 67-103, 2012/11/01/ 2012, doi: https://doi.org/10.1016/j.datak.2012.08.001.
[6] R. C. Prati, G. E. A. P. A. Batista, and M. C. Monard, "Learning with Class Skews and Small Disjuncts," in Advances in Artificial Intelligence – SBIA 2004, Berlin, Heidelberg, A. L. C. Bazzan and S. Labidi, Eds., 2004// 2004: Springer Berlin Heidelberg, pp. 296-306.
[7] Y.-C. Wang and C.-H. Cheng, "A multiple combined method for rebalancing medical data with class imbalances," Computers in Biology and Medicine, vol. 134, p. 104527, 2021/07/01/ 2021, doi: https://doi.org/10.1016/j.compbiomed.2021.104527.
[8] Y. Xiao, J. Wu, and Z. Lin, "Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data," Computers in Biology and Medicine, vol. 135, p. 104540, 2021/08/01/ 2021, doi: https://doi.org/10.1016/j.compbiomed.2021.104540.
[9] X. g. Chen, S. Liu, and W. Zhang, "Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 2, pp. 1075-1083, 2022, doi: 10.1109/TCBB.2020.3021800.
[10] J. Jani, J. Doshi, I. Kheria, K. Mehta, C. Bhadane, and R. Karani, "LayNet—A multi-layer architecture to handle imbalance in medical imaging data," Computers in Biology and Medicine, vol. 163, p. 107179, 2023/09/01/ 2023, doi: https://doi.org/10.1016/j.compbiomed.2023.107179.
[11] X. Zhou, Y. Hu, W. Liang, J. Ma, and Q. Jin, "Variational LSTM Enhanced Anomaly Detection for Industrial Big Data," IEEE Transactions on Industrial Informatics, vol. 17, no. 5, pp. 3469-3477, 2021, doi: 10.1109/TII.2020.3022432.
[12] B. Gao et al., "Enhancing anomaly detection accuracy and interpretability in low-quality and class imbalanced data: A comprehensive approach," Applied Energy, vol. 353, p. 122157, 2024/01/01/ 2024, doi: https://doi.org/10.1016/j.apenergy.2023.122157.
[13] Z. Li, M. Huang, G. Liu, and C. Jiang, "A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection," Expert Systems with Applications, vol. 175, p. 114750, 2021/08/01/ 2021, doi: https://doi.org/10.1016/j.eswa.2021.114750.
[14] V. García, A. I. Marqués, and J. S. Sánchez, "Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction," Information Fusion, vol. 47, pp. 88-101, 2019/05/01/ 2019, doi: https://doi.org/10.1016/j.inffus.2018.07.004.
[15] D. Veganzones and E. Séverin, "An investigation of bankruptcy prediction in imbalanced datasets," Decision Support Systems, vol. 112, pp. 111-124, 2018/08/01/ 2018, doi: https://doi.org/10.1016/j.dss.2018.06.011.
[16] A. Islam, S. B. Belhaouari, A. U. Rehman, and H. Bensmail, "KNNOR: An oversampling technique for imbalanced datasets," Applied Soft Computing, vol. 115, p. 108288, 2022/01/01/ 2022, doi: https://doi.org/10.1016/j.asoc.2021.108288.
[17] C.-F. Tsai, W.-C. Lin, Y.-H. Hu, and G.-T. Yao, "Under-sampling class imbalanced datasets by combining clustering analysis and instance selection," Information Sciences, vol. 477, pp. 47-54, 2019, doi: 10.1016/j.ins.2018.10.029.
[18] W.-C. Lin, C.-F. Tsai, Y.-H. Hu, and J.-S. Jhang, "Clustering-based undersampling in class-imbalanced data," Information Sciences, vol. 409-410, pp. 17-26, 2017, doi: 10.1016/j.ins.2017.05.008.
[19] R. A. Sowah et al., "HCBST: An Efficient Hybrid Sampling Technique for Class Imbalance Problems," ACM Trans. Knowl. Discov. Data, vol. 16, no. 3, p. Article 57, 2021, doi: 10.1145/3488280.
[20] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," J. Artif. Int. Res., vol. 16, no. 1, pp. 321–357, 2002.
[21] M. Khushi et al., "A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data," IEEE Access, vol. 9, pp. 109960-109975, 2021, doi: 10.1109/ACCESS.2021.3102399.
[22] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, 2004, doi: 10.1145/1007730.1007735.
[23] Y. Sun, L. Cai, B. Liao, W. Zhu, and J. Xu, "A Robust Oversampling Approach for Class Imbalance Problem With Small Disjuncts," IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 6, pp. 5550-5562, 2023, doi: 10.1109/TKDE.2022.3161291.
[24] D. Devi, S. K. Biswas, and B. Purkayastha, "Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique," Connection Science, vol. 31, no. 2, pp. 105-142, 2019/04/03 2019, doi: 10.1080/09540091.2018.1560394.
[25] P. Soltanzadeh, M. R. Feizi-Derakhshi, and M. Hashemzadeh, "Addressing the class-imbalance and class-overlap problems by a metaheuristic-based under-sampling approach," Pattern Recognition, vol. 143, p. 109721, 2023/11/01/ 2023, doi: https://doi.org/10.1016/j.patcog.2023.109721.
[26] B. J. Frey and D. Dueck, "Clustering by Passing Messages Between Data Points," Science, vol. 315, no. 5814, pp. 972-976, 2007, doi: doi:10.1126/science.1136800.
[27] J. MacQueen, "Some methods for classification and analysis of multivariate observations," in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol. 1, no. 14: Oakland, CA, USA, pp. 281-297.
[28] O. Sagi and L. Rokach, "Ensemble learning: A survey," WIREs Data Mining and Knowledge Discovery, vol. 8, no. 4, p. e1249, 2018, doi: https://doi.org/10.1002/widm.1249.
[29] H. Guan, Y. Zhang, M. Xian, H. D. Cheng, and X. Tang, "SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling," Applied Intelligence, vol. 51, no. 3, pp. 1394-1409, 2021/03/01 2021, doi: 10.1007/s10489-020-01852-8.
[30] D. A. Cieslak, N. V. Chawla, and A. Striegel, "Combating imbalance in network intrusion datasets," in 2006 IEEE International Conference on Granular Computing, 10-12 May 2006 2006, pp. 732-737, doi: 10.1109/GRC.2006.1635905.
[31] J.-H. Seo and Y.-H. Kim, "Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection," Computational Intelligence and Neuroscience, vol. 2018, p. 9704672, 2018/11/01 2018, doi: 10.1155/2018/9704672.
[32] Q. Liu et al., "Application of KM-SMOTE for rockburst intelligent prediction," Tunnelling and Underground Space Technology, vol. 138, p. 105180, 2023/08/01/ 2023, doi: https://doi.org/10.1016/j.tust.2023.105180.
[33] H. Karamti et al., "Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach," Cancers, vol. 15, no. 17, p. 4412, 2023. [Online]. Available: https://www.mdpi.com/2072-6694/15/17/4412.
[34] V. S. Spelmen and R. Porkodi, "A Review on Handling Imbalanced Data," in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1-3 March 2018 2018, pp. 1-11, doi: 10.1109/ICCTCT.2018.8551020.
[35] L. Wang, M. Han, X. Li, N. Zhang, and H. Cheng, "Review of Classification Methods on Unbalanced Data Sets," IEEE Access, vol. 9, pp. 64606-64628, 2021, doi: 10.1109/ACCESS.2021.3074243.
[36] S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, "Handling imbalanced datasets: A review," GESTS International Transactions on Computer Science and Engineering, vol. 30, pp. 25-36, 11/30 2005.
[37] S. Rayana, W. Zhong, and L. Akoglu, "Sequential Ensemble Learning for Outlier Detection: A Bias-Variance Perspective," in 2016 IEEE 16th International Conference on Data Mining (ICDM), 12-15 Dec. 2016 2016, pp. 1167-1172, doi: 10.1109/ICDM.2016.0154.
[38] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001/10/01 2001, doi: 10.1023/A:1010933404324.
[39] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, "A survey on ensemble learning," Frontiers of Computer Science, vol. 14, no. 2, pp. 241-258, 2020/04/01 2020, doi: 10.1007/s11704-019-8208-z.
[40] I. D. Mienye and Y. Sun, "A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects," IEEE Access, vol. 10, pp. 99129-99149, 2022, doi: 10.1109/ACCESS.2022.3207287.
[41] Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997/08/01/ 1997, doi: https://doi.org/10.1006/jcss.1997.1504.
[42] I. D. Mienye and Y. Sun, "Performance analysis of cost-sensitive learning methods with application to imbalanced medical data," Informatics in Medicine Unlocked, vol. 25, p. 100690, 2021/01/01/ 2021, doi: https://doi.org/10.1016/j.imu.2021.100690.
[43] J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, "Boosting methods for multi-class imbalanced data classification: an experimental review," Journal of Big Data, vol. 7, no. 1, p. 70, 2020/09/01 2020, doi: 10.1186/s40537-020-00349-y.
[44] R. Longadge and S. Dongre, "Class Imbalance Problem in Data Mining Review," International Journal of Computer Science and Network, vol. 2, no. 1, 2013.
[45] S. Lloyd, "Least squares quantization in PCM," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137, 1982, doi: 10.1109/TIT.1982.1056489.
[46] E. W. Forgy, "Cluster analysis of multivariate data : efficiency versus interpretability of classifications," Biometrics, vol. 21, pp. 768-769, 1965.
[47] H. P. Friedman and J. Rubin, "On Some Invariant Criteria for Grouping Data," Journal of the American Statistical Association, vol. 62, no. 320, pp. 1159-1178, 1967, doi: 10.2307/2283767.
[48] X. Wu et al., "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, no. 1, pp. 1-37, 2008/01/01 2008, doi: 10.1007/s10115-007-0114-2.
[49] J. Alcala-Fdez et al., "KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework," Journal of Multiple-Valued Logic and Soft Computing, vol. 17, pp. 255-287, 01/01 2010.
[50] S. Boonamnuay, N. Kerdprasop, and K. Kerdprasop, "Classification and regression tree with resampling for classifying imbalanced data," International Journal of Machine Learning and Computing, vol. 8, no. 4, pp. 336-340, 2018.
[51] N. Cristianini and E. Ricci, "Support Vector Machines," in Encyclopedia of Algorithms, M.-Y. Kao Ed. Boston, MA: Springer US, 2008, pp. 928-932.
[52] M. P. Sesmero, J. A. Iglesias, E. Magán, A. Ledezma, and A. Sanchis, "Impact of the learners diversity and combination method on the generation of heterogeneous classifier ensembles," Applied Soft Computing, vol. 111, p. 107689, 2021/11/01/ 2021, doi: https://doi.org/10.1016/j.asoc.2021.107689.
[53] F. Pedregosa et al., "Scikit-learn: Machine learning in Python," the Journal of machine Learning research, vol. 12, pp. 2825-2830, 2011.
[54] G. LemaÃŽtre, F. Nogueira, and C. K. Aridas, "Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning," Journal of machine learning research, vol. 18, no. 17, pp. 1-5, 2017.
[55] P. J. Rousseeuw, "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987/11/01/ 1987, doi: https://doi.org/10.1016/0377-0427(87)90125-7.
[56] T. Fawcett, "An introduction to ROC analysis," (in English), Pattern Recognit. Lett., Article vol. 27, no. 8, pp. 861-874, Jun 2006, doi: 10.1016/j.patrec.2005.10.010.
[57] J. N. Mandrekar, "Receiver Operating Characteristic Curve in Diagnostic Test Assessment," Journal of Thoracic Oncology, vol. 5, no. 9, pp. 1315-1316, 2010/09/01/ 2010, doi: https://doi.org/10.1097/JTO.0b013e3181ec173d.
[58] J. Grzyb and M. Woźniak, "SVM ensemble training for imbalanced data classification using multi-objective optimization techniques," Applied Intelligence, vol. 53, no. 12, pp. 15424-15441, 2023/06/01 2023, doi: 10.1007/s10489-022-04291-9.
[59] I. Borg and P. J. Groenen, Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
[60] K. Napierala and J. Stefanowski, "Types of minority class examples and their influence on learning classifiers from imbalanced data," Journal of Intelligent Information Systems, vol. 46, no. 3, pp. 563-597, 2016/06/01 2016, doi: 10.1007/s10844-015-0368-1. |