參考文獻 |
[1] Y. Sun, A. K. C. Wong, and M. S. Kamel, “Classification of imbalanced data: a review,” Int. J. Pattern Recognit. Artif. Intell., vol. 23, no. 04, pp. 687–719, Jun. 2009, doi: 10.1142/S0218001409007326.
[2] Q. Zou, S. Xie, Z. Lin, M. Wu, and Y. Ju, “Finding the Best Classification Threshold in Imbalanced Classification,” Big Data Res., vol. 5, pp. 2–8, Sep. 2016, doi: 10.1016/j.bdr.2015.12.001.
[3] V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), Mar. 2018, pp. 1–11. doi: 10.1109/ICCTCT.2018.8551020.
[4] G. Kovács, “An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets,” Appl. Soft Comput., vol. 83, p. 105662, Oct. 2019, doi: 10.1016/j.asoc.2019.105662.
[5] V. Ganganwar, “An overview of classification algorithms for imbalanced datasets,” Int. J. Emerg. Technol. Adv. Eng., vol. 2, no. 4, pp. 42–47, 2012.
[6] G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Inf. Sci., vol. 465, pp. 1–20, Oct. 2018, doi: 10.1016/j.ins.2018.06.056.
[7] R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems (ICICS), Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.
[8] S. Vellamcheti and P. Singh, “Class Imbalance Deep Learning for Bankruptcy Prediction,” in 2020 First International Conference on Power, Control and Computing Technologies (ICPC2T), Jan. 2020, pp. 421–425. doi: 10.1109/ICPC2T48082.2020.9071460.
[9] H. Sain and S. W. Purnami, “Combine Sampling Support Vector Machine for Imbalanced Data Classification,” Procedia Comput. Sci., vol. 72, pp. 59–66, Jan. 2015, doi: 10.1016/j.procs.2015.12.105.
[10] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” SIGKDD Explor Newsl, vol. 6, no. 1, pp. 20–29, Jun. 2004, doi: 10.1145/1007730.1007735.
[11] P. Gnip, L. Vokorokos, and P. Drotár, “Selective oversampling approach for strongly imbalanced data,” PeerJ Comput. Sci., vol. 7, p. e604, Jun. 2021, doi: 10.7717/peerj-cs.604.
[12] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surv. CSUR, vol. 41, no. 3, pp. 1–58, 2009.
[13] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
[14] A. Fernandez, S. Garcia, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, Apr. 2018, doi: 10.1613/jair.1.11192.
[15] Asniar, N. U. Maulidevi, and K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 6, Part B, pp. 3413–3423, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.014.
[16] X. W. Liang, A. P. Jiang, T. Li, Y. Y. Xue, and G. T. Wang, “LR-SMOTE — An improved unbalanced data set oversampling based on K-means and SVM,” Knowl.-Based Syst., vol. 196, p. 105845, May 2020, doi: 10.1016/j.knosys.2020.105845.
[17] M. H. IBRAHIM, “ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning,” Neural Comput. Appl., vol. 33, no. 22, pp. 15781–15806, Nov. 2021, doi: 10.1007/s00521-021-06198-x.
[18] W. Dan and L. Yian, “Denoise-Based Over-Sampling for Imbalanced Data Classification,” presented at the 2020 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), IEEE Computer Society, Oct. 2020, pp. 1–4. doi: 10.1109/DCABES50732.2020.00078.
[19] A. Arafa, N. El-Fishawy, M. Badawy, and M. Radad, “RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, Part A, pp. 5059–5074, Sep. 2022, doi: 10.1016/j.jksuci.2022.06.005.
[20] D. M. Hawkins, Identification of Outliers. Dordrecht: Springer Netherlands, 1980. doi: 10.1007/978-94-015-3994-4.
[21] A. Boukerche, L. Zheng, and O. Alfandi, “Outlier detection: Methods, models, and classification,” ACM Comput. Surv. CSUR, vol. 53, no. 3, pp. 1–37, 2020.
[22] I. Souiden, Z. Brahmi, and H. Toumi, “A Survey on Outlier Detection in the Context of Stream Mining: Review of Existing Approaches and Recommadations,” in Intelligent Systems Design and Applications, A. M. Madureira, A. Abraham, D. Gamboa, and P. Novais, Eds., Cham: Springer International Publishing, 2017, pp. 372–383. doi: 10.1007/978-3-319-53480-0_37.
[23] O. Alghushairy, R. Alsini, T. Soule, and X. Ma, “A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams,” Big Data Cogn. Comput., vol. 5, no. 1, Art. no. 1, Mar. 2021, doi: 10.3390/bdcc5010001.
[24] A. Smiti, “A critical overview of outlier detection methods,” Comput. Sci. Rev., vol. 38, p. 100306, Nov. 2020, doi: 10.1016/j.cosrev.2020.100306.
[25] H. Wang, M. J. Bah, and M. Hammad, “Progress in Outlier Detection Techniques: A Survey,” IEEE Access, vol. 7, pp. 107964–108000, 2019, doi: 10.1109/ACCESS.2019.2932769.
[26] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF: identifying density-based local outliers,” presented at the Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp. 93–104.
[27] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, in SIGMOD ’00. New York, NY, USA: Association for Computing Machinery, May 2000, pp. 427–438. doi: 10.1145/342009.335437.
[28] E. M. Knox and R. T. Ng, “Algorithms for mining distancebased outliers in large datasets,” presented at the Proceedings of the international conference on very large data bases, Citeseer, 1998, pp. 392–403.
[29] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in 2008 Eighth IEEE International Conference on Data Mining, Feb. 2008, pp. 413–422. doi: 10.1109/ICDM.2008.17.
[30] Y. Chabchoub, M. U. Togbe, A. Boly, and R. Chiky, “An In-Depth Study and Improvement of Isolation Forest,” IEEE Access, vol. 10, pp. 10219–10237, 2022, doi: 10.1109/ACCESS.2022.3144425.
[31] D. EStimator, “A Fast Algorithm for the Minimum Covariance,” Technometrics, vol. 41, no. 3, p. 212, 1999.
[32] M. Hubert and M. Debruyne, “Minimum covariance determinant,” WIREs Comput. Stat., vol. 2, no. 1, pp. 36–43, 2010, doi: 10.1002/wics.61.
[33] M. Hubert, M. Debruyne, and P. J. Rousseeuw, “Minimum covariance determinant and extensions,” WIREs Comput. Stat., vol. 10, no. 3, p. e1421, 2018, doi: 10.1002/wics.1421.
[34] B. Schölkopf, R. C. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt, “Support Vector Method for Novelty Detection,” in Advances in Neural Information Processing Systems, MIT Press, 1999. Accessed: Mar. 14, 2024. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/1999/hash/8725fb777f25776ffa9076e44fcfd776-Abstract.html
[35] H. J. Shin, D.-H. Eom, and S.-S. Kim, “One-class support vector machines—an application in machine fault detection and classification,” Comput. Ind. Eng., vol. 48, no. 2, pp. 395–408, Mar. 2005, doi: 10.1016/j.cie.2005.01.009.
[36] A. Gosain and S. Sardana, “Handling class imbalance problem using oversampling techniques: A review,” in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi: IEEE, Sep. 2017, pp. 79–85. doi: 10.1109/ICACCI.2017.8125820.
[37] I. Dey and V. Pratap, “A Comparative Study of SMOTE, Borderline-SMOTE, and ADASYN Oversampling Techniques using Different Classifiers,” in 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), Trichy, India: IEEE, Mar. 2023, pp. 294–302. doi: 10.1109/ICSMDI57622.2023.00060.
[38] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Front. Comput. Sci., vol. 14, no. 2, pp. 241–258, Apr. 2020, doi: 10.1007/s11704-019-8208-z.
[39] H. Parvin, M. MirnabiBaboli, and H. Alinejad-Rokny, “Proposing a classifier ensemble framework based on classifier selection and decision tree,” Eng. Appl. Artif. Intell., vol. 37, pp. 34–42, Jan. 2015, doi: 10.1016/j.engappai.2014.08.005.
[40] X. Feng, Z. Xiao, B. Zhong, J. Qiu, and Y. Dong, “Dynamic ensemble classification for credit scoring using soft probability,” Appl. Soft Comput., vol. 65, pp. 139–151, Apr. 2018, doi: 10.1016/j.asoc.2018.01.021.
[41] Z. Liu and Y. Zhang, “Credit evaluation with a data mining approach based on gradient boosting decision tree,” J. Phys. Conf. Ser., vol. 1848, no. 1, p. 012034, Apr. 2021, doi: 10.1088/1742-6596/1848/1/012034.
[42] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intell. Syst. Their Appl., vol. 13, no. 4, pp. 18–28, Jul. 1998, doi: 10.1109/5254.708428.
[43] V. Vapnik, The Nature of Statistical Learning Theory. Springer Science & Business Media, 2013.
[44] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.
[45] G. Ke et al., “Lightgbm: A highly efficient gradient boosting decision tree,” Adv. Neural Inf. Process. Syst., vol. 30, 2017.
[46] J. Derrac, S. Garcia, L. Sanchez, and F. Herrera, “Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” J Mult Valued Log. Soft Comput, vol. 17, pp. 255–287, 2015.
[47] A. Puri and M. Kumar Gupta, “Improved Hybrid Bag-Boost Ensemble With K-Means-SMOTE–ENN Technique for Handling Noisy Class Imbalanced Data,” Comput. J., vol. 65, no. 1, pp. 124–138, Jan. 2022, doi: 10.1093/comjnl/bxab039.
[48] Z. Ali, R. Ahmad, M. N. Akhtar, Z. H. Chuhan, H. M. Kiran, and W. Shahzad, “Empirical Study of Associative Classifiers on Imbalanced Datasets in KEEL,” in 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA), Jul. 2018, pp. 1–7. doi: 10.1109/IISA.2018.8633612.
[49] Q. Liu, W. Luo, and T. Shi, “Classification method for imbalanced data set based on EKCStacking algorithm,” in Proceedings of the 2019 8th International Conference on Networks, Communication and Computing, in ICNCC ’19. New York, NY, USA: Association for Computing Machinery, Jan. 2020, pp. 51–56. doi: 10.1145/3375998.3376002.
[50] N. Rout, D. Mishra, and M. K. Mallick, “Handling Imbalanced Data: A Survey,” in International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications, M. S. Reddy, K. Viswanath, and S. P. K.M., Eds., Singapore: Springer, 2018, pp. 431–443. doi: 10.1007/978-981-10-5272-9_39.
[51] J. Kong, W. Kowalczyk, S. Menzel, and T. Bäck, “Improving Imbalanced Classification by Anomaly Detection,” in Parallel Problem Solving from Nature – PPSN XVI, T. Bäck, M. Preuss, A. Deutz, H. Wang, C. Doerr, M. Emmerich, and H. Trautmann, Eds., Cham: Springer International Publishing, 2020, pp. 512–523. doi: 10.1007/978-3-030-58112-1_35.
[52] J. Wang, J. Xu, C. Zhao, Y. Peng, and H. Wang, “An ensemble feature selection method for high-dimensional data based on sort aggregation,” Syst. Sci. Control Eng., vol. 7, no. 2, pp. 32–39, Nov. 2019, doi: 10.1080/21642583.2019.1620658.
[53] H. Qian, S. Zhang, B. Wang, L. Peng, S. Gao, and Y. Song, “A comparative study on machine learning models combining with outlier detection and balanced sampling methods for credit scoring,” Dec. 25, 2021, arXiv: arXiv:2112.13196. doi: 10.48550/arXiv.2112.13196.
[54] L. Cleofas-Sánchez, J. S. Sánchez, V. García, and R. M. Valdovinos, “Associative learning on imbalanced environments: An empirical study,” Expert Syst. Appl., vol. 54, pp. 387–397, Jul. 2016, doi: 10.1016/j.eswa.2015.10.001.
[55] A. I. Marqués, V. García, and J. S. Sánchez, “On the suitability of resampling techniques for the class imbalance problem in credit scoring,” J. Oper. Res. Soc., vol. 64, no. 7, pp. 1060–1070, Jul. 2013, doi: 10.1057/jors.2012.120. |