參考文獻 |
Aittokallio, T. (2010). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 11(2), 253–264. https://doi.org/10.1093/bib/bbp059
Awan, S. E., Bennamoun, M., Sohel, F., Sanfilippo, F., & Dwivedi, G. (2021). Imputation of missing data with class imbalance using conditional generative adversarial networks. Neurocomputing, 453, 164–171. https://doi.org/10.1016/j.neucom.2021.04.010
Batista, G. E., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied artificial intelligence, 17(5-6), 519-533. https://doi.org/10.1080/713827181
Bertsimas, D., Pawlowski, C., & Zhuo, Y. D. (2017). From predictive methods to missing data imputation: an optimization approach. The Journal of Machine Learning Research, 18(1), 7133-7171.
Breiman, L. (2001a). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. https://doi.org/10.1214/ss/1009213726
Breiman, L. (2001b). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Taylor & Francis.
Burgette, L. F., & Reiter, J. P. (2010). Multiple Imputation for Missing Data via Sequential Regression Trees. American Journal of Epidemiology, 172(9), 1070– 1076. https://doi.org/10.1093/aje/kwq260
Buuren, S. van. (2018). Flexible Imputation of Missing Data, Second Edition. CRC Press.
Buuren, S. van, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45, 1–67. https://doi.org/10.18637/jss.v045.i03
Cao, K., Wei, C., Gaidon, A., Arechiga, N., & Ma, T. (2019). Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. Advances in Neural Information Processing Systems, 32. https://doi.org/10.48550/arXiv.1906.07413
Cevallos Valdiviezo, H., & Van Aelst, S. (2015). Tree-based prediction on incomplete data using imputation or surrogate decisions. Information Sciences, 311, 163– 181. https://doi.org/10.1016/j.ins.2015.03.018
Cheng, C.-H., Kao, Y.-F., & Lin, H.-P. (2021). A financial statement fraud model based on synthesized attribute selection and a dataset with missing values and imbalanced classes. Applied Soft Computing, 108, 107487.
https://doi.org/10.1016/j.asoc.2021.107487
Das, S., Datta, S., & Chaudhuri, B. B. (2018). Handling data irregularities in
classification: Foundations, trends, and future challenges. Pattern Recognition,
81, 674–693. https://doi.org/10.1016/j.patcog.2018.03.008
Datta, S., Misra, D., & Das, S. (2016). A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features. Pattern Recognition Letters, 80, 231–237. https://doi.org/10.1016/j.patrec.2016.06.023 Denil, M., & Trappenberg, T. (2010, May). Overlap versus imbalance. In Canadian conference on artificial intelligence (pp. 220-231). Springer, Berlin, Heidelberg. Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority
Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070
Enders, C. K. (2010). Applied Missing Data Analysis. Guilford Press. Farajzadeh-Zanjani, M., Razavi-Far, R., & Saif, M. (2016, December). Efficient
sampling techniques for ensemble learning and diagnosing bearing defects under class imbalanced condition. In 2016 IEEE symposium series on computational intelligence (SSCI) (pp. 1-7). IEEE. https://doi.org/10.1109/SSCI.2016.7849879
Farhangfar, A., Kurgan, L., & Dy, J. (2008). Impact of imputation of missing values on classification error for discrete data. Pattern Recognition, 41(12), 3692–3705. https://doi.org/10.1016/j.patcog.2008.05.019
Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, pp. 978-3). Berlin: Springer. https://doi.org/10.1007/978-3-319-98074-4
Fix, E., & Hodges Jr, J. L. (1952). Discriminatory analysis-nonparametric discrimination: Small sample performance. California Univ Berkeley. https://apps.dtic.mil/sti/citations/ADA800276
Fotouhi, S., Asadi, S., & Kattan, M. W. (2019). A comprehensive data level analysis for cancer diagnosis on imbalanced data. Journal of Biomedical Informatics, 90, 103089. https://doi.org/10.1016/j.jbi.2018.12.003
Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4), 42-47.
García, S., Zhang, Z.-L., Altalhi, A., Alshomrani, S., & Herrera, F. (2018). Dynamic ensemble selection for multi-class imbalanced datasets. Information Sciences, 445–446, 22–37. https://doi.org/10.1016/j.ins.2018.03.002
García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J. M., & Herrera, F. (2016). Big data preprocessing: methods and prospects. Big Data Analytics, 1(1), 1–22.
García, V., Mollineda, R. A., & Sánchez, J. S. (2008). On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Analysis and Applications, 11(3), 269–280. https://doi.org/10.1007/s10044-007-0087-5
García-Laencina, P. J., Sancho-Gómez, J.-L., & Figueiras-Vidal, A. R. (2010). Pattern classification with missing data: A review. Neural Computing and Applications, 19(2), 263–282. https://doi.org/10.1007/s00521-009-0295-6
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1
Halder, B., Ahmed, M. M., Amagasa, T., Isa, N. A. M., Faisal, R. H., & Rahman, M. (2022). Missing information in imbalanced data stream: fuzzy adaptive imputation approach. Applied Intelligence, 52(5), 5561-5583. https://doi.org/10.1007/s10489-021-02741-4
Han, H., Wang, W. Y., & Mao, B. H. (2005, August). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878-887). Springer, Berlin, Heidelberg.
Hand, D. J. (2007). Principles of Data Mining. Drug Safety, 30(7), 621–622. https://doi.org/10.2165/00002018-200730070-00010
Hasib, K. M., Iqbal, M. S., Shah, F. M., Mahmud, J. A., Popel, M. H., Showrov, M. I. H., Ahmed, S., & Rahman, O. (2020). A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem. Journal of Computer Science, 16(11), 1546–1557. https://doi.org/10.3844/jcssp.2020.1546.1557
Haykin, S. (1999). Self-organizing maps. Neural networks-A comprehensive foundation, 443-483.
Huang, C., Li, Y., Loy, C. C., & Tang, X. (2020). Deep Imbalanced Learning for Face Recognition and Attribute Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11), 2781–2794. https://doi.org/10.1109/TPAMI.2019.2914680
Jerez, J. M., Molina, I., García-Laencina, P. J., Alba, E., Ribelles, N., Martín, M., & Franco, L. (2010). Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial intelligence in medicine, 50(2), 105-115. https://doi.org/10.1016/j.artmed.2010.05.002
Kaur, H., Pannu, H. S., & Malhi, A. K. (2019). A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions. ACM Computing Surveys, 52(4), 79:1-79:36. https://doi.org/10.1145/3343440
Kaur, P., & Gosain, A. (2018). Comparing the behavior of oversampling and undersampling approach of class imbalance learning by combining class imbalance problem with noise. In ICT Based Innovations (pp. 23-30). Springer, Singapore. https://doi.org/10.1007/978-981-10-6602-3_3
Kim, H., Golub, G. H., & Park, H. (2005). Missing value estimation for DNA microarray gene expression data: Local least squares imputation. Bioinformatics (Oxford, England), 21(2), 187–198. https://doi.org/10.1093/bioinformatics/bth499
Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning, 30(2), 195–215. https://doi.org/10.1023/A:1007452223027
Lee, D.-H., Yang, J.-K., Lee, C.-H., & Kim, K.-J. (2019). A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data. Journal of Manufacturing Systems, 52, 146–156. https://doi.org/10.1016/j.jmsy.2019.07.001
Lee, W., Jun, C.-H., & Lee, J.-S. (2017). Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Information Sciences, 381, 92–103. https://doi.org/10.1016/j.ins.2016.11.014
Lin, W.-C., & Tsai, C.-F. (2020). Missing value imputation: A review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2), 1487–1509. https://doi.org/10.1007/s10462-019-09709-4
Little, R. J. A., & Rubin, D. B. (1989). The Analysis of Social Science Data with Missing Values. Sociological Methods and Research, 18(2–3). https://doi.org/10.1177/0049124189018002004
Liu, T., Fan, W., & Wu, C. (2019). A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset. Artificial Intelligence in Medicine, 101, 101723. https://doi.org/10.1016/j.artmed.2019.101723
López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information sciences, 250, 113–141. https://doi.org/10.1016/j.ins.2013.07.007
Makki, S., Assaghir, Z., Taher, Y., Haque, R., Hacid, M.-S., & Zeineddine, H. (2019). An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE Access, 7, 93010–93022. https://doi.org/10.1109/ACCESS.2019.2927266
Maloof, M. A. (2003). Learning when data sets are imbalanced and when costs are unequal and unknown. ICML-2003 Workshop on Learning from Imbalanced Data Sets II.
Murti, D. M. P., Pujianto, U., Wibawa, A. P., & Akbar, M. I. (2019). K-Nearest Neighbor (K-NN) based Missing Data Imputation. 2019 5th International Conference on Science in Information Technology (ICSITech), 83–88. https://doi.org/10.1109/ICSITech46713.2019.8987530
Puri, A., & Kumar Gupta, M. (2021). Knowledge discovery from noisy imbalanced and incomplete binary class data. Expert Systems with Applications, 181, 115179. https://doi.org/10.1016/j.eswa.2021.115179
Purwar, A., & Singh, S. K. (2015). Hybrid prediction model with missing value imputation for medical data. Expert Systems with Applications, 42(13), 5621– 5631. https://doi.org/10.1016/j.eswa.2015.02.050
Quinlan, J. R. (1993). C4. 5: programs for machine learning.
Razavi-Far, R., Farajzadeh-Zanajni, M., Wang, B., Saif, M., & Chakrabarti, S. (2021). Imputation-Based Ensemble Techniques for Class Imbalance Learning. IEEE Transactions on Knowledge and Data Engineering, 33(5), 1988–2001. https://doi.org/10.1109/TKDE.2019.2951556
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. https://doi.org/10.1093/biomet/63.3.581
Rubin, D. B. (1988, August). An overview of multiple imputation. In Proceedings of the survey research methods section of the American statistical association (pp. 79-84). Princeton, NJ, USA: Citeseer.
Salem, M., Taheri, S., & Yuan, J.-S. (2018). An Experimental Evaluation of Fault Diagnosis from Imbalanced and Incomplete Data for Smart Semiconductor Manufacturing. Big Data and Cognitive Computing, 2(4), 30. https://doi.org/10.3390/bdcc2040030
Santos, M. S., Pereira, R. C., Costa, A. F., Soares, J. P., Santos, J., & Abreu, P. H. (2019). Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access, 7, 11651–11667. https://doi.org/10.1109/ACCESS.2019.2891360
Schafer, J. L. (1997). Analysis of incomplete multivariate data. CRC press.
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2009). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems,
Man, and Cybernetics-Part A: Systems and Humans, 40(1), 185-
197.https://doi.org/10.1109/TSMCA.2009.2029559
Shin, K., Han, J., & Kang, S. (2021). MI-MOTE: Multiple imputation-based minority oversampling technique for imbalanced and incomplete data classification. Information Sciences, 575, 80–89. https://doi.org/10.1016/j.ins.2021.06.043 Thomas, T., & Rajabi, E. (2021). A systematic review of machine learning-based missing value imputation techniques. Data Technologies and Applications, 55(4), 558–585. https://doi.org/10.1108/DTA-12-2020-0298
Twala, B. (2009). An Empirical Comparison of Techniques for Handling Incomplete Data Using Decision Trees. Applied Artificial Intelligence, 23(5), 373–405. https://doi.org/10.1080/08839510902872223
Wang, X., Li, A., Jiang, Z., & Feng, H. (2006). Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme. BMC Bioinformatics, 7(1), 32. https://doi.org/10.1186/1471-2105-7-32
Wasikowski, M., & Chen, X. (2010). Combating the Small Sample Class Imbalance Problem Using Feature Selection. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1388–1400. https://doi.org/10.1109/TKDE.2009.187
Wei, R., Wang, J., Su, M., Jia, E., Chen, S., Chen, T., & Ni, Y. (2018). Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data. Scientific Reports, 8(1), 663. https://doi.org/10.1038/s41598-017-19120-0
Zhang, Y., Li, X., Gao, L., Wang, L., & Wen, L. (2018). Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning. Journal of Manufacturing Systems, 48, 34–50. https://doi.org/10.1016/j.jmsy.2018.04.005 |