參考文獻 |
Altman, N.S. (1992) An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician. [Online] 46 (3), 175–185. Available from: doi:10.1080/00031305.1992.10475879.
Asmaa, M., Houda, B. & Ilham, B. (2012) Addressing the Problem of Unbalanced Data Sets in Sentiment Analysis: In: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval. [Online]. 2012 Barcelona, Spain, SciTePress - Science and and Technology Publications. pp. 306–311. Available from: doi:10.5220/0004142603060311 [Accessed: 16 October 2019].
Barandela, R., Valdovinos, R.M., Sánchez, J.S. & Ferri, F.J. (2004) The Imbalanced Training Sample Problem: Under or over Sampling? In: Ana Fred, Terry M. Caelli, Robert P. W. Duin, Aurélio C. Campilho, et al. (eds.). Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science. [Online]. Berlin, Heidelberg, Springer Berlin Heidelberg. pp. 806–814. Available from: doi:10.1007/978-3-540-27868-9_88 [Accessed: 14 March 2020].
Bria, A., Marrocco, C. & Tortorella, F. (2020) Addressing class imbalance in deep learning for small lesion detection on medical images. Computers in Biology and Medicine. [Online] 120103735. Available from: doi:10.1016/j.compbiomed.2020.103735.
Buda, M., Maki, A. & Mazurowski, M.A. (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks. [Online] 106249–259. Available from: doi:10.1016/j.neunet.2018.07.011.
Bunkhumpornpat, C., Sinapiromsaran, K. & Lursinsap, C. (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Thanaruk Theeramunkong, Boonserm Kijsirikul, Nick Cercone, & Tu-Bao Ho (eds.). Advances in Knowledge Discovery and Data Mining. [Online]. Berlin, Heidelberg, Springer Berlin Heidelberg. pp. 475–482. Available from: doi:10.1007/978-3-642-01307-2_43 [Accessed: 13 November 2019].
Burez, J. & Van den Poel, D. (2009) Handling class imbalance in customer churn prediction. Expert Systems with Applications. [Online] 36 (3), 4626–4636. Available from: doi:10.1016/j.eswa.2008.05.027.
Chawla, N.V., Bowyer, K.W., Hall, L.O. & Kegelmeyer, W.P. (2002) SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. [Online] 16321–357. Available from: doi:10.1613/jair.953.
Chen, T., Xu, R., Liu, B., Lu, Q., et al. (2014) WEMOTE - Word Embedding based Minority Oversampling Technique for Imbalanced Emotion and Sentiment Classification. In Workshop on Issues of Sentiment Discovery and Opinion Mining. 12.
Cortes, C. & Vapnik, V. (1995) Support-vector networks. Machine Learning. [Online] 20 (3), 273–297. Available from: doi:10.1007/BF00994018.
Dai, H. J., & Wang, C. K. (2019) Classifying adverse drug reactions from imbalanced twitter data. International journal of medical informatics. 129122–132.
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North. [Online]. 2019 Minneapolis, Minnesota, Association for Computational Linguistics. pp. 4171–4186. Available from: doi:10.18653/v1/N19-1423 [Accessed: 14 December 2019].
Domingos, P. & Pazzani, M. (1996) Beyond independence: Conditions for the optimality of the simple bayesian classier. In Proc. 13th Intl. Conf. Machine Learning. 105–112.
Fernandes, E., De Carvalho, A.C.P. de L.F. & Yao, X. (2019a) Ensemble of Classifiers based on MultiObjective Genetic Sampling for Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. [Online] 1–1. Available from: doi:10.1109/TKDE.2019.2898861.
Fernandes, E., De Carvalho, A.C.P. de L.F. & Yao, X. (2019b) Ensemble of Classifiers based on MultiObjective Genetic Sampling for Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. [Online] 1–1. Available from: doi:10.1109/TKDE.2019.2898861.
Flores, A.C., Icoy, R.I., Pena, C.F. & Gorro, K.D. (2018) An Evaluation of SVM and Naive Bayes with SMOTE on Sentiment Analysis Data Set. In: 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST). [Online]. July 2018 Phuket, IEEE. pp. 1–4. Available from: doi:10.1109/ICEAST.2018.8434401 [Accessed: 16 October 2019].
García, V., Sánchez, J.S., Marqués, A.I., Florencia, R., et al. (2019) Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data. Expert Systems with Applications. [Online] 113026. Available from: doi:10.1016/j.eswa.2019.113026.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., et al. (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications. [Online] 73220–239. Available from: doi:10.1016/j.eswa.2016.12.035.
Han, H., Wang, W.-Y. & Mao, B.-H. (2005) Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: De-Shuang Huang, Xiao-Ping Zhang, & Guang-Bin Huang (eds.). Advances in Intelligent Computing. [Online]. Berlin, Heidelberg, Springer Berlin Heidelberg. pp. 878–887. Available from: doi:10.1007/11538059_91 [Accessed: 13 November 2019].
Hidalgo, J.M.G. (2002) Evaluating cost-sensitive Unsolicited Bulk Email categorization. In: Proceedings of the 2002 ACM symposium on Applied computing. SAC ’02. [Online]. 11 March 2002 Madrid, Spain, Association for Computing Machinery. pp. 615–620. Available from: doi:10.1145/508791.508911 [Accessed: 30 June 2020].
Hidalgo, J.M.G., López, M.M. & Sanz, E.P. (2000) Combining Text and Heuristics for Cost-Sensitive Spam Filtering. In: Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop. [Online]. 2000 p. Available from: https://www.aclweb.org/anthology/W00-0719 [Accessed: 30 June 2020].
Japkowicz, N. (2000) The Class Imbalance Problem: Signi cance and Strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI. 7.
Japkowicz, N. & Stephen, S. (2002) The class imbalance problem: A systematic study1. Intelligent Data Analysis. [Online] 6 (5), 429–449. Available from: doi:10.3233/IDA-2002-6504.
Jo, T. & Japkowicz, N. (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter. [Online] 6 (1), 40. Available from: doi:10.1145/1007730.1007737.
Johnson, J.M. & Khoshgoftaar, T.M. (2019) Survey on deep learning with class imbalance. Journal of Big Data. [Online] 6 (1), 27. Available from: doi:10.1186/s40537-019-0192-5.
Karia, V., Zhang, W., Naeim, A. & Ramezani, R. (2019) GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets.
Kingma, D.P. & Welling, M. (2014) Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs, stat]. [Online] Available from: http://arxiv.org/abs/1312.6114 [Accessed: 24 January 2020].
Kovács, G. (2019) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing. [Online] 83105662. Available from: doi:10.1016/j.asoc.2019.105662.
Krawczyk, B. (2016) Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence. [Online] 5 (4), 221–232. Available from: doi:10.1007/s13748-016-0094-0.
Kubat, M., Holte, R.C. & Matwin, S. (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning. [Online] 30 (2/3), 195–215. Available from: doi:10.1023/A:1007452223027.
Kubat, M. & Matwin, S. (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Fourteenth interna- tional conference on machine learning.
Kusner, M.J., Sun, Y., Kolkin, N.I. & Weinberger, K.Q. (2015) From Word Embeddings To Document Distances. 10.
Li, C. & Liu, S. (2018) A comparative study of the class imbalance problem in Twitter spam detection. Concurrency and Computation: Practice and Experience. [Online] 30 (5), e4281. Available from: doi:10.1002/cpe.4281.
Li, J., Li, H. & Yu, J.-L. (2011) Application of Random-SMOTE on Imbalanced Data Mining. In: 2011 Fourth International Conference on Business Intelligence and Financial Engineering. [Online]. October 2011 Wuhan, Hubei, China, IEEE. pp. 130–133. Available from: doi:10.1109/BIFE.2011.25 [Accessed: 16 October 2019].
Li, Y., Sun, G. & Zhu, Y. (2010) Data Imbalance Problem in Text Classification. In: 2010 Third International Symposium on Information Processing. [Online]. October 2010 Qingdao, Shandong, China, IEEE. pp. 301–305. Available from: doi:10.1109/ISIP.2010.47 [Accessed: 18 December 2019].
Liu, X.-Y., Wu, J. & Zhou, Z.-H. (2009) Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). [Online] 39 (2), 539–550. Available from: doi:10.1109/TSMCB.2008.2007853.
Liu, Y., Loh, H.T. & Sun, A. (2009) Imbalanced text classification: A term weighting approach. Expert Systems with Applications. [Online] 36 (1), 690–701. Available from: doi:10.1016/j.eswa.2007.10.042.
Mani, I. & Zhang, I. (2003) KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In: Proceedings of the ICML’2003 workshop on learning from imbalanced datasets. 2003.
McCallum, A. & Nigam, K. (1998) A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization. 75241–48.
Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs]. [Online] Available from: http://arxiv.org/abs/1301.3781 [Accessed: 14 December 2019].
Mikolov, T., Yih, W. & Zweig, G. (2013) Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. [Online]. June 2013 Atlanta, Georgia, Association for Computational Linguistics. pp. 746–751. Available from: https://www.aclweb.org/anthology/N13-1090 [Accessed: 21 April 2020].
Mishra, S. (2017) Handling Imbalanced Data: SMOTE vs. Random Undersampling. 04 (08), 4.
Mountassir, A., Benbrahim, H. & Berrada, I. (2012) An empirical study to address the problem of Unbalanced Data Sets in sentiment classification. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). [Online]. October 2012 Seoul, Korea (South), IEEE. pp. 3298–3303. Available from: doi:10.1109/ICSMC.2012.6378300 [Accessed: 21 October 2019].
Padurariu, C. & Breaban, M.E. (2019) Dealing with Data Imbalance in Text Classification. Procedia Computer Science. [Online] 159736–745. Available from: doi:10.1016/j.procs.2019.09.229.
Pawar, P.Y. & Gawande, S.H. (2012) A Comparative Study on Different Types of Approaches to Text Categorization. International Journal of Machine Learning and Computing. [Online] 423–426. Available from: doi:10.7763/IJMLC.2012.V2.158.
Peters, M., Neumann, M., Iyyer, M., Gardner, M., et al. (2018) Deep Contextualized Word Representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). [Online]. 2018 New Orleans, Louisiana, Association for Computational Linguistics. pp. 2227–2237. Available from: doi:10.18653/v1/N18-1202 [Accessed: 14 December 2019].
Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018) Improving Language Understanding by Generative Pre-Training. Technical report, OpenAI. 12.
Rahman, M.M. & Davis, D.N. (2013) Addressing the Class Imbalance Problem in Medical Datasets. International Journal of Machine Learning and Computing. [Online] 224–228. Available from: doi:10.7763/IJMLC.2013.V3.307.
Rao, R.B., Krishnan, S. & Niculescu, R.S. (2006) Data mining for improved cardiac care. ACM SIGKDD Explorations Newsletter. [Online] 8 (1), 3–10. Available from: doi:10.1145/1147234.1147236.
Rennie, J., D., S. & L., T. (n.d.) Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03). 616–623.
Sahu, M., Mukhopadhyay, A., Szengel, A. & Zachow, S. (2017) Addressing multi-label imbalance problem of surgical tool detection using CNN. International Journal of Computer Assisted Radiology and Surgery. [Online] 12 (6), 1013–1020. Available from: doi:10.1007/s11548-017-1565-x.
Saladi, P.S.M. & Dash, T. (2019) Genetic Algorithm-Based Oversampling Technique to Learn from Imbalanced Data. In: Jagdish Chand Bansal, Kedar Nath Das, Atulya Nagar, Kusum Deep, et al. (eds.). Soft Computing for Problem Solving. [Online]. Singapore, Springer Singapore. pp. 387–397. Available from: doi:10.1007/978-981-13-1592-3_30 [Accessed: 16 October 2019].
Sarakit, P., Theeramunkong, T. & Haruechaiyasak, C. (2015) Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm. In: 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA). [Online]. August 2015 Chonburi, Thailand, IEEE. pp. 1–5. Available from: doi:10.1109/ICAICTA.2015.7335373 [Accessed: 16 October 2019].
Satriaji, W. & Kusumaningrum, R. (2018) Effect of Synthetic Minority Oversampling Technique (SMOTE), Feature Representation, and Classification Algorithm on Imbalanced Sentiment Analysis. In: 2018 2nd International Conference on Informatics and Computational Sciences (ICICoS). [Online]. October 2018 pp. 1–5. Available from: doi:10.1109/ICICOS.2018.8621648.
Stamatatos, E. (2008) Author identification: Using text sampling to handle the class imbalance problem. Information Processing & Management. [Online] 44 (2), 790–799. Available from: doi:10.1016/j.ipm.2007.05.012.
Su, P., Liu, Y. & Song, X. (2018) Research on Intrusion Detection Method Based on Improved Smote and XGBoost. In: Proceedings of the 8th International Conference on Communication and Network Security - ICCNS 2018. [Online]. 2018 Qingdao, China, ACM Press. pp. 37–41. Available from: doi:10.1145/3290480.3290505 [Accessed: 28 February 2020].
Sun, Y., Kamel, M.S., Wong, A.K.C. & Wang, Y. (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition. [Online] 40 (12), 3358–3378. Available from: doi:10.1016/j.patcog.2007.04.009.
Tallo, T.E. & Musdholifah, A. (2018) The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem. In: 2018 4th International Conference on Science and Technology (ICST). [Online]. August 2018 Yogyakarta, IEEE. pp. 1–4. Available from: doi:10.1109/ICSTC.2018.8528591 [Accessed: 18 December 2019].
Thabtah, F., Hammoud, S., Kamalov, F. & Gonsalves, A. (2020) Data imbalance in classification: Experimental evaluation. Information Sciences. [Online] 513429–441. Available from: doi:10.1016/j.ins.2019.11.004.
Ting, S., Ip, W. & Tsang, A. (2011) Is Naïve bayes a good classifier for document classification? International journal of software engineering and its applications. v. 537–46.
Turénko, D., Khan, A., Hussain, R. & Imran Ali, S. (2020) Oversampling Versus Variational Autoencoders: Employing Synthetic Data for Detection of Heracleum Sosnowskyi in Satellite Images. In: Kuinam J. Kim & Hye-Young Kim (eds.). Information Science and Applications. [Online]. Singapore, Springer Singapore. pp. 399–409. Available from: doi:10.1007/978-981-15-1465-4_40 [Accessed: 16 February 2020].
Van Hulse, J., Khoshgoftaar, T.M. & Napolitano, A. (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning - ICML ’07. [Online]. 2007 Corvalis, Oregon, ACM Press. pp. 935–942. Available from: doi:10.1145/1273496.1273614 [Accessed: 27 February 2020].
Wan, Z., Zhang, Y. & He, H. (2017) Variational autoencoder based synthetic data generation for imbalanced learning. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI). [Online]. November 2017 Honolulu, HI, IEEE. pp. 1–7. Available from: doi:10.1109/SSCI.2017.8285168 [Accessed: 16 February 2020].
Wei, W., Li, J., Cao, L., Ou, Y., et al. (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web. [Online] 16 (4), 449–475. Available from: doi:10.1007/s11280-012-0178-0.
Xu, R., Chen, T., Xia, Y., Lu, Q., et al. (2015) Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification. Cognitive Computation. [Online] 7 (2), 226–240. Available from: doi:10.1007/s12559-015-9319-y.
Zhang, W., Yoshida, T. & Tang, X. (2011) A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications. [Online] 38 (3), 2758–2765. Available from: doi:10.1016/j.eswa.2010.08.066.
Zhang, Y.-P., Zhang, L.-N. & Wang, Y.-C. (2010) Cluster-based majority under-sampling approaches for class imbalance learning. In: 2010 2nd IEEE International Conference on Information and Financial Engineering. [Online]. September 2010 pp. 400–404. Available from: doi:10.1109/ICIFE.2010.5609385.
Zheng, W. & Jin, M. (2020) The Effects of Class Imbalance and Training Data Size on Classifier Learning: An Empirical Study. SN Computer Science. [Online] 1 (2), 71. Available from: doi:10.1007/s42979-020-0074-0.
Zhi, W.M., Guo, H.P. & Fan, M. (2012) Discussion of Classification for Imbalanced Data Sets. Advanced Materials Research. [Online] 546–547622–627. Available from: doi:10.4028/www.scientific.net/AMR.546-547.622. |