探討使用多面向方法在文字不平衡資料集之分類問題影響

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：141

、訪客IP：18.219.189.247

姓名

陳芃諭(Peng-Yu Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

探討使用多面向方法在文字不平衡資料集之分類問題影響
(The Effectiveness of Multifaceted Approach to Class Imbalance Text Classification)

相關論文

★ 多重標籤文本分類之實證研究 : word embedding 與傳統技術之比較	★ 基於圖神經網路之網路協定關聯分析
★ 學習模態間及模態內之共用表示式	★ Hierarchical Classification and Regression with Feature Selection
★ 病徵應用於病患自撰日誌之情緒分析	★ 基於注意力機制的開放式對話系統
★ 針對特定領域任務—基於常識的BERT模型之應用	★ 基於社群媒體使用者之硬體設備差異分析文本情緒強烈程度
★ 機器學習與特徵工程用於虛擬貨幣異常交易監控之成效討論	★ 捷運轉轍器應用長短期記憶網路與機器學習實現最佳維保時間提醒
★ 基於半監督式學習的網路流量分類	★ ERP日誌分析-以A公司為例
★ 企業資訊安全防護：網路封包蒐集分析與網路行為之探索性研究	★ 資料探勘技術在顧客關係管理之應用─以C銀行數位存款為例
★ 人臉圖片生成與增益之可用性與效率探討分析	★ 人工合成文本之資料增益於不平衡文字分類問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-10以後開放)

摘要(中)

文字類別不平衡任務在許多情境與應用常常出現，例如: 垃圾郵件偵測、文本分類任務...等。處理類別不平衡問題時，往往都會採用重採樣方法(resampling techniques)，然而，處理類別不平衡問題時，需要考量到採納不同面向方法所帶來的影響。在本論文，我們觀察了不同面向對於文字不平衡資料集在分類上所帶來的影響，例如: 不同種的資料表示法(TF-IDF, Word2Vec, ELMo 以及 BERT), 重採樣方法(SMOTE)以及生成方法(VAE)在不同的類別不平衡比例。我們也納入多種分類器與上述方法做組合搭配，觀察差異為何。
從實驗結果來看，我們可以推薦一個較佳的組合方法處理文字類別不平衡的資料集。ELMo, SMOTE和SVM會是適合處理文字不平衡資料集，然而當資料集的資料量越大時，TF-IDF, SMOTE和SVM會是較佳的組合結果。
我們發現在處理文字不平衡資料集時，資料表示法、合成方法、生成方法、分類器、類別不平衡比例與資料量大小都是會互相影響。此外，比較分類器訓練在合成資料或是生成資料時，SMOTE的結果會比VAE來的較好，甚至在TF-IDF, SMOTE以及SVM此組合可以超越真實資料的結果。
本論文中，我們採納TF-IDF和其他embedding方法，並且關注在SMOTE與VAE，以及比較合成資料、生成資料與原始資料。我們甚至觀察不同的類別不平衡比例與資料量大小所帶來的影響。

摘要(英)

Class imbalance is present in many text classification applications, for example, text polarity classification, spam detection, topic classification and so on. Resampling techniques are commonly used to deal with class imbalance problems. However, it takes a multifaceted approach to effectively address the class imbalance problems. In this study, we investigate the effectiveness of different text representations (TF-IDF, Word2Vec, ELMo and BERT), resampling techniques (SMOTE) and generative techniques (VAE) on various class imbalance ratios. We also evaluate how different classifiers perform with these techniques.
From the experiment results, we can devise a general recommendation for dealing with class imbalance in text classification. The combination of ELMo, SMOTE and SVM is suitable for dealing with the imbalance dataset. However, as the larger training data set is, the combination of TF-IDF, SMOTE and SVM could be more suitable.
We find that the perspectives of dealing with the class imbalance dataset are affected to each other, like data representation, synthetic method, generative method, classifiers, class imbalance ratio and the training data size. Besides, comparing that the classifiers are trained with the synthetic data and generative data, SMOTE still outperforms than VAE. Even the result of the combination of TF-IDF, SMOTE and SVM can surpass the original data.
In our study, we take TF-IDF and the embedding methods be the data representation in the experiment, and focus on SMOTE and VAE, also compare the result of synthetic data and generative data with original data. Even considering the class imbalance and training data size to be one of the perspectives in our study.

關鍵字(中)

★ 類別不平衡
★ 文字分類
★ SMOTE
★ 機器學習
★ 深度學習

關鍵字(英)

★ class imbalance
★ text classification
★ SMOTE
★ machine learning
★ deep learning

論文目次

Chinese Abstract .. .. .. I
Abstract .. .. .. .. II
Acknowledgement .. .. .. III
Table of Contents .. .. .. .. IV
List of figures.. .. .. .. VI
List of tables .. .. .. .. VII
1. Introduction .. .. .. .. 1
1.1. Overview .. .. .. .. 1
1.2. Motivation .. .. .. 2
1.3. Objectives .. .. .. 3
1.4. Thesis Organization .. .. .. . 3
2. Related Works .. .. .. .. 4
2.1. Class Imbalance Issue .. .. 4
2.2. Dealing with Class Imbalance in Text .. .. 5
2.3. Data Representation for Text .. .. . 9
2.3.1. Bag-of-words model (BoW) .. .. .. 10
2.3.2. Term Frequency-Inverse Document Frequency (TF-IDF) . 11
2.3.3. Word2Vec .. .. .. 12
2.3.4. Embeddings from Language Models (ELMo) .. . 13
2.3.5. Bidirectional Encoder Representation from Transformers (BERT) .. 14
2.3.6. Summary of Data Representation for Text .. . 15
2.4. Data-Level Techniques .. .. .. 15
2.4.1. Under-sampling methods .. .. 15
2.4.2. Over-sampling methods .. .. .. 16
2.5. Generative Models.. .. .. . 18
2.5.1. Variational auto-encoding (VAE) .. .. . 18
2.6. Classifiers .. .. .. .. 20
2.6.1. Support Vector Machine .. .. .. 21
2.6.2. Naïve Bayes .. .. .. 22
2.6.3. K-nearest neighbors .. .. .. 23
2.6.4. Random Forest .. .. . 23
2.6.5. Extreme Gradient Boosting .. .. 24
2.6.6. Multilayer perceptron .. .. 25
2.7. Evaluation Metrics.. .. .. . 26
2.7.1. Confusion Metric .. .. 26
2.7.2. F1-Score .. .. .. .. 27
2.7.3. Geometric Means .. .. 28
2.8. Discussion .. .. .. . 28
3. Methodology .. .. .. .. 30
3.1. Datasets .. .. .. .. 34
3.2. Experimental Setups .. .. 40
3.2.1. Preprocessing .. .. 40
3.3. Hyperparameters settings .. .. .. 41
3.3.1. Data representation hypermeters settings .. 41
3.3.2. Synthesis minority methods hypermeters settings .. .. 42
3.3.3. Classifier hypermeters settings .. .. . 42
4. Experiment Results .. .. .. . 44
4.1. Experiment Objectives Analysis .. .. 44
4.1.1. First Experiment Objective .. .. 44
4.1.2. Second Experiment Objective .. .. .. 52
4.1.3. Third Experiment Objective .. .. .. 55
4.2. Summary the experiment results .. .. 65
5. Conclusion .. .. .. .. 68
5.1. Summary .. .. .. 68
5.2. Contribution .. .. .. . 69
5.3. Limitation .. .. .. .. 69
5.4. Future Works .. .. .. 69
6. Reference .. .. .. . 70

參考文獻

Altman, N.S. (1992) An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician. [Online] 46 (3), 175–185. Available from: doi:10.1080/00031305.1992.10475879.
Asmaa, M., Houda, B. & Ilham, B. (2012) Addressing the Problem of Unbalanced Data Sets in Sentiment Analysis: In: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval. [Online]. 2012 Barcelona, Spain, SciTePress - Science and and Technology Publications. pp. 306–311. Available from: doi:10.5220/0004142603060311 [Accessed: 16 October 2019].
Barandela, R., Valdovinos, R.M., Sánchez, J.S. & Ferri, F.J. (2004) The Imbalanced Training Sample Problem: Under or over Sampling? In: Ana Fred, Terry M. Caelli, Robert P. W. Duin, Aurélio C. Campilho, et al. (eds.). Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science. [Online]. Berlin, Heidelberg, Springer Berlin Heidelberg. pp. 806–814. Available from: doi:10.1007/978-3-540-27868-9_88 [Accessed: 14 March 2020].
Bria, A., Marrocco, C. & Tortorella, F. (2020) Addressing class imbalance in deep learning for small lesion detection on medical images. Computers in Biology and Medicine. [Online] 120103735. Available from: doi:10.1016/j.compbiomed.2020.103735.
Buda, M., Maki, A. & Mazurowski, M.A. (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks. [Online] 106249–259. Available from: doi:10.1016/j.neunet.2018.07.011.
Bunkhumpornpat, C., Sinapiromsaran, K. & Lursinsap, C. (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In: Thanaruk Theeramunkong, Boonserm Kijsirikul, Nick Cercone, & Tu-Bao Ho (eds.). Advances in Knowledge Discovery and Data Mining. [Online]. Berlin, Heidelberg, Springer Berlin Heidelberg. pp. 475–482. Available from: doi:10.1007/978-3-642-01307-2_43 [Accessed: 13 November 2019].
Burez, J. & Van den Poel, D. (2009) Handling class imbalance in customer churn prediction. Expert Systems with Applications. [Online] 36 (3), 4626–4636. Available from: doi:10.1016/j.eswa.2008.05.027.
Chawla, N.V., Bowyer, K.W., Hall, L.O. & Kegelmeyer, W.P. (2002) SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. [Online] 16321–357. Available from: doi:10.1613/jair.953.
Chen, T., Xu, R., Liu, B., Lu, Q., et al. (2014) WEMOTE - Word Embedding based Minority Oversampling Technique for Imbalanced Emotion and Sentiment Classification. In Workshop on Issues of Sentiment Discovery and Opinion Mining. 12.
Cortes, C. & Vapnik, V. (1995) Support-vector networks. Machine Learning. [Online] 20 (3), 273–297. Available from: doi:10.1007/BF00994018.
Dai, H. J., & Wang, C. K. (2019) Classifying adverse drug reactions from imbalanced twitter data. International journal of medical informatics. 129122–132.
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North. [Online]. 2019 Minneapolis, Minnesota, Association for Computational Linguistics. pp. 4171–4186. Available from: doi:10.18653/v1/N19-1423 [Accessed: 14 December 2019].
Domingos, P. & Pazzani, M. (1996) Beyond independence: Conditions for the optimality of the simple bayesian classier. In Proc. 13th Intl. Conf. Machine Learning. 105–112.
Fernandes, E., De Carvalho, A.C.P. de L.F. & Yao, X. (2019a) Ensemble of Classifiers based on MultiObjective Genetic Sampling for Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. [Online] 1–1. Available from: doi:10.1109/TKDE.2019.2898861.
Fernandes, E., De Carvalho, A.C.P. de L.F. & Yao, X. (2019b) Ensemble of Classifiers based on MultiObjective Genetic Sampling for Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. [Online] 1–1. Available from: doi:10.1109/TKDE.2019.2898861.
Flores, A.C., Icoy, R.I., Pena, C.F. & Gorro, K.D. (2018) An Evaluation of SVM and Naive Bayes with SMOTE on Sentiment Analysis Data Set. In: 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST). [Online]. July 2018 Phuket, IEEE. pp. 1–4. Available from: doi:10.1109/ICEAST.2018.8434401 [Accessed: 16 October 2019].
García, V., Sánchez, J.S., Marqués, A.I., Florencia, R., et al. (2019) Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data. Expert Systems with Applications. [Online] 113026. Available from: doi:10.1016/j.eswa.2019.113026.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., et al. (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications. [Online] 73220–239. Available from: doi:10.1016/j.eswa.2016.12.035.
Han, H., Wang, W.-Y. & Mao, B.-H. (2005) Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: De-Shuang Huang, Xiao-Ping Zhang, & Guang-Bin Huang (eds.). Advances in Intelligent Computing. [Online]. Berlin, Heidelberg, Springer Berlin Heidelberg. pp. 878–887. Available from: doi:10.1007/11538059_91 [Accessed: 13 November 2019].
Hidalgo, J.M.G. (2002) Evaluating cost-sensitive Unsolicited Bulk Email categorization. In: Proceedings of the 2002 ACM symposium on Applied computing. SAC ’02. [Online]. 11 March 2002 Madrid, Spain, Association for Computing Machinery. pp. 615–620. Available from: doi:10.1145/508791.508911 [Accessed: 30 June 2020].
Hidalgo, J.M.G., López, M.M. & Sanz, E.P. (2000) Combining Text and Heuristics for Cost-Sensitive Spam Filtering. In: Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop. [Online]. 2000 p. Available from: https://www.aclweb.org/anthology/W00-0719 [Accessed: 30 June 2020].
Japkowicz, N. (2000) The Class Imbalance Problem: Signi cance and Strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI. 7.
Japkowicz, N. & Stephen, S. (2002) The class imbalance problem: A systematic study1. Intelligent Data Analysis. [Online] 6 (5), 429–449. Available from: doi:10.3233/IDA-2002-6504.
Jo, T. & Japkowicz, N. (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter. [Online] 6 (1), 40. Available from: doi:10.1145/1007730.1007737.
Johnson, J.M. & Khoshgoftaar, T.M. (2019) Survey on deep learning with class imbalance. Journal of Big Data. [Online] 6 (1), 27. Available from: doi:10.1186/s40537-019-0192-5.
Karia, V., Zhang, W., Naeim, A. & Ramezani, R. (2019) GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets.
Kingma, D.P. & Welling, M. (2014) Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs, stat]. [Online] Available from: http://arxiv.org/abs/1312.6114 [Accessed: 24 January 2020].
Kovács, G. (2019) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing. [Online] 83105662. Available from: doi:10.1016/j.asoc.2019.105662.
Krawczyk, B. (2016) Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence. [Online] 5 (4), 221–232. Available from: doi:10.1007/s13748-016-0094-0.
Kubat, M., Holte, R.C. & Matwin, S. (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning. [Online] 30 (2/3), 195–215. Available from: doi:10.1023/A:1007452223027.
Kubat, M. & Matwin, S. (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Fourteenth interna- tional conference on machine learning.
Kusner, M.J., Sun, Y., Kolkin, N.I. & Weinberger, K.Q. (2015) From Word Embeddings To Document Distances. 10.
Li, C. & Liu, S. (2018) A comparative study of the class imbalance problem in Twitter spam detection. Concurrency and Computation: Practice and Experience. [Online] 30 (5), e4281. Available from: doi:10.1002/cpe.4281.
Li, J., Li, H. & Yu, J.-L. (2011) Application of Random-SMOTE on Imbalanced Data Mining. In: 2011 Fourth International Conference on Business Intelligence and Financial Engineering. [Online]. October 2011 Wuhan, Hubei, China, IEEE. pp. 130–133. Available from: doi:10.1109/BIFE.2011.25 [Accessed: 16 October 2019].
Li, Y., Sun, G. & Zhu, Y. (2010) Data Imbalance Problem in Text Classification. In: 2010 Third International Symposium on Information Processing. [Online]. October 2010 Qingdao, Shandong, China, IEEE. pp. 301–305. Available from: doi:10.1109/ISIP.2010.47 [Accessed: 18 December 2019].
Liu, X.-Y., Wu, J. & Zhou, Z.-H. (2009) Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). [Online] 39 (2), 539–550. Available from: doi:10.1109/TSMCB.2008.2007853.
Liu, Y., Loh, H.T. & Sun, A. (2009) Imbalanced text classification: A term weighting approach. Expert Systems with Applications. [Online] 36 (1), 690–701. Available from: doi:10.1016/j.eswa.2007.10.042.
Mani, I. & Zhang, I. (2003) KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In: Proceedings of the ICML’2003 workshop on learning from imbalanced datasets. 2003.
McCallum, A. & Nigam, K. (1998) A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization. 75241–48.
Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs]. [Online] Available from: http://arxiv.org/abs/1301.3781 [Accessed: 14 December 2019].
Mikolov, T., Yih, W. & Zweig, G. (2013) Linguistic Regularities in Continuous Space Word Representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. [Online]. June 2013 Atlanta, Georgia, Association for Computational Linguistics. pp. 746–751. Available from: https://www.aclweb.org/anthology/N13-1090 [Accessed: 21 April 2020].
Mishra, S. (2017) Handling Imbalanced Data: SMOTE vs. Random Undersampling. 04 (08), 4.
Mountassir, A., Benbrahim, H. & Berrada, I. (2012) An empirical study to address the problem of Unbalanced Data Sets in sentiment classification. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). [Online]. October 2012 Seoul, Korea (South), IEEE. pp. 3298–3303. Available from: doi:10.1109/ICSMC.2012.6378300 [Accessed: 21 October 2019].
Padurariu, C. & Breaban, M.E. (2019) Dealing with Data Imbalance in Text Classification. Procedia Computer Science. [Online] 159736–745. Available from: doi:10.1016/j.procs.2019.09.229.
Pawar, P.Y. & Gawande, S.H. (2012) A Comparative Study on Different Types of Approaches to Text Categorization. International Journal of Machine Learning and Computing. [Online] 423–426. Available from: doi:10.7763/IJMLC.2012.V2.158.
Peters, M., Neumann, M., Iyyer, M., Gardner, M., et al. (2018) Deep Contextualized Word Representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). [Online]. 2018 New Orleans, Louisiana, Association for Computational Linguistics. pp. 2227–2237. Available from: doi:10.18653/v1/N18-1202 [Accessed: 14 December 2019].
Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018) Improving Language Understanding by Generative Pre-Training. Technical report, OpenAI. 12.
Rahman, M.M. & Davis, D.N. (2013) Addressing the Class Imbalance Problem in Medical Datasets. International Journal of Machine Learning and Computing. [Online] 224–228. Available from: doi:10.7763/IJMLC.2013.V3.307.
Rao, R.B., Krishnan, S. & Niculescu, R.S. (2006) Data mining for improved cardiac care. ACM SIGKDD Explorations Newsletter. [Online] 8 (1), 3–10. Available from: doi:10.1145/1147234.1147236.
Rennie, J., D., S. & L., T. (n.d.) Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03). 616–623.
Sahu, M., Mukhopadhyay, A., Szengel, A. & Zachow, S. (2017) Addressing multi-label imbalance problem of surgical tool detection using CNN. International Journal of Computer Assisted Radiology and Surgery. [Online] 12 (6), 1013–1020. Available from: doi:10.1007/s11548-017-1565-x.
Saladi, P.S.M. & Dash, T. (2019) Genetic Algorithm-Based Oversampling Technique to Learn from Imbalanced Data. In: Jagdish Chand Bansal, Kedar Nath Das, Atulya Nagar, Kusum Deep, et al. (eds.). Soft Computing for Problem Solving. [Online]. Singapore, Springer Singapore. pp. 387–397. Available from: doi:10.1007/978-981-13-1592-3_30 [Accessed: 16 October 2019].
Sarakit, P., Theeramunkong, T. & Haruechaiyasak, C. (2015) Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm. In: 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA). [Online]. August 2015 Chonburi, Thailand, IEEE. pp. 1–5. Available from: doi:10.1109/ICAICTA.2015.7335373 [Accessed: 16 October 2019].
Satriaji, W. & Kusumaningrum, R. (2018) Effect of Synthetic Minority Oversampling Technique (SMOTE), Feature Representation, and Classification Algorithm on Imbalanced Sentiment Analysis. In: 2018 2nd International Conference on Informatics and Computational Sciences (ICICoS). [Online]. October 2018 pp. 1–5. Available from: doi:10.1109/ICICOS.2018.8621648.
Stamatatos, E. (2008) Author identification: Using text sampling to handle the class imbalance problem. Information Processing & Management. [Online] 44 (2), 790–799. Available from: doi:10.1016/j.ipm.2007.05.012.
Su, P., Liu, Y. & Song, X. (2018) Research on Intrusion Detection Method Based on Improved Smote and XGBoost. In: Proceedings of the 8th International Conference on Communication and Network Security - ICCNS 2018. [Online]. 2018 Qingdao, China, ACM Press. pp. 37–41. Available from: doi:10.1145/3290480.3290505 [Accessed: 28 February 2020].
Sun, Y., Kamel, M.S., Wong, A.K.C. & Wang, Y. (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition. [Online] 40 (12), 3358–3378. Available from: doi:10.1016/j.patcog.2007.04.009.
Tallo, T.E. & Musdholifah, A. (2018) The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem. In: 2018 4th International Conference on Science and Technology (ICST). [Online]. August 2018 Yogyakarta, IEEE. pp. 1–4. Available from: doi:10.1109/ICSTC.2018.8528591 [Accessed: 18 December 2019].
Thabtah, F., Hammoud, S., Kamalov, F. & Gonsalves, A. (2020) Data imbalance in classification: Experimental evaluation. Information Sciences. [Online] 513429–441. Available from: doi:10.1016/j.ins.2019.11.004.
Ting, S., Ip, W. & Tsang, A. (2011) Is Naïve bayes a good classifier for document classification? International journal of software engineering and its applications. v. 537–46.
Turénko, D., Khan, A., Hussain, R. & Imran Ali, S. (2020) Oversampling Versus Variational Autoencoders: Employing Synthetic Data for Detection of Heracleum Sosnowskyi in Satellite Images. In: Kuinam J. Kim & Hye-Young Kim (eds.). Information Science and Applications. [Online]. Singapore, Springer Singapore. pp. 399–409. Available from: doi:10.1007/978-981-15-1465-4_40 [Accessed: 16 February 2020].
Van Hulse, J., Khoshgoftaar, T.M. & Napolitano, A. (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning - ICML ’07. [Online]. 2007 Corvalis, Oregon, ACM Press. pp. 935–942. Available from: doi:10.1145/1273496.1273614 [Accessed: 27 February 2020].
Wan, Z., Zhang, Y. & He, H. (2017) Variational autoencoder based synthetic data generation for imbalanced learning. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI). [Online]. November 2017 Honolulu, HI, IEEE. pp. 1–7. Available from: doi:10.1109/SSCI.2017.8285168 [Accessed: 16 February 2020].
Wei, W., Li, J., Cao, L., Ou, Y., et al. (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web. [Online] 16 (4), 449–475. Available from: doi:10.1007/s11280-012-0178-0.
Xu, R., Chen, T., Xia, Y., Lu, Q., et al. (2015) Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification. Cognitive Computation. [Online] 7 (2), 226–240. Available from: doi:10.1007/s12559-015-9319-y.
Zhang, W., Yoshida, T. & Tang, X. (2011) A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications. [Online] 38 (3), 2758–2765. Available from: doi:10.1016/j.eswa.2010.08.066.
Zhang, Y.-P., Zhang, L.-N. & Wang, Y.-C. (2010) Cluster-based majority under-sampling approaches for class imbalance learning. In: 2010 2nd IEEE International Conference on Information and Financial Engineering. [Online]. September 2010 pp. 400–404. Available from: doi:10.1109/ICIFE.2010.5609385.
Zheng, W. & Jin, M. (2020) The Effects of Class Imbalance and Training Data Size on Classifier Learning: An Empirical Study. SN Computer Science. [Online] 1 (2), 71. Available from: doi:10.1007/s42979-020-0074-0.
Zhi, W.M., Guo, H.P. & Fan, M. (2012) Discussion of Classification for Imbalanced Data Sets. Advanced Materials Research. [Online] 546–547622–627. Available from: doi:10.4028/www.scientific.net/AMR.546-547.622.

指導教授

柯士文(Shih-Wen Ke)

審核日期

2020-7-20

推文