dc.description.abstract | With the fast growing of social media and web 2.0 platform in recent years, people increasingly share their thoughts and exchange their opinions on the internet. The need for enterprise to understand the public opinion to improve their decision making is greater than ever. However, conventional sentiment analysis fails to accurately identify sarcasm, and class imbalance poses a major challenge in sarcasm detection. In order to handle the class imbalance problem in sarcasm detection, this study proposes six ensemble oversampling methods (SEO) that effectively exploit the advantages of various oversampling algorithms. By applying the concept of ensemble learning to oversampling techniques, the proposed methods - random, center, uncentered, cluster random, cluster center, and cluster uncentered - offer distinct selection approaches for the newly produced sarcastic data. In this study, SMOTE, ADASYN, polynom-fit-SMOTE, ProWSyn, SMOTE_IPF are adopted for the oversampling algorithms in the experiment. Furthermore, two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit, are utilized. After extracting features from the text using Word2Vec, GloVe, and FastText, oversampling and ensemble techniques are applied. The performance of SEO is evaluated using five classifiers - Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, and Logistic Regression - based on the classification results. The results shows that the proposed method outperform single oversampling algorithm method by 7% for AUC metric and 2% for F1-score for iSarcasmEval. While the improvement is 1.5% for AUC metric and 1% for F1-score for SARC-reduced. | en_US |