中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/93254
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 41650121      在线人数 : 1364
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93254


    题名: 利用集成式過採樣方法解決諷刺偵測之類別不平衡問題;Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques
    作者: 林鈺融;Lin, Yu-Jung
    贡献者: 資訊管理學系
    关键词: 諷刺偵測;類別不平衡;過採樣;集成式學習;Sarcasm detection;Class imbalance;Oversampling;Ensemble learning
    日期: 2023-07-24
    上传时间: 2024-09-19 16:50:47 (UTC+8)
    出版者: 國立中央大學
    摘要: 隨著近年來社交媒體和 Web 2.0 平台的快速發展,越來越多的使用者在網路上分享他們的想法並交換意見。企業理解公眾輿論以改善決策的需求比以往任何時候都更加迫切。然而,傳統的情感分析卻無法準確識別諷刺,其中類別不平衡是一個主要問題,為了解決諷刺偵測中的類別不平衡的問題,本研究提出了六種集成過採樣方法(SEO)來有效發揮不同過採樣演算法的優勢。透過將集成學習的概念應用於過採樣技術,所提出的方法 – random、center、uncentered、cluster random、cluster center和cluster uncentered - 為新生成的諷刺資料提供了不同的選擇方法。在本研究中,採用了SMOTE、ADASYN、polynom-fit-SMOTE、ProWSyn和SMOTE-IPF作為實驗中使用的過採樣演算法,並且使用從Twitter和Reddit收集的兩個類別不平衡的諷刺偵測資料集(即iSarcasmEval和SARC-reduced),將文本經過Word2Vec、GloVe、FastText萃取特徵後進行過採樣與集成,以五個分類器 - 支持向量機、決策樹、隨機森林、極限梯度提升和邏輯斯回歸的分類結果對SEO的性能進行評估。實驗結果顯示,SEO在iSarcasmEval的AUC指標上比起單一過採樣演算法高出了7%,在F1-score上則高出了2%。而SARC-reduced,SEO比起單一演算法在AUC指標有著1.5%的提升,在F1-score則有著1% 的提升。;With the fast growing of social media and web 2.0 platform in recent years, people increasingly share their thoughts and exchange their opinions on the internet. The need for enterprise to understand the public opinion to improve their decision making is greater than ever. However, conventional sentiment analysis fails to accurately identify sarcasm, and class imbalance poses a major challenge in sarcasm detection. In order to handle the class imbalance problem in sarcasm detection, this study proposes six ensemble oversampling methods (SEO) that effectively exploit the advantages of various oversampling algorithms. By applying the concept of ensemble learning to oversampling techniques, the proposed methods - random, center, uncentered, cluster random, cluster center, and cluster uncentered - offer distinct selection approaches for the newly produced sarcastic data. In this study, SMOTE, ADASYN, polynom-fit-SMOTE, ProWSyn, SMOTE_IPF are adopted for the oversampling algorithms in the experiment. Furthermore, two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit, are utilized. After extracting features from the text using Word2Vec, GloVe, and FastText, oversampling and ensemble techniques are applied. The performance of SEO is evaluated using five classifiers - Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, and Logistic Regression - based on the classification results. The results shows that the proposed method outperform single oversampling algorithm method by 7% for AUC metric and 2% for F1-score for iSarcasmEval. While the improvement is 1.5% for AUC metric and 1% for F1-score for SARC-reduced.
    显示于类别:[資訊管理研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML15检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明