使用集成式深度學習方法偵測PTT BasketballTW討論版中諷刺言論之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：18.220.162.159

姓名

陳韋州(Wei-Chou Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

使用集成式深度學習方法偵測PTT BasketballTW討論版中諷刺言論之研究
(Sarcasm Detection in PTT BasketballTW Discussion Board： Using Ensemble Deep Learning Approach)

相關論文

★ 技術商品銷售之技術人員關鍵職能探討	★ 資訊委外之承包商能力、信任及溝通與委外成效關係之個案研究
★ 兵工技術軍官職能需求分析-以某軍事工廠為例	★ 不同楷模學習模式對VB程式語言學習之影響
★ 影響採購「網路資料中心產品」因素之探討	★ 資訊人員績效評估之研究—以陸軍某資訊單位為例
★ 高職資料處理科學生網路成癮相關因素及其影響之探討	★ 資訊服務委外對資訊部門及人員之衝擊-某大型外商公司之個案研究
★ 二次導入ERP系統之研究-以某個案公司為例	★ 資料倉儲於證券產業應用之個案研究
★ 影響消費者採用創新數位產品之因素---以整合式手機為例	★ 企業合併下資訊系統整合過程之個案研究
★ 資料倉儲系統建置之個案研究	★ 電子表單系統導入之探討 - 以 A 公司為例
★ 企業資訊安全機制導入與評估–以H公司為例	★ 從人力網站探討國內資訊人力現況–以104銀行資料為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-1以後開放)

摘要(中)

隨著網路社群平台的發展，諷刺文本在線上溝通中扮演了重要角色，但由於其隱晦特性及特殊表達方式，自動檢測諷刺文本在自然語言處理領域仍是一項挑戰。本研究旨在探討適用於繁體中文的諷刺文本自動檢測方法，透過結合多種先進的預訓練語言模型並採用集成學習策略，以提升識別準確性。為了研究當前台灣網路環境中常見的諷刺表達方式，本研究從台灣網路論壇PTT的籃球版（BasketballTW）收集資料，開發了一個繁體中文的諷刺資料集。在資料集建構過程中，選擇合適的標註人員並評估標記一致性，以確保資料品質。而實驗結果表明，集成學習的策略在繁體中文諷刺文本偵測上能夠有效提升分類的效能，特別是結合多個預訓練語言模型的預測機率可以顯著提升模型效能，而結合語言模型的最後一層隱藏層嵌入向量的方法，以及將多個預訓練語言模型的預測機率結合手工設計特徵的方法，在效能提升上則相對有限。

摘要(英)

With the development of online social platforms, sarcastic texts play an increasingly important role in online communication. However, due to their implicit nature and unique expression, automatically detecting sarcastic texts remains a challenge in the field of natural language processing. This study aims to explore methods for automatically detecting sarcastic texts in Traditional Chinese by combining various advanced pre-trained language models and adopting ensemble learning strategies to enhance detection accuracy. Data was collected from the basketball message board (BasketballTW), which is one of Taiwan′s largest online forum, PTT, to develop a dataset of sarcastic texts in Traditional Chinese. During the dataset construction, appropriate annotators were selected and the consistency of annotations was evaluated to ensure data quality. Experimental results indicate that ensemble learning strategies significantly improve the classification performance of detecting sarcastic texts in Traditional Chinese, especially when combining the prediction probabilities of multiple pre-trained language models. However, the method of combining the last hidden layer embeddings of language models and integrating manually designed features with the prediction probabilities of multiple pre-trained language models shows relatively limited improvements in performance.

關鍵字(中)

★ 集成式學習
★ 諷刺偵測
★ 預訓練語言模型
★ 自然語言處理

關鍵字(英)

★ ensemble learning
★ sarcasm detection
★ pre-trained language models
★ natural language processing

論文目次

摘要 i
致謝詞 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 4
第二章文獻探討 6
2.1諷刺偵測之相關研究 6
2.1.1 諷刺定義 6
2.1.2 建立諷刺資料集之相關研究 6
2.1.3 諷刺偵測方法之相關研究 11
2.2 集成學習 15
2.3 預訓練語言模型 16
第三章研究方法 18
3.1 資料來源 19
3.2 資料標註方式、評估標註品質 20
3.3 資料集樣貌 21
3.4 類別不平衡處理 22
3.5 資料前處理 23
3.6 特徵萃取 24
3.7 集成式諷刺偵測模型 25
3.7.1 基學習器 25
3.7.1.1 BERT 26
3.7.1.2 BERT-CNN 28
3.7.1.3 RoBERTa 29
3.7.1.4 MacBERT 29
3.7.1.5 XLNet 30
3.8 實驗評估指標 31
第四章實驗評估 33
4.1 實驗設計 33
4.1.1 基線方法 33
4.1.2 集成學習 34
4.1.2.1 結合最後一層隱藏層狀態 36
4.1.2.2 結合各基學習器的機率 36
4.1.2.3 集成學習結合手工設計特徵 37
4.2 實驗結果 37
4.2.1 基線方法 37
4.2.2 結合最後一層隱藏層狀態 38
4.2.3 結合各基學習器的機率 39
4.2.4 集成學習結合手工設計特徵 40
4.3 討論 41
第五章研究結論與建議 45
5.1 研究結論 45
5.2 研究限制 46
5.3 未來研究方法與建議 47
參考文獻 48

參考文獻

Barbieri, F., & Saggion, H. (2014). Modelling irony in Twitter. Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 56-64. https://doi.org/10.3115/v1/E14-3007
Bamman, D., & Smith, N. (2021). Contextualized sarcasm detection on Twitter. Proceedings of the International AAAI Conference on Web and Social Media, 9(1), 574-577. https://doi.org/10.1609/icwsm.v9i1.14655
Baroiu, A.-C., & Trausan-Matu, S. (2023). Comparison of Deep Learning Models for Automatic Detection of Sarcasm Context on the MUStARD Dataset. Electronics, 12(3), 666. https://doi.org/10.3390/electronics12030666
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/BF00058655
Cui, Y., Che, W., Liu, T., Qin, B., & Yang, Z. (2021). Pre-Training With Whole Word Masking for Chinese BERT. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021). https://arxiv.org/abs/2004.06354
Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010), 107-116. https://doi.org/10.3115/1596374.1596399
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. http://arxiv.org/abs/1810.04805.
Dietterich, T. G. (2000). Ensemble methods in machine learning. In J. Kittler & F. Roli (Eds.), Multiple Classifier Systems (Vol. 1857, pp. 1-15). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-45014-9_1.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1006/jcss.1997.1504.
Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., & Suganthan, P. N. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115, 105151. https://doi.org/10.1016/j.engappai.2022.105151.
Ghosh, D., & Veale, T. (2016). Fracking sarcasm using convolutional neural network. Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 161-169. https://doi.org/10.18653/v1/W16-0425
Gong, X., Zhao, Q., Zhang, J., Mao, R., & Xu, R. (2020). The design and construction of a Chinese sarcasm dataset. arXiv preprint arXiv:2001.00496.
Gonzalez-Ibanez, R., Muresan, S., & Wacholder, N. (2011). Identifying sarcasm in Twitter: A closer look. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 581-586. https://aclanthology.org/P11-2102.
Jia, X., Deng, Z., Min, F., & Liu, D. (2019). Three-way decisions based feature fusion for Chinese irony detection. International Journal of Approximate Reasoning, 113, 324–335. https://doi.org/10.1016/j.ijar.2019.07.010
Khodak, M., Saunshi, N., & Vodrahalli, K. (2018). A large self-annotated corpus for sarcasm. arXiv preprint arXiv:1704.05579. http://arxiv.org/abs/1704.05579.
Krishna, M. M., Midhunchakkaravarthy, & Vankara, J. (2023). Detection of Sarcasm Using Bi-Directional RNN Based Deep Learning Model in Sentiment Analysis. Journal of Advanced Research in Applied Sciences and Engineering Technology, 31(2), 352–362. https://doi.org/10.37934/araset.31.2.352362.
Kumar, A., Narapareddy, V. T., Aditya Srikanth, V., Malapati, A., & Neti, L. B. M. (2020). Sarcasm detection using multi-head attention based bidirectional LSTM. IEEE Access, 8, 6388-6397. https://doi.org/10.1109/ACCESS.2019.2963630.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310.
Li, J., Pan, H., Lin, Z., Fu, P., & Wang, W. (2021). Sarcasm detection with commonsense knowledge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3192-3201. https://doi.org/10.1109/TASLP.2021.3120601.
Liu, B. (2010). Sentiment analysis and subjectivity. In N. Indurkhya & F. J. Damerau (Eds.), Handbook of Natural Language Processing (2nd ed., pp. 627-666). CRC Press.
Liu, L., Priestley, J. L., Zhou, Y., Ray, H. E., & Han, M. (2019). A2Text-Net: A novel deep neural network for sarcasm detection. In 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI), 118-126. https://doi.org/10.1109/CogMI48466.2019.00025.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. http://arxiv.org/abs/1907.11692.
Mohammed, A., & Kora, R. (2022). An effective ensemble deep learning framework for text classification. Journal of King Saud University - Computer and Information Sciences, 34(10), 8825-8837. https://doi.org/10.1016/j.jksuci.2021.11.001.
Pexman, P. M. (2018). How do we understand sarcasm? Frontiers in Psychology, 9, 549. https://doi.org/10.3389/fpsyg.2018.00549.
Plepi, J., Flek, L., & Ai, C. (2021). Perceived and intended sarcasm detection with graph attention networks. Findings of the Association for Computational Linguistics: EMNLP 2021, 4746-4753. https://arxiv.org/abs/2110.04001.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2018). Language models are unsupervised multitask learners. OpenAI Blog. http://arxiv.org/abs/1901.09612.
Reyes, A., Rosso, P., & Veale, T. (2013). A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, 47(1), 239-268. https://doi.org/10.1007/s10579-012-9196-x.
Riloff, E., Qadir, A., Surve, P., Silva, L. D., Gilbert, N., & Huang, R. (2013). Sarcasm as contrast between a positive sentiment and negative situation. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 704-714. https://aclanthology.org/D13-1066.
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery, 8(4), e1249. https://doi.org/10.1002/widm.1249.
Tang, Y., & Chen, H. H. (2014). Chinese irony corpus construction and ironic structure analysis. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 70-79. https://aclanthology.org/D14-1066.
Tay, Y., Tuan, L. A., Hui, S. C., & Su, J. (2018). Reasoning with sarcasm by reading in-between. arXiv preprint arXiv:1805.02856. http://arxiv.org/abs/1805.02856.
凃育婷 (2020). 基於順序遷移學習開發繁體中文情感分析工具. 碩士論文, 國立台灣大學資訊管理研究所.
Van Hee, C., Lefever, E., & Hoste, V. (2018). SemEval-2018 Task 3: Irony detection in English tweets. Proceedings of The 12th International Workshop on Semantic Evaluation, 39-50. https://doi.org/10.18653/v1/S18-1005.
Wen, Z., Gui, L., Wang, Q., Guo, M., Yu, X., Du, J., Xu, R. (2022). Sememe knowledge and auxiliary information enhanced approach for sarcasm detection. Information Processing & Management, 59(3), 102883. https://doi.org/10.1016/j.ipm.2022.102883.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1.
Xiang, R., Gao, X., Long, Y., Li, A., Chersoni, E., Lu, Q., & Huang, C.-R. (2020). Ciron: A new benchmark dataset for Chinese irony detection. Proceedings of The 28th International Conference on Computational Linguistics, 4507-4517. https://aclanthology.org/2020.coling-main.395.
Xiong, T., Zhang, P., Zhu, H., & Yang, Y. (2019). Sarcasm detection with self-matching networks and low-rank bilinear pooling. The World Wide Web Conference, 2115-2124. https://doi.org/10.1145/3308558.3313735.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237. https://arxiv.org/abs/1906.08237.
Zhang, H., Lu, H., Li, Y., & Li, S. (2019). Detecting sarcasm in text: An iterative semi-supervised approach. Expert Systems with Applications, 138, 112834. https://doi.org/10.1016/j.eswa.2019.07.031.
Zhang, S., Zhang, X., Chan, J., & Rosso, P. (2019). Irony detection via sentiment-based transfer learning. Information Processing & Management, 56(5), 1633-1644. https://doi.org/10.1016/j.ipm.2019.04.006.
Zheng, S., & Yang, M. (2019). A new method of improving BERT for text classification. In Z. Cui, J. Pan, S. Zhang, L. Xiao, & J. Yang (Eds.), Intelligence Science and Big Data Engineering. Big Data and Machine Learning (Vol. 11936, pp. 442-452). Springer International Publishing. https://doi.org/10.1007/978-3-030-36204-1_37.

指導教授

周惠文

審核日期

2024-7-26

推文