情感分析方法於COVID-19疫情預測之適用性評估

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：34

、訪客IP：3.138.124.167

姓名

黃紹航(Shao-Hang Huang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

情感分析方法於COVID-19疫情預測之適用性評估

相關論文

★ 具代理人之行動匿名拍賣與付款機制	★ 網路攝影機遠端連線安全性分析
★ HSDPA環境下的複合式細胞切換機制	★ 樹狀結構為基礎之行動隨意網路IP位址分配機制
★ 平面環境中目標區域之偵測 - 使用行動感測網路技術	★ 藍芽Scatternet上的P2P檔案分享機制
★ 交通壅塞避免之動態繞路機制	★ 運用UWB提升MANET上檔案分享之效能
★ 合作學習平台對團體迷思現象及學習成效之影響–以英文字彙學習為例	★ 以RFID為基礎的室內定位機制─使用虛擬標籤的經驗法則
★ 適用於實體購物情境的行動商品比價系統-使用影像辨識技術	★ 信用卡網路刷卡安全性
★ DEAP:適用於行動RFID系統之高效能動態認證協定	★ 在破產預測與信用評估領域對前處理方式與分類器組合的比較分析
★ 單一類別分類方法於不平衡資料集－搭配遺漏值填補和樣本選取方法	★ 正規化與變數篩選在破產領域的適用性研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-8-1以後開放)

摘要(中)

COVID-19仍持續威脅著世界各國的公共衛生，而有效地預測COVID-19確診以及死亡人數上升或下降的趨勢，將有助於研究人員和政策制定者通過將 COVID-19 推向正確的方向來降低死亡率和確診率，目前對於COVID-19疫預測皆為使用結構化資料進行預測，並沒有學者使用非結構化的資料進行預測。然而，在非結構化預測上，因為社群媒體的蓬勃發展，透過社群媒體的文本進行預測，在各領域上皆有許多學者以此進行實驗，因此本研究想要透過社群媒體上有關於COVID-19的文本進行疫情趨勢預測。
本研究主要是利用不同的情感分析方法，將社群媒體的文本產生每日情感分數，再結合結構化資料進行疫情預測，預測目標為確診人數變化以及死亡人數變化。本研究選用不同的情感分析方法(辭典法、情感分析套件、靜動態詞嵌入方法)，並使用三種不同的分類器，SVM、LSTM、Bi-GRU進行分類，去分析最為有效預測疫情趨勢的組合，最終，本實驗發現以動態詞嵌入方法RoBERTa搭配Bi-GRU有最佳疫情趨勢預測，在預測確診人數，其評估指標Precision最高達75.89%。

摘要(英)

COVID-19 is continuing to threaten the public hygiene of countries around the world. An
efficiently way to predict the trend of COVID-19 epidemic will help researchers and policy maker make the right decision to reduce the mortality rate and confrimed case rate.At present,All research on COVID-19 epidemic prediction is based on technical data, However, With the development of social media,Using social media texts to predict is common in various fields.Therfore, This research is mainly discussed about using different sentiment analysis methods to generate daily sentiment scores from social media texts,and combine technical data
for epidemic prediction.
This research selects different sentiment analysis methods(dictionary method, API, and dynamic word embedding sentiment analysis method),and uses three different classifiers ,SVM、LSTM、Bi-GRU for epidemic prediction. At the end of the research, we found that the dynamic word embedding sentiment analysis method RoBERTa with the epidemic prediction classifier Bi-GRU can predict the trend of COVID-19 epidemic with best combination. In predicting the
number of confirmed cases, evaluation indicator precision is rasie to 75.89%.

關鍵字(中)

★ 情感分析
★ 詞嵌入
★ 疫情預測
★ 機器學習
★ 深度學習

關鍵字(英)

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
一、緒論 1
1-1 研究背景 1
1-2 研究動機 2
1-3 研究目的 3
二、文獻探討 5
2-1 COVID-19情感分析任務在社群媒體上的研究 5
2-2 探討不同種類詞嵌入的方法 6
2-2-1 Word2Vec 9
2-2-2 Glove 9
2-2-3 BERT 10
2-2-4 GRUBERT 12
2-2-5 RoBERTa 12
2-3 預測COVID-19確診人數以及死亡人數的分類器模型 13
2-3-1 SVM 14
2-3-2 RNN 14
2-3-3 LSTM 15
2-3-4 Bi-GRU 16
三、研究方法 17
3-1 資料蒐集 18
3-2 資料前處理 18
3-2-1 非結構化資料前處理 18
3-2-2 結構化資料前處理 19
3-3 詞嵌入方法在情感分析任務上的預測效能 20
3-4 標註公式以及發酵日期 21
3-5 評估指標 22
3-6 探討不同情感分析方法以及不同分類器對於疫情預測之適用性 23
3-6-1 Day Forward-Chaining 24
3-6-2 辭典法 25
3-6-3 情感分析套件Vader 26
3-6-4 詞嵌入方法 26
3-6-5 分類器 26
3-7 探討文本與結構化資料合併後在疫情趨勢預測的效用 26
四、實驗結果與分析 27
4-1 探討詞嵌入方法在Sentiment140資料集下的效能 27
4-2 探討不同情感分析方法以及不同分類器對於疫情預測之影響 29
4-2-1 探討最佳人數變化倍率以及發酵日 29
4-2-2 不同分類器下比較不同情感分析方法對於疫情預測之影響 35
4-2-2 小結 40
4-3 探討不同國家資料對於疫情預測之影響 41
4-4 探討不同關鍵字的資料對於疫情預測之影響 42
4-4-1 探討所下關鍵字不同蒐集的資料集對於準確率之影響 43
4-4-2 探討不同資料集合併後對於疫情趨勢的準確率之影響 45
4-5 探討兩種資料型態合併後對於疫情預測之影響 47
五、結論 49
5-1 結論與貢獻 49
5-2 研究限制 51
5-3 未來研究與建議 51
參考文獻 53

參考文獻

[1] Bandyopadhyay, Samir Kumar, and Shawni Dutta, “Machine learning approach for confirmation of covid-19 cases: Positive, negative, death and release,” MedRxiv, vol. 1, pp. 1-10, 2020.
[2] Statista, Social network penetration worldwide from 2017 to 2025,https://www.statista.com/statistics/260811/social-network-penetration-worldwide/.
[3] Dhaoui, Chedia, Cynthia M. Webster, and Lay Peng Tan. "Social media sentiment analysis: lexicon versus machine learning."Journal of Consumer Marketing, vol.34, pp.1-9, 2017.
[4] Subhasis Sanyal, Mohit Kumar Barai. "Comparative Study on Lexicon-based sentiment analysers over Negative sentiment. "International Journal of Electrical, Electronics and Computers, vol.6, pp.1-13, 2021
[5] Yadav, Ashima, and Dinesh Kumar Vishwakarma. "Sentiment analysis using deep learning architectures: a review." Artificial Intelligence Review, vol.53, pp.4335-4385, 2020.
[6] Zhao, Wei, et al. "Weakly-supervised deep embedding for product review sentiment analysis." IEEE Transactions on Knowledge and Data Engineering, vol.30, pp.185-197, 2017.
[7] Dubey, Akash Dutt. "Twitter Sentiment Analysis during COVID-19 Outbreak, "SSRN, pp.1-9, 2020, http://dx.doi.org/10.2139/ssrn.3572023.
[8] Yin, Hui, Shuiqiao Yang, and Jianxin Li. "Detecting topic and sentiment dynamics due to COVID-19 pandemic using social media. "International Conference on Advanced Data Mining and Applications, vol.12447, pp.610-623, 2020.
[9] Fernandez, Gabriela, et al. "Sentiment analysis of social media response and spatial distribution patterns on the COVID-19 outbreak: The case study of Italy. " Empowering Human Dynamics Research with Social Media and Geospatial Data Analytics, vol.1, pp.167-184, 2021.
[10] Elbagir, Shihab, and Jing Yang. "Twitter sentiment analysis using natural language toolkit and VADER sentiment." Proceedings of the International MultiConference of Engineers and Computer Scientists, vol.122, pp.1-5, 2019
[11] Marcec, Robert, and Robert Likic. "Using Twitter for sentiment analysis towards AstraZeneca/Oxford, Pfizer/BioNTech and Moderna COVID-19 vaccines." Postgraduate Medical Journal, pp.1-7, 2021.
[12] Naseem, Usman, et al. "Covidsenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis." IEEE Transactions on Computational Social Systems, vol.8, pp.1003-1015, 2021.
[13] Yuxuan Wang et al. "From static to dynamic word representations: a survey. "International Journal of Machine Learning and Cybernetics, vol.11, pp.1611-1630, 2020.
[14] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv, pp.1-12, 2013.
[15] Jeffrey Pennington, Richard Socher, Christopher Manning” GloVe: Global Vectors for Word Representation” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol1, pp 1532–1543,2014.
[16] Jacob Devlin et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp. 4171–4186, 2019.
[17] Horne, Leo, et al. "GRUBERT: A GRU-Based Method to Fuse BERT Hidden Layers for Twitter Sentiment Analysis." Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, vol.1, pp.130-138, 2020.
[18] Ballı,Serkan "Data analysis of Covid-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. "Chaos, Solitons & Fractals, vol.142, pp.1-11, 2021.
[19] Ayoobi, Nooshin, et al. "Time Series Forecasting of New Cases and New Deaths Rate for COVID-19 using Deep Learning Methods. " Results in Physics, vol.27, pp.1-26, 2021.
[20] Fierro, Constanza, Jorge Pérez, and Javier Mora. "Predicting unplanned readmissions with highly unstructured data." arXiv preprint, pp.1-7, 2020.
[21] Kilimci, Zeynep Hilal, and Ramazan Duvar. "An Efficient Word Embedding and Deep Learning Based Model to Forecast the Direction of Stock Exchange Market Using Twitter and Financial News Sites: A Case of Istanbul Stock Exchange (BIST 100)." IEEE Access, vol.8, pp. 188186-188198, 2020.
[22] Mrityunjay, Amit Kumar Jakhar, and Shivam Pandey. "Sentiment analysis on the impact of coronavirus in social life using the BERT model." Social Network Analysis and Mining, vol.11, pp.1-11, 2021.
[23] KM, Vijayashree Karanth, Pramod Sunagar, and Anita Kanavalli. "Analysis of sentiments in political-based tweets using machine learning techniques." Proceedings of 2019 Global Conference for Advancement in Technology (GCAT) IEEE, vol.1, pp.1-5, 2019.
[24] Kolasani, Sai Vikram, and Rida Assaf. "Predicting Stock Movement Using Sentiment Analysis of Twitter Feed with Neural Networks." Journal of Data Analysis and Information Processing, vol.8, pp.309-319, 2020.
[25] Dang, Nhan Cach, María N. Moreno-García, and Fernando De la Prieta. "Sentiment analysis based on deep learning: A comparative study." Electronics, vol.9, pp.1-29, 2020.
[26] Yang Liu, Jelena Trajkovic, Hen-Geul Henry Yeh, Wenlu Zhang” Machine Learning for Predicting Stock Market Movement using News Headlines” 2020 IEEE Green Energy and Smart Systems Conference (IGESSC), vol1, pp 1-6,2020.
[27] Akrivi Krouska, Christos Troussas, Maria Virvou” Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-trained Word Embedding” Machine Learning Paradigms, vol18, pp 111–124,2020.
[28] Fazeel Abid, Chenli, Muhammad Alam, Adnan Abid” Representation of Words Over Vectors in Recurrent Convolutional Attention Architecture for Sentiment Analysis” 2019 International Conference on Innovative Computing (ICIC), vol1, pp 1–8,2019.
[29] Ayyub, Kashif, et al. "Exploring Diverse Features for Sentiment Quantification Using Machine Learning Algorithms." IEEE Access, vol.8, pp.142819-142831, 2020.
[30] Jasy, Md Deloar Hossan, et al. "A Performance Evaluation of Sentiment Classification Applying SVM, KNN, and Naive Bayes." Proceedings of 2021 International Conference on Computing, Networking, Telecommunications & Engineering Sciences Applications (CoNTESA) IEEE, vol.1, pp.56-60, 2021.
[31] Junqi Dai, Hang Yan, Tianxiang Sun, Pengfei Liu, Xipeng Qiu “Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol1, pp1816-1829,2021.
[32] Gupta, Amit Kumar, et al. "Prediction of COVID-19 pandemic measuring criteria using support vector machine, prophet and linear regression models in Indian scenario." Journal of Interdisciplinary Mathematics, vol.24, pp.89-108, 20201.
[33] İsmail Kırbaş, Adnan Sözen, et al. " Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches." Chaos, Solitons & Fractals, vol.138, pp.1-7, 2020.
[34] Corinna Cortes & Vladimir Vapnik. " Support-vector networks. "Machine learning, vol.20, pp.273-297, 1995.
[35] Sepp Hochreiter, Jürgen Schmidhuber. " Long Short-Term Memory. " Neural Computatio, vol.9, pp. 1735-1780, 1997.
[36] Google Code, Word2Vec Pretrained Model on Google Website, https://code.google.com/archive/p/word2vec/.
[37] Github of StandfordNLP, Glove Pretrained Model on Stanford, https://github.com/stanfordnlp/GloVe.
[38] Hugging Face, https://huggingface.co/docs/transformers/main/en/index.
[39] Harvard Health Publishing, If you′ve been exposed, are sick, or are caring for someone with COVID-19, https://www.health.harvard.edu/diseases-and-conditions/if-youve-been-exposed-to-the-coronavirus.
[40] Drugs.com, How do COVID-19 symptoms progress and what causes death? https://www.drugs.com/medical-answers/covid-19-symptoms-progress-death-3536264/.
[41] Petersen, Kai, and Jan M. Gerken. "# Covid-19: An exploratory investigation of hashtag usage on Twitter." Health Policy, vol.125, pp.541-547,2021

指導教授

蘇坤良

審核日期

2022-9-7

推文