摘要: | 隨著 COVID-19 疫情爆發以來,很多國家為了防止疫情擴散進行封城管制。 人們待在家裡減少不必要的外出,來避免染疫的風險,原本實體的社交活動改成 在線上社群媒體上進行,也透過社群媒體來了解與疫情相關的資訊。但是很多的 資訊是未經查證,卻因為社群媒體的特徵被輕易傳播,導致 COVID-19 假新聞在 各大社群平台上蔓延開來。 目前社群媒體 COVID-19 假新聞辨識研究,大多數學者僅使用社群媒體貼文 的文本內容,來進行 COVID-19 假新聞辨識,較少學者以貼文底下的留言內容, 或者是轉貼貼文內容作為特徵。另外,目前主要訓練 COVID-19 假新聞辨識模型 的語料資料集都是以英文為主的社群平台,如 Twitter,缺乏中文語料資料集。因 此本研究將使用文字探勘技術,提取中國知名社群媒體新浪微博上與疫情相關的 貼文文本特徵,貼文底下留言的內容特徵,與轉貼貼文的內容特徵,並使用貝氏 分類器、邏輯斯迴歸、隨機森林、支援向量機等機器學習方式,以建構 COVID- 19 假新聞辨識模型。最後實驗結果顯示,模型結合貼文內容、留言內容、轉貼內 容等特徵進行訓練,可以達到更好的模型辨識準確率。;With the outbreak of the COVID-19, many countries around the world have gone into lockdown to prevent the spread of the epidemic. People stay at home and reduce unnecessary going out to avoid the risk of infection. The physical social activities were changed to online social media, and information related to the epidemic was also obtained through social media. However, a lot of information was not verified, but was easily spread through the characteristics of social media, leading to COVID-19 fake news spread on major online social platforms. At present, most scholars only use the content of social media posts to detect COVID-19 fake news, and few scholars consider the content of social media comments, or the content of social media reposts. Additionally, the corpus mainly used for training COVID-19 fake news detection models are mostly English-based social platforms such as Twitter in most study, there are few corpus used in Chinese languages. Therefore, this study will use text mining technology to extract the content of posts related to the epidemic on Sina Weibo, a well-known social media in China, the content of comments, and the content of reposts, and use machine learning methods like Bayesian classifier, logistic regression, random forest, support vector machine to build COVID-19 fake news detection models. The final experimental results show that the model can achieve better model detection accuracy by combining the content of posts, the content of comments, and the content of reposts. |