基於 BERT 與 TF-IDF 特徵之假消息辨識模型—以繁體中文為例;Traditional Chinese Fake news Detection based on Bert Model Combined with TF-IDF (Term Frequency – Inverse Document Frequency)

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/89808

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89808

題名:	基於 BERT 與 TF-IDF 特徵之假消息辨識模型—以繁體中文為例;Traditional Chinese Fake news Detection based on Bert Model Combined with TF-IDF (Term Frequency – Inverse Document Frequency)
作者:	林家琪;Lin, Chia-Chi
貢獻者:	資訊管理學系
關鍵詞:	BERT;TF-IDF;深度學習;假消息辨識;BERT;TF-IDF;Deep Learning;Fake News Detection
日期:	2022-07-18
上傳時間:	2022-10-04 12:00:36 (UTC+8)
出版者:	國立中央大學
摘要:	假消息問題在臺灣與國際上日益嚴重。假消息帶來影響層面廣泛，政治層面可改變選舉結果、疾病層面可使民眾恐慌，戰爭中更以此為工具，混淆閱聽者之判斷。為防治假消息，不同國家訂定法律，社群平台也提出方案防治假消息。然而，假消息查核耗費人力，資料量大且時效緩慢，因此近年不同研究皆致力於使用深度學習技術進行假消息辨識，因應巨大的資料量並希望能減少人力成本。然而，繁體中文使用族群龐大，也同樣的面對假消息問題，但繁體中文之假消息辨識相關研究仍然較少。因應繁體中文使用者需求，本研究提出深度學習模型 BERT(Bidirectional Encoder Representations from Transformers)-TFIDF(Term Frequency – Inverse Document Frequency) 進行假消息辨識，並以繁體中文假消息資料集進行驗證以探討其表現。資料集以「Cofacts 真的假的」事實查核平台及其他 3 個民間單位、 5 個政府單位之消息為來源，經過資料處理後，進行 BERT-TFIDF 與其他模型之比較，並應用於英文資料集「Liar, liar pants on fire」。實驗結果顯示本研究提出之 BERT- TFIDF 模型可準確辨識出 90% 的繁體中文假消息，同時應用在英文資料集上亦可提升 Recall 和 F-measure。本研究成果提供良好辨識假消息之預測模型，並驗證 TF- IDF 語意特徵結合深度學習之預測。;Fake news is becoming an increasingly serious problem in Taiwan and in the rest of the world. This problem has substantially affected many aspects of our daily life, such as politics and public health, for example, affecting election results and creating disease- related panic to the public. To a worse extent, fake news can easily confuse people’s judgment when wars break out. Up until today, many countries have enacted laws and countless social network sites have proposed all kinds of plans to prevent fake news from spreading. However, misinformation-checking involves a labor-intensive and time- consuming process to verify a huge amount of data. To deal with the huge amount of data, a number of researchers in recent years have employed deep learning techniques to verify misinformation in an attempt to reduce manpower and costs. Traditional Chinese users are a big group that is also faced with fake news. Up until now, very few researchers have studied the topics related to the verification of traditional Chinese misinformation. To cope with traditional Chinese users’ need, this paper introduced a deep learning model “BERT (Bidirectional Encoder Representations from Transformers)-TFIDF (Term Frequency – Inverse Document Frequency)” to verify fake news, using traditional Chinese fake news dataset to evaluate the performance of BERT- TFIDF. The dataset is made up of the information obtained from a fact-checking platform “Cofacts”, 3 private organizations and 5 government agencies. In this paper, data was processed and BERT-TFIDF was compared with other models and then applied to an English dataset “Liar, liar pants on fire.” According to the experiment results, the BERT- TFIDF had identified 90% of traditional Chinese misinformation and was sufficient to improve the “Recall” and “F-measure” of English dataset. The research results provide a predictive model with proven ability to verity fake news and to validate the prediction of semantic features combined with deep learning techniques.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	31	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....