博碩士論文 109423051 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:44 、訪客IP:3.148.168.26
姓名 林家琪(Chia-Chi Lin)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 基於 BERT 與 TF-IDF 特徵之假消息辨識模型—以繁體中文為例
(Traditional Chinese Fake news Detection based on Bert Model Combined with TF-IDF (Term Frequency – Inverse Document Frequency))
相關論文
★ 運用資料探勘法探討台灣老年人口全民健保醫療資源利用之研究★ 運用地理資訊系統與資料探勘技術於基層診所選址分析與研究─以台北市為例
★ 以醫師觀點探討看診輔助系統建置之研究★ 以創新抗拒觀點探討消費者對客服機器人使用意圖之研究
★ 網路拍賣頁面相關的服務品質 對賣家經營績效之影響★ 多重商品類別的線上再購行為預測模型
★ 以使用與滿足理論與科技接受模式探討人機介面對網購意願之影響★ 整合網路口碑之個人化醫療院所推薦系統-以牙醫診所為例
★ 網路口碑影響智慧型手機銷售量的時間動態分析★ 運用資料探勘技術於建置招生 決策支援系統之研究
★ 評估臨床決策支援系統對候診時間與 醫病關係之影響★ 高等教育招生決策支援系統建構之研究
★ 以社會網路分析觀點探討巨量資料在健康保健領域之研究發展★ 醫療App人機互動設計對使用者滿意度之研究
★ 社群媒體粉絲頁經營之研究─ 以Facebook某健康粉絲頁為例★ 基於網路口碑與醫療利用理論之混合式推薦系統
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2027-7-6以後開放)
摘要(中) 假消息問題在臺灣與國際上日益嚴重。假消息帶來影響層面廣泛,政治層面可改變 選舉結果、疾病層面可使民眾恐慌,戰爭中更以此為工具,混淆閱聽者之判斷。為防 治假消息,不同國家訂定法律,社群平台也提出方案防治假消息。然而,假消息查核 耗費人力,資料量大且時效緩慢,因此近年不同研究皆致力於使用深度學習技術進行 假消息辨識,因應巨大的資料量並希望能減少人力成本。
然而,繁體中文使用族群龐大,也同樣的面對假消息問題,但繁體中文之假消息 辨識相關研究仍然較少。因應繁體中文使用者需求,本研究提出深度學習模型 BERT(Bidirectional Encoder Representations from Transformers)-TFIDF(Term Frequency – Inverse Document Frequency) 進行假消息辨識,並以繁體中文假消息資料集進行驗 證以探討其表現。資料集以「Cofacts 真的假的」事實查核平台及其他 3 個民間單位、 5 個政府單位之消息為來源,經過資料處理後,進行 BERT-TFIDF 與其他模型之比 較,並應用於英文資料集「Liar, liar pants on fire」。實驗結果顯示本研究提出之 BERT- TFIDF 模型可準確辨識出 90% 的繁體中文假消息,同時應用在英文資料集上亦可提 升 Recall 和 F-measure。本研究成果提供良好辨識假消息之預測模型,並驗證 TF- IDF 語意特徵結合深度學習之預測。
摘要(英) Fake news is becoming an increasingly serious problem in Taiwan and in the rest of the world. This problem has substantially affected many aspects of our daily life, such as politics and public health, for example, affecting election results and creating disease- related panic to the public. To a worse extent, fake news can easily confuse people’s judgment when wars break out. Up until today, many countries have enacted laws and countless social network sites have proposed all kinds of plans to prevent fake news from spreading. However, misinformation-checking involves a labor-intensive and time- consuming process to verify a huge amount of data. To deal with the huge amount of data, a number of researchers in recent years have employed deep learning techniques to verify misinformation in an attempt to reduce manpower and costs.
Traditional Chinese users are a big group that is also faced with fake news. Up until now, very few researchers have studied the topics related to the verification of traditional Chinese misinformation. To cope with traditional Chinese users’ need, this paper introduced a deep learning model “BERT (Bidirectional Encoder Representations from Transformers)-TFIDF (Term Frequency – Inverse Document Frequency)” to verify fake news, using traditional Chinese fake news dataset to evaluate the performance of BERT- TFIDF. The dataset is made up of the information obtained from a fact-checking platform “Cofacts”, 3 private organizations and 5 government agencies. In this paper, data was processed and BERT-TFIDF was compared with other models and then applied to an English dataset “Liar, liar pants on fire.” According to the experiment results, the BERT- TFIDF had identified 90% of traditional Chinese misinformation and was sufficient to improve the “Recall” and “F-measure” of English dataset. The research results provide a predictive model with proven ability to verity fake news and to validate the prediction of semantic features combined with deep learning techniques.
關鍵字(中) ★ BERT
★ TF-IDF
★ 深度學習
★ 假消息辨識
關鍵字(英) ★ BERT
★ TF-IDF
★ Deep Learning
★ Fake News Detection
論文目次 摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 v
表目錄 vi
第一章 緒論 1
1.1 研究背景 1
1.2 臺灣假消息現況 2
1.3 研究動機與目的 3
1.4 論文架構 5
第二章 文獻探討 6
2.1 假消息辨識之研究方向 6
2.2 機器學習、深度學習於文字類型假消息辨識 8
第三章 研究方法 11
3.1 研究流程 11
3.2 模型設計 15
3.3 BERT-TFIDF 模型 20
3.4 實驗流程 24
第四章 實驗結果與分析 28
4.1實驗環境 28
4.2 實驗資料與資料分配 29
4.3 評估與驗證 31
4.4 參數設定 33
4.5 實驗結果 36
第五章 研究結論與建議 41
參考文獻 李俊儀 (2021)。基於深度學習技術之假新聞偵測研究-以臺灣社群Cofacts為例(未出版碩士論文)。國防大學,桃園縣。
林儀, & 林志成 (2019)。年金改革假新聞之研究。學校行政,(121),204-219.
許文錦, 李牧衡, & 呂明聲 (2022)。運用 BERT 深度學習模型於衛教謠言檢測之研究。資訊管理學報 ,29(1),27-44.
郭宇璇 (2022)。假訊息與事實查核調查出爐|九成民眾收過假消息 五成民眾自認不受騙. https://www.feja.org.tw/63754
楊惟任 (2019)。假新聞的危害與因應。展望與探索月刊,17(12),95-116.
葉乃靜 (2020a)。由新冠病毒 (COVID-19) 防疫機制談假新聞防制。臺北市立圖書館館訊,35(3),90-113.
葉乃靜 (2020b)。後真相時代社群媒體上的假新聞分享行為研究。Journal of Library and Information Science,46(1),96-112.
鍾慧錦 (2018)。拒絕假新聞!LINE轉傳行為之研究(未出版碩士論文)。佛光大學,宜蘭縣。
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-236.
Ball, P., & Maxmen, A. (2020). The epic battle against coronavirus misinformation and conspiracy theories. Nature, 581(7809), 371-375.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Conroy, N. K., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. Proceedings of the association for information science and technology, 52(1), 1-4.
Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of machine learning research, 2(Dec), 265-292.
Cuan-Baltazar, J. Y., Muñoz-Perez, M. J., Robledo-Vega, C., Pérez-Zepeda, M. F., & Soto-Vega, E. (2020). Misinformation of COVID-19 on the internet: infodemiology study. JMIR public health and surveillance, 6(2), e18444.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Espinosa, M. S., Centeno, R., & Rodrigo, Á. (2020). Analyzing User Profiles for Detection of Fake News Spreaders on Twitter. CLEF (Working Notes),
Gaydhani, A., Doma, V., Kendre, S., & Bhagwat, L. (2018). Detecting hate speech and offensive language on twitter using machine learning: An n-gram and tfidf based approach. arXiv preprint arXiv:1809.08651.
Gilda, S. (2017). Notice of Violation of IEEE Publication Principles: Evaluating machine learning algorithms for fake news detection. 2017 IEEE 15th student conference on research and development (SCOReD),
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural networks, 18(5-6), 602-610.
Gundapu, S., & Mamidi, R. (2021). Transformer based automatic COVID-19 fake news detection system. arXiv preprint arXiv:2101.00180.
Gupta, A., Li, H., Farnoush, A., & Jiang, W. (2022). Understanding patterns of COVID infodemic: A systematic and pragmatic approach to curb fake news. Journal of business research, 140, 670-683.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Kadhim, A. I. (2019). Term weighting for feature extraction on Twitter: A comparison between BM25 and TF-IDF. 2019 international conference on advanced science and engineering (ICOASE),
Khan, J. Y., Khondaker, M. T. I., Afroz, S., Uddin, G., & Iqbal, A. (2021). A benchmark study of machine learning models for online fake news detection. Machine Learning with Applications, 4, 100032.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
Lavin, M. (2019). Analyzing documents with TF-IDF. The Programming Historian.
Li, X., Meng, Y., Sun, X., Han, Q., Yuan, A., & Li, J. (2019). Is word segmentation necessary for deep learning of Chinese representations? arXiv preprint arXiv:1905.05526.
Li, Y., Du, G., Xiang, Y., Li, S., Ma, L., Shao, D., Wang, X., & Chen, H. (2020). Towards Chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge. Journal of biomedical informatics, 106, 103435.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Rajaraman, A., & Ullman, J. D. (2011). Mining of massive datasets. Cambridge University Press.
Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics, 8, 842-866.
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.
Smitha, N., & Bharath, R. (2020). Performance comparison of machine learning classifiers for fake news detection. 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA),
Sun, C., Huang, L., & Qiu, X. (2019). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. arXiv preprint arXiv:1903.09588.
V-dem. (2021). https://www.v-dem.net/
Varol, O., Ferrara, E., Davis, C., Menczer, F., & Flammini, A. (2017). Online human-bot interactions: Detection, estimation, and characterization. Proceedings of the international AAAI conference on web and social media
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Vijjali, R., Potluri, P., Kumar, S., & Teki, S. (2020). Two stage transformer model for COVID-19 fake news detection and fact checking. arXiv preprint arXiv:2011.13253.
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151.
Wang, C., Gao, M., He, X., & Zhang, R. (2015). Challenges in chinese knowledge graph construction. 2015 31st IEEE International Conference on Data Engineering Workshops
Wang, W. Y. (2017). " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648.
Yang, W., Xie, Y., Lin, A., Li, X., Tan, L., Xiong, K., Li, M., & Lin, J. (2019). End-to-end open-domain question answering with bertserini. arXiv preprint arXiv:1902.01718.
YarAdua, S. M. (2018). Influence of Digital Images on the Propagation of Fake News on Twitter in Russia and Ukraine Crisis.
Zheng, X., Chen, H., & Xu, T. (2013). Deep learning for Chinese word segmentation and POS tagging. Proceedings of the 2013 conference on empirical methods in natural language processing
Zhou, X., & Zafarani, R. (2020). A survey of fake news: Fundamental theories, detection methods, and opportunities, ACM Computing Surveys (CSUR)
指導教授 許文錦 審核日期 2022-7-18
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明