機器學習分類防疫新聞

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：27

、訪客IP：18.216.62.43

姓名

劉冠麟(Kuan-Lin Liu) 查詢紙本館藏

畢業系所

通訊工程學系在職專班

論文名稱

機器學習分類防疫新聞
(A Study on Text Classification for epidemic prevention News)

相關論文

★ 利用二元關聯法之簡易指紋辨識	★ 使用MMSE等化器的Filterbank OFDM系統探討
★ Kalman Filtering應用於可適性載波同步系統之研究	★ 無線區域網路之MIMO-OFDM系統設計與電路實現
★ 包含通道追蹤之IEEE 802.11a接收機設計與電路實現	★ 時變通道下的OFDM傳輸系統設計: 基於IEEE 802.11a標準
★ MIMO-OFDM系統各天線間載波頻率偏差之探討與收發機硬體實現	★ 使用雜散式領航訊號之DVB-T系統通道估測演算法與電路實現
★ 數位地面視訊廣播系統同步模組之設計與電路實現	★ 適用於移動式正交分頻多工通訊系統的改良型時域通道響應追蹤演算法
★ 正交分頻多工系統通道估測基於可適性模型化通道參數估測	★ 以共同項載波頻率偏移補償於正交分頻多重存取系統中減少多重存取干擾之方法
★ 正交分頻多工系統之資料訊號裁剪雜訊消除	★ 適用於正交分頻多工通訊系統的改良型決策反饋之卡爾曼濾波通道估測器
★ 半盲目通道追蹤演算法使用於正交分頻多工系統	★ 正交分頻多重存取以共同項載波頻率偏移補償以達到最小均方誤差之方法

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-15以後開放)

摘要(中)

2019年12月於中國大陸湖北武漢地區，發現新型冠狀病毒，隨後在2020年初迅速蔓延至全球，逐漸造成全球性的大瘟疫，被多個國際組織及新聞媒體形容是多個國際組織及傳媒形容為自第二次世界大戰以來全球面臨的最嚴峻危機。截至2020年5月，全球已有220多個國家和地區累計報告逾471萬名確診病例，逾35萬名患者死亡。
本文於新冠肺炎全球大流行的背景，在台灣每日約有一半以上的新聞報導皆與新冠肺炎或是防疫知識相關，在本篇研究，我們利用決策樹、支援向量機、隨機森林、樸素貝氏分等分類器來對分類防疫新聞，本研究分類防疫新聞和其他新聞，對於只有兩種分類的情況下雜訊是非常嚴重對於隨機森林或是樸素貝氏的正確率會有一定的影響，實驗結果：決策樹有最好的效果(精確度：0.927)。

摘要(英)

The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing pandemic of coronavirus disease 2019 (COVID‑19), caused by severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2). The outbreak was first identified in Wuhan, China , in December 2019. The World Health Organization declared the outbreak a Public Health Emergency of International Concern on 30 January, and a pandemic on 11 March. As of May 2020, more than 4.71 million cases of COVID-19 have been reported in more than 188 countries and territories, resulting in more than 315,000 deaths. More than 1.73 million people have recovered from the virus. this paper is based on the global pandemic of COVID‑19. About half of the daily news reports in Taiwan are related to COVID‑19 or epidemic prevention knowledge. This thesis studies different classification methods for the COVID-19 epidemic prevention news. Based on practical news data collected from web pages, our simulation results show that the decision tree method achieves the best
classification result with an accuracy of 0.927.

關鍵字(中)

★ 機器學習
★ 文本分類
★ 新聞分類

關鍵字(英)

★ Machine learning
★ Text Classification
★ News Classification

論文目次

摘要 i
Abstract ii
誌謝　　　　　　 iii
圖目錄 vi
表目錄 viii
1. 緒論 - 1 -
1.1. 研究背景 - 1 -
1.2. 文獻探討 - 1 -
1.3. 章節架構 - 4 -
1.4. 新冠肺炎 COVID-19 - 4 -
2. 背景說明 - 6 -
2.1. Weka - 6 -
2.2. Visual Studio Code - 6 -
2.3. Python - 6 -
2.4. SQL-Lite - 7 -
2.5. 結巴(jieba) - 7 -
2.6. CKIP斷詞系統 - 7 -
2.7. TF-IDF(Term Frequency Inverse Document Frequency) - 8 -
2.8. 詞嵌入(word embedding) - 9 -
2.9. 決策樹(Decision Tree) - 9 -
2.10. 隨機森林(Random Forest) - 10 -
2.11. 樸素貝氏分類器(Naïve Bayesian Classifier) - 10 -
2.12. 支援向量(Support Vector Machine-SVM) - 11 -
3. 研究內容與方法 - 12 -
3.1. 研究架構(Research framework) - 12 -
3.2. 爬蟲(web crawler) - 12 -
3.3. 結巴斷詞(Jieba) - 15 -
3.4. CKIP斷詞 - 18 -
3.5. 文本預處理(pre-processing) - 19 -
3.6. 刪除停用詞(Delete Stop Words) - 22 -
3.7. 詞嵌入(word embedding) - 23 -
3.8. 決策樹(Decision Tree) - 26 -
3.9. 隨機森林(Random Forest) - 27 -
3.10. 樸素貝氏分類器(Naïve Bayesian Classifier) - 28 -
3.11. 支援向量(Support Vector Machine-SVM) - 28 -
4. 實驗結果 - 34 -
4.1. 評估方式 - 34 -
4.2. 實驗資料 - 37 -
4.3. 決策樹(Decision Tree)分類結果 - 39 -
4.4. 隨機森林(Random Forest) 分類結果 - 39 -
4.5. 樸素貝氏(Naïve Bayes) 分類結果 - 40 -
4.6. 支援向量機(SVM) 分類結果 - 41 -
4.7. CKIP斷詞與結巴斷詞實驗分類結果比較 - 42 -
4.8. 實驗結論 - 45 -
5. 結論 - 47 -
5.1. 總結 - 47 -
5.2. 未來展望 - 47 -
6. 參考資料 - 49 -
7. 附件 - 52 -
7.1. WEKA 訓練分類器操作 - 52 -
7.2. 決策樹(Decision Tree)參數設定 - 55 -
7.3. 決策樹ROC-Aera、PRC-Aera - 56 -
7.4. 隨機森林(Random Forest)參數設定 - 58 -
7.5. 隨機森林(Random Forest)ROC曲線、PR曲線 - 59 -
7.6. 單純貝式分類(Naïve Bayes)參數設定 - 61 -
7.7. 單純貝式分類(Naïve Bayes) ROC曲線、PR曲線 - 62 -
7.8. 支援向量(SVM)參數設定 - 64 -
7.9. 支援向量機(SVM) ROC曲線、PR曲線 - 65 -

參考文獻

[1] 衛生福利部-衛授疾字第 1090100030 號公告

[2] weka-wiki https://zh.wikipedia.org/wiki/Weka

[3] visual studio code https://azure.microsoft.com/zh-tw/products/visual-studio-code/

[4] Python https://zh.wikipedia.org/zh-tw/Python

[5] SQL-Lite https://zh.wikipedia.org/zh-tw/SQLite

[6] Jieba https://github.com/fxsjy/jieba/wiki

[7]決策樹 https://zh.wikipedia.org/wiki/%E5%86%B3%E7%AD%96%E6%A0%91

[8]隨機森林https://zh.wikipedia.org/zhtw/%E9%9A%8F%E6%9C%BA%E6%A3%AE%E6%9E%97

[9]樸素貝氏 https://zh.wikipedia.org/wiki/%E6%9C%B4%E7%B4%A0%E8%B4%9D%E5%8F%B6%E6%96%AF%E5%88%86%E7%B1%BB%E5%99%A8

[10]支援向量機
https://zh.wikipedia.org/wiki/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA

[11]李宏毅教授,台灣大學-機器學習課程講義https://speech.ee.ntu.edu.tw/~tlkagk/courses_ML19.html

[12] 爬蟲教學 CrawlerTutorial- https://github.com/leVirve/CrawlerTutorial

[13] 陳鄞,哈爾濱工業大學自然語言處理課程 https://slidesplayer.com/slide/11334254/

[14] Qian-Xiang Lin , Chia-Hui Chang , and Chen-Ling Che,A Simple and Effective Closed Test for Chinese Word Segmentation Based on Sequence Labeling Computational Linguistics and Chinese Language Processing Vol. 15, No. 3-4, September/December 2010, pp. 161-180

[15] Tomas Mikolov Google Inc.,Mountain View,CA Efficient Estimation of Word Representations in Vector Space

[16] Maosong Sun,Jingyang Li, Zhipeng Guo,Yu Zhao,Yabin Zheng, Xiance Si, Zhiyuan Liu. THUCTC: An Efficient Chinese Text Classifier. 2016.

[17] Wei-Yun Ma, Keh-Jiann Chen IJALP,Design of CKIP Chinese Word Segmentation System Vol. 14, No. 3, pp. 235–249, May 2004

[18] ZHILIANG ZHU, JIE LIANG, DEYANG LI , HAI YU , Hot Topic Detection Based on a Refined TF-IDF Algorithm AND GUOQI LIU Software College, Northeastern University, Shenyang 110169, China

[19] JINGANG LIU, CHUNHE XIA , HAIHUA YAN , ZHIPU XIE , AND JIE SUN Hierarchical Comprehensive Context Modeling for Chinese Text Classification Received September 11, 2019, accepted October 15, 2019, date of publication October 23, 2019, date of current version November 4, 2019.

[20] Fang Miao, Pu Zhang, Libiao Jin, Hongda Wu ,Chinese News Text Classification Based on Machine learning algorithm 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics

[21] 鄭亦渟,新聞分類方法之比較及推薦系統設計與實作,國立中正大學資訊工程研究所碩士論文

[22] 鍾智孫,PTT網站餐廳美食類別擷取之研究, 國立中央大學資訊工程系碩士論文

[23]Chien-Lung Chou and Chia-Hui Chang and Ya-Yun Huang, " Boosted Web Named Entity Recognition via Tri-Training", ACM Trans. Asian Low-Resour. Lang. Inf. Process. , Vol 16, pp. 10:1--10:23, December 2016.

[23] Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim and Sung Hyon Myaeng, "Some Effective Techniques for Naive Bayes Text Classification," in IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1457-1466, Nov. 2006, doi: 10.1109/TKDE.2006.180.

[24] Tin Kam Ho, "Random decision forests," Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, Quebec, Canada, 1995, pp. 278-282 vol.1, doi: 10.1109/ICDAR.1995.598994.

[25]W. Zhao, G. Zhang, G. Yuan, J. Liu, H. Shan and S. Zhang, "The Study on the Text Classification for Financial News Based on Partial Information," in IEEE Access, vol. 8, pp. 100426-100437, 2020, doi: 10.1109/ACCESS.2020.2997969.

[26] Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

[27] D. Isa, L. H. Lee, V. P. Kallimani and R. RajKumar, "Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine," in IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 9, pp. 1264-1272, Sept. 2008, doi: 10.1109/TKDE.2008.76.

[28] CKIP LAB 中文斷詞小組 https：//ckip.iis.sinica.edu.tw/demo/

[29] TF-IDF https：//zh.wikipedia.org/wiki/Tf-idf

指導教授

張大中(Dah-Chung Chang)

審核日期

2020-7-23

推文