PTT災害事件擷取系統;PTT Disaster Events Extraction System

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/74786

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/74786

Title:	PTT災害事件擷取系統;PTT Disaster Events Extraction System
Authors:	蔣佳峰;Chiang, Chia-Feng
Contributors:	資訊工程學系
Keywords:	命名實體擷取;災害事件;資訊擷取;NER;Disaster Events;Information Extraction
Date:	2017-08-24
Issue Date:	2017-10-27 14:39:20 (UTC+8)
Publisher:	國立中央大學
Abstract:	台灣屬於較常遭受天然災害侵襲的國家，夏季常遭遇的颱風與不定期發生的地震，均影響民眾生活甚鉅。當這些天災達到警戒範圍時，救災單位必須儘速掌握災情，並調派相關資源前往災區救援。然而，為能有效降低災後損失，亦須仰賴災區民眾的主動回報，將迫切的災情資訊傳遞至救災單位。在災害發生時，這些災情資訊一般透過電話，向救災單位傳遞。值得重視的是，通常在災害發生後，災情回報往往呈現爆炸性的增加，如救災單位接聽人手不足，或將成為迅速掌握災情的窒礙。隨著網路通訊的蓬勃發展，3C產品的普及率已逐漸提升，民眾從網路上交換訊息也更加便利，災害發生當下，這些災情資訊也可能在社群網路間流動。因此，我們另闢一個獲取災情資訊的管道：從社群媒體中獲取災情資訊。此一任務涉及資訊擷取（Information Extraction）的技術，從非結構化的文字資料擷取出特定訊息，並儲存於資料庫中。在本論文中，我們建立一個PTT災害事件擷取系統，使用批踢踢實業坊做為資訊來源，透過網路爬蟲定期抓取民眾發表的文章內容，並使用命名實體辨識（Named entity recognition）擷取出「災害名稱」、「災害地點」及「災情敘述」等災情資訊，以建立災害事件報告。本論文分為三個部分，第一部分為文章前處理作業，透過網路爬蟲分析PTT網頁版的HTML結構，從台灣各地看板及八卦板定期抓取大量文章並儲存。第二部分為文章分類，使用自動化方式從訓練資料獲取分類用特徵，透過SVM建立分類模型，並將大量的文章過濾出有效的災情相關貼文。第三部分為命名實體擷取，透過中央大學WIDM實驗室提供的NER_Tool，使用條件隨機域（Conditional Random Field）做為演算法，以此建立災害名稱、災害地點及災情敘述等三個辨識模型。根據實驗結果顯示：經人工標記後的測試資料比較，各模型在Exact Match皆有F-Measure高於0.7的成果，而Partial Match的F-Measure皆高於0.75。 ;Taiwan is the country which is often affected by natural disasters such as typhoon and earthquake. When these natural disasters reach the scope of alert, the disaster relief units must quickly grasp the information. In order to effectively reduce the losses, we must also rely on the active report of the people in disaster areas. In the event of a disaster, these disaster information is generally transmitted by telephone to the disaster relief unit. It is worth noting that, the reports of the disaster appear explosively. Relief units hard to handle great amount of reports with the lack of manpower. The fact becomes the bottleneck of grasping disaster information. With the development of Internet, 3C product penetration has been gradually improved. It is more convenient to exchange information from the Internet. When the disaster occurs, disaster information may also be exchanged. As a result, we have an another way getting disaster information: access to disaster information from social network.This task involves information extraction technology, from the unstructured text information to extract the specific message, and stored in the database. In this paper, we set up a PTT disaster event extraction system, using the PTTWeb as a source of information, crawling regularly through the web crawler, and using Named entity recognition Identify disaster information such as "disaster name", "disaster location" and "damage description" to establish disaster reports. This paper is divided into three parts. The first part of the article is pre-processing operations. Using web crawler to fetch PTT posts. The second part is the classification of articles, by using SVM to build a classification model in order to filter out disaster related posts. The third part is the named entity recognition. The training tool is proposed by the NCU WIDM lab. Conditional random field is used as the training algorithm. We have built three models including, disaster name, disaster location and damage description. In experiments, those models in exact match test can get the result with F-Measure higher than 0.7, and F-Measure higher than 0.75 in partial match test.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	1022	View/Open

社群 sharing

Loading...