中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/74786
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78818/78818 (100%)
Visitors : 34730285      Online Users : 1004
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/74786


    Title: PTT災害事件擷取系統;PTT Disaster Events Extraction System
    Authors: 蔣佳峰;Chiang, Chia-Feng
    Contributors: 資訊工程學系
    Keywords: 命名實體擷取;災害事件;資訊擷取;NER;Disaster Events;Information Extraction
    Date: 2017-08-24
    Issue Date: 2017-10-27 14:39:20 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 台灣屬於較常遭受天然災害侵襲的國家,夏季常遭遇的颱風與不定期發生的地震,均影響民眾生活甚鉅。當這些天災達到警戒範圍時,救災單位必須儘速掌握災情,並調派相關資源前往災區救援。然而,為能有效降低災後損失,亦須仰賴災區民眾的主動回報,將迫切的災情資訊傳遞至救災單位。在災害發生時,這些災情資訊一般透過電話,向救災單位傳遞。值得重視的是,通常在災害發生後,災情回報往往呈現爆炸性的增加,如救災單位接聽人手不足,或將成為迅速掌握災情的窒礙。
    隨著網路通訊的蓬勃發展,3C產品的普及率已逐漸提升,民眾從網路上交換訊息也更加便利,災害發生當下,這些災情資訊也可能在社群網路間流動。因此,我們另闢一個獲取災情資訊的管道:從社群媒體中獲取災情資訊。
    此一任務涉及資訊擷取(Information Extraction)的技術,從非結構化的文字資料擷取出特定訊息,並儲存於資料庫中。在本論文中,我們建立一個PTT災害事件擷取系統,使用批踢踢實業坊做為資訊來源,透過網路爬蟲定期抓取民眾發表的文章內容,並使用命名實體辨識(Named entity recognition)擷取出「災害名稱」、「災害地點」及「災情敘述」等災情資訊,以建立災害事件報告。
    本論文分為三個部分,第一部分為文章前處理作業,透過網路爬蟲分析PTT網頁版的HTML結構,從台灣各地看板及八卦板定期抓取大量文章並儲存。第二部分為文章分類,使用自動化方式從訓練資料獲取分類用特徵,透過SVM建立分類模型,並將大量的文章過濾出有效的災情相關貼文。第三部分為命名實體擷取,透過中央大學WIDM實驗室提供的NER_Tool,使用條件隨機域(Conditional Random Field)做為演算法,以此建立災害名稱、災害地點及災情敘述等三個辨識模型。根據實驗結果顯示:經人工標記後的測試資料比較,各模型在Exact Match皆有F-Measure高於0.7的成果,而Partial Match的F-Measure皆高於0.75。
    ;Taiwan is the country which is often affected by natural disasters such as typhoon and earthquake. When these natural disasters reach the scope of alert, the disaster relief units must quickly grasp the information. In order to effectively reduce the losses, we must also rely on the active report of the people in disaster areas. In the event of a disaster, these disaster information is generally transmitted by telephone to the disaster relief unit. It is worth noting that, the reports of the disaster appear explosively. Relief units hard to handle great amount of reports with the lack of manpower. The fact becomes the bottleneck of grasping disaster information.
    With the development of Internet, 3C product penetration has been gradually improved. It is more convenient to exchange information from the Internet. When the disaster occurs, disaster information may also be exchanged. As a result, we have an another way getting disaster information: access to disaster information from social network.This task involves information extraction technology, from the unstructured text information to extract the specific message, and stored in the database. In this paper, we set up a PTT disaster event extraction system, using the PTTWeb as a source of information, crawling regularly through the web crawler, and using Named entity recognition Identify disaster information such as "disaster name", "disaster location" and "damage description" to establish disaster reports.
    This paper is divided into three parts. The first part of the article is pre-processing operations. Using web crawler to fetch PTT posts. The second part is the classification of articles, by using SVM to build a classification model in order to filter out disaster related posts. The third part is the named entity recognition. The training tool is proposed by the NCU WIDM lab. Conditional random field is used as the training algorithm. We have built three models including, disaster name, disaster location and damage description. In experiments, those models in exact match test can get the result with F-Measure higher than 0.7, and F-Measure higher than 0.75 in partial match test.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML1005View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明