中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/84225
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38580299      Online Users : 556
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/84225


    Title: 多重嵌入增強式門控圖序列神經網路之中文健康照護命名實體辨識;Multiple Embeddings Enhanced Gated Graph Sequence Neural Networks for Chinese Healthcare Named Entity Recognition
    Authors: 盧毅;Lu, Yi
    Contributors: 電機工程學系
    Keywords: 嵌入向量;圖神經網路;命名實體辨識;資訊擷取;健康資訊學;embedding representation;graph neural networks;named entity recognition;information extraction;health informatics
    Date: 2020-08-20
    Issue Date: 2020-09-02 18:31:09 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 命名實體辨識任務的目標是從非結構化的輸入文本中,抽取出關注的命名實體,例如:人名、地名、組織名、日期、時間等專有名詞,擷取的命名實體,可以做為關係擷取、事件偵測與追蹤、知識圖譜建置、問答系統等應用的基礎。機器學習的方法將其視為序列標註問題,透過大規模語料學習標註模型,對句子的各個字元位置進行標註。我們提出一個多重嵌入增強式門控圖序列神經網路 (Multiple Embeddings Enhanced Gated Graph Sequence Neural Network, ME-GGSNN) 模型,用於中文健康照護領域命名實體辨識,我們整合詞嵌入以及部首嵌入的資訊,建構多重嵌入的字嵌入向量,藉由調適門控圖序列神經網路,融入已知字典中的命名實體資訊,然後銜接雙向長短期記憶類神經網路與條件隨機場域,對中文句子中的字元序列標註。
    我們透過網路爬蟲蒐集健康照護相關內容的網路文章以及醫療問答紀錄,然後隨機抽取中文句子做人工斷詞與命名實體標記,句子總數為 30,692句 (約150萬字/91.7萬詞),共有68,460命名實體,包含10個命名實體種類:人體、症狀、醫療器材、檢驗、化學物質、疾病、藥品、營養品、治療與時間。藉由實驗結果與錯誤分析得知,我們提出的模型達到最好的F1-score 75.69%,比相關研究模型 (BiLSTM-CRF, BERT, Lattice, Gazetteers以及ME-CNER)表現好,且為效能與效率兼具的中文健康照護命名實體辨識方法。
    ;Named Entity Recognition (NER) focuses on locating the mentions of name entities and classifying their types, usually referring to proper nouns such as persons, places, organizations, dates, and times. The NER results can be used as the basis for relationship extraction, event detection and tracking, knowledge graph building, and question answering system. NER studies usually regard this research topic as a sequence labeling problem and learns the labeling model through the large-scale corpus. We propose a ME-GGSNN (Multiple Embeddings enhanced Gated Graph Sequence Neural Networks) model for Chinese healthcare NER. We derive a character representation based on multiple embeddings in different granularities from the radical, character to word levels. An adapted gated graph sequence neural network is involved to incorporate named entity information in the dictionaries. A standard BiLSTM-CRF is then used to identify named entities and classify their types in the healthcare domain.
    We firstly crawled articles from websites that provide healthcare information, online health-related news and medical question/answer forums. We then randomly selected partial sentences to retain content diversity. It includes 30,692 sentences with a total of around 1.5 million characters or 91.7 thousand words. After manual annotation, we have 68,460 named entities across 10 entity types: body, symptom, instrument, examination, chemical, disease, drug, supplement, treatment, and time. Based on further experiments and error analysis, our proposed method achieved the best F1-score of 75.69% that outperforms previous models including the BiLSTM-CRF, BERT, Lattice, Gazetteers, and ME-CNER. In summary, our ME-GGSNN model is an effective and efficient solution for the Chinese healthcare NER task.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML168View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明