中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/88349
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 41642735      Online Users : 1320
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/88349


    Title: 管道式語言轉譯器 之中文健康照護開放資訊擷取;Pipelined Language Transformers for Chinese Healthcare Open Information Extraction
    Authors: 鄭少鈞;Cheng, Shao-Chun
    Contributors: 電機工程學系
    Keywords: 轉譯器;開放式資訊擷取;知識圖譜;健康資訊學;Transformers;Open Information Extraction;Knowledge Graph;Health Informatics
    Date: 2022-01-10
    Issue Date: 2022-07-14 00:03:01 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 開放式資訊擷取的目的是將非結構化的句子,轉化成三元組的形式 (個體1,關係,個體2) ,以 “神經醯胺能夠修復皮脂膜及減緩乾燥”這個句子為例,開放式資訊擷取的模型會從此句子擷取出 (神經醯胺,修復,皮脂膜) 和 (神經醯胺,減緩,乾燥) 這兩個三元組,三元組的形式可以視覺化成知識圖譜,作為問答系統的知識推論基礎。在開放式資訊擷取的研究領域中,我們提出一個名為CHOIE (Chinese Healthcare Open Information Extraction) 的管道式語言轉譯器(pipelined language transformers) 的模型,專注於中文健康照護領域的資訊擷取。CHOIE模型以現今表現優良的RoBERTa自然語言預訓練模型作為基礎架構,搭配不同的神經網路模型抽取特徵,最後加上分類器。本研究將其任務視為兩階段,先抽取三元組中的所有關係,然後以每一個關係為中心找出個體1和個體2,完成三元組之擷取。由於目前缺少公開的中文人工標記的資料集,因此我們透過網路爬蟲,爬取醫療照護類型的文章,人工標記個體關係之後,最終可以將三元組分為四種類型,分別是簡單關係、單一重疊、多元重疊、複雜關係四個種類。藉由實驗結果和錯誤分析,我們可以得知提出的CHOIE管道式語言轉譯器,在開放式資訊擷取的三個評估指標,分別達到最佳效能 Exact Match (F1: 0.848) 、Contain Match (F1: 0.913) 、Token Level Match (F1: 0.925) ,比目前現有的資訊擷取模型 (Multi2OIE、SpanOIE、RNNOIE) 表現較好。;Open Information Extraction (OIE) aims at extracting the triples in terms of (Argument-1, Relation, Argument-2) from unstructured natural language texts. For example, an open IE system may extract the triples such as (Ceramide, repair, sebum) and (Ceramide, relieve, dryness) from the given sentence “Ceramide can repair the sebum and relieve the dryness”. These extracted triples can be visualized as a part of the knowledge graph that may benefit knowledge inferences in the question answering systems. In this study, we propose a pipelined language transformers model called CHOIE (Chinese Healthcare Open Information Extraction). It uses a pipeline of RoBERTa transformers and different neural networks for feature-extracting to extract triples. We regard the Chinese open information extraction as a two-phase task. First, we extract all the relations in a given sentence and then find all the arguments based on each relation. Due to the lack of publicly available datasets that were annotated manually, we construct such a Chinese OIE dataset in the healthcare domain. We firstly crawled articles from websites that provide healthcare information. After pre-processing, we split the remaining texts into several sentences. We randomly selected partial sentences for manual annotation. Finally, our constructed dataset can be further categorized into four distinct groups including simple relations, single overlaps, multiple overlaps, and complicated relations. Based on the experimental results and error analysis, our proposed CHOIE model achieved the best performance in three evaluation metrics: Exact Match (F1: 0.848), Contain Match (F1: 0.913), and Token Level Match (F1: 0.925) that outperforms existing Multi2OIE, SpanOIE, and RNNOIE models.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML54View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明