English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 38468683      線上人數 : 288
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/88349


    題名: 管道式語言轉譯器 之中文健康照護開放資訊擷取;Pipelined Language Transformers for Chinese Healthcare Open Information Extraction
    作者: 鄭少鈞;Cheng, Shao-Chun
    貢獻者: 電機工程學系
    關鍵詞: 轉譯器;開放式資訊擷取;知識圖譜;健康資訊學;Transformers;Open Information Extraction;Knowledge Graph;Health Informatics
    日期: 2022-01-10
    上傳時間: 2022-07-14 00:03:01 (UTC+8)
    出版者: 國立中央大學
    摘要: 開放式資訊擷取的目的是將非結構化的句子,轉化成三元組的形式 (個體1,關係,個體2) ,以 “神經醯胺能夠修復皮脂膜及減緩乾燥”這個句子為例,開放式資訊擷取的模型會從此句子擷取出 (神經醯胺,修復,皮脂膜) 和 (神經醯胺,減緩,乾燥) 這兩個三元組,三元組的形式可以視覺化成知識圖譜,作為問答系統的知識推論基礎。在開放式資訊擷取的研究領域中,我們提出一個名為CHOIE (Chinese Healthcare Open Information Extraction) 的管道式語言轉譯器(pipelined language transformers) 的模型,專注於中文健康照護領域的資訊擷取。CHOIE模型以現今表現優良的RoBERTa自然語言預訓練模型作為基礎架構,搭配不同的神經網路模型抽取特徵,最後加上分類器。本研究將其任務視為兩階段,先抽取三元組中的所有關係,然後以每一個關係為中心找出個體1和個體2,完成三元組之擷取。由於目前缺少公開的中文人工標記的資料集,因此我們透過網路爬蟲,爬取醫療照護類型的文章,人工標記個體關係之後,最終可以將三元組分為四種類型,分別是簡單關係、單一重疊、多元重疊、複雜關係四個種類。藉由實驗結果和錯誤分析,我們可以得知提出的CHOIE管道式語言轉譯器,在開放式資訊擷取的三個評估指標,分別達到最佳效能 Exact Match (F1: 0.848) 、Contain Match (F1: 0.913) 、Token Level Match (F1: 0.925) ,比目前現有的資訊擷取模型 (Multi2OIE、SpanOIE、RNNOIE) 表現較好。;Open Information Extraction (OIE) aims at extracting the triples in terms of (Argument-1, Relation, Argument-2) from unstructured natural language texts. For example, an open IE system may extract the triples such as (Ceramide, repair, sebum) and (Ceramide, relieve, dryness) from the given sentence “Ceramide can repair the sebum and relieve the dryness”. These extracted triples can be visualized as a part of the knowledge graph that may benefit knowledge inferences in the question answering systems. In this study, we propose a pipelined language transformers model called CHOIE (Chinese Healthcare Open Information Extraction). It uses a pipeline of RoBERTa transformers and different neural networks for feature-extracting to extract triples. We regard the Chinese open information extraction as a two-phase task. First, we extract all the relations in a given sentence and then find all the arguments based on each relation. Due to the lack of publicly available datasets that were annotated manually, we construct such a Chinese OIE dataset in the healthcare domain. We firstly crawled articles from websites that provide healthcare information. After pre-processing, we split the remaining texts into several sentences. We randomly selected partial sentences for manual annotation. Finally, our constructed dataset can be further categorized into four distinct groups including simple relations, single overlaps, multiple overlaps, and complicated relations. Based on the experimental results and error analysis, our proposed CHOIE model achieved the best performance in three evaluation metrics: Exact Match (F1: 0.848), Contain Match (F1: 0.913), and Token Level Match (F1: 0.925) that outperforms existing Multi2OIE, SpanOIE, and RNNOIE models.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML58檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明