中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90047
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 40146013      在线人数 : 231
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/90047


    题名: 利用雙重註釋編碼器於中文健康照護實體連結;Leveraging Dual Gloss Encoders in Chinese Healthcare Entity Linking
    作者: 洪滿珍;Hung, Man-Chen
    贡献者: 電機工程學系
    关键词: 實體連結;詞義消歧;語言轉譯器;自然語言理解;健康資訊學;entity linking;word sense disambiguation;language transformers;natural language understanding;health informatics
    日期: 2022-08-25
    上传时间: 2022-10-04 12:09:01 (UTC+8)
    出版者: 國立中央大學
    摘要: 詞義消歧是自然語言理解的一項重要且艱難的任務,尤其是對於醫療領域經常有多
    種語義含意的詞彙。我們提出一個雙重註釋編碼器 (Dual Gloss Encoders, DGE) 模型,
    以 BERT 轉譯器為基礎,將中文句子中健康照護領域的命名實體,連結到多國語言詞彙
    語義網路 BabelNet,以實現上下文感知語義理解。消歧目標詞的每個註釋都源自
    BabelNet,我們將原句嵌入註釋得到語境化的目標詞嵌入向量,目標詞嵌入向量再和每
    個註釋嵌入配對,以計算語義消歧的分數,將分數最高者選為語句中消歧目標詞的註釋
    選項。由於在健康照護領域缺乏中文實體連結數據,我們收集了適當的領域單詞並在句
    子中手動標記它們的註釋。最後,我們總共有 10,218 個句子,包含 40 個不同的消歧目
    標詞和 94 個不同的語義註釋。我們將建構的數據劃分為訓練集 7,109 筆、發展集 979
    筆與測試集 2,130 筆。實驗結果表明,我們提出的 DGE 模型的性能優於三個實體連結
    模型,即 BERTWSD、GlossBERT 與 BEM,獲得了 F1-Score 97.81%。;Word sense disambiguation is an important and difficult task for natural language
    understanding, especially for those lexical words with many semantic meanings in the
    healthcare domain. We propose a BERT transformer based Dual Gloss Encoder (DGE) model
    to link Chinese healthcare entities to the multi-lingual lexical network BabelNet for contextaware semantic understanding. The target word along with its context in original sentence is
    encoded to obtain embedding vector. Each gloss of the target word is originated from BabelNet
    to encode the gloss embedding. Target word embedding and each gloss embedding will be
    paired to calculate the scores for sense disambiguation. The gloss with the highest score is
    returned as predicted gloss for the target word in a given sentence. Due to a lack of Chinese
    entity linking data in the healthcare domain, we collected proper domain-specific words and
    manually annotated their glosses in the sentence. Finally, we have a total of 10,218 sentences
    containing 40 distinct target words with 94 various semantic glosses. Our constructed data was
    divided into three mutually exclusive datasets, including training set (7,109 sentences),
    development set (979 sentences), and test set (2,130 sentences). Experimental results indicate
    that our proposed DGE model performs better than three entity linking models, i.e., BERTWSD,
    GlossBERT and BEM, obtaining the best F1-score of 97.81%.
    显示于类别:[電機工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML29检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明