利用雙重註釋編碼器於中文健康照護實體連結

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	洪滿珍	zh_TW
DC.creator	Man-Chen Hung	en_US
dc.date.accessioned	2022-8-25T07:39:07Z
dc.date.available	2022-8-25T07:39:07Z
dc.date.issued	2022
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=109521068
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	詞義消歧是自然語言理解的一項重要且艱難的任務，尤其是對於醫療領域經常有多種語義含意的詞彙。我們提出一個雙重註釋編碼器 (Dual Gloss Encoders, DGE) 模型，以 BERT 轉譯器為基礎，將中文句子中健康照護領域的命名實體，連結到多國語言詞彙語義網路 BabelNet，以實現上下文感知語義理解。消歧目標詞的每個註釋都源自 BabelNet，我們將原句嵌入註釋得到語境化的目標詞嵌入向量，目標詞嵌入向量再和每個註釋嵌入配對，以計算語義消歧的分數，將分數最高者選為語句中消歧目標詞的註釋選項。由於在健康照護領域缺乏中文實體連結數據，我們收集了適當的領域單詞並在句子中手動標記它們的註釋。最後，我們總共有 10,218 個句子，包含 40 個不同的消歧目標詞和 94 個不同的語義註釋。我們將建構的數據劃分為訓練集 7,109 筆、發展集 979 筆與測試集 2,130 筆。實驗結果表明，我們提出的 DGE 模型的性能優於三個實體連結模型，即 BERTWSD、GlossBERT 與 BEM，獲得了 F1-Score 97.81%。	zh_TW
dc.description.abstract	Word sense disambiguation is an important and difficult task for natural language understanding, especially for those lexical words with many semantic meanings in the healthcare domain. We propose a BERT transformer based Dual Gloss Encoder (DGE) model to link Chinese healthcare entities to the multi-lingual lexical network BabelNet for contextaware semantic understanding. The target word along with its context in original sentence is encoded to obtain embedding vector. Each gloss of the target word is originated from BabelNet to encode the gloss embedding. Target word embedding and each gloss embedding will be paired to calculate the scores for sense disambiguation. The gloss with the highest score is returned as predicted gloss for the target word in a given sentence. Due to a lack of Chinese entity linking data in the healthcare domain, we collected proper domain-specific words and manually annotated their glosses in the sentence. Finally, we have a total of 10,218 sentences containing 40 distinct target words with 94 various semantic glosses. Our constructed data was divided into three mutually exclusive datasets, including training set (7,109 sentences), development set (979 sentences), and test set (2,130 sentences). Experimental results indicate that our proposed DGE model performs better than three entity linking models, i.e., BERTWSD, GlossBERT and BEM, obtaining the best F1-score of 97.81%.	en_US
DC.subject	實體連結	zh_TW
DC.subject	詞義消歧	zh_TW
DC.subject	語言轉譯器	zh_TW
DC.subject	自然語言理解	zh_TW
DC.subject	健康資訊學	zh_TW
DC.subject	entity linking	en_US
DC.subject	word sense disambiguation	en_US
DC.subject	language transformers	en_US
DC.subject	natural language understanding	en_US
DC.subject	health informatics	en_US
DC.title	利用雙重註釋編碼器於中文健康照護實體連結	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Leveraging Dual Gloss Encoders in Chinese Healthcare Entity Linking	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 109521068 完整後設資料紀錄