博碩士論文 108521106 完整後設資料紀錄

DC 欄位 語言
DC.contributor電機工程學系zh_TW
DC.creator陳柏翰zh_TW
DC.creatorPo-Han Chenen_US
dc.date.accessioned2021-10-27T07:39:07Z
dc.date.available2021-10-27T07:39:07Z
dc.date.issued2021
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=108521106
dc.contributor.department電機工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract多分類文本分類旨在自動將輸入實例歸納至預先定義好的分類中,該方法可用於眾多應用情境,例如:情感分析、聊天機器人、問答系統、電商產品分類和過濾資料等。本研究的主要目標為歸納非結構化的中文醫療問題至正確的分類中,我們可以將分類資訊視為醫療知識特徵,有助於機器理解問題語意內涵,並可做為自動問答系統的基礎。近年來,在基於深度學習的方法中,最被廣泛使用的模型架構為轉譯器 (Transformers),這些模型有效地捕獲了廣域語意資訊與結構句法,在許多自然語言處理任務得到好的效能表現。因此,我們以兩階段領域知識強化機制為基礎,改善三種主流預訓練模型,並提出EKG-Transformers (Encyclopedia enhanced pre-training with Knowledge Graph fine-tuning Transformers ) 模型,用於中文醫療問題意圖分類,我們將醫學百科 (Encyclopedia)蒐集的層級資料訓練於語言模型上,進一步將醫學領域的階層資訊,例如:疾病的症狀與檢測方式、治療方法的注意事項與副作用、藥物的用法與用量等,導入語言模型中,微調時加入建構的知識圖譜 (Knowledge Graph) 三元組,賦予關係網路給字序列中的命名實體,並將字序列轉化成句圖 (Sentence Graph),讓模型在遇到需要知識驅動的序列時,能給予更好的語言表徵及分類。本研究使用了醫療問題意圖分類資料集 (Chinese Medical Intent Dataset, CMID),該資料集歸納出了4個分類:病症、藥物、治療和其他,與涵蓋於其下的36個子分類,總共包含約12,000則的醫療問題,並標註了分詞與命名實體結果。藉由實驗結果與錯誤分析得知,我們提出的EKG-MacBERT模型達到最好的Micro F1-score 74.50%,比相關研究模型 (MacBERT, RoBERTa, BERT, TextCNN, TextRNN, TextGCN與FastText) 表現好,並為中文醫療問題意圖分類提出一個效能解決方案。zh_TW
dc.description.abstractOur main research objective focuses on classifying unstructured Chinese medical questions into one of the pre-defined categories. Recently, the most widely used model architecture is Transformer, which effectively captures semantic and structural syntaxes to achieve promising results in many natural language processing tasks. We improve three mainstream pre-training models based on the two-stage domain knowledge enhancement mechanisms. We propose the EKG-Transformers (Encyclopedia enhanced pre-training with Knowledge Graph fine-tuning Transformers) for user intent classification of Chinese medical questions. During the pre-training phase, we ingrain hierarchical healthcare information, such as the symptoms and diagnoses of a disease, the precautions and side-effects of treatment, and usage and dosage of a drug in the language model. During the fine-tuning phase, a word sequence is endowed with the relation network and further converted into sentence graphs with the injection of triples related to the named entities from the knowledge graph. Experimental data came from the Chinese Medical Intent Dataset (CMID), which included manually annotated users’ intents (in 4 categories and 36 sub-categories), along with word segmentation and named entity results with a total of around 12,000 medical questions. Based on further experiments and the error analysis, EKG-MacBERT achieved the best F1-score of 74.50% that outperforms previous models including the MacBERT, RoBERTa, BERT, TextCNN, TextRNN, TextGCN, and FastText. In summary, our EKG-Transformers model brings forward an effective way to solve the problem of Medical Question Intent Classification.en_US
DC.subject領域知識擷取zh_TW
DC.subject預訓練語言模型zh_TW
DC.subject百科全書zh_TW
DC.subject知識圖譜zh_TW
DC.subject多元分類zh_TW
DC.subjectdomain knowledge extractionen_US
DC.subjectpre-trained language modelsen_US
DC.subjectencyclopediaen_US
DC.subjectknowledge graphen_US
DC.subjectmulti-class classificationen_US
DC.title強化領域知識語言模型於中文醫療問題意圖分類zh_TW
dc.language.isozh-TWzh-TW
DC.titleIngraining Domain knowledge in Language Models for Chinese Medical Question Intent Classificationen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明