管道式語言轉譯器 之中文健康照護開放資訊擷取

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：35

、訪客IP：3.133.120.100

姓名

鄭少鈞(Shao-Chun Cheng) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

管道式語言轉譯器之中文健康照護開放資訊擷取
(Pipelined Language Transformers for Chinese Healthcare Open Information Extraction)

相關論文

★ 多重嵌入增強式門控圖序列神經網路之中文健康照護命名實體辨識	★ 基於腦電圖小波分析之中風病人癲癇偵測研究
★ 基於條件式生成對抗網路之資料擴增於思覺失調症自動判別	★ 標籤圖卷積增強式超圖注意力網路之中文健康照護文本多重分類
★ 運用合成器混合注意力改善BERT模型於科學語言編輯	★ 強化領域知識語言模型於中文醫療問題意圖分類
★ 運用句嵌入向量重排序器增進中文醫療問答系統效能	★ 利用雙重註釋編碼器於中文健康照護實體連結
★ 聯合詞性與局部語境於中文健康照護實體關係擷取	★ 運用異質圖注意力網路於中文醫療答案擷取式摘要
★ 學習使用者意圖於中文醫療問題生成式摘要	★ 標籤強化超圖注意力網路模型於精神疾病文本多標籤分類

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-1-7以後開放)

摘要(中)

開放式資訊擷取的目的是將非結構化的句子，轉化成三元組的形式 (個體1，關係，個體2) ，以 “神經醯胺能夠修復皮脂膜及減緩乾燥”這個句子為例，開放式資訊擷取的模型會從此句子擷取出 (神經醯胺，修復，皮脂膜) 和 (神經醯胺，減緩，乾燥) 這兩個三元組，三元組的形式可以視覺化成知識圖譜，作為問答系統的知識推論基礎。在開放式資訊擷取的研究領域中，我們提出一個名為CHOIE (Chinese Healthcare Open Information Extraction) 的管道式語言轉譯器(pipelined language transformers) 的模型，專注於中文健康照護領域的資訊擷取。CHOIE模型以現今表現優良的RoBERTa自然語言預訓練模型作為基礎架構，搭配不同的神經網路模型抽取特徵，最後加上分類器。本研究將其任務視為兩階段，先抽取三元組中的所有關係，然後以每一個關係為中心找出個體1和個體2，完成三元組之擷取。由於目前缺少公開的中文人工標記的資料集，因此我們透過網路爬蟲，爬取醫療照護類型的文章，人工標記個體關係之後，最終可以將三元組分為四種類型，分別是簡單關係、單一重疊、多元重疊、複雜關係四個種類。藉由實驗結果和錯誤分析，我們可以得知提出的CHOIE管道式語言轉譯器，在開放式資訊擷取的三個評估指標，分別達到最佳效能 Exact Match (F1: 0.848) 、Contain Match (F1: 0.913) 、Token Level Match (F1: 0.925) ，比目前現有的資訊擷取模型 (Multi2OIE、SpanOIE、RNNOIE) 表現較好。

摘要(英)

Open Information Extraction (OIE) aims at extracting the triples in terms of (Argument-1, Relation, Argument-2) from unstructured natural language texts. For example, an open IE system may extract the triples such as (Ceramide, repair, sebum) and (Ceramide, relieve, dryness) from the given sentence “Ceramide can repair the sebum and relieve the dryness”. These extracted triples can be visualized as a part of the knowledge graph that may benefit knowledge inferences in the question answering systems. In this study, we propose a pipelined language transformers model called CHOIE (Chinese Healthcare Open Information Extraction). It uses a pipeline of RoBERTa transformers and different neural networks for feature-extracting to extract triples. We regard the Chinese open information extraction as a two-phase task. First, we extract all the relations in a given sentence and then find all the arguments based on each relation. Due to the lack of publicly available datasets that were annotated manually, we construct such a Chinese OIE dataset in the healthcare domain. We firstly crawled articles from websites that provide healthcare information. After pre-processing, we split the remaining texts into several sentences. We randomly selected partial sentences for manual annotation. Finally, our constructed dataset can be further categorized into four distinct groups including simple relations, single overlaps, multiple overlaps, and complicated relations. Based on the experimental results and error analysis, our proposed CHOIE model achieved the best performance in three evaluation metrics: Exact Match (F1: 0.848), Contain Match (F1: 0.913), and Token Level Match (F1: 0.925) that outperforms existing Multi2OIE, SpanOIE, and RNNOIE models.

關鍵字(中)

★ 轉譯器
★ 開放式資訊擷取
★ 知識圖譜
★ 健康資訊學

關鍵字(英)

★ Transformers
★ Open Information Extraction
★ Knowledge Graph
★ Health Informatics

論文目次

摘要 I
ABSTRACT II
致謝 III
圖目錄 VI
表目錄 VII
第一章緒論 1
1-1 研究背景 1
1-2 研究動機與目的 2
1-3 章節概要 4
第二章相關研究 5
2-1 資訊擷取的定義類別 5
2-2 英文資訊擷取資料集 6
2-3 中文資訊擷取資料集 10
2-4 英文開放式資訊擷取模型 12
2-5 中文開放式資訊擷取模型 15
2-6 資料集與模型彙整摘要 16
第三章模型架構 18
3-1 系統架構 18
3-2 任務定義 19
3-3 轉譯器架構 19
3-4 關係辨識模型 21
3-5 個體辨識模型 22
第四章實驗結果 24
4-1 資料集建置 24
4-2 效能指標 29
4-4 模型比較 32
4-5 消融實驗 34
4-6 嵌入向量分析 35
4-7 視覺化三元組系統 37
4-8 錯誤分析 39
4-9 討論 41
第五章結論與未來工作 44
參考資料 45

參考文獻

參考資料
1. Corporation, L. LINE CONVERGE 2019. 2019; Available from: https://linecorp.com/zh-hant/pr/news/zh-hant/2019/2952.
2. Hochreiter, S. and J. Schmidhuber, Long Short-Term Memory. Neural Comput., 1997. 9(8): p. 1735--1780.
3. Vaswani, A., et al., Attention Is All You Need. CoRR, 2017. abs/1706.03762.
4. Pennington, J., R. Socher, and C. Manning. GloVe: Global Vectors for Word Representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. Association for Computational Linguistics.
5. Devlin, J., et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. Association for Computational Linguistics.
6. Liu, Y., et al., RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, 2019. abs/1907.11692.
7. Doddington, G., et al. The Automatic Content Extraction (ACE) Program -- Tasks, Data, and Evaluation. in Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC′04). 2004. European Language Resources Association (ELRA).
8. Carreras, X. and L. Màrquez. Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling. in Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004. 2004. Association for Computational Linguistics.
9. Nakashole, N., G. Weikum, and F. Suchanek. PATTY: A Taxonomy of Relational Patterns with Semantic Types. in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012. Association for Computational Linguistics.
10. Fader, A., S. Soderland, and O. Etzioni. Identifying Relations for Open Information Extraction. in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011. Association for Computational Linguistics.
11. Gashteovski, K., et al., OPIEC: An Open Information Extraction Corpus. CoRR, 2019. abs/1904.12324.
12. Eberts, M. and A. Ulges, Span-based Joint Entity and Relation Extraction with Transformer Pre-training. CoRR, 2019. abs/1909.07755.
13. Gashteovski, K., R. Gemulla, and L. del Corro. MinIE: Minimizing Facts in Open Information Extraction. in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. Association for Computational Linguistics.
14. Mausam, M. Open Information Extraction Systems and Downstream Applications. in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2016. AAAI Press.
15. Stanovsky, G. and I. Dagan. Creating a Large Benchmark for Open Information Extraction. in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. Association for Computational Linguistics.
16. Schneider, R., et al., Analysing Errors of Open Information Extraction Systems. CoRR, 2017. abs/1707.07499.
17. Lechelle, W., F. Gotti, and P. Langlais. WiRe57 : A Fine-Grained Benchmark for Open Information Extraction. in Proceedings of the 13th Linguistic Annotation Workshop. 2019. Association for Computational Linguistics.
18. Bhardwaj, S., S. Aggarwal, and M. Mausam. CaRB: A Crowdsourced Benchmark for Open IE. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. Association for Computational Linguistics.
19. Xu, J., et al. A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text. 2017.
20. Qiu, L. and Y. Zhang. ZORE: A Syntax-based System for Chinese Open Relation Extraction. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. Association for Computational Linguistics.
21. Tseng, Y.-H., et al. Chinese Open Relation Extraction for Knowledge Acquisition. in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers. 2014. Association for Computational Linguistics.
22. Jia, S., et al., Chinese Open Relation Extraction and Knowledge Base Establishment. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 2018. 17(3).
23. Stanovsky, G., et al. Supervised Open Information Extraction. in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. Association for Computational Linguistics.
24. Ro, Y., Y. Lee, and P. Kang. Multi^2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT. in Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. Association for Computational Linguistics.
25. Lee, L.-H. and Y. Lu, Multiple Embeddings Enhanced Multi-Graph Neural Networks for Chinese Healthcare Named Entity Recognition. IEEE Journal of Biomedical and Health Informatics, 2021: p. 1-1.

指導教授

李龍豪(Lung-Hao Lee)

審核日期

2022-1-10

推文