摘要(英) |
Open Information Extraction (OIE) aims at extracting the triples in terms of (Argument-1, Relation, Argument-2) from unstructured natural language texts. For example, an open IE system may extract the triples such as (Ceramide, repair, sebum) and (Ceramide, relieve, dryness) from the given sentence “Ceramide can repair the sebum and relieve the dryness”. These extracted triples can be visualized as a part of the knowledge graph that may benefit knowledge inferences in the question answering systems. In this study, we propose a pipelined language transformers model called CHOIE (Chinese Healthcare Open Information Extraction). It uses a pipeline of RoBERTa transformers and different neural networks for feature-extracting to extract triples. We regard the Chinese open information extraction as a two-phase task. First, we extract all the relations in a given sentence and then find all the arguments based on each relation. Due to the lack of publicly available datasets that were annotated manually, we construct such a Chinese OIE dataset in the healthcare domain. We firstly crawled articles from websites that provide healthcare information. After pre-processing, we split the remaining texts into several sentences. We randomly selected partial sentences for manual annotation. Finally, our constructed dataset can be further categorized into four distinct groups including simple relations, single overlaps, multiple overlaps, and complicated relations. Based on the experimental results and error analysis, our proposed CHOIE model achieved the best performance in three evaluation metrics: Exact Match (F1: 0.848), Contain Match (F1: 0.913), and Token Level Match (F1: 0.925) that outperforms existing Multi2OIE, SpanOIE, and RNNOIE models. |
參考文獻 |
參考資料
1. Corporation, L. LINE CONVERGE 2019. 2019; Available from: https://linecorp.com/zh-hant/pr/news/zh-hant/2019/2952.
2. Hochreiter, S. and J. Schmidhuber, Long Short-Term Memory. Neural Comput., 1997. 9(8): p. 1735--1780.
3. Vaswani, A., et al., Attention Is All You Need. CoRR, 2017. abs/1706.03762.
4. Pennington, J., R. Socher, and C. Manning. GloVe: Global Vectors for Word Representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. Association for Computational Linguistics.
5. Devlin, J., et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. Association for Computational Linguistics.
6. Liu, Y., et al., RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, 2019. abs/1907.11692.
7. Doddington, G., et al. The Automatic Content Extraction (ACE) Program -- Tasks, Data, and Evaluation. in Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC′04). 2004. European Language Resources Association (ELRA).
8. Carreras, X. and L. Màrquez. Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling. in Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004. 2004. Association for Computational Linguistics.
9. Nakashole, N., G. Weikum, and F. Suchanek. PATTY: A Taxonomy of Relational Patterns with Semantic Types. in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012. Association for Computational Linguistics.
10. Fader, A., S. Soderland, and O. Etzioni. Identifying Relations for Open Information Extraction. in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011. Association for Computational Linguistics.
11. Gashteovski, K., et al., OPIEC: An Open Information Extraction Corpus. CoRR, 2019. abs/1904.12324.
12. Eberts, M. and A. Ulges, Span-based Joint Entity and Relation Extraction with Transformer Pre-training. CoRR, 2019. abs/1909.07755.
13. Gashteovski, K., R. Gemulla, and L. del Corro. MinIE: Minimizing Facts in Open Information Extraction. in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. Association for Computational Linguistics.
14. Mausam, M. Open Information Extraction Systems and Downstream Applications. in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2016. AAAI Press.
15. Stanovsky, G. and I. Dagan. Creating a Large Benchmark for Open Information Extraction. in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. Association for Computational Linguistics.
16. Schneider, R., et al., Analysing Errors of Open Information Extraction Systems. CoRR, 2017. abs/1707.07499.
17. Lechelle, W., F. Gotti, and P. Langlais. WiRe57 : A Fine-Grained Benchmark for Open Information Extraction. in Proceedings of the 13th Linguistic Annotation Workshop. 2019. Association for Computational Linguistics.
18. Bhardwaj, S., S. Aggarwal, and M. Mausam. CaRB: A Crowdsourced Benchmark for Open IE. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. Association for Computational Linguistics.
19. Xu, J., et al. A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text. 2017.
20. Qiu, L. and Y. Zhang. ZORE: A Syntax-based System for Chinese Open Relation Extraction. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. Association for Computational Linguistics.
21. Tseng, Y.-H., et al. Chinese Open Relation Extraction for Knowledge Acquisition. in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers. 2014. Association for Computational Linguistics.
22. Jia, S., et al., Chinese Open Relation Extraction and Knowledge Base Establishment. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 2018. 17(3).
23. Stanovsky, G., et al. Supervised Open Information Extraction. in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. Association for Computational Linguistics.
24. Ro, Y., Y. Lee, and P. Kang. Multi^2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT. in Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. Association for Computational Linguistics.
25. Lee, L.-H. and Y. Lu, Multiple Embeddings Enhanced Multi-Graph Neural Networks for Chinese Healthcare Named Entity Recognition. IEEE Journal of Biomedical and Health Informatics, 2021: p. 1-1. |