dc.description.abstract | Open Information Extraction (OIE) aims at extracting the triples in terms of (Argument-1, Relation, Argument-2) from unstructured natural language texts. For example, an open IE system may extract the triples such as (Ceramide, repair, sebum) and (Ceramide, relieve, dryness) from the given sentence “Ceramide can repair the sebum and relieve the dryness”. These extracted triples can be visualized as a part of the knowledge graph that may benefit knowledge inferences in the question answering systems. In this study, we propose a pipelined language transformers model called CHOIE (Chinese Healthcare Open Information Extraction). It uses a pipeline of RoBERTa transformers and different neural networks for feature-extracting to extract triples. We regard the Chinese open information extraction as a two-phase task. First, we extract all the relations in a given sentence and then find all the arguments based on each relation. Due to the lack of publicly available datasets that were annotated manually, we construct such a Chinese OIE dataset in the healthcare domain. We firstly crawled articles from websites that provide healthcare information. After pre-processing, we split the remaining texts into several sentences. We randomly selected partial sentences for manual annotation. Finally, our constructed dataset can be further categorized into four distinct groups including simple relations, single overlaps, multiple overlaps, and complicated relations. Based on the experimental results and error analysis, our proposed CHOIE model achieved the best performance in three evaluation metrics: Exact Match (F1: 0.848), Contain Match (F1: 0.913), and Token Level Match (F1: 0.925) that outperforms existing Multi2OIE, SpanOIE, and RNNOIE models. | en_US |