Oxford University Press;England: Oxford University Press
摘要:
摘要: Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates ( 1 ) class composition, which is used for combining chemical classes whose naming conventions are similar; ( 2 ) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and ( 3 ) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem. 其他題名: Database (Oxford) 出版者: England: Oxford University Press 出版日期: 2016-10-25 出處: Database : the journal of biological databases and curation, 2016-10, Vol.2016, p.baw135 版權: The Author(s) 2016. Published by Oxford University Press. 版權: The Author(s) 2016. Published by Oxford University Press. 2016 識別號: ISSN: 1758-0463 識別號: EISSN: 1758-0463 識別號: DOI: 10.1093/database/baw135 識別號: PMID: 31414701