摘要(英) |
Named Entity Recognition (NER) focuses on locating the mentions of name entities and classifying their types, usually referring to proper nouns such as persons, places, organizations, dates, and times. The NER results can be used as the basis for relationship extraction, event detection and tracking, knowledge graph building, and question answering system. NER studies usually regard this research topic as a sequence labeling problem and learns the labeling model through the large-scale corpus. We propose a ME-GGSNN (Multiple Embeddings enhanced Gated Graph Sequence Neural Networks) model for Chinese healthcare NER. We derive a character representation based on multiple embeddings in different granularities from the radical, character to word levels. An adapted gated graph sequence neural network is involved to incorporate named entity information in the dictionaries. A standard BiLSTM-CRF is then used to identify named entities and classify their types in the healthcare domain.
We firstly crawled articles from websites that provide healthcare information, online health-related news and medical question/answer forums. We then randomly selected partial sentences to retain content diversity. It includes 30,692 sentences with a total of around 1.5 million characters or 91.7 thousand words. After manual annotation, we have 68,460 named entities across 10 entity types: body, symptom, instrument, examination, chemical, disease, drug, supplement, treatment, and time. Based on further experiments and error analysis, our proposed method achieved the best F1-score of 75.69% that outperforms previous models including the BiLSTM-CRF, BERT, Lattice, Gazetteers, and ME-CNER. In summary, our ME-GGSNN model is an effective and efficient solution for the Chinese healthcare NER task. |
參考文獻 |
[1] Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77 (2), p. 257–286, February 1989.
[2] Toutanova, Kristina; Manning, Christopher D., Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. Proc. J. SIGDAT Conf. on Empirical Methods in NLP and Very Large Corpora (EMNLP/VLC-2000). pp. 63–70.
[3] Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, ICML 2001, pp. 282–289 (2001).
[4] Krizhevsky, A., Sutskever, I., & Hinton, G., (2012). ImageNet classification with deep convolutional neural networks. In NIPS.
[5] Williams, Ronald J.; Hinton, Geoffrey E.; Rumelhart, David E., (October 1986). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536.
[6] Hochreiter, S., Schmidhuber, J., Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
[7] Levow, G.A., The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Computational Linguistics, pp.
[8] Nanyun Peng and Mark Dredze, 015. Named entity recognition for Chinese social media with jointly trained embeddings. In EMNLP. pages 548–554.
[9] Zhang, Y. and Yang, J., (2018). Chinese NER using lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(ACL’18),Long Papers, pages 1554-1564.
[10] Xianpei Han, Overview of the CCKS 2019 Knowledge Graph Evaluation Track: Entity, Relation, Event and QA (2019). arXiv.
[11] Fu, G., Luke, K.K., Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explor. Newsl. 7, 19–25 (2005).
[12] Gideon S. Mann and Andrew McCallum., 2010. Generalized Expectation Criteria for SemiSupervised Learning with Weakly Labeled Data. J. Mach. Learn. Res. 11 (March 2010), 955–984.
[13] Duan, H., Zheng, Y., A study on features of the CRFs-based Chinese. Int. J. Adv. Intell. 3, 287–294 (2011).
[14] Han, A.L.-F., Wong, D.F., Chao, L.S., Chinese named entity recognition with conditional random fields in the light of Chinese characteristics. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds.) IIS 2013. LNCS, vol. 7912, pp. 57–68. Springer, Heidelberg (2013).
[15] Huang, Z., Xu, W., Yu, K., Bidirectional LSTM-CRF models for sequence tagging (2015). arXiv.
[16] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer (2016).Neural architectures for named entity recognition.In Proceedings of the NAACL’16, pp. 108-117
[17] Chuanhai Dong, Jiajun Zhang, Chengqing Zong, Masanori Hattori, and Hui Di., 2016. Character based LSTM-CRF with radical-level features for Chinese named entity recognition. In International Conference on Computer Processing of Oriental Languages. Springer, pages 239–250.
[18] Canwen Xu, Feiyang Wang, Jialong Han, and Chenliang Li, Exploiting multiple embeddings for chinese named entity recognition. In CIKM, pages 2269–2272. ACM, 2019.
[19] Ruixue Ding, Pengjun Xie, Xiaoyan Zhang, Wei Lu, Linlin Li, and Luo Si., 2019. A neural multidigraph model for chinese ner with gazetteers. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1462–1467.
[20] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel, 2016. Gated graph sequence neural networks. In Proc. of ICLR.
[21] Mikolov, T., Chen, K., Corrado, G., & Dean, J., (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[22] Cho, K. et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. Conference on Empirical Methods in Natural Language Processing 1724–1734 (2014).
[23] Cohen, Jacob, (1960). "A coefficient of agreement for nominal scales". Educational and Psychological Measurement. 20 (1): 37–46.
[24] Fleiss, J. L., (1971) "Measuring nominal scale agreement among many raters." Psychological Bulletin, Vol. 76, No. 5 pp. 378–382.
[25] Landis, J. R. and Koch, G. G., "The measurement of observer agreement for categorical data" in Biometrics. Vol. 33, pp. 159–174.
[26] Ma, Wei-Yun and Keh-Jiann Chen, 2003, "Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff", Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp168-171.
[27] Jeffrey Pennington, Richard Socher, and Christopher D. Manning, 2014. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
[28] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov, 2017. Enriching word vectors with subword information. TACL 5:135–146.
[29] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova., BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short apers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. |