利用記憶增強條件隨機場域之深度學習及自動化詞彙特徵於中文命名實體辨識之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：18.225.255.26

姓名

簡國峻(Kuo-Chun Chien) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

利用記憶增強條件隨機場域之深度學習及自動化詞彙特徵於中文命名實體辨識之研究
(Leveraging Memory Enhanced Condition Random Fields with Convolutional and Automatic Lexical Feature for Chinese Named Entity Recognition)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

序列標記的模型被廣泛的運用在自然語言處理的範疇當中，如：命名實體辨識、詞性標記、斷詞等。命名實體辨識(Named Entity Recognition, NER)是自然語言處理當中一項重要的任務，因為它可以將未經過處理的文章，提取當中的命名實體並歸類到預先定義的分類當中，如：人名、地名、組織等。
命名實體辨識任務當中，大多數的研究是針對英文的資料集，不同於英文通常以空格做為每個單字的分割，且每個單字通常具有其獨特的意思；中文字通常隱含許多不同的資訊，根據所在的詞彙當中不同的位置，就有可能代表不同的意思，也因此中文當中並沒有明確的斷詞特徵。而傳統的機器學習於中文命名實體的辨識任務中，大多係使用統計的方式，並採取條件隨機場域進行序列標記，因此受限於小範圍的特徵擷取，如何在中文的資料集當中擷取參考長距離上下文資訊，判斷當前字詞正確的語意，進而正確的辨識命名實體，是一個充滿挑戰性及前瞻性的任務。
為克服上述的挑戰，本研究係使用深度學習的條件隨機場域進行中文命名實體辨識任務；首先透過訓練詞向量模型，將字元轉換為數值化之資料，再藉由卷積層、雙向GRU層，及整合長距離文章資訊的記憶層，使命名實體任務不同於往常僅能夠擷取小範圍的資訊，能夠獲取豐富完整的文章訊息。此外，也藉由特徵的探勘[1]，並使用深度學習模型可自動訓練的參數，自動調整詞向量及詞彙特徵，除長距離的文章資訊外，更能充分獲得文章所隱藏的訊息。
本研究所使用的資料集包含使用自製爬蟲軟體所蒐集的網路文章做為訓練資料，另以網路新聞做為測試資料[3]的PerNews及SIGHAN Bakeoff-3[2]；經研究實驗結果呈現，在網路社群媒體的資料中可以達到的91.67％的標記準確率，與尚未加入記憶的模型相比大幅提升2.9％，再加入詞彙詞向量及詞彙特徵，與基礎的記憶模型相比更是提升了6.04％。本研究所提出之模型在SIGHAN-MSRA中也得到最高的92.45％地名實體辨識效果及90.95％召回率。

摘要(英)

Sequence labeling model has been widely used in Natural Language Processing (NLP). Ex: Named Entity recognition (NER), Part-Of-Speech tagging (POS) and Word Segmentation. Named Entity Recognition (NER) is one of the important tasks of Natural Language Processing because it can extract unnamed articles and extract them into pre-defined categories, such as person name, place name, organization, etc.
Most of the research in Named Entity Recognition (NER) focused on English data. In English, spaces are usually used for dividing words, and each word has its own meaning. While in Chinese, each characters contains different information, different location of the vocabulary, may represent different meanings, so Chinese is without explicit word delimiters. However, the traditional machine learning of Chinese Named Entity Recognition (CNER), most of them use statistical methods and take the Conditional Random Field (CRF) to complete the sequence labeling task. Therefore, it only can capture local features. It is a challenging and forward-looking task to capturing long-range context information in Chinese dataset, determine the correct semantic meaning of the current word, and correctly identify the named entity.
In order to overcome the challenges, this study used the deep learning Condition Random Fields to execute Chinese Named Entity Recognition task. Firstly, training a word vector model to convert characters to numeric data. And used convolutional layer, bidirectional GRU layer, and the memory layer that integrates external memory contains long-range context information. Making the task different from usual, only can capture local information, but can obtain rich message of article. Also by feature extraction generate some lexical features[1]. And use a automatically trained variable of deep learning model to automatically adjust the weight of word embedding and lexical features. In addition of long-range article information, the model also can fully obtain the hidden information of article.
The data set used in this research includes PerNews which is online articles collected using custom crawler as training data and online news articles as test data, and SIGHAN Bakeoff-3. According to the results, the model proposed in this research achieve 91.67% tagging accuracy in the online social media data. The result is significantly higher than the model that doesn’t add memory layer by 2.9%. And then the word embedding and lexical features are added, compared with the basic memory model increase 6.04%. The model proposed in this study also achieve the highest F1-score 92.45% at location name entity recognition performance and 90.95% overall recall rate in SIGHAN-MSRA dataset.

關鍵字(中)

★ 機器學習
★ 命名實體辨識
★ 記憶網路
★ 特徵探勘

關鍵字(英)

★ Machine Learning
★ Named Entity Recognition
★ Memory Network
★ Feature Mining

論文目次

摘要 i
Abstract ii
目錄 iv
表目錄 vi
圖目錄 vii
一、簡介 1
二、相關研究 3
2-1 條件隨機場域(Condition Random Fields) 3
2-2 卷積神經網路(Convolutional Neural Networks) 3
2-3 遞歸神經網路(Recurrent Neural Networks) 4
2-4 記憶網路(Memory Networks) 4
三、模型架構及方法 6
3-1 輸入層(Input Layer) 8
3-2 卷積層(Convolutional Layer) 10
3-3 雙向GRU層(Bidirectional GRU Layer) 11
3-4 記憶層(Memory Layer) 12
3-4-1輸入記憶(Input Memory) 13
3-4-2輸出記憶(Output Memory) 14
3-4-3當前輸入(Current Input) 14
3-4-4注意力(Attention) 14
3-4-5記憶層輸出(Memory Layer Output) 15
3-5 條件隨機場域層(Condition Random Fields Layer) 15
四、實驗與系統效能 17
4-1 資料集 17
4-1-1 PerNews 17
4-1-2 SIGHAN-MSRA 19
4-1-3 資料分析 19
4-1-4 詞向量(Word Vector) 21
4-2 效能評估方法 22
4-3 模型參數調整 23
4-3-1 記憶層之激活函數 23
4-3-2 記憶產生方式及記憶大小 24
4-3-3 卷積層過濾器數量 25
4-4 增加額外的特徵 26
4-4-1 詞彙詞向量(Word Embedding) 27
4-4-2 詞彙特徵(Lexical Features) 28
4-4-3 效能評估 29
4-5 與其他研究模型的效能評估 32
五、結論與未來展望 36
參考文獻 37

參考文獻

[1]C. Chou and C. Chang, "Mining features for web ner model construction based on distant learning," 2017 International Conference on Asian Language Processing (IALP), Singapore, 2017, pp. 322-325.
[2]Levow, G.A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Computational Linguistics, pp. 108–117 (2006)
[3]Y. Y. Huang, C.H. Chung, “A Tool for Web NER Model Generation Based on Google Snippets,” Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, pp. 148–163, 2015.
[4]Sunita Sarawagi (2008), “Information Extraction,” Foundations and Trends® in Databases, pp. 261-377, 2008.
[5]L. Satish and B.I. Gururaj. 1993. Use of hidden Markov models for partial discharge pattern classification. Electrical Insulation, IEEE Transactions on 28, 2 (Apr 1993), 172–182.
[6]Gideon S. Mann and Andrew McCallum. 2010. Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. J. Mach. Learn. Res. 11 (March 2010), 955–984.
[7]Andrew McCallum and Wei Li. 2003. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 -Volume 4 (CONLL ’03). Association for Computational Linguistics, Stroudsburg, PA,USA, 188–191
[8]Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2):157–166
[9]Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and J¨urgen Schmidhuber. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies.
[10]Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015). Austin, USA, volume 333, pages 2267–2273.
[11]Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. Assessing the ability of lstms to learn syntax sensitive dependencies. Transactions of the Association for Computational Linguistics (TACL 2016) 4:521–535.
[12]Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015). San Diego, USA.
[13]Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2016. Language modeling with gated convolutional networks. arXiv Preprint. arXiv: 1612.08083.
[14]John D. Lafferty, Andrew Mccallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. pages 282–289.
[15]Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). Ann Arbor, USA, pages 363–370.
[16]Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493–2537.
[17]Wang, C., and Xu, B. (2017) Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation. preprint arXiv:1711.04411
[18]Sepp Hochreiter, Jürgen Schmidhuber, “Long Short-Term Memory”, in Neural Computation 9(8):1735-80, December 1997.
[19]Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
[20]Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging [OL]. arXiv Preprint.arXiv: 1508.01991.
[21]Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681.
[22]Kyunghyun Cho, Bart Van Merri¨enboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha, Qatar, pages 103–111.
[23]Liu, Fei and Baldwin, Timothy and Cohn, Trevor, 2017, Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields, Proceedings of the Eighth International Joint Conference on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, pages 555—565
[24]TensorFlow, https://www.tensorflow.org/
[25]Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of NAACL-2016, San Diego, California, USA, June.
[26]Joohui An, Seungwoo Lee, and Gary Geunbae Lee. 2003. Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics -Volume 2 (ACL’03). Association for Computational Linguistics, Stroudsburg, PA, USA, 165–168.
[27]Salton, G., Wong, A., Yang, C. S., “A Vector Space Model for Automatic Indexing,” Commun. ACM, vol. 18, 1975, pp：613-620
[28]Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[29]Jieba,https://github.com/fxsjy/jieba
[30]CRF++: Yet Another CRFtoolkit：http://crfpp.sourceforge.net/
[31]Zhou, J., He, L., Dai, X., Chen, J.: Chinese named entity recognition with a multiphase model. In: Proceedings of 5th SIGHAN Workshop on Chinese Language Processing, pp. 213–216 (2006)
[32]Chen, A., Peng, F., Shan, R., Sun, G.: Chinese named entity recognition with conditional probabilistic models. In: Proceedings of 5th SIGHAN Workshop on Chinese Language Processing, pp. 173–176 (2006)
[33]Zhou, J., Qu, W., Zhang, F.: Chinese named entity recognition via joint identification and categorization. Chin. J. Electron. 22, 225–230 (2013)
[34]Zhang, S., Qin, Y., Wen, J., Wang, X.: Word segmentation and named entity recognition for SIGHAN Bakeoff3. In: Proceedings of 5th SIGHAN Workshop on Chinese Language Processing, pp. 158–161 (2006)
[35]Chuanhai Dong, Jiajun Zhang, Chengqing Zong,Masanori Hattori, and Hui Di. 2016. Characterbased LSTM-CRF with radical-level features for Chinese named entity recognition. In International Conference on Computer Processing of Oriental Languages. Springer, pages 239–250.
[36]Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of EMNLP-2014, pages 1532–1543, Doha, Qatar, October.
[37]Bottou. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nˆımes. EC2, 1991.
[38]Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky,Ilya Sutskever, and Ruslan Salakhutdinov. 2014.Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1):1929–1958.
[39]Nanyun Peng and Mark Dredze. 2015. Named entity recognition for Chinese social media with jointly trained embeddings. In Proceedings of EMNLP-2015, pages 548–554, Lisbon, Portugal, September.
[40]Zhang, Y., Clark, S.: A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In: Proceedings of 2010 Conference on Empirical Methods in Natural Language Processing, pp. 843–852 (2010)
[41]Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, Huanbo Luan. Joint Learning of Character and Word Embeddings. The 25th International Joint Conference on Artificial Intelligence (IJCAI 2015)

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2018-10-2

推文