Items with full text/Total items : 65318/65318 (100%)
Visitors : 21733682
Online Users : 200
Please use this identifier to cite or link to this item:
|Title: ||應用門控機制與多層卷積深度學習模型於中文命名實體辨識之研究;Multi-Stack Convolution with Gating Mechanism for Chinese Named Entity Recognition|
|Authors: ||張智皓;Chang, Chih-Hao|
|Keywords: ||深度學習;命名實體辨識;卷積神經網路;門控機制;Deep Learning;Named Entity Recognition;Convolutional Neural Networks;Gating Mechanism|
|Issue Date: ||2019-04-02 15:03:24 (UTC+8)|
|Abstract: ||在傳統的基於機器學習的中文命名實體辨識系統中，往往採用從中文文本中萃取出大量的人工特徵(hand-craft features)、甚至採用專家所設計實體專用關鍵詞庫(Dictionary)等，再利用線性統計與機率模型的方法統整出重要特徵進而找出中文語意規則，然而卻有兩個顯而易見的缺點：從大量中文文本中提取特徵是一件非常費時費力且複雜的任務；再者，模型的優劣完全相依於人工所設計之特徵辨識強度。因此，礙於中文語意混淆特性與未知詞彙，精確率難以提高。|
本研究使用資料文檔包括SIGHAN Bakeoff-3及透過客製化爬蟲程式所擷取網路之文章作為訓練資料；以實體報章電子檔做為測試資料，作為基準用以評估各模型之效能，經研究測試結果呈現，本文所提出之模型F1-Measure達SIGHAN overall 90.76%和報章電子檔 90.42 %之出眾效能。;Traditional Chinese Named Entity Recognition based on machine learning usually relies on large amounts of hand-craft features, even dictionaries created by experts specific for entity, and then, uses linear regression and statistical models to gather important features and Chinese semantic rules. However, two obvious flaws can be observed. Firstly, it is extremely time-consuming and complicated to extract features from Chinese texts. Secondly, the usefulness of the models completely depends on the recognition efficiency based on hand-craft features; as a result, it is difficult to improve its accuracy due to semantic confusion that is characteristic in Chinese and unknown vocabularies.
In English, spaces are used for word segmentation, and Chinese does not have similar word segmentation. However, Chinese words are highly interdependent and demonstrate semantic differences (homographs, polysemy) based on the context. Therefore, a great challenge as well as a possibility is how to recognize Chinese named entities in large corpora.
To provide a solution to the challenge and flaws mentioned above, this study employs deep learning structure to complete Chinese Named Entity Recognition. Firstly, the deep learning model is combined with unsupervised learning to embed a large amount of pre-training words in the vocabulary. Then, the vocabulary is used to numeralize words before using multi-stack convolution to extract textual features. Gating mechanism is also incorporated between layers to generalize features and automatically extract features without employing feature engineering. The purpose of doing so is to reduce the dependency on hand-craft features in Named Entity Recognition and avoid hand-craft Chinese recognition features. This method can be effectively applied to recognizing different types of entities.
This study uses documents from SIGHAN Bakeoff-3 and utilizes customized crawler programs to capture internet articles for training data. Electronic files of newspaper articles are used as testing data and form the standard by which the efficiency of different models can be evaluated. The results show that the F1-Measure model proposed by the study reaches outstanding an overall efficiency of 90.76% in SIGHAN and 90.42% in electronic files of newspaper articles.
|Appears in Collections:||[資訊工程學系碩士在職專班 ] 博碩士論文|
Files in This Item:
All items in NCUIR are protected by copyright, with all rights reserved.
::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期：8-24-2009 :::