博碩士論文 105552022 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系在職專班zh_TW
DC.creator張智皓zh_TW
DC.creatorChih-Hao Changen_US
dc.date.accessioned2018-9-13T07:39:07Z
dc.date.available2018-9-13T07:39:07Z
dc.date.issued2018
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=105552022
dc.contributor.department資訊工程學系在職專班zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在傳統的基於機器學習的中文命名實體辨識系統中,往往採用從中文文本中萃取出大量的人工特徵(hand-craft features)、甚至採用專家所設計實體專用關鍵詞庫(Dictionary)等,再利用線性統計與機率模型的方法統整出重要特徵進而找出中文語意規則,然而卻有兩個顯而易見的缺點:從大量中文文本中提取特徵是一件非常費時費力且複雜的任務;再者,模型的優劣完全相依於人工所設計之特徵辨識強度。因此,礙於中文語意混淆特性與未知詞彙,精確率難以提高。 有鑑於在不同的語系結構下,英文以空格作為斷詞特徵,而中文則無明確斷詞表現,但字詞間的關係卻具有強烈的相依性,並根據前後文語意將展現不同的差異性(同字異義、一詞多義)。因此,在龐大語料庫中如何辨識中文命名實體,極具挑戰與可能性。 為應對上述種種挑戰以及缺點,本研究採用深度學習架構完成中文命名實體辨識系統;首先透過非監督式學習(Unsupervised Learning)方式採用深度學習模型對大量文本預訓練詞嵌入字典;透過字典將字、詞數值化,再應用多層次卷積(Convolution)層階層式地萃取文字特徵,層與層間加入門控機制泛化特徵,在無任何特徵工程下自動萃取出蘊含於其中的特徵資訊,目的在於減少命名實體辨識對於人工特徵的依賴、及毋須設計中文識別特徵,該方法有效地應用於辨識實體類型。 本研究使用資料文檔包括SIGHAN Bakeoff-3[1]及透過客製化爬蟲程式所擷取網路之文章作為訓練資料;以實體報章電子檔做為測試資料[31],作為基準用以評估各模型之效能,經研究測試結果呈現,本文所提出之模型F1-Measure達SIGHAN overall 90.76%和報章電子檔 90.42 %之出眾效能。zh_TW
dc.description.abstractTraditional Chinese Named Entity Recognition based on machine learning usually relies on large amounts of hand-craft features, even dictionaries created by experts specific for entity, and then, uses linear regression and statistical models to gather important features and Chinese semantic rules. However, two obvious flaws can be observed. Firstly, it is extremely time-consuming and complicated to extract features from Chinese texts. Secondly, the usefulness of the models completely depends on the recognition efficiency based on hand-craft features; as a result, it is difficult to improve its accuracy due to semantic confusion that is characteristic in Chinese and unknown vocabularies. In English, spaces are used for word segmentation, and Chinese does not have similar word segmentation. However, Chinese words are highly interdependent and demonstrate semantic differences (homographs, polysemy) based on the context. Therefore, a great challenge as well as a possibility is how to recognize Chinese named entities in large corpora. To provide a solution to the challenge and flaws mentioned above, this study employs deep learning structure to complete Chinese Named Entity Recognition. Firstly, the deep learning model is combined with unsupervised learning to embed a large amount of pre-training words in the vocabulary. Then, the vocabulary is used to numeralize words before using multi-stack convolution to extract textual features. Gating mechanism is also incorporated between layers to generalize features and automatically extract features without employing feature engineering. The purpose of doing so is to reduce the dependency on hand-craft features in Named Entity Recognition and avoid hand-craft Chinese recognition features. This method can be effectively applied to recognizing different types of entities. This study uses documents from SIGHAN Bakeoff-3 and utilizes customized crawler programs to capture internet articles for training data. Electronic files of newspaper articles are used as testing data and form the standard by which the efficiency of different models can be evaluated. The results show that the F1-Measure model proposed by the study reaches outstanding an overall efficiency of 90.76% in SIGHAN and 90.42% in electronic files of newspaper articles.en_US
DC.subject深度學習zh_TW
DC.subject命名實體辨識zh_TW
DC.subject卷積神經網路zh_TW
DC.subject門控機制zh_TW
DC.subjectDeep Learningen_US
DC.subjectNamed Entity Recognitionen_US
DC.subjectConvolutional Neural Networksen_US
DC.subjectGating Mechanismen_US
DC.title應用門控機制與多層卷積深度學習模型於中文命名實體辨識之研究zh_TW
dc.language.isozh-TWzh-TW
DC.titleMulti-Stack Convolution with Gating Mechanism for Chinese Named Entity Recognitionen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明