English  |  正體中文  |  简体中文  |  Items with full text/Total items : 66984/66984 (100%)
Visitors : 22921379      Online Users : 110
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version

    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/79576

    Title: 利用記憶增強條件隨機場域之深度學習及自動化詞彙特徵於中文命名實體辨識之研究;Leveraging Memory Enhanced Condition Random Fields with Convolutional and Automatic Lexical Feature for Chinese Named Entity Recognition
    Authors: 簡國峻;Chien, Kuo-Chun
    Contributors: 資訊工程學系在職專班
    Keywords: 機器學習;命名實體辨識;記憶網路;特徵探勘;Machine Learning;Named Entity Recognition;Memory Network;Feature Mining
    Date: 2018-10-02
    Issue Date: 2019-04-02 15:03:40 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 序列標記的模型被廣泛的運用在自然語言處理的範疇當中,如:命名實體辨識、詞性標記、斷詞等。命名實體辨識(Named Entity Recognition, NER)是自然語言處理當中一項重要的任務,因為它可以將未經過處理的文章,提取當中的命名實體並歸類到預先定義的分類當中,如:人名、地名、組織等。
    本研究所使用的資料集包含使用自製爬蟲軟體所蒐集的網路文章做為訓練資料,另以網路新聞做為測試資料[3]的PerNews及SIGHAN Bakeoff-3[2];經研究實驗結果呈現,在網路社群媒體的資料中可以達到的91.67%的標記準確率,與尚未加入記憶的模型相比大幅提升2.9%,再加入詞彙詞向量及詞彙特徵,與基礎的記憶模型相比更是提升了6.04%。本研究所提出之模型在SIGHAN-MSRA中也得到最高的92.45%地名實體辨識效果及90.95%召回率。;Sequence labeling model has been widely used in Natural Language Processing (NLP). Ex: Named Entity recognition (NER), Part-Of-Speech tagging (POS) and Word Segmentation. Named Entity Recognition (NER) is one of the important tasks of Natural Language Processing because it can extract unnamed articles and extract them into pre-defined categories, such as person name, place name, organization, etc.
    Most of the research in Named Entity Recognition (NER) focused on English data. In English, spaces are usually used for dividing words, and each word has its own meaning. While in Chinese, each characters contains different information, different location of the vocabulary, may represent different meanings, so Chinese is without explicit word delimiters. However, the traditional machine learning of Chinese Named Entity Recognition (CNER), most of them use statistical methods and take the Conditional Random Field (CRF) to complete the sequence labeling task. Therefore, it only can capture local features. It is a challenging and forward-looking task to capturing long-range context information in Chinese dataset, determine the correct semantic meaning of the current word, and correctly identify the named entity.
    In order to overcome the challenges, this study used the deep learning Condition Random Fields to execute Chinese Named Entity Recognition task. Firstly, training a word vector model to convert characters to numeric data. And used convolutional layer, bidirectional GRU layer, and the memory layer that integrates external memory contains long-range context information. Making the task different from usual, only can capture local information, but can obtain rich message of article. Also by feature extraction generate some lexical features[1]. And use a automatically trained variable of deep learning model to automatically adjust the weight of word embedding and lexical features. In addition of long-range article information, the model also can fully obtain the hidden information of article.
    The data set used in this research includes PerNews which is online articles collected using custom crawler as training data and online news articles as test data, and SIGHAN Bakeoff-3. According to the results, the model proposed in this research achieve 91.67% tagging accuracy in the online social media data. The result is significantly higher than the model that doesn’t add memory layer by 2.9%. And then the word embedding and lexical features are added, compared with the basic memory model increase 6.04%. The model proposed in this study also achieve the highest F1-score 92.45% at location name entity recognition performance and 90.95% overall recall rate in SIGHAN-MSRA dataset.
    Appears in Collections:[資訊工程學系碩士在職專班 ] 博碩士論文

    Files in This Item:

    File Description SizeFormat

    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback  - 隱私權政策聲明