中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/77637
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 41648106      Online Users : 2177
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/77637


    Title: 以詞向量模型增進基於遞歸神經網路之中文文字摘要系統效能
    Authors: 蔡汶霖;Tsai, Wen-Lin
    Contributors: 資訊管理學系
    Keywords: 詞向量;詞嵌入;中文摘要;萃取式摘要;遞歸神經網路;word vector;word embedding;Chinese summarization;abstractive summarization;RNN
    Date: 2018-07-27
    Issue Date: 2018-08-31 14:51:17 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 在資訊過度膨脹的時代,人們難以在短時間內接受大量資訊,自動摘要的技術因應而生。本研究以遞歸神經網路(recurrent neural network, RNN)為基礎建立一套萃取式(abstractive)摘要系統,並以word2vec與GloVe及fastText等不同的詞向量(word embedding)模型作為遞歸神經網路之預訓練詞向量模型,藉此提升摘要系統之品質。
    本研究使用來自維基百科的大規模泛用語料庫與來自LCSTS資料集的語料庫作為預訓練詞向量之語料庫,並以不同維度的多種詞向量模型搭配不同維度的遞歸神經網路交互測試實驗,發現預訓練詞向量的確有助於提升系統效能,且使用適中維度的詞向量模型搭配高維度的遞歸神經網路時能取得最佳表現。
    本研究亦將系統應用於中文文章,提出泛用性高且效能優異的萃取式中文摘要系統,除了以自動化評估指標取得優於前人研究30%之水準外,本研究亦以質性分析列出從優而劣之摘要成果以供參考,最後則以臺灣地區之實際新聞文章測試並驗證系統效能。;In an era of information expansion, it is difficult for people to accept a large amount of information in a short time. That is the reason why the technology of automatic summarization was born. In this study, an abstractive text summarization system based on recurrent neural network (RNN) is established. Various pre-trained word embedding models, such as word2vec, GloVe, and fastText, are used with the RNN model to improve the quality of the summarization system.
    In this study, we used two corpora to pre-train word embedding models, including a large-scale and general corpus from Wikipedia and a corpus from the LCSTS dataset. In a series of experiments, we built RNN models with different hidden units’ size and different word embedding models with their different dimensions and found that the pre-trained word embedding models conduce to improve system performances. To achieve best results, using suitable dimensions in word embedding models with larger hidden units’ size in RNN is highly recommended.
    The summarization system is also applied to Chinese articles to achieve a Chinese abstractive summarization system with high versatility and high performance. Our system exceeds previous works’ results in 30%, and we also provide qualitative analyses to demonstrate our outstanding achievements. Lastly, we use Taiwan news articles to test and verify our system performance.
    Appears in Collections:[Graduate Institute of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML266View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明