中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/77637
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41667901      線上人數 : 1560
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/77637


    題名: 以詞向量模型增進基於遞歸神經網路之中文文字摘要系統效能
    作者: 蔡汶霖;Tsai, Wen-Lin
    貢獻者: 資訊管理學系
    關鍵詞: 詞向量;詞嵌入;中文摘要;萃取式摘要;遞歸神經網路;word vector;word embedding;Chinese summarization;abstractive summarization;RNN
    日期: 2018-07-27
    上傳時間: 2018-08-31 14:51:17 (UTC+8)
    出版者: 國立中央大學
    摘要: 在資訊過度膨脹的時代,人們難以在短時間內接受大量資訊,自動摘要的技術因應而生。本研究以遞歸神經網路(recurrent neural network, RNN)為基礎建立一套萃取式(abstractive)摘要系統,並以word2vec與GloVe及fastText等不同的詞向量(word embedding)模型作為遞歸神經網路之預訓練詞向量模型,藉此提升摘要系統之品質。
    本研究使用來自維基百科的大規模泛用語料庫與來自LCSTS資料集的語料庫作為預訓練詞向量之語料庫,並以不同維度的多種詞向量模型搭配不同維度的遞歸神經網路交互測試實驗,發現預訓練詞向量的確有助於提升系統效能,且使用適中維度的詞向量模型搭配高維度的遞歸神經網路時能取得最佳表現。
    本研究亦將系統應用於中文文章,提出泛用性高且效能優異的萃取式中文摘要系統,除了以自動化評估指標取得優於前人研究30%之水準外,本研究亦以質性分析列出從優而劣之摘要成果以供參考,最後則以臺灣地區之實際新聞文章測試並驗證系統效能。;In an era of information expansion, it is difficult for people to accept a large amount of information in a short time. That is the reason why the technology of automatic summarization was born. In this study, an abstractive text summarization system based on recurrent neural network (RNN) is established. Various pre-trained word embedding models, such as word2vec, GloVe, and fastText, are used with the RNN model to improve the quality of the summarization system.
    In this study, we used two corpora to pre-train word embedding models, including a large-scale and general corpus from Wikipedia and a corpus from the LCSTS dataset. In a series of experiments, we built RNN models with different hidden units’ size and different word embedding models with their different dimensions and found that the pre-trained word embedding models conduce to improve system performances. To achieve best results, using suitable dimensions in word embedding models with larger hidden units’ size in RNN is highly recommended.
    The summarization system is also applied to Chinese articles to achieve a Chinese abstractive summarization system with high versatility and high performance. Our system exceeds previous works’ results in 30%, and we also provide qualitative analyses to demonstrate our outstanding achievements. Lastly, we use Taiwan news articles to test and verify our system performance.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML266檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明