以詞向量模型增進基於遞歸神經網路之中文文字摘要系統效能

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/77637

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/77637

Title:	以詞向量模型增進基於遞歸神經網路之中文文字摘要系統效能
Authors:	蔡汶霖;Tsai, Wen-Lin
Contributors:	資訊管理學系
Keywords:	詞向量;詞嵌入;中文摘要;萃取式摘要;遞歸神經網路;word vector;word embedding;Chinese summarization;abstractive summarization;RNN
Date:	2018-07-27
Issue Date:	2018-08-31 14:51:17 (UTC+8)
Publisher:	國立中央大學
Abstract:	在資訊過度膨脹的時代，人們難以在短時間內接受大量資訊，自動摘要的技術因應而生。本研究以遞歸神經網路(recurrent neural network, RNN)為基礎建立一套萃取式(abstractive)摘要系統，並以word2vec與GloVe及fastText等不同的詞向量(word embedding)模型作為遞歸神經網路之預訓練詞向量模型，藉此提升摘要系統之品質。本研究使用來自維基百科的大規模泛用語料庫與來自LCSTS資料集的語料庫作為預訓練詞向量之語料庫，並以不同維度的多種詞向量模型搭配不同維度的遞歸神經網路交互測試實驗，發現預訓練詞向量的確有助於提升系統效能，且使用適中維度的詞向量模型搭配高維度的遞歸神經網路時能取得最佳表現。本研究亦將系統應用於中文文章，提出泛用性高且效能優異的萃取式中文摘要系統，除了以自動化評估指標取得優於前人研究30%之水準外，本研究亦以質性分析列出從優而劣之摘要成果以供參考，最後則以臺灣地區之實際新聞文章測試並驗證系統效能。;In an era of information expansion, it is difficult for people to accept a large amount of information in a short time. That is the reason why the technology of automatic summarization was born. In this study, an abstractive text summarization system based on recurrent neural network (RNN) is established. Various pre-trained word embedding models, such as word2vec, GloVe, and fastText, are used with the RNN model to improve the quality of the summarization system. In this study, we used two corpora to pre-train word embedding models, including a large-scale and general corpus from Wikipedia and a corpus from the LCSTS dataset. In a series of experiments, we built RNN models with different hidden units’ size and different word embedding models with their different dimensions and found that the pre-trained word embedding models conduce to improve system performances. To achieve best results, using suitable dimensions in word embedding models with larger hidden units’ size in RNN is highly recommended. The summarization system is also applied to Chinese articles to achieve a Chinese abstractive summarization system with high versatility and high performance. Our system exceeds previous works’ results in 30%, and we also provide qualitative analyses to demonstrate our outstanding achievements. Lastly, we use Taiwan news articles to test and verify our system performance.
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	266	View/Open

社群 sharing

Loading...