dc.description.abstract | With the development of science and technology, it brings abundant resources for human. If we don’t properly manage it, it will cause information explosion. Therefore, the automatic text summarization is an important research topic today. In view of the breakthroughs in hardware limitations and computing resources in recent years, deep Learning is re-discussed by scholars and begins to be widely used in the field of natural language processing (NLP). Therefore, this research uses the attention mechanism model, Transformer, as the main system architecture to generate logical and sentence-smooth abstractive summarization. Besides, the research expects to use word embedding pre-training model to effectively improve the quality of the abstractive summarization. This system can greatly reduce the time and labor costs. The research uses the current largest Chinese data set LCSTS for comparison. At the same time, it will compare the results of the shallow pre-training word embedding model (Word2vec, FastText) and the deep pre-training word embedding model (ELMo). The results will help follow-up scholars do other researches.
From the experimental results, we can see that: besides the CBOW model of Word2vec and FastText are not conducive to improving the summarization results, other word vector pre-training models have better performance in Rouge-1, Rouge-2 and Rouge-L. Take FastText Skip-gram word vector pre-training model combine Transformer model for example, this model has best performance. Rouge-1, Rouge-2, and Rouge-L are 0.391, 0.247, and 0.43, especially in Rouge-L. The performance of Rouge-L is as high as 0.43, which means that the automatic generation of the abstract in this study has a higher coverage rate for the original text. Compared with the experimental benchmark, Rouge-1, Rouge-2 and Rouge-L are increased by 9%, 16% and 9%. Besides, compared with the time of Transformer 8-layer model, it has shorter training time and can get better summary results at the same time. The training time of the model can reduce about 5.5 hours. We can infer that model combined with the word vector pre-training model is available to improve system performance.
| en_US |