基於注意力機制之詞向量中文萃取式摘要研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：152

、訪客IP：18.117.103.28

姓名

麥嘉芳(Chia-Fang Mai) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

基於注意力機制之詞向量中文萃取式摘要研究

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

隨著科技發展，雖帶來豐富資源，但若沒有適當管理，會造成資訊爆炸之問題，因此自動文件摘要為當今重要的研究議題。有鑑於近年來硬體限制和運算資源不足的突破，深度學習開始被學者們重新探討，並廣泛應用於自然語言處理的領域當中，因此本研究以基於注意力機制的Transformer類神經網路作為主要系統架構，生成具有邏輯性且語句通順的萃取式摘要，並探討如何透過詞向量預訓練模型，有效地增進摘要品質，進而大幅減少時間與人力上之成本。此外，並使用目前最大且最常見之中文資料集LCSTS進行驗證，比較淺層預訓練模型（Word2vec、FastText）和深層預訓練模型（ELMo）之差異，其結果有助於後續學者的相關研究。
從實驗結果可得知，整體來看，除了Word2vec和FastText中的CBOW模型不利於增進摘要結果外，其他詞向量預訓練模型，在Rouge-1、Rouge-2和Rouge-L都有較好的表現，以本研究實驗中最好結果FastText 的Skip-gram詞向量預訓練模型，加上Transformer模型為例，Rouge-1、Rouge-2和Rouge-L分別為0.391、0.247、0.43，特別在Rouge-L的表現上高達0.43，即表示本研究自動生成摘要對原文有較高的涵蓋率，與實驗基準相比，Rouge-1、Rouge-2和Rouge-L分別提升了9%、16%、9%，且與Transformer 8層模型時間相比，透過較短的訓練時間，減少了5.5個小時，卻能得到更好的摘要結果，因此可推論本研究結合詞向量預訓練模型是可用較少的時間去增進系統效能。

摘要(英)

With the development of science and technology, it brings abundant resources for human. If we don’t properly manage it, it will cause information explosion. Therefore, the automatic text summarization is an important research topic today. In view of the breakthroughs in hardware limitations and computing resources in recent years, deep Learning is re-discussed by scholars and begins to be widely used in the field of natural language processing (NLP). Therefore, this research uses the attention mechanism model, Transformer, as the main system architecture to generate logical and sentence-smooth abstractive summarization. Besides, the research expects to use word embedding pre-training model to effectively improve the quality of the abstractive summarization. This system can greatly reduce the time and labor costs. The research uses the current largest Chinese data set LCSTS for comparison. At the same time, it will compare the results of the shallow pre-training word embedding model (Word2vec, FastText) and the deep pre-training word embedding model (ELMo). The results will help follow-up scholars do other researches.
From the experimental results, we can see that: besides the CBOW model of Word2vec and FastText are not conducive to improving the summarization results, other word vector pre-training models have better performance in Rouge-1, Rouge-2 and Rouge-L. Take FastText Skip-gram word vector pre-training model combine Transformer model for example, this model has best performance. Rouge-1, Rouge-2, and Rouge-L are 0.391, 0.247, and 0.43, especially in Rouge-L. The performance of Rouge-L is as high as 0.43, which means that the automatic generation of the abstract in this study has a higher coverage rate for the original text. Compared with the experimental benchmark, Rouge-1, Rouge-2 and Rouge-L are increased by 9%, 16% and 9%. Besides, compared with the time of Transformer 8-layer model, it has shorter training time and can get better summary results at the same time. The training time of the model can reduce about 5.5 hours. We can infer that model combined with the word vector pre-training model is available to improve system performance.

關鍵字(中)

★ 自然語言處理
★ 中文萃取式摘要
★ 注意力機制
★ Transformer
★ 詞向量

關鍵字(英)

論文目次

摘要 i
Abstract ii
誌謝 iv
目錄 v
圖目錄 vii
表目錄 viii
一、緒論 1
1-1 研究背景 1
1-2 研究動機 1
1-3 研究目的 2
1-4 研究架構 3
二、文獻研究 5
2-1 自動文件摘要 5
2-2 詞向量之語言模型 6
2-2-1 Word2vec 7
2-2-2 FastText 9
2-2-3 ELMo 10
2-3 序列到序列模型 11
2-3-1 注意力機制 13
三、研究方法 15
3-1 研究流程 15
3-2 系統架構 16
3-2-1資料前處理 16
3-2-2 詞向量語言模型訓練及提取 17
3-2-3 Transformer模型訓練 17
3-2-4 成果評估 21
四、實驗與結果 22
4-1 資料集 22
4-2 實驗環境 24
4-3 評估方式 24
4-4 實驗設計與結果 25
4-4-1 實驗一：transformer模型層數之影響 25
4-4-2 實驗二：詞向量模型Word2vec之影響 26
4-4-3 實驗三：詞向量模型FastText之影響 27
4-4-4 實驗四：詞向量模型ELMo之影響 28
4-5 實驗分析與綜合比較 29
4-6 實驗分析與其他學者之比較 30
4-7 質性分析實例 31
4-7-1 ROUGE分數較高實例 31
4-7-2 ROUGE分數較低實例 32
4-7-3 系統改善之實例 33
4-8 實際應用 34
五、結論與未來研究方向 37
5.1 結論 37
5.2 研究限制 37
5.3 未來研究方向 38
參考文獻 38

參考文獻

張弛、毅航、Conrad、龍心塵（2019）。BERT 大火却不懂 Transformer？读这一篇就够了。取自2019年7月16日，http://www.6aiq.com/article/1547650238532?p=1&m=0
張昇暉（2017）。中文文件串流之摘要擷取研究。國立中央大學資訊管理研究所未出版碩士論文，台灣，桃園。
蔡汶霖（2018）。以詞向量模型增進基於遞歸神經網路之中文文字摘要系統效能。國立中央大學資訊管理研究所未出版碩士論文，台灣，桃園。
謝育倫、劉士弘、陳冠宇、王新民、許聞廉、陳柏琳（2016）。運用序列到序列生成架構於重寫式自動摘要。第28屆自然語言與語音處理研討會(ROCLING 2016)，台灣，台南市。
Ayana, Shen, S., Liu, Z., & Sun, M. (2016). Neural headline generation with minimum risk training. Retrieved June 14, 2019, from https://www.researchgate.net/publication/301878995_Neural_Headline_Generation_with_Minimum_Risk_Training　
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Proceeding of the International Conference on Learning Representations 2015 (ICLR 2015), San Diego, CA.
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(6). 1137-1155.
Cao, Z., Li, W., Li, S., Wei, F., & Li, Y. (2016). AttSum: Joint learning of focusing and summarization with neural attention. Proceeding of the 26th International Conference on Computational Linguistics: Technical Papers (COLING 2016), Osaka, Japan
Chen, Q., Zhu, X., Ling, Z., Wei, S., & Jiang, H. (2016). Distraction-based neural networks for modeling documents. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2016), Palo Alto, California, USA.
Chen, Y., Chen, B., & Wang, H. (2009). A probabilistic generative framework for extractive broadcast news speech summarization. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 95-106
Chopra, S., Auli, M., & Rush, A. M. (2016). Abstractive sentence summarization with attentive recurrent neural networks. Proceeding of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493-2537
Conneau, A., Kiela, D., Schwenk, H., Barrault, L. & Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark
Conroy, J. M. & O’leary, D. P. (2001). Text summarization via hidden markov models. Proceeding of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2001), New Orleans, Louisiana, USA.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the Association for Information Science and Technology (JAIST), 41(6), 391-407.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. Retrieved June 14, 2019, from https://arxiv.org/pdf/1810.04805.pdf
Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Proceeding of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2001), New Orleans, Louisiana, United States.
Gu, J., Lu, Z., Li, H., & Li, V. O. K. (2016). Incorporating copying mechanism in sequence-to-sequence learning. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany.
Hinton, G. E. (1986). Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Amherst, Massachusetts.
Hofmann, T. (1999). Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 1999), Berkeley, California, USA.
Hou, Y., Xiang, Y., Tang, B., Chen, Q., Wang, X., & Zhu, F. (2017). Identifying high quality document–summary pairs through text matching. Information , 8(2), 64-84
Hu, B., Chen, Q., & Zhu, F. (2015). LCSTS: A large scale chinese short text summarization dataset. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. Proc eeding of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain
Klein, G., Kim, Y., Deng, Y., Senellart, J., & Rush, A. M. (2017). OpenNMT: Open-source toolkit for neural machine translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada.
Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable document summarizer. Proceeding of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 1995), Seattle, Washington, USA.
Li, P., Bing, L., & Lam, W. (2018). Actor-critic based training framework for abstractive summarization. Retrieved June 14, 2019, from https://arxiv.org/pdf/1803.11070.pdf
Li, P., Lam, W., Bing, L., & Wang, Z. (2017). Deep recurrent generative decoder for abstractive text summarization. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing(EMNLP), Copenhagen, Denmark.
Lin, C. (2004). ROUGE: A package for automatic evaluation of summaries. Proceedings of Association for Computational Linguistics (ACL 2004), Barcelona, Spain.
Luong, M., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.
Ma, S., & Sun, X. (2017). A semantic relevance based neural network for text summarization and text simplification. Retrieved June 14, 2019, from https://arxiv.org/pdf/1710.02318.pdf
Ma, S., Sun, X., Li, W., Li, S., Li, W., & Ren, X. (2018). Word embedding attention network: Generating words by querying distributed word representations for paraphrase generation. Retrieved June 14, 2019, from https://arxiv.org/pdf/1803.01465v1.pdf
Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into texts. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain.
Mikolov, T., Chen, K., Corrado, G., & Dean, D. (2013). Efficient estimation of word representations in vector space. Proceedings of the International Conference on Learning Representations (ICLR 2013), Scottsdale, Arizona, USA
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J. H., & Khudanpur, S. (2010). Recurrent neural network based language model. Proceeding of the11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Chiba, Japan
Osborne, M. (2002). Using maximum entropy for sentence extraction. Proceeding of the Workshop on Automatic Summarization (including DUC 2002), Philadelphia.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. Proceeding of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (AACL-HLT 2018), New Orleans, Louisiana.
Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. Proceeding of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal.
Shen, D., Sun, J., Li, H., Yang, Q., & Chen, Z. (2007). Document summarization using conditional random fields. Proceeding of the 20th international joint conference on Artifical intelligence (IJCAI 2007), Hyderabad, India
Sun, X., Wei, B., Ren, X., & Ma, S. (2018). Label embedding network: Learning label representation for soft training of deep networks. Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, Canada.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Proceeding of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California, USA
Wang, L., Yao, J., Tao, Y., Zhong,L., Liu, W., & Du, Q. (2018). A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden.
Xu, H., Cao, Y., Shang, Y., Liu, Y., Tan, J., & Guo, L. (2018). Adversarial reinforcement learning for chinese text summarization. Proceedings of the 18th International Conference (ICCS 2018), Wuxi, China.
Yang, W., Tang, Z., & Tang, X. (2018). A hierarchical neural abstractive summarization with self-attention mechanism. Proceedings of the 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018), Dalian, China.
Yin, J., Jiang, X., Lu, Z., Shang, L., Li, H., & Li, X. (2016). Neural generative question answering. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2016), San Diego, California.
Yin, W., & Pei, Y. (2015). Optimizing sentence modeling and selection for document summarization. Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, Argentina.
Zhuang, H., Wang, C., Li, C., Li, Y., Wang, Q., & Zhou, X. (2018). Chinese language processing based on stroke representation and multi dimensional representation. IEEE Access, 6, 41928-41941.

指導教授

林熙禎

審核日期

2019-7-19

推文