論文名稱 利用與摘要相關的文章重點句結合對比學習改進摘要模型的事實一致性
(Combining Key Sentences Related to the Abstract with Contrastive Learning to Improve Summarization Factual Inconsistency)
摘要(中) 摘要中的事實不一致性代表摘要中的訊息無法從來源文章中獲得驗證,是抽象式摘要中棘手的問題,研究顯示模型產出的摘要有30\%擁有事實不一致的問題,使得抽象式摘要難以應用在生活中,近幾年研究者也開始重視這個問題。



摘要(英) Hallucination, also known as factual inconsistency, is when models generate summaries that contain incorrect information or information not mentioned in source text.

It is a critical problem in abstractive summarization and makes summaries generated by models hard to use in practice.
Previous works prefer to add additional information such as background knowledge into the model or use post-correct/rank method after decoding to improve this headache.

Contrastive learning is a new model-training method and has achieved excellent results in the Image Processing field. The concept is to use the contrast between positive and negative samples to make vectors learned by the model cluster together. Given the anchor point, the distance between the anchor point and the positive samples will be closer, and the distance between the anchor point and the negative samples will be farther. This way, the model has the ability to distinguish positive examples from negative examples to a certain extent.

We propose a new method to improve factual consistency by separating representation of the most relevant sentences and the least relevant sentences from the source document during the training phase through contrastive learning so that the model can learn how to generate summaries that are more relevant to the main points of the source documents.
關鍵字(中) ★ 抽象式摘要
★ 預訓練模型
★ 對比學習
★ 事實一致性
關鍵字(英) ★ Abstractive Summarization
★ Pre-trained Model
★ Factual Inconsistency
★ Hallucination
★ Contrastive Learning
論文目次 Contents
中文摘要 i
Abstract ii
誌謝 iv
Contents v
List of Figures vii
List of Tables viii
1 Introduction 1
2 Related work 4
2.1 Pre-trained language model . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Bart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Automatic text summarization . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Extractive summarization . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Abstractive summarization . . . . . . . . . . . . . . . . . . . . . 8
2.3 Factuality improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Factuality evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Method 11
3.1 Abstractive text summarization . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Sentence extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Relevant sentences extraction . . . . . . . . . . . . . . . . . . . 14
3.2.2 Less relevant sentences extraction . . . . . . . . . . . . . . . . . 15
3.3 Contrastive learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Final training objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Experiments 18
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.1 CNN Dailymail . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.2 Xsum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Models to compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.1 ROUGE-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.2 ROUGE-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.3 ROUGE-L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.4 QuestEval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.5 FactCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Results and analysis 22
5.1 CNN Dailymail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Xsum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Study on contrastive encoder . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4.1 Embedding visualization . . . . . . . . . . . . . . . . . . . . . . 25
5.5 Hyperparameter combination in final loss . . . . . . . . . . . . . . . . . 27
6 Conclusion 28
Bibliography 28
List of Figures
1.1 Example of factual inconsistency . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Architecture of the Transformer . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Five data corruption methods in Bart . . . . . . . . . . . . . . . . . . . . 7
3.1 Example of sentence extraction on CNN Dailymail dataset . . . . . . . . 13
3.2 Example of sentence extraction on XSum dataset . . . . . . . . . . . . . 14
3.3 Our model architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 case visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
List of Tables
5.1 The Cnn Dailymail result scores . . . . . . . . . . . . . . . . . . . . . . 23
5.2 The Xsum result scores . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Encoder study result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 The hyperparameter combination result . . . . . . . . . . . . . . . . . . 27
指導教授 蔡宗翰(Tzong-Han Tsai) 審核日期 2023-2-2
