混合式階層多輪客製化對話生成模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：33

、訪客IP：18.117.112.183

姓名

黃智輝(Chih-Hui Huang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

混合式階層多輪客製化對話生成模型
(Hybrid Hierarchical Transformer for Customized Multi-turn Dialogue Generation)

相關論文

★ 台灣50走勢分析：以多重長短期記憶模型架構為基礎之預測	★ 以多重遞迴歸神經網路模型為基礎之黃金價格預測分析
★ 增量學習用於工業4.0瑕疵檢測	★ 遞回歸神經網路於電腦零組件銷售價格預測之研究
★ 長短期記憶神經網路於釣魚網站預測之研究	★ 基於深度學習辨識跳頻信號之研究
★ Opinion Leader Discovery in Dynamic Social Networks	★ 深度學習模型於工業4.0之機台虛擬量測應用
★ A Novel NMF-Based Movie Recommendation with Time Decay	★ 以類別為基礎sequence-to-sequence模型之POI旅遊行程推薦
★ A DQN-Based Reinforcement Learning Model for Neural Network Architecture Search	★ Neural Network Architecture Optimization Based on Virtual Reward Reinforcement Learning
★ 生成式對抗網路架構搜尋	★ 以漸進式基因演算法實現神經網路架構搜尋最佳化
★ Enhanced Model Agnostic Meta Learning with Meta Gradient Memory	★ 遞迴類神經網路結合先期工業廢水指標之股價預測研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-7-1以後開放)

摘要(中)

自然語言處理技術(Nature Language Processing)現今已獲得了長足的進步，從過往的小單位詞彙的翻譯，到如今已能對整篇文章進行整合性理解並了解句子內的含意。目前最常見的對話生成採用多對多模型(Seq2Seq)，將使用者的一句問題輸入進模型後模型根據該句話來生成最好的回答。然而，觀察在現實生活中複雜的人與人對話，將很難發現對話採用獨立的單句問與答。大多數人對話會使用多輪對話的形式，也就是當回答者想要回答時，他所關注的問題不單單只有當下提問者的提問，更會包含前幾輪提問者的發問、自己前幾輪的回答。同時，對話的生成應該更加客製化，根據當前對話的主題、提問者的對象特徵的不同，模型應該有不一樣的對話來滿足不同客戶、不同對話主題。
本實驗針對以上不足之處做出改進，提出使用混和式階層注意力機制來改良多輪對話的訊息。以外也提出了在自注意力基礎上如何客製化生成語句，透過實驗證明此作法能夠有效改善此類任務條件，並為對話生成帶來貢獻。

摘要(英)

Nowadays, Natural Language Processing (NLP) has been made great progress. In the past, NLP can only work with one-word translation, nowadays, it could integrate the entire article and understand the meaning of sentence. The most common dialogue generation technique is “sequence to sequence”, which generates the best response sentence according to a single user input. However, in the reality, most of human dialogue have multi-turn questions and responses instead of only a single pair of them. When a person wants to response decently, he or she will not only focus on the last question, but the whole scenario, includes previous conversations. That is to say, the generation of dialogue may be more completely if contains information of utterance. Second, in the meanwhile, the response should be more customized. It should be based on current dialogue theme, such as characteristics of the questioner…etc. Model should have different responses for different users and themes although within same questions.
To meet above achievements, our research proposes hybrid hierarchical mechanism to improve multi-turn dialogues. Furthermore, we also propose a method to customize generating response based on self-attention mechanism. In our experiments, this approach can effectively improve the dialogue generation.

關鍵字(中)

★ 對話生成
★ 自注意力機制
★ 客製化系統
★ 深度學習

關鍵字(英)

★ Dialogue Generation
★ Self-attention mechanism
★ Customized System
★ Deep Learning

論文目次

摘要 i
ABSTRACT ii
List of Figures v
List of Tables vi
1. Introduction 1
2. Related work 5
2-1 Sequence-to-Sequence Model 5
2-2 Hierarchical RNN 8
3. Proposed method: HHT 10
3-1 Self-attention mechanism 12
3-2 Utterance Encoder 16
3-3 Context Encoder 19
3-4 Decoder 21
4. Experiments and results 24
4-1 Datasets 24
4-2 Baseline Models 25
4-3 Evaluation Metrics 26
4-4 BLEU Performance Comparison 28
4-5 ROUGE Performance Comparison 33
4-6 Discussion on Utterance Length 36
4-7 Influence of Model Units 39
4-8 Analysis on Corpus Size 40
4-9 Ablation Study 42
4-10 Parameter Setting Discussion 46
4-11 Case Study 50
5. Conclusion 53
5-1 Future work 53
Reference 54

參考文獻

Reference
[1] Y. Liu and M. Lapata, “Hierarchical Transformers for Multi-Document Summarization,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5070–5081, 2019.
[2] X. Zhang, F. Wei, and M. Zhou, “HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5059–5069, 2019.
[3] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” in Advances in Neural Information Processing Systems, vol. 27, 2014.
[4] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, 1986.
[5] S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,”in Fuzziness Knowl.-Based Syst., vol. 06, no. 02, pp. 107–116, 1998.
[6] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[7] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches,” In Proceedings of SSST-8, p 103–111, 2014.
[8] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, 1997.
[9] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning toalign and translate,” In Proceedings of ICLR 2015, 2015.
[10] T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, 2015.
[11] M. R. Costa-jussà and J. A. R. Fonollosa, “Character-based Neural Machine Translation,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 357–361, 2016.
[12] A. See, M.-T. Luong, and C. D. Manning, “Compression of Neural Machine Translation Models via Pruning,” in Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 291–301, 2016.
[13] Y. Kim and A. M. Rush, “Sequence-Level Knowledge Distillation,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1317–1327, 2016.
[14] J. Zhou, Y. Cao, X. Wang, P. Li, and W. Xu, “Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation,” Trans. Assoc. Comput. Linguist., vol. 4, pp. 371–383, 2016.
[15] R. J. Williams and D. Zipser, “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks,” Neural Comput., vol. 1, no. 2, pp. 270–280, 1989.
[16] S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks,” in Advances in Neural Information Processing System, pp. 1171–1179, 2015.
[17] A. M. Lamb, A. G. ALIAS PARTH GOYAL, Y. Zhang, S. Zhang, A. C. Courville, and Y. Bengio, “Professor Forcing: A New Algorithm for Training Recurrent Networks,” in Advances in Neural Information Processing Systems, vol. 29, 2016.
[18] T. Mihaylova and A. F. T. Martins, “Scheduled Sampling for Transformers,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 351–356, 2019.
[19] T. Mikolov, K. Chen, G. s Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Proc. Workshop ICLR, vol. 2013, 2013.
[20] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A neural probabilistic language model,” J. Mach. Learn. Res., vol. 3, no. null, pp. 1137–1155, 2003.
[21] M. E. Peters et al., “Deep Contextualized Word Representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237, 2018.
[22] A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
[23] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, 2019.
[24] A. Radford and K. Narasimhan, “Improving Language Understanding by Generative Pre-Training,” 2018.
[25] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE: Enhanced Language Representation with Informative Entities,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451, 2019.
[26] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
[27] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3776–3783, 2016.
[28] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” 2nd International Conference on Learning Representations ICLR, 2014.
[29] K. Yao, G. Zweig, and B. Peng, “Attention with Intention for a Neural Network Conversation Model,” arxiv, 2015.
[30] H. Chen, Z. Ren, J. Tang, Y. E. Zhao, and D. Yin, “Hierarchical Variational Memory Network for Dialogue Generation,” in Proceedings of the 2018 World Wide Web Conference, pp. 1653–1662, 2018.
[31] Y. Liu, H. Yuan, and S. Ji, “Learning Local and Global Multi-context Representations for Document Classification,” in 2019 IEEE International Conference on Data Mining (ICDM), pp. 1234–1239, 2019.
[32] Y. Li, J. Yu, and Z. Wang, “Dense Semantic Matching Network for Multi-turn Conversation,” in 2019 IEEE International Conference on Data Mining (ICDM), pp. 1186–1191, 2019.
[33] J. Li, T. Luong, and D. Jurafsky, “A Hierarchical Neural Autoencoder for Paragraphs and Documents,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1106–1115, 2015.
[34] I. Serban et al., “A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues,” Proc. AAAI Conf. Artif. Intell., vol. 31, no. 1, Art. no. 1, 2017.
[35] J. Xu, H. Wang, Z.-Y. Niu, H. Wu, W. Che, and T. Liu, “Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1835–1845, 2020.
[36] S. Wu, Y. Li, D. Zhang, Y. Zhou, and Z. Wu, Diverse and Informative Dialogue Generation with Context-Specific Commonsense Knowledge Awareness,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5811–5820, 2020.
[37] V. Vlasov, J. Mosig, and A. Nichol, “Dialogue Transformers,” CoRR, 2019.
[38] W. Chen, J. Chen, P. Qin, X. Yan, and W. Y. Wang, “Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3696–3709, 2019.
[39] B. Santra, P. Anusha, and P. Goyal, “Hierarchical Transformer for Task Oriented Dialog Systems,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5649–5658, 2021.
[40] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318, 2002.
[41] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out, pp. 74–81, 2004.
[42] Y. Li, H. Su, X. Shen, W. Li, Z. Cao, and S. Niu, “DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset,” in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 986–995, 2017.
[43] X. Zang, A. Rastogi, S. Sunkara, R. Gupta, J. Zhang, and J. Chen, “MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines,” in Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pp. 109–117, 2020.
[44] B. Chen and C. Cherry, “A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU,” in Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 362–367, 2014.

指導教授

陳以錚(Yi-Cheng Chen)

審核日期

2022-7-21

推文