近年許多研究提出各種基於神經網路的對話系統,但模擬對話仍然是對話生成領域中最棘手的挑戰之一。而大多對話系統與其相關研究仍採用基於RNN架構Seq2Seq模型,此外Transformer在Neural Machine Translation (NMT) 領域上的表現遠超於基於RNN架構的Seq2Seq模型,但鮮少研究將基於RNN的Seq2Seq模型和Transformer模型在對話生成領域上進行評估和比較,且對話生成模型的評估方式仍然無法使用單一的評估基準來對模型的生成回應進行評估。 因此本研究會採用基於RNN的Seq2Seq模型和Transformer模型,並使用二種電影字幕及對話相關資料集Cornell Movie-Dialog Corpus和OpenSubtitles Corpus進行對模型進行訓練。因資料集的特性,本篇研究也將著重在於open-domain對話模型之上進行探討,並且使用多種量化分析指標和質性分析來證實二者模型架構對於open-domain對話生成領域中的合適性,並且探討各個對話評估方式的相依性和可靠性。 從量化分析和質性分析結果顯示,基於RNN的Seq2Seq模型合適回答較短且保守的回應,Transformer模型在回應的整體質量和預測能力比基於RNN的Seq2Seq模型較高,且擅長回答推論簡單的問題,以及比後者較能夠生成較長的回應。並在本研究中找出各項評估指標的相依關係。而期望在未來研究中,將Transformer模型導入和取代基於RNN的Seq2Seq模型在不同的模型的架構和任務當中,並且將本研究評估的流程導入未來的研究當中。 ;In recent years, many studies have proposed many kinds of neural network-based dialogue systems, but analog dialogue is still one of the most difficult challenges in the field of dialogue generation. Most of the dialogue systems and related research still use the Seq2Seq model based on RNN architecture. In addition, Transformer performs much better in the field of Neural Machine Translation (NMT) than RNN-based Seq2Seq models, but few studies have evaluated and compared RNN-based Seq2Seq models and Transformer models in the field of dialog generation, and the way in which the dialog generation model is evaluated is still not able to use a single evaluation benchmark to evaluate the model′s generated response. Therefore, this study will use RNN-based Seq2Seq model and Transformer model, and models were trained using two movie subtitles and conversation-related data sets, Cornell Movie-Dialog Corpus and OpenSubtitles Corpus. Due to the nature of the dataset, this study will also focus on the open-domain dialogue model and use a variety of quantitative analysis indicators and qualitative analysis to verify the suitability of the two model architectures in the open-domain dialog generation domain. And explore the interdependence and reliability of the various methods of dialogue evaluation. From the results of quantitative analysis and qualitative analysis, the RNN-based Seq2Seq model is suitable for short answers and conservative responses. The overall quality and predictive power of the Transformer model is higher than that of the RNN-based Seq2Seq model, and it is good at answering simple inference questions and generating longer responses than the latter. In this study, we find the dependence of various evaluation indicators. It is expected that in the future research, the Transformer model will be introduced and replaced with the RNN-based Seq2Seq model in the architecture and tasks of different models, and the process of this research evaluation will be introduced into future research.