中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/88382
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41742969      線上人數 : 1436
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/88382


    題名: 藉由mBART50和DialoGPT所生成之繁體中文對話;Dialogue Generation in Traditional Chinese Utilizing the Algorithm Based on mBART50 and DialoGPT
    作者: 廖翊均;Liao, Yi-Chun
    貢獻者: 數學系
    關鍵詞: 機器翻譯;對話生成;mBART50;DialoGPT;Machine Translation;Dialogue Generation;mBART50;DialoGPT
    日期: 2022-01-14
    上傳時間: 2022-07-14 01:08:59 (UTC+8)
    出版者: 國立中央大學
    摘要: 在自然語言處理中,目前已知的模型大部分都有支援中文翻譯或者對話的生成。但是我們知道,中文分為簡體中文以及繁體中文。 然而這些模型支援的中文多為簡體中文, 雖然同樣都是中文,但它們的用詞以及用法都不盡相同。

    由於缺乏繁體中文的翻譯和對話數據,本文將以翻譯和對話相結合來進行。也就是說,我們做了繁體中文和英文的雙向翻譯,以及英文的對話。訓練翻譯的數據來以下的新聞以及線上課程網站:The China Post、Voice Tube和Hope English,並且對話數據來自dailydialog,之後我們使用Hi Tutor和 TOCFL來做最後的測試。我們藉由 mBART50 以及 DialoGPT 兩種模型的合併並且使用微調的方式來生成繁體中文的對話。

    我們微調出來的翻譯模型其結果皆比原來的模型好,尤其是在beam size值為7時。對話模型在微調後的結果顯示,在小型的模型中生成的對話最為流暢。在最後的實驗中,我們運用了參數 beam size、top k 和 top p 找出能夠產生最佳結果的數值,分別為:7、10和0.95。我們最好的模型在最後的測試中的分數為2.85。最後,我們使用微調出來的最好的模型生成了一個藉由英文對話而產生的繁體中文的對話。;In Natural Language Processing ( NLP ), as far as we know, lots of the currently known models support translation or dialogue generation in Chinese. But we know that Chinese is divided into simplified Chinese and traditional Chinese. However, the Chinese supported by these models are mostly simplified Chinese. Although they are all Chinese, their characters and usage are not the same.

    Due to a lack of translation and dialogue data in Traditional Chinese, we use a combination of translation and dialogue with English as the pivot in this paper. In other words, we have made a two-way translation between traditional Chinese and English, as well as a pivotal dialogue in English. To accomplish the translation in the training part, we use data collected from the following news sources and online classes: The China Post, Voice Tube, and Hope English. Moreover, we use dailydialog to train the English dialogue. Then, for the final test, we adopt a traditional Chinese dialogue from Hi Tutor and TOCFL. We utilize mBART50 and DialoGPT to generate the traditional Chinese dialogue with fine-tuning.

    The results of our fine-tuning models are better than the original models without fine-tuning. Especially when the beam size is 7 in the translation. After fine-tuning the dialogue model, the result shows that the dialogue generated from the small size model is the smoothest. In the final experiment, we use the parameters beam size, top k, and top p to produce the best results in our model, respectively: 7, 10, and 0.95. The bleu score of the final test in our best model is 2.85. Finally, using the best model, we build a traditional Chinese dialogue utilizing English conversations as the pivot.
    顯示於類別:[數學研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML86檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明