博碩士論文 107525009 完整後設資料紀錄

DC 欄位 語言
DC.contributor軟體工程研究所zh_TW
DC.creator莊家閔zh_TW
DC.creatorChia-Min Chuangen_US
dc.date.accessioned2020-7-31T07:39:07Z
dc.date.available2020-7-31T07:39:07Z
dc.date.issued2020
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107525009
dc.contributor.department軟體工程研究所zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract跨語言文本摘要是透過機器將一種語言的文章轉換成另 一種語言的摘要,先前的研究大多將該任務以兩步驟方法處 理──「先翻譯後摘要」或「先摘要後翻譯」。但是,這兩 種方法皆會有翻譯錯誤的問題,且其中的機器翻譯模型難以 隨著摘要任務繼續更新微調(fine-tune)。針對上述問題, 我們採用預訓練跨語言編碼器以向量表示(represent)不同 語言的輸入,將其映射至相同的向量空間。預訓練方法已被 廣泛應用在各種自然語言生成任務中,並取得優異的模型表 現。此編碼器使得模型在學習摘要能力的過程中,同時保有 跨語言能力。本研究中,我們實驗三種不同的微調方法, 證明預訓練跨語言編碼器可以學習單詞階層(word-level) 的語意特徵。在我們所有的模型組態裡,最優異的模型可 在ROUGE-1分數上,超越基準模型3分。zh_TW
dc.description.abstractCross-lingual text summarization (CLTS) is the task to generate a summary in one language given a document in a another language. Most of the previous work consider CLTS as two sub-tasks: translate-then-summarize and summarize-then-translate. Both of them are suffered from translation error and the translation system is hard to be fine-tuned with text summarization directly. To deal with the above problems, we utilize a pretrained cross-lingual encoder, which has been demonstrated the effectiveness in natural language generation, to represent text inputs from from different languages. We augment a standard sequence-to-sequence (Seq2Seq) network with our pretrained cross-lingual encoder so as to capture cross-lingual contextualized word representation. We show that the pretrained cross-lingual encoder can be fine-tuned on a text summarization dataset while keeping the cross-lingual ability. We experiment three different fine-tune strategies and show that the pretrained encoder can capture cross-lingual semantic features. The best of the proposed models obtains 42.08 Rouge-1 on ZH2ENSUM datasets [Zhu et al., 2019], significantly improving our baseline model by more than 3 Rouge-1.en_US
DC.subject文本摘要zh_TW
DC.subject預訓練模型zh_TW
DC.subject跨語言處理zh_TW
DC.subjectSummarizationen_US
DC.subjectPretraining language modelen_US
DC.subjectCross-lingualen_US
DC.title使用預訓練編碼器提升跨語言摘要能力zh_TW
dc.language.isozh-TWzh-TW
DC.titleImproving Cross-Lingual Text Summarization using Pretrained Encoderen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明