博碩士論文 111522016 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator廖楚信zh_TW
DC.creatorChu-Xin Liaoen_US
dc.date.accessioned2024-8-19T07:39:07Z
dc.date.available2024-8-19T07:39:07Z
dc.date.issued2024
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=111522016
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract語音翻譯(Speech Translation,ST)是自然語言處理(NLP)和語音處理的交叉領域,目的是將一種語言的語音直接翻譯成另一種語言的語音或文字。這項技術是現代科技的重要成果之一,不僅能實現無障礙交流,還能促進全球交流與合作,推動語言教育進步。隨著全球化和跨文化交流的加速,語音翻譯技術在各種應用場景中變得越來越重要,成為許多學者研究的焦點。 深度學習技術在翻譯任務中可以細分為多種類型:文字到文字(Text-to-Text)、文字到語音(Text-to-Speech)、語音到文字(Speech-to-Text)和語音到語音(Speech-to-Speech)。其中,文字到文字、語音到文字以及語音到語音的翻譯備受關注。大型語言模型(如GPT)具備高超的理解和生成能力,使得文字到文字翻譯在大量高質量訓練資料的支持下,效果尤為突出。 語音到語音翻譯可採用三階層級聯(3-Stage Cascaded)方法,將自動語音辨識(ASR)模型,機器翻譯(MT)模型和文字轉語音(TTS)模型進行串聯。這種方法使得級聯模型的缺點(錯誤傳遞以及高延遲)更為明顯。單階層語音到語音翻譯模型(Direct Speech-to-Speech Translation Model)雖然改善了級聯模型的缺點,其效果卻落後於強大的級聯模型。這主要是因為語音到語音的訓練資料稀少,即便使用資料增強方法,效果也不如級聯模型。因此,克服資料稀少或生成高質量的語音到語音資料成為一個重要議題。本篇論文志在找出其中的平衡,使得模型能夠同時擁有高效能且低延遲。zh_TW
dc.description.abstractSpeech Translation (ST) is an interdisciplinary field that combines Natural Language Processing (NLP) and speech processing, aiming to directly translate speech from one language into another language′s speech or text. This technology is one of the significant achievements of modern science, not only enabling barrier-free communication but also promoting global exchange and cooperation, as well as advancing language education. With the acceleration of globalization and cross-cultural exchanges, speech translation technology has become increasingly important in various application scenarios and has become a focal point of research for many scholars. Deep learning technology in translation tasks can be categorized into several types: Text-to-Text, Text-to-Speech, Speech-to-Text, and Speech-to-Speech. Among these, Text-to-Text, Speech-to-Text, and Speech-to-Speech translation are particularly noteworthy. Large language models (such as GPT) possess exceptional comprehension and generation capabilities, making Text-to-Text translation particularly effective with extensive high-quality training data. Speech-to-Speech translation can adopt a three-stage cascaded approach, linking Automatic Speech Recognition (ASR) models, Machine Translation (MT) models, and Text-to-Speech (TTS) models in sequence. This method makes the drawbacks of cascaded models more apparent; however, Direct Speech-to-Speech Translation Models still significantly lag behind well-trained cascaded models. This is primarily due to the scarcity of training data for Speech-to-Speech translation. Even with data augmentation techniques, the results are still inferior to cascaded models. Therefore, overcoming the scarcity of data or generating high-quality Speech-to-Speech data remains a crucial issue. This paper aims to find a balance, ensuring that the models achieve both high performance and low latency.en_US
DC.subject自動語音辨識zh_TW
DC.subject機器翻譯zh_TW
DC.subject文字轉語音zh_TW
DC.subject語音翻譯zh_TW
DC.subjectAutomatic Speech Recognitionen_US
DC.subjectMachine Translationen_US
DC.subjectText to Speechen_US
DC.subjectSpeech Translationen_US
DC.title利用預訓練模型和多種類型的數據改進語音翻譯zh_TW
dc.language.isozh-TWzh-TW
DC.titleLeveraging Pre-trained Models and Various Types of Data to Improve Speech Translationen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明