博碩士論文 104582603 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator比⾺特zh_TW
DC.creatorBima Prihastoen_US
dc.date.accessioned2023-7-29T07:39:07Z
dc.date.available2023-7-29T07:39:07Z
dc.date.issued2023
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104582603
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract這篇論文對語音處理做出了重大貢獻,特別是在語音合成和語音轉換方面。 這個貢獻分為三個主要部分。 首先,已經確定基於 RNN 的模型適用於解決語音合成問題,但是計算時間長仍然是一個問題。 本論文在對 MGU 進行修改的基礎上,成功地構建了一種新的 RNN 架構,從 MGU 的一些方程中去除了單元狀態歷史。 這種基於 MGU 的新架構的速度是其他基於 MGU 的架構的兩倍,但仍能產生同等質量的聲音。 兩種對比學習之前都解決了非平行語音轉換問題,但是聲音合成結果並不理想。 這是因為沒有保留聲源的信息內容,無法調整音色和韻律來匹配目標聲音。 本論文介紹了一種硬性反例的對比學習方法,稱為CNEG-VC。 該技術基於語音輸入生成實例方面的負面示例,並使用對抗性損失來生成硬負面示例,從而提高非並行語音轉換的性能。 最後,論文提出了在頻譜特徵中使用選擇性注意作為非並行語音轉換中對比學習的錨點,稱為 CSA-VC。 該技術基於對每行概率分佈的測量來選擇查詢,並使用減少的注意力矩陣來確保在合成中保留源關係。zh_TW
dc.description.abstractThis dissertation has made a substantial contribution to speech processing, particularly in speech synthesis and voice conversion. There are three main parts to this contribution. Firstly, it has been established that RNN-based models are suitable for solving speech synthesis problems, however long computing time is still an issue. This dissertation successfully built a new RNN architecture based on modifications to the MGU, which removes the unit state history from some equations in the MGU. This new MGU-based architecture is twice as fast as the other MGU-based architectures yet still produce a sound of equal quality. Secondly, contrastive learning has previously solved non-parallel voice conversion problems, but the sound synthesis results were unsatisfactory. This is because the information content of the sound source was not preserved and the timbre and prosody could not be adjusted to match the target sound. This dissertation introduced a hard negative examples approach in contrastive learning, called CNEG-VC. This technique generates instance-wise negative examples based on the voice input and uses an adversarial loss to produce hard negative exam- ples, resulting in an improved performance in non-parallel voice conversion. Finally, the dissertation proposed the use of selective attention in spectral features as an anchor point for contrastive learning in non-parallel voice conversion, called CSA-VC. This technique selects a query based on the measurement of the probability distribution of each line and uses the reduced attention matrix to ensure that source relations are preserved in the synthesis.en_US
DC.subject語音合成zh_TW
DC.subject語音轉換zh_TW
DC.subject非平行數據zh_TW
DC.subject遞歸神經網絡zh_TW
DC.subject對比學習zh_TW
DC.subjecthard negative examplezh_TW
DC.subject注意機制zh_TW
DC.subjectSpeech synthesisen_US
DC.subjectvoice conversionen_US
DC.subjectnon-parallel dataen_US
DC.subjectrecurrent neural networksen_US
DC.subjectcontrastive learningen_US
DC.subjecthard negative exampleen_US
DC.subjectattention mechanismen_US
DC.title使用門控遞歸網絡和對比學習進行語音合成的非並行語音轉換:一種混合深度學習方法zh_TW
dc.language.isozh-TWzh-TW
DC.titleNon-Parallel Voice Conversion for Speech Synthesis using Gated Recurrent Networks and Contrastive Learning: A Hybrid Deep Learning Approachen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明