使用門控遞歸網絡和對比學習進行語音合成的非並行語音轉換：一種混合深度學習方法

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	比⾺特	zh_TW
DC.creator	Bima Prihasto	en_US
dc.date.accessioned	2023-7-29T07:39:07Z
dc.date.available	2023-7-29T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=104582603
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	這篇論文對語音處理做出了重大貢獻，特別是在語音合成和語音轉換方面。這個貢獻分為三個主要部分。首先，已經確定基於 RNN 的模型適用於解決語音合成問題，但是計算時間長仍然是一個問題。本論文在對 MGU 進行修改的基礎上，成功地構建了一種新的 RNN 架構，從 MGU 的一些方程中去除了單元狀態歷史。這種基於 MGU 的新架構的速度是其他基於 MGU 的架構的兩倍，但仍能產生同等質量的聲音。兩種對比學習之前都解決了非平行語音轉換問題，但是聲音合成結果並不理想。這是因為沒有保留聲源的信息內容，無法調整音色和韻律來匹配目標聲音。本論文介紹了一種硬性反例的對比學習方法，稱為CNEG-VC。該技術基於語音輸入生成實例方面的負面示例，並使用對抗性損失來生成硬負面示例，從而提高非並行語音轉換的性能。最後，論文提出了在頻譜特徵中使用選擇性注意作為非並行語音轉換中對比學習的錨點，稱為 CSA-VC。該技術基於對每行概率分佈的測量來選擇查詢，並使用減少的注意力矩陣來確保在合成中保留源關係。	zh_TW
dc.description.abstract	This dissertation has made a substantial contribution to speech processing, particularly in speech synthesis and voice conversion. There are three main parts to this contribution. Firstly, it has been established that RNN-based models are suitable for solving speech synthesis problems, however long computing time is still an issue. This dissertation successfully built a new RNN architecture based on modifications to the MGU, which removes the unit state history from some equations in the MGU. This new MGU-based architecture is twice as fast as the other MGU-based architectures yet still produce a sound of equal quality. Secondly, contrastive learning has previously solved non-parallel voice conversion problems, but the sound synthesis results were unsatisfactory. This is because the information content of the sound source was not preserved and the timbre and prosody could not be adjusted to match the target sound. This dissertation introduced a hard negative examples approach in contrastive learning, called CNEG-VC. This technique generates instance-wise negative examples based on the voice input and uses an adversarial loss to produce hard negative exam- ples, resulting in an improved performance in non-parallel voice conversion. Finally, the dissertation proposed the use of selective attention in spectral features as an anchor point for contrastive learning in non-parallel voice conversion, called CSA-VC. This technique selects a query based on the measurement of the probability distribution of each line and uses the reduced attention matrix to ensure that source relations are preserved in the synthesis.	en_US
DC.subject	語音合成	zh_TW
DC.subject	語音轉換	zh_TW
DC.subject	非平行數據	zh_TW
DC.subject	遞歸神經網絡	zh_TW
DC.subject	對比學習	zh_TW
DC.subject	hard negative example	zh_TW
DC.subject	注意機制	zh_TW
DC.subject	Speech synthesis	en_US
DC.subject	voice conversion	en_US
DC.subject	non-parallel data	en_US
DC.subject	recurrent neural networks	en_US
DC.subject	contrastive learning	en_US
DC.subject	hard negative example	en_US
DC.subject	attention mechanism	en_US
DC.title	使用門控遞歸網絡和對比學習進行語音合成的非並行語音轉換：一種混合深度學習方法	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Non-Parallel Voice Conversion for Speech Synthesis using Gated Recurrent Networks and Contrastive Learning: A Hybrid Deep Learning Approach	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 104582603 完整後設資料紀錄