中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/93381
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41245021      線上人數 : 822
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93381


    題名: 使用門控遞歸網絡和對比學習進行語音合成的非並行語音轉換:一種混合深度學習方法;Non-Parallel Voice Conversion for Speech Synthesis using Gated Recurrent Networks and Contrastive Learning: A Hybrid Deep Learning Approach
    作者: 比⾺特;Prihasto, Bima
    貢獻者: 資訊工程學系
    關鍵詞: 語音合成;語音轉換;非平行數據;遞歸神經網絡;對比學習;hard negative example;注意機制;Speech synthesis;voice conversion;non-parallel data;recurrent neural networks;contrastive learning;hard negative example;attention mechanism
    日期: 2023-07-29
    上傳時間: 2024-09-19 16:56:50 (UTC+8)
    出版者: 國立中央大學
    摘要: 這篇論文對語音處理做出了重大貢獻,特別是在語音合成和語音轉換方面。 這個貢獻分為三個主要部分。 首先,已經確定基於 RNN 的模型適用於解決語音合成問題,但是計算時間長仍然是一個問題。 本論文在對 MGU 進行修改的基礎上,成功地構建了一種新的 RNN 架構,從 MGU 的一些方程中去除了單元狀態歷史。 這種基於 MGU 的新架構的速度是其他基於 MGU 的架構的兩倍,但仍能產生同等質量的聲音。 兩種對比學習之前都解決了非平行語音轉換問題,但是聲音合成結果並不理想。 這是因為沒有保留聲源的信息內容,無法調整音色和韻律來匹配目標聲音。 本論文介紹了一種硬性反例的對比學習方法,稱為CNEG-VC。 該技術基於語音輸入生成實例方面的負面示例,並使用對抗性損失來生成硬負面示例,從而提高非並行語音轉換的性能。 最後,論文提出了在頻譜特徵中使用選擇性注意作為非並行語音轉換中對比學習的錨點,稱為 CSA-VC。 該技術基於對每行概率分佈的測量來選擇查詢,並使用減少的注意力矩陣來確保在合成中保留源關係。;This dissertation has made a substantial contribution to speech processing, particularly in speech synthesis and voice conversion. There are three main parts to this contribution. Firstly, it has been established that RNN-based models are suitable for solving speech synthesis problems, however long computing time is still an issue. This dissertation successfully built a new RNN architecture based on modifications to the MGU, which removes the unit state history from some equations in the MGU. This new MGU-based architecture is twice as fast as the other MGU-based architectures yet still produce a sound of equal quality. Secondly, contrastive learning has previously solved non-parallel voice conversion problems, but the sound synthesis results were unsatisfactory. This is because the information content of the sound source was not preserved and the timbre and prosody could not be adjusted to match the target sound. This dissertation introduced a hard negative examples approach in contrastive learning, called CNEG-VC. This technique generates instance-wise negative examples based on the voice input and uses an adversarial loss to produce hard negative exam- ples, resulting in an improved performance in non-parallel voice conversion. Finally, the dissertation proposed the use of selective attention in spectral features as an anchor point for contrastive learning in non-parallel voice conversion, called CSA-VC. This technique selects a query based on the measurement of the probability distribution of each line and uses the reduced attention matrix to ensure that source relations are preserved in the synthesis.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML13檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明