English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41262321      線上人數 : 240
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86332


    題名: 非平行語料庫基於生成注意力網路之語音轉換技術;Spectrum and Prosody Transformation for Non-parallel Voice Conversion with Generative Attentional Networks
    作者: 邱則維;Chiu, Tse-Wei
    貢獻者: 通訊工程學系
    關鍵詞: 語音轉換;生成對抗網路;注意力機制;非平行語料庫;Voice conversion;Generative Adversarial Networks;Attention;Non-parallel data
    日期: 2021-07-19
    上傳時間: 2021-12-07 12:34:03 (UTC+8)
    出版者: 國立中央大學
    摘要: 音轉換(Voice Conversion, VC)是一種較為複雜的技術,其目的為將原始語者的音色和音調做轉換,並保留語音內容,讓輸出後的結果聽起來像是目標語者所講出的。
    本篇論文使用了非平行的語料庫作為訓練數據,並提出加入注意力機制的循環生成對抗網路 (Cycle Generative Adversarial Network, Cycle-GAN) 用於語音轉換上,在轉換過程中能對不同語者特徵上的差異給予更多的權重,讓轉換時更能針對差異的地方做轉換,並保留較相似的片段。我們在架構中加入注意力模塊,並加入了新的損失函數用來更新網路。由於訓練生成對抗網路時會遇到不穩定的問題,因此我們針對鑑別器的損失函數部分,對真實樣本與生成後的樣本鑑別時給予不同的權重來改善。
    上述方法我們用於轉換頻譜包絡(音色)上,但我們也針對基本頻率(音調)嘗試使用生成對抗網路做轉換,並與原先轉換的方法做分析比較。最後從實驗結果表明在梅爾倒譜失真(Mel-Cepstral distortion, MCD)與平均意見分數(Mean Opinion Score, MOS)中,我們所提出語音轉換架構較基線系統好。
    ;Voice Conversion (VC) is a complex technology designed to convert the pitch and timbre of the original speaker and preserve the speech content, let the output sounds like what the target speaker said.
    This paper uses non-parallel corpus as training data, and proposes a Cycle Generation Adversarial Network (Cycle-GAN) with attention mechanisms for voice conversion, which can give more weight to differences in the characteristics of different speakers during the transformation process, so that the conversion can be made more closely to the differences, and some similarities are retained. We added attention modules to the architecture and new loss functions to update the network. Because we often encounter unstable problems in training GAN, we give different weights to real and generated samples for the loss function part of the discriminator.
    The above methods are used to transform the spectrum envelope, but we also try to convert using the GAN for the fundamental frequency and compare it with the original conversion method. Finally, the experimental results show that in Mel-Cepstral distortion (MCD) and Mean Opinion Score (MOS), we proposed voice conversion architecture is better than the baseline system.
    顯示於類別:[通訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML90檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明