中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/86332
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38468605      Online Users : 231
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/86332


    Title: 非平行語料庫基於生成注意力網路之語音轉換技術;Spectrum and Prosody Transformation for Non-parallel Voice Conversion with Generative Attentional Networks
    Authors: 邱則維;Chiu, Tse-Wei
    Contributors: 通訊工程學系
    Keywords: 語音轉換;生成對抗網路;注意力機制;非平行語料庫;Voice conversion;Generative Adversarial Networks;Attention;Non-parallel data
    Date: 2021-07-19
    Issue Date: 2021-12-07 12:34:03 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 音轉換(Voice Conversion, VC)是一種較為複雜的技術,其目的為將原始語者的音色和音調做轉換,並保留語音內容,讓輸出後的結果聽起來像是目標語者所講出的。
    本篇論文使用了非平行的語料庫作為訓練數據,並提出加入注意力機制的循環生成對抗網路 (Cycle Generative Adversarial Network, Cycle-GAN) 用於語音轉換上,在轉換過程中能對不同語者特徵上的差異給予更多的權重,讓轉換時更能針對差異的地方做轉換,並保留較相似的片段。我們在架構中加入注意力模塊,並加入了新的損失函數用來更新網路。由於訓練生成對抗網路時會遇到不穩定的問題,因此我們針對鑑別器的損失函數部分,對真實樣本與生成後的樣本鑑別時給予不同的權重來改善。
    上述方法我們用於轉換頻譜包絡(音色)上,但我們也針對基本頻率(音調)嘗試使用生成對抗網路做轉換,並與原先轉換的方法做分析比較。最後從實驗結果表明在梅爾倒譜失真(Mel-Cepstral distortion, MCD)與平均意見分數(Mean Opinion Score, MOS)中,我們所提出語音轉換架構較基線系統好。
    ;Voice Conversion (VC) is a complex technology designed to convert the pitch and timbre of the original speaker and preserve the speech content, let the output sounds like what the target speaker said.
    This paper uses non-parallel corpus as training data, and proposes a Cycle Generation Adversarial Network (Cycle-GAN) with attention mechanisms for voice conversion, which can give more weight to differences in the characteristics of different speakers during the transformation process, so that the conversion can be made more closely to the differences, and some similarities are retained. We added attention modules to the architecture and new loss functions to update the network. Because we often encounter unstable problems in training GAN, we give different weights to real and generated samples for the loss function part of the discriminator.
    The above methods are used to transform the spectrum envelope, but we also try to convert using the GAN for the fundamental frequency and compare it with the original conversion method. Finally, the experimental results show that in Mel-Cepstral distortion (MCD) and Mean Opinion Score (MOS), we proposed voice conversion architecture is better than the baseline system.
    Appears in Collections:[Graduate Institute of Communication Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML157View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明