中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90046
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 81570/81570 (100%)
造访人次 : 47023064      在线人数 : 182
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/90046


    题名: 基於語者特徵領域泛化之零資源語音轉換系統;Zero-shot Voice Conversion Based on Speaker Embedding Domain Generalization
    作者: 鄭俊祥;CHENG, CHUN-HSIANG
    贡献者: 資訊工程學系
    关键词: 語音轉換;語者編碼;語音合成;領域泛化;元學習;voice conversion;speaker embedding;text-to-speech;domain generalizationn;meta-learning
    日期: 2022-09-23
    上传时间: 2022-10-04 12:08:57 (UTC+8)
    出版者: 國立中央大學
    摘要: 近年來隨著深度學習的發展,讓人們開始可以進行一些天馬行空的想像,透過語音轉換的方式,將任何一位來源語者的聲音,只保留聲音中的語義資訊(如文字),將聲音中的語者資訊(如音高、語速、能量)轉換成另一位目標語者的聲音。然而,若要達到良好的轉換效果,就必須要有足夠的訓練資料對模型進行足夠的訓練,並且需要提升模型的泛化能力來提高模型對任何領域的推論效果。因此通常語音轉換任務在註冊語者(訓練時用過的語者資料)上的效果較好,而在未註冊語者(訓練時未用過的語者資料)上效果較差,雖然近年來也有研究朝向未註冊語者的語音轉換,但合成出的品質還是低於註冊語者的品質,因此本論文希望建構出一個零資源的中文語音轉換系統來改善語音轉換任務中未註冊語者的語音品質。
    本論文建構了一種零資源的語音轉換系統,主要透過有效地解耦語音當中的語義資訊及語者資訊來達成零資源的語音轉換,本論文讓模型分別透過預訓練之語音辨識模型Wav2vec 2.0模型提取來自於來源語者的語義資訊,以及透過WavLM模型提取來自於目標語者的語者資訊,再將目標語者的語者資訊透過Robust MAML模型將語者資訊映射到一個領域泛化(domain generalization)的空間中,使其能夠直接應用於任何未註冊的語者領域(unseen speaker domain),最後再透過遷移學習的方式,將語義資訊以及領域泛化之語者資訊經由語音合成模型FastSpeech2合成出目標語者的語音,以此建構出一個零資源的語音轉換系統。;In recent years, with the development of deep learning, people can start to have some wild imagination. Through the method of voice conversion, the voice of any source speaker will only retain the semantic information (such as text) in the voice, and the voice will be converted the speaker information (such as pitch, speed, energy) of source speaker into the speaker information of another target speaker. However, in order to achieve a good conversion effect, there must be enough training data to train the model enough, and the generalization ability of the model needs to be improved to improve the inference effect of the model in any data domain. Therefore, the speech conversion task usually performs better on registered speakers (speaker data used in training), but is less effective on unregistered speakers (speaker data not used in training), although in recent years there have research is aimed at the voice conversion of unregistered speakers, but the quality of the synthesis is still lower than that of registered speakers. Therefore, this paper hopes to construct a zero-resource Chinese voice conversion system to improve the voice quality of unregistered speakers in the voice conversion task..
    This paper constructs a zero-resource speech conversion system, which mainly achieves zero-resource speech conversion by effectively decoupling the semantic information and speaker information in the speech. In this paper, the model uses the pre-trained speech recognition model Wav2vec 2.0 model to extract the semantic information from the source speaker, and extract the speaker information from the target speaker through the WavLM model, and then map the speaker information of the target speaker to a domain generalization feature space through the Robust MAML model, it can be directly applied to any unregistered speaker domain (unseen speaker domain). finally, through transfer learning, the speech of target voice will be synthesized by the source speaker’s semantic information and target speaker’s speaker information through the FastSpeech2 model.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML105检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明