中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/95472
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 42410390      在线人数 : 1420
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95472


    题名: Research on Applying Diffusion Model Decoders in Timbre Transformation Systems A Case Study of Erhu Timbre
    作者: 劉秉澤;LIU, BING-ZE
    贡献者: 資訊工程學系
    关键词: 擴散模型;音色轉換;音高編碼器;響度編碼器;FAD;Diffusion;Timbre Change;Pitch Encoder;Loudness Encoder;FAD
    日期: 2024-07-12
    上传时间: 2024-10-09 16:53:23 (UTC+8)
    出版者: 國立中央大學
    摘要: 在本研究中,我們提出了一種基於 Diffusion 架構的音色轉換模型,該模型旨在將多
    種樂器演奏的樂曲轉換為二胡演奏版本。我們的模型通過 Pitch Encoder 和 Loudness
    Encoder 擷取樂曲的音高和響度特徵,並將這些特徵作為條件輸入至 Diffusion Model
    base 的 Decoder 中,以生成高品質的二胡音色樂曲。在實驗部分,我們系統地評估了模
    型的性能,包括音高準確性(Pitch Accuracy)、餘弦相似度(Cosine Similarity)和弗雷
    歇音頻距離(Fréchet Audio Distance)。結果表明,我們的模型在音高準確性方面達到了
    95% 至 96% 的高準確率,並且生成的二胡音色與真實二胡演奏接近。此外,通過消融
    實驗驗證了 Loudness Encoder 在模型中的重要性,確保了模型在無聲輸入時能夠正確地
    生成無聲音波。本研究展示了基於 Diffusion 架構的音色轉換模型在音樂生成領域的潛
    力,為未來的音樂生成和音色轉換研究提供了新的思路。;In this study, we propose a timbre transfer model based on the Diffusion architecture, which
    aims to convert musical pieces performed by various instruments into erhu performances. Our
    model utilizes Pitch Encoder and Loudness Encoder to extract the pitch and loudness features of
    the music, and these features are then used as conditions input into the Diffusion Model-based
    Decoder to generate high-quality erhu timbre music. In the experimental section, we system atically evaluated the model’s performance, including Pitch Accuracy, Cosine Similarity, and
    Fréchet Audio Distance. The results show that our model achieved a high pitch accuracy of 95%
    to 96% and that the generated erhu timbre closely matches the real erhu performances. Further more, ablation experiments confirmed the importance of the Loudness Encoder, ensuring that
    the model correctly generates silent waveforms when given silent inputs. This study demon strates the potential of Diffusion architecture-based timbre transfer models in the field of music
    generation, providing new insights for future research in music generation and timbre transfer.
    显示于类别:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML26检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明