中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/86818
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 41645737      Online Users : 1501
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/86818


    Title: EPG2S:基於電子硬顎圖訊號的語音生成技術;EPG2S: Speech Synthesis Technology Based on Electropalatography Signal
    Authors: 陳柏勳;Chen, Po-Hsun
    Contributors: 資訊工程學系
    Keywords: 多模態;電子硬顎圖;語音合成;語音增強;multimodal;electropalatography;speech synthesis;speech enhancement
    Date: 2021-09-27
    Issue Date: 2021-12-07 13:16:05 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 使用發音的運動資訊合成語音,能為現實應用帶來益處,例如聲帶受損的病患、需要靜音通話的場景,或是在高噪音的環境中。在這項研究中,我們探索了另類數據,即電子硬顎圖 (Electropalatography, EPG),並提出了一種新穎的多模態 EPG 轉語音 (EPG-to-Speech, EPG2S) 合成系統。我們的模型有兩項目標:(1) 僅使用 EPG 信號合成語音。 (2) 如果我們可以在有噪聲的環境中同時獲得語者的語音信號,我們就可以利用 EPG 信號進行語音增強 (SE)。在 EPG2S 系統中我們研究了兩種融合策略,分別為後期融合 (Late Fusion, LF) 和早期融合 (Early Fusion, EF)。在漢語語料庫上的實驗結果表明,第一個目標中,與加入真實世界噪聲的語音相比,所提出的多模態 EPG2S 系統平均皆優於 SNR 為 -5dB 或更低的背景噪聲。第二個目標中,這些系統在 PESQ、STOI 和 ESTOI 這些語音評估指標中,優於僅使用語音訊號的 SE 系統。這些結果驗證了使用 EPG 信號合成語音的可行性以及將其納入 SE 系統的有效性。;Synthesized speech from articulatory movement can bring benefits to patients with vocal cord disorders, situations requiring silence, or in high-noise environments. In this study, we explore alternative data, namely electropalatography (EPG), and propose a novel multimodal EPG-to-speech (EPG2S) synthesis system. Our model has two goals: (1) Synthesize speech using only EPG signal. (2) If we can obtain the speaker′s audio signal in a noisy environment simultaneously, we can perform speech enhancement (SE) by leveraging the EPG signal. Two fusion strategies are investigated for the EPG2S system, namely late fusion (LF) and early fusion (EF). Experimental results on a Mandarin corpus. In the first goal, compared to speech with real-world noises, the proposed multimodal EPG2S systems outperform background noise at an SNR level of -5dB or lower on average. In the second goal, these systems outperform the audio-only SE counterparts in PESQ, STOI, and ESTOI speech evaluation metrics. These results verify the feasibility of using EPG signals to synthesize speech and the effectiveness of incorporating it into the SE system.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML100View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明