語音合成及語者轉換之應用與設計

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：3.17.187.214

姓名

張敦鈞(CHANG,TUN-CHUN) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

語音合成及語者轉換之應用與設計
(Application and Design of Speech Synthesis and Speaker Conversion)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 基於語意之輿情分析系統	★ 高品質口述系統之設計與應用
★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測	★ 基於風格向量空間之個性化協同過濾服裝推薦系統
★ RetinaNet應用於人臉偵測	★ 金融商品走勢預測
★ 整合深度學習方法預測年齡以及衰老基因之研究	★ 漢語之端到端語音合成研究
★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進	★ 基於深度學習之指數股票型基金趨勢預測
★ 探討財經新聞與金融趨勢的相關性	★ 基於卷積神經網路的情緒語音分析
★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活	★ 運用LLM自動生成食譜方法與系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

本論文結合語音合成及語者轉換的技術做相關的應用與設計，語音合成是真人聲音經合成引擎轉成機器音，語者轉換是以原來語者的聲音為基礎，轉換為另一語者的聲型發聲。要使這兩個技術能應用於生活及娛樂上，需要設計系統來供實作，本系統的設計為，輸入文字，經語音合成產生出來源語者的聲模，再加上文脈相依資料，經合成軟體，擷取出來源語者的頻譜特徵參數，再將目的語者語音擷取出頻譜特徵參數。兩者的頻譜特徵參數，經DTW比對，產生音框特徵向量匹配表，經LBG演算法形成高斯混合模型，用EM演算法，做高斯混合模型訓練，再經由GMM對應參數的方法，當輸入來源語者的頻譜參數，會轉出輸目的語者的頻譜參數，另外，激發出來源語者的音高特徵參數，再與目的語者的頻譜特徵參數，經合成濾波器形成目的語者的合成音。本論文提出語音合成及語者轉換之多項應用與設計。

摘要(英)

The document combines speech synthesis and speaker conversion and these have relevant application and design. Speech synthesis is that the voice of real man is converted machine voice by synthesis engine. Speaker conversion is based on source speaker and it converts another voice of speaker. To let two techniques can be used in life and entertainment, it needs system to provide implement. The design of the system is that spectrum feather parameter of source speaker is extracted by synthetic software Data of text dependence produced by inputting words and voice model of source speaker input it. And the parameter of target speaker is extracted from voice of target speaker. Both of parameter generate the match table of feather vector of frame by DTW comparing, then GMM is formed by LBG algorithm. After that, using EM algorithm is order to train GMM. When finishing train, parameter correspondence method has transform function. When inputting source spectrum, target spectrum can be got. Besides, synthetic voice of target speaker is formed by speaker is formed by putting pitch feather parameter of source speaker excited and spectrum feather parameter of target speaker together through MLSA (Mel Log Spectrum Approximation). This document proposes many applications and designs of Speech Synthesis and Speaker Conversion.

關鍵字(中)

★ 語音合成
★ 語者轉換

關鍵字(英)

★ Speech Synthesis
★ Speaker Conversion

論文目次

中文摘要 i
英文摘要 ii
誌謝 iii
圖目錄 iv
表目錄 vi
章節目次 vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機與目的 1
第二章系統架構 3
2.1 架構流程圖 3
2.2 訓練階段 3
2.2.1 文字轉語音TTS 4
2.2.2 文脈相依資料 4
2.2.3 來源語者的聲音模型 12
2.2.4 特徵參數擷取 13
2.2.5 DTW比對 16
2.2.6 高斯混合模型訓練 19
2.3 測試階段 27
2.3.1 高斯模型對應參數 27
2.3.2 語音轉換 28
2.3.3 產生目的語者合成音 29
第三章系統實作 30
3.1 錄音 30
3.2 現有音檔 31
3.3 系統介面介紹 33
3.4 實作結果 34
第四章系統相關應用介紹 36
4.1 應用於電玩角色扮演 36
4.2 應用於配音 41
4.3 應用於父母得有聲故事書 43
4.4 應用於小孩玩具棒 44
第五章結論與展望 46
參考文獻 47

參考文獻

[1] 蔡松峯, GMM為基礎之語音轉換法的改進,台灣科技大學碩士論文, 2009

[2] Tomoki Toda,Alan W. Black,and Keiichi Tokuda,“Voice Conversion
Based on Maximum-LikelihoodEstimation of Spectral Parameter
Trajectory”,IEEE

[3] Examples for Using Speech Signal Processing Toolkit Ver3.9
http://sourceforge.mirrorservice.org/s/sp/sp-tk/SPTK/SPTK-3.9/SPTK examples-3.9.pdf

[4] REFERENCE MANUAL for Speech Signal Processing Toolkit Ver.3.9
http://sourceforge.mirrorservice.org/s/sp/sp-tk/SPTK/SPTK-3.9/SPTKref-3.9.pdf

[5] 林家緯,使用GMM轉換之背景伴奏消除及趨勢估計之歌曲音高軌跡追蹤 ,交通大學碩士論文, 2013

[6] 白育瑋,使用語言語聲學資訊之高斯混合模型語音轉換應用於可自定
文字轉語音系統, 成功大學碩士論文, 2013

[7] 李昀璋,基於HTS語音合成系統合回歸樹與回朔機制之頻譜係數基頻轉換之語者轉換系統, 成功大學碩士論文,2012

[8] MFCC梅爾倒頻譜係數提取詳解,https://read01.com/JD2NeR.html
[9] 維基百科,梅爾頻率倒譜係數,https://zh.wikipedia.org/wiki
[10] 張智星, Audio Signal Processing and Recognition (音訊處理與辨識),
, 12-2:mfcc
http://mirlab.org/jang/books/audiosignalprocessing/

[11] 張智星,8-4 Dynamic Time Warping
http://mirlab.org/jang/books/dcpr/

[12] GMM:高斯混合模型
http://www.cs.nccu.edu.tw/~whliao/acv2008/08gmm.pdf

[13] GMM:高斯混合模型(Gaussian Mixture Model)
waoffice.ee.kuas.edu.tw /download/建德研究所資料/七月課程/高斯混合模型/高斯混合模型.ppt

[14] 宋柏毅,以韻律模型為基礎之中文韻律轉換研究,交通大學碩士論文,2009
[15] 張智星, Audio Signal Processing and Recognition (音訊處理與辨識),
3-2 Basic Acoustic Features (基本聲學特徵)
http://mirlab.org/jang/books/audiosignalprocessing/

[16] 蔡松峰,使用分段式GMM 及自動GMM 挑選之語音轉換方法, 台灣科技大學碩士論文,2011

[17] 蔡昀庭,基於隱藏式馬可夫模型之中文語音合成系統, 清華大學碩士論文, 2009

[18] 維基百科,梅爾刻度
https://zh.wikipedia.org/wiki/%E6%A2%85%E5%B0%94%E5%88%BB% E5%BA%A6

[19] 賴名彥、蔡松峰,結合HMM頻譜模型與ANN抖音模型之國語歌聲合成,台灣科技大學碩士論文,2014

[20] 露營美工圖案與插圖
https:// shop.xilu.com/product/0002_2013_06_27_13_5780313.html

[21] APP01／大富翁4Fun！買地炒房現金一把抓 http://fashion.ettoday.net/news/139329
[22] 西遊回合，快樂交友——《神武2》手游
https://read01.com/0kA6jz.html

[23] 啟蒙有聲故事書
https:// shop.xilu.com/product/0002_2013_06_27_13_5780313.html

[24] 玩具咬咬棒(紓壓玩具)
https://tw.bid.yahoo.com/item/100151397123

[25] 張智星, Dynamic Time Warping (DTW), p.7
mirlab.org/jang/books/dcpr/slide/dtw.ppt

[26] Chi-Yueh Lin, National Tsing Hua University, Mandarin TTS using HTS toolkit , p.63

[27] 動態時間校正.ppt, p.3
waoffice.ee.kuas.edu.tw /download/建德研究所資料/七月課程/動態時間校正/動態時間校正.ppt

[28] 父母必讀！這樣說睡前故事，讓孩子一生受益！http://www.spicemami.com/thread-17403-1-1.html

[29] 溫馨童話(雙CD)：熊媽媽說故事
http://kingstone.com.tw/book/book_page.asp?kmcode=2015231644518& lid=book_class_sec_se&actid=WISE

[30] 頻域
https://zh.wikipedia.org/wiki/%E9%A0%BB%E5%9F%9F

[31] 蔡佳泓, Interval Estimation, 國立政治大學東亞所, April 15, 2014
http://www3.nccu.edu.tw/~tsaich/EastAsian/Week8_interval2015.pdf

指導教授

王家慶(Jia-Ching Wang)

審核日期

2016-8-25

推文