姓名 林立源(Li-yuan Lin)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱 結合高斯混合超級向量與微分核函數之 語者確認研究
(Combine GMM-Supervector and Derivative Kernel for speaker verification )
摘要(中) 近幾年來SVM已經被廣泛的使用在很多領域,而且有很好的效果。本論文採用NIST 2001語料庫,利用SVM來作與文字不相關的語者確認。SVM的語者確認系統,通常都是用動態核函數去處理語音自然的特性。而核函數可分為兩種,分別是參數式和微分式。本論文結合此兩種核函數來計算分數使系統的性能更好。
從UBM-MAP調適出的高斯混合語者模型參數透過GMM-supervector Kernel及Likelihood Kernel做映射,得到兩組超級向量,接著執行雜訊屬性補償(NAP)修正GMM-supervector,最後利用兩組超級向量分別訓練SVM 模型。而在仿冒者的選取上,則是選取與目標語者特徵最相似的前20名仿冒語音,使得訓練出來的SVM 模型更有鑑別力。測試時,先得到所有測試句的兩組超級向量之後,再依序分別對指定的SVM模型算分數,得到兩組分數。
從NIST 2001語料庫實驗結果顯示,64mixture的GMM-superveror Kernel (NAP)結合256mixture的derivative Kernel系統可達最好的相等錯誤率及決策成本函數分別為6.04%及0.0777,比起傳統語者確認模型的效能15.87%及0.1911,改善了9.8%及0.1135。
摘要(英) A Support vector machine-based speaker verification has become a standard approach in recent year. This thesis will evaluate the text-independent speaker verification on NIST 2001 SRE. The SVM speaker verification system usually uses dynamic kernels to handle the dynamic nature of the speech utterance. We can always separate dynamic kernel into two general classes, derivative and parametric. This paper will combine them in the score and space to promote the system’s performance.
From the UBM, we can use map to get the parameters of the GMM, by use of GMM-supervector Kernel and Likelihood Kernel to do the mapping which can get the two supervectors, and then we do the NAP process to modify the GMM-supervector. Finally we put these two supervector into the SVM for training the SVM model. About the imposters selection, we choose the top 20 speaker’s whose characteristics are similar to the target which can let the model become more discriminative. When testing, after we get all test speech’s two supervector, we use them to calculate the score with the specific model and get the score. Finally we combine two score together.
From the experiment on NIST 2001 SRE, we can find 64mixture GMM-supervector combined with a 256mixture derivative kernel result in better EER and DCF which are 6.05% and 0.1023 respectively. Comparing with the traditional speaker verification performance 15.87% and 0.1911,the proposed method obtains improvement 9.8% and 0.1135 respectively.
關鍵字(中) ★ 語者確認 關鍵字(英) ★ speaker verification
第一章 緒論
1.1 研究動機...................... 1
1.2 語者辨識概述.................... 2
1.3 研究方向...................... 4
1.4 章節概要...................... 4
第二章 語音處理與語者辨識基本技術
2.1 特徵參數擷取.................... 5
2.2 語者模型建立.................... 8
2.2.1 通用背景模型................. 8
2.2.2 高斯混合語者模型............... 10
2.2.3 語者模型訓練流程................12
2.2.4 向量量化....................13
2.2.5 EM 演算法...................16
2.3 語者模型調適.................... 18
2.4 語者確認...................... 21
2.5 相等錯誤率與偵測錯誤交易曲線圖........... 24
第三章 系統架構
3.1 序列核函數..................... 26
3.1.1 GMM超級向量................. 26
3.1.2 GMM超級向量線性核函數............ 28
3.1.3 雜訊屬性投影................. 30
3.1.4 證明段落補償方程式............... 32
3.1.5 微分核函數.................. 33
3.2 支撐向量機..................... 35
3.2.1 線性SVM分類器................. 35
3.2.2 資料不可分隔情形................ 40
3.2.3 SVM的核函數.................. 41
3.3 訓練支撐向量機模型................. 43
3.3.1 仿冒語者挑選.................. 44
3.4 語者確認系統.................... 45
3.4.1 GMM超級向量核函數語者確認系統.........45
3.4.2 結合GMM超級向量與微分核函數之語者確認系統..... 48
第四章 語者辨識實驗之研究
4.1 語音資料庫..................... 50
4.2 運用序列核函數於語者確認系統............ 52
4.2.1 實驗一 運用GMM超級向量核函數.........52
4.2.2 實驗二 運用NAP於GMM超級向量.........54
4.2.3 實驗三 運用微分核函數............. 56
4.2.4 實驗四 結合GMM超級向量與微分核函數..... 57
4.2.5 實驗五 與參考文獻比較............. 58
第五章 結論與未來展望
5.1 結論........................ 59
5.2 未來展望...................... 60
