利用語者特定背景模型之語者確認系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：26

、訪客IP：18.227.134.133

姓名

蘇仲潔(Zhong-Jie Su) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

利用語者特定背景模型之語者確認系統
(Speaker Verification using Speaker Dependent Background Models)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 智慧型遠端監控系統	★ 正向系統輸出回授之穩定度分析與控制器設計
★ 混合式區間搜索粒子群演算法	★ 基於深度神經網路的手勢辨識研究
★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統	★ 非監督式快速語者調適演算法研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

在一般的語者確認系統中，有兩種背景語者模型的選取方法，分別為通用背景模型(Universal Background Model,UBM)與反語者模型(Anti-SpeakerModel)，其兩種方法都存在著各自的缺點，將會使系統的效能下降。所以在本論文中我們將對背景語者模型進行研究與改良，主要以改善以上兩種背景語者模型之缺點、提升辨識效能為目的，並提出背景語者模型之新準則，以及找出一個符合此準則的目標函式，然後將每個由通用背景模型所調適出來的語者特定模型，分別建立其專屬的背景語者模型，在本論文中我們稱這些符合新準則的模型為語者特定背景模型(Speaker Dependent Background Model, SDBM)。語者特定背景模型將可改善傳統反語者模型與通用背景模型的部分缺點，並增進語者確認的效果，其效果將以實驗來予以驗證。

摘要(英)

Universal background model (UBM) and anti-speaker model are two methods of background models for a speaker verification system in general. But they existed a few problems.
Therefore we propose two criteria for determining background models. The created new background model is called the speaker dependent background model (SDBM). The results of experiments show that the SDBM improves the performance of the UBM and anti-speaker model approaches.

關鍵字(中)

★ 語者確認
★ 通用背景模型
★ 反語者模型

關鍵字(英)

★ Speaker Verification
★ Universal Background Model
★ Anti-Model

論文目次

摘要 I
Abstract II
圖目錄 V
表目錄 VI
第一章緒論 1
1.1 研究動機與背景 1
1.2 語者辨識架構概述 4
1.3 語者調適概述 6
1.4 文獻探討 7
1.4.1 最小錯誤鑑別式(Minimum Classification Error, MCE) 7
1.4.2 聯合因子分析(Joint Factor Analysis, JFA) 10
1.4.3 UB-DNorm 15
1.4.4 測試正規化(Test Normalization, TNorm) 16
1.4.5 DPSO調適法 17
1.4.6 RWRS確認系統 19
1.5 研究方向 21
1.6 章節概要 23
第二章語者辨識基礎之技術 24
2.1 預處理 25
2.1.1 取音框 26
2.1.2 預強調 27
2.1.3 特徵參數擷取 27
2.2 語者模型之訓練 28
2.2.1 高斯混合模型 29
2.2.2 向量量化 31
2.2.3 EM演算法 34
2.3 語者模型之調適 36
2.4 語者確認端 39
第三章背景語者模型 41
3.1 反語者模型 41
3.1.1 反語者模型選擇方法 43
3.2 通用背景模型 43
第四章語者特定背景模型 45
4.1 SDBM準則 46
4.2 SDBM調適法 47
4.2.1 競爭語者選取法 50
4.2.2 SDBM目標函式 50
4.2.3 綜合機率減少演算法 52
4.3 SDBM語者確認系統 54
第五章實驗結果與討論 55
5.1 語音資料庫 55
5.2 效能評估 56
5.2.1 相等錯誤率 56
5.2.2 決策成本函數 57
5.3 實驗結果 58
5.3.1 實驗一四種調適法之效能比較 58
5.3.2 實驗二四種調適法之速度比較 61
5.3.3 實驗三五種確認系統之效能比較 62
5.3.4 實驗四五種確認系統之速度比較 64
第六章結論與未來展望 66
6.1 結論 66
6.2 未來展望 67
參考文獻 68

參考文獻

[1]S. Furui, “An Overview of Speaker Recognition Technology,” ESCA Workshop on Automatic Speaker Recognition, Identification, pp. 1–9, 1994.
[2]呂易宸,“語音門禁系統,” 中央大學碩士論文, 民國100年.
[3]J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech and Audio Processing, vol.2, pp. 291-298, 1994.
[4]D.A. Reynolds, T.F. Quatieri, and R.B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol.10, pp. 19-41, 2000.
[5]B.H. Juang and S. Katagiri, “Discriminative Learning for Minimum Error Classification [Pattern Recognition],” Signal Processing, IEEE Transactions on, vol.40, pp.3043-3054, 1992.
[6]Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda and T.Kitamura, “Minimum Classification Error Interactive Training for Speaker Identification” IEEE International Conference on Acoustics, Speech, and Signal Processin(ICASSP), vol.1, pp. 641-644, 2005.
[7]P. Kenny, G. Boulianne, P. Ouellet and P. Dumouchel, “Joint Factor Analysis Versus Eigenchannels in Speaker Recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol.15, pp.1435-1447, 2007.
[8]李普、郭武、戴礼荣，” 联合因子分析算法中基于信号子空间的空间变换方法”，中國科學技术大學电子工程与信息科學系语音及语言信息处理国家工程实验室合肥230027)第26卷，第8期，2013年。
[9]B. Yegnanarayana and S. P. Kishore, “AANN: An Alternative to GMM for Pattern Recognition,” Neural Network, vol.15, pp. 459–469, 2002.
[10]S. Garimella, S.H. Mallidi and H. Hermansky, “Regularized Auto-Associative Neural Networks for Speaker Verification,” Signal Processing Letters, IEEE, vol. 19, pp.841-844, 2012.
[11]M. Azeem, M. Hanmandlu, and N. Ahmad, “Generalization of Adaptive Neuro-Fuzzy Inference Systems,” IEEE Trans. Neural Network, vol.11, pp. 1332–1346, 2000.
[12]E. Mamdani, “Application of Fuzzy Logic to Approximate Reasoning using Linguistic Synthesis,” IEEE Trans. Comput., vol.C-26, pp. 1182–1191, 1977.
[13]P. Martin Larsen, “Industrial Applications of Fuzzy Logic Control,” Int. J.Man-Mach. Stud., vol.12, pp. 3–10, 1980.
[14]T. Takagi and M. Sugeno, “Fuzzy Identification of Systems and its Applications to Modeling and Control,” IEEE Trans. Syst., Man Cybern., vol.SMC-15, pp. 116–132, 1985.
[15]S. Bhardwaj, S. Srivastava, M. Hanmandlu and J.R.P. Gupta, “GFM-Based Methods for Speaker Identification,” Cybernetics, IEEE Transactions on, vol.43, pp.1047-1058, 2013.
[16]S.J.D. Prince and J.H. Elder, “Probabilistic Linear Discriminant Analysis for Inferences about Identity,” Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, vol., pp.1-8, 2007.
[17]L. Burget, O. Plchot, S. Cumani, O. Glembek, P. Matejka and N. Brummer, “Discriminatively Trained Probabilistic Linear Discriminant Analysis for Speaker Verification,” Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, vol., pp.4832-4835, 2011.
[18]A. Kanagasundaram, D. Dean and S. Sridharan, “Improving PLDA Speaker Verification with Limited Development Data,” Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, vol., pp.1665-1669, 2014.
[19]C. B. de Lima, A. Alcaim and J. A. Apolinario, “On the use of PCA in GMM and AR-Vector Models for Text Independent Speaker Verification,” 14th International Conference on Digital Signal Processing, vol.2, pp. 595-598, 2002.
[20]H. J. Song and H. S. Kim, “Bilinear Model-Based Maximum Likelihood Linear Regression Speaker Adaptation Framework,” IEEE Signal Processing Letters, vol.16, pp. 1063-1066, 2009.
[21]C. H. Huang, J. T. Chien and H. M. Wangb, “A New Eigenvoice Approach to Speaker Adaptation,” International Symposium on Chinese Spoken Language Processing, pp. 109-112, 2004.
[22]M. Tonomura, T. Kosaka and S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Field Smoothing using Maximum a Posteriori Probability Estimation,” International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp. 688-691, 1995.
[23]M. Ben, R. Blouet and F. Bimbot, “A Monte-Carlo Method for Score Normalization in Automatic Speaker Verification using Kullback-Leibler Distances,” IEEE International Conference on Acoustics, Speech and Signal Processing, vol.1, pp. I-689-I-692, 2002.
[24]D. Yuan, L. Liang, Z. Xian-Yu and Z. Jian, “Studies on Model Distance Normalization Approach in Text-independent Speaker Verification,” Acta Automatica Sinica, vol.35, pp. 556-560, 2009.
[25]R. Auckenthaler, M. Carey and H. Lloyd-Thomas, “Score Normalization for Text-Independent Speaker Verification Systems,” Digital Signal Processing, vol.10, pp. 42-54, 2000.
[26]吳昱宏, “粒子群演算法應用於語者模型訓練與調適之研究” 中央大學碩士論文, 民國101年.
[27]蘇樺, “粒子群演算法之語者確認系統” 中央大學碩士論文, 民國103年.
[28]D. Burton, “Text-Dependent Speaker Verification Using Vector Quantization Source Coding,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol.35, pp. 133-143, 1987.
[29]Y. Bennani, “Text-Independent Talker Identification System Combining Connectionist and Conventional Models,” Neural Networks for Signal Processing [1992] II., Proceedings of the 1992 IEEE-SP Workshop, vol., pp.131-138, 1992.
[30]B. Chen, J. W. Kuo and W. H. Tsai, “Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp. I-777-80, 2004.
[31]M. Bacchiani and B. Roark, “Unsupervised Language Model Adaptation,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp. I-224 - I-227, 2003.
[32]F. Soong, A. Rosenberg, L. Rabiner and B.H. Juang, “A Vector Quantization Approach to Speaker Recognition,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ′85, vol.10, pp.387-390, 1985.
[33]A. R. Richard and F. W. Homer, “Mixture Densities, Maximum Likelihood and the Em Algorithm,” Society for Industrial and Applied Mathematics, vol.26, pp. 195-239, 1984.
[34]CS229 Lecture notes Andrew Ng “The EM algorithm” http://cs229.stanford.edu/notes/cs229-notes8.pdf.
[35]Y. Wang, “Initialization in Speaker Model Training Based on Expectation Maximization,” Image and Signal Processing (CISP), 2013 6th International Congress on, vol.03, pp.1309-1313, 2013.
[36]S. Memon, M. Lech, N. Maddage, “Information Theoretic Expectation Maximization Based Gaussian Mixture Modeling for Speaker Verification,” Pattern Recognition (ICPR), 2010 20th International Conference on, vol., pp.4536-4540, 2010.
[37]A. R. Douglas, F. Q. Thomas and B. D. Robert, “Speaker Veriﬁcation Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol.10, pp. 19-41, 2000.
[38]CS229 Lecture notes Andrew Ng “Mixtures of Gaussians and the EM algorithm” http://cs229.stanford.edu/notes/cs229-notes7b.pdf.
[39]Y. Linde, A. Buzo, R.M. Gray, “An Algorithm for Vector Quantizer Design,” Communications, IEEE Transactions on, vol.28, pp.84-95, 1980.
[40]T. Hao, S.M. Chu, T.S. Huang, “Generative model-based speaker clustering via mixture of von Mises-Fisher distributions,” Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, vol., no., pp.4101-4104, 2009.
[41]A. E. Rosenberg, J. Delong, C. H. Lee, B. H. Juang and F. K. Soong, “The Use of Cohort Normalized Scores for Speaker Recognition,” Pro. ICSL 92. Banff, pp. 599-602. 1992.
[42]C.S. Liu, H.C. Wang and C.H. Lee, “Speaker Verification using Normalized Log-Likelihood Score,” IEEE Trans.on Speech and Audio Processing, pp 57-60, 1996.
[43]吳金池,“語者辨識系統之研究” 中央大學碩士論文, 民國91年.
[44]Y.H Chao, W.H. Tsai and H.M. Wang, “Discriminative Feedback Adaptation for GMM-UBM Speaker Verification,” Chinese Spoken Language Processing, 2008. ISCSLP ′08. 6th International Symposium on, vol., pp.1-4, 2008.
[45]L. Bottou,“Stochastic Gradient Descent Tricks,” Microsoft Research, Red-mond, WA leon@bottou.org http://leon.bottou.org Abstract.
[46]The NIST Year 2001 Speaker Recognition Evaluation, Available at http://www.itl.nist.gov/iad/mig/tests/sre/2001/index.html.
[47]J. Kennedy and R. Eberhart, “Particle Swarm Optimization,” IEEE International Conference on Neural Networks, vol.4, pp.1942-1948, 1995.
[48]M. Zambrano-Bigiarini, M. Clerc, R. Rojas, “Standard Particle Swarm Optimisation 2011 at CEC-2013: a Baseline for Future PSO Improvements,” Evolutionary Computation (CEC), 2013 IEEE Congress on, vol., pp.2337-2344, 2013.
[49]“Genetic Algorithms: Theory and Applications”, Lecture Notes Second Edition — WS 2001/2002 by Ulrich Bodenhofer.
[50]D. E. Goldberg, “Genetic Algorithm in Search, Optimization and Machine Learning,” AddisonWesley Publishing Company, 1989.

指導教授

莊堯棠(Yau-tarng Juang)

審核日期

2015-7-13

推文