博碩士論文 100522075 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:56 、訪客IP:3.145.167.121
姓名 王光耀(Kuang-Yao Wang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 基於稀疏表示之語者辨識之研究
(A Study on Sparse Representation Based Speaker Recognition)
相關論文
★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 語者辨識一直以來都是語音研究中熱門的主題,其應用也相當廣泛,以門禁系統為代表。目前的研究中,以i-vector為參數的系統有相當好的效果,另外,在辨識領域上,稀疏表示分類器(Sparse Representation Classifier, SRC)是目前研究的主流,因此,我們以i-vector和SRC作為基礎的系統,提出改進辦法。
本論文提出一套基於稀疏表示為基礎的辨識系統,在原有的架構流程下加入改進方式,首先是參數擷取的部分,以PPCA建構Supervector,並加入檢定的方式調整特徵值選取,使每個Component的維度可以針對資料的不同作調整,接著,我們在稀疏字典上加強,提出字典主成分選取的辦法,並對Session及Channel變異補償,使字典增加鑑別性,第三個部分,噪音字典,提出三種蒐集變異量的方式,分別利用Robust PCA、NAP、JFA的概念分解出噪音項,並希望以噪音基底吸收變異,達到去噪的效果,最後,以貝氏機率為概念的Approximate Bayesian Compressed Sensing (ABCS) 求解係數,其中,對係數做Semi-Gaussian Prior的假設,限制係數稀疏的特性。
根據實驗結果顯示,不論是參數的改進、字典的處理、求解係數的方式,對辨識率都有一定程度的提升。
摘要(英) Speaker recognition has always been a popular topic in speech recognition research, and is applied in many area. Here, we take "Access Control System" as one of the applications. Currently, i-vector based speaker recognition system has achieved great performance. On the other hand, there are many researches concentrating on Sparse Representation Classifier (SRC). We thus base our system on those two novel concepts, i-vector and SRC, and propose some method to improve the system.
In respect of feature extraction, we construct a Supervector with Probability Principal Component Analysis (PPCA), and choose the number of eigenvalues by bartlett test, so that we can select appropriate dimension for each components. In the second part of the system, we enhance the sparse dictionary, which includes choosing primary elements of the dictionary, compensating session and channel variability, and making the dictionary discriminative. In the third part, we propose noise dictionary by collecting the noise of Robust PCA, Nuisance Attribute Projection (NAP) and Joint Factor Analysis (JFA). We believe that noise basis can absorb some variability and achieve the effect of de-noising. Finally, we solve sparse coefficients using Approximate Bayesian Compressed Sensing (ABCS), which is a bayesian probability method, and restrict the sparse coefficients by assuming them being Semi-Gaussian distribution.
Experimental results verify that the selected features, the dictionary processing, as well as the method for solving coefficients, has given improvement to the recognition rate up to a certain extent.
關鍵字(中) ★ 語者辨識
★ 稀疏表示
★ 機率型主成分分析
★ Supervector
★ i-vector
關鍵字(英)
論文目次 摘要 vi
Abstract vii
章節目次 viii
圖目錄 x
表目錄 xi
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 研究方法與章節概要 4
第二章 語者辨識簡介及文獻探討 6
2.1 簡介(Introduction) 6
2.2 特徵參數 7
2.2.1線性預測倒頻譜(Linear Predictive Cepstrum Coefficients, LPCC) 7
2.2.2梅爾倒頻譜(Mel-scale Frequency Cepstral Coefficients, MFCC) 8
2.2.3韻律學參數擷取(Prosodic Feature) 8
2.2.4高斯混和模型之超級向量(GMM-Supervector) 9
2.3 變異補償演算法與辨識器方法 10
2.3.1高斯混合模型(Gaussian Mixture Model, GMM) 10
2.3.2支持向量機(Support Vector Machine, SVM) 11
2.3.3核化函數(Kernel Function) 12
2.3.4擾動屬性投影(Nuisance Attribute Projection, NAP) 15
2.3.5聯合因素分析(Joint Factor Analysis, JFA) 17
2.3.6 ZT-norm 18
第三章 基於PPCA之超級向量擷取 19
3.1 簡介(Introduction) 19
3.2 高斯混合模型之超級向量(GMM-Supervector) 19
3.3 基於機率型主成分分析之因素分析模型 20
3.4 巴雷特檢定(Bartlett Test) 23
3.5 i-vector 24
3.6 參數擷取架構 25
第四章 基於稀疏表示之語者辨識 26
4.1 簡介(Introduction) 26
4.2 稀疏表示分類器(Sparse Representation Classifier, SRC)
26
4.3 字典處理及變異補償 28
4.4 噪音字典 33
4.5 Approximate Bayesian Compressed Sensing(ABCS) 35
第五章 實驗結果 38
5.1 實驗設置與環境 38
5.2 PPCA-Supervector與基礎方法比較 39
5.3 字典處理及變異補償之效能比較 40
5.3.1對字典以SVD及RPCA建構之效果 40
5.3.2 NAP對變異補償之效果 41
5.3.3 Kernel SRC之效果 42
5.4 噪音字典對SRC效果的影響 43
5.5 ABCS求解係數之效果 44
第六章 結論及未來研究方向 45
參考文獻 46
參考文獻 [1] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture models,” IEEE Trans. Speech Audio Process., vol. 3, no. 1, pp. 72–83, Jan. 1995.
[2] B. L. Pellom and J. H. L. Hansen, “An efficient scoring algorithm for Gaussian mixture model based speaker identification,” IEEE Signal Process. Lett., vol. 5, no. 11, pp. 281–284, Nov. 1998.
[3] W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, “Support vector machines for speaker and language recognition,” Comput. Speech Lang., vol. 20, pp. 210–229, 2006.
[4] J. L. Gauvain and C. H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 291–298, 1994.
[5] M. R. Hasan, M. Jamil, M. G. Rabbani, and M. S. Rahman, “Speaker identification using Mel frequency cepstral coefficients,” 3rd international Conference on Electrical & Computer Engineering ICECE 2004, 28-30 December 2004, Dhaka, Bangladesh.
[6] H. Hermansky. “Perceptual Linear Predictive (PLP) Analysis of Speech,” Journal of the Acoust. Society ofAmer., 87: 1738- 1752, April, 1990.
[7] T. Kinnunen, V. Hautamäki, and P. Fränti, “Fusion of spectral feature sets for accurate speaker identification,” in Proc. 9th Conf. Speech Comput., St. Petersburg, Russia, 2004, pp. 361–365.
[8] W. Campbell, D. Sturim, D. Reynolds, and A. Solomonoff, “SVM based speaker verification using a GMM supervector kernel and nap variability compensation,” in Proc. ICASSP, Toulouse, France, 2006, pp. 97–100.
[9] W. Campbell, D. Sturim, and D. Reynolds, “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Process. Lett., vol. 13, no. 5, pp. 308–311, May 2006.
[10] T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: From features to supervectors,” Speech Commun., vol. 52, no. 1, pp. 12–40, 2010.
[11] B. G. B. Fauve, D. Matrouf, N. Scheffer, J.-F. Bonastre, and J. S. D. Mason, “State-of-the-art performance in text-independent speaker verification through open-source software,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 1960–1968, 2007.
[12] C. H. You, K. A. Lee, and H. Li, “An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition,” IEEE Signal Process. Lett., vol. 16, no. 1, pp. 49–52, Jan. 2009.
[13] T. Kailath. The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communications Technology, 15(1):52–60, 1967.
[14] Chang Huai You, Kong Aik Lee, and Haizhou Li, “A gmm supervector kernel with the bhattacharyya distance for svm based speaker recognition,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, april 2009, pp. 4221 –4224.
[15] A. Solomonoff, W. M. Campbell, and C. Quillen, “Channel compensation for SVM speaker recognition,” in Proc. Odyssey04, 2004, pp. 57–62.
[16] O. Glembek, L. Burget, N. Brummer, and P. Kenny, “Comparison of scoring methods used in speaker recognition with joint factor analysis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Taipei, Taiwan, Apr. 2009, pp. 4057–4060.
[17] P. Kenny, P. Ouellet, N. Dehak, V. Gupta, and P. Dumouchel, “A study of interspeaker variability in speaker verification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp. 980–988, Jul. 2008.
[18] A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, and M. Mason, “i-vector based Speaker Recognition on Short Utterances,” in Interspeech, 2011.
[19] P. Matejka, O. Glembek, F. Castaldo, O. Plchot, P. Kenny, L. Burget, and J. Cernocky, “Full-covariance ubm and heavy-tailed plda in i-vector speaker verification,” Proc. ICASSP ’11, pp. 4828–4831, 2011.
[20] J.M.K. Kua, J. Epps, E. Ambikairajah, “i-vector with sparse representation classification for speaker verification, ” Speech Commun, 2013.
[21] K. Huang and S. Aviyente, “Sparse Representation for Signal Classification,” Neural Information Processing Systems, 2006.
[22] J. M. K. Kua, E. Ambikairajah, J. Epps, and R. Togneri, “Speaker verification using sparse representation classification,” in Proc. ICASSP, May 2011, pp. 4548–4551.
[23] R. Saeidi, A. Hurmalainen, T. Virtanen, and D. A. van Leeuwen, “Exemplar-based Sparse Representation and Sparse Discrimination for Noise Robust Speaker Identification,” in Odyssey speaker and language recognition workshop, Singapore, 2012.
[24] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. PP, no. 99, 2010.
[25] “The NIST year 2005 speaker recognition evaluation plan,” 2008. [Online]. Available: http://www.nist.gov
[26] D. A. Reynolds, T. F. Quatieri, and R. Dunn, “Speaker verification using adapted Gaussian mixture models,” Dig. Signal Process., vol. 10, no. 1-3, pp. 19–41, 2000.
[27] D. A. Reynolds, T. F. Quatieri, and R. Dunn, “Speaker verification using adapted Gaussian mixture models,” Dig. Signal Process., vol. 10, no. 1-3, pp. 19–41, 2000.
[28] T. Hasan and J. H. L. Hansen, “Factor analysis of acoustic features using a mixture of probabilistic principal component analyzers for robust speaker verification,” in Proc. Odyssey, Singapore, Jun. 2012.
[29] M. Tipping and C. Bishop, “Mixtures of probabilistic principal component analyzers,” Neural Computation, vol. 11, no. 2, pp. 443–482, 1999.
[30] 李孟穎,「感知因素分析法應用於語音強化」,成功大學資訊工程學系博士論文,2004年。
[31] M.S. Bartlett, “Tests of significance in factor analysis,” British Journal of Psychology, Statistical Section 3, 77–85, 1950
[32] J. Wright, A. Ganesh, S. Rao, and Y. Ma. Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization. Submitted to the Journal of the ACM, 2009.
[33] C. F. Chen, C. P. Wei, and Y.C. F. Wang, “Low-rank matrix recovery with structural incoherence for robust face recognition,” in Proc. IEEE Conf. Comput. Vis. Patt. Recogn. (CVPR), Providence, RI, USA, Jun. 2012, pp. 2618–2625.
[34] L. Zhang, W. D. Zhou, P. C. Chang, J. Liu, Z. Yan, T. Wang, and F. Z. Li, “Kernel sparse representation-based classifier,” IEEE Trans. Signal Processing, vol. 60, no. 4, pp. 1684–1695, Apr. 2012.
[35] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol.52, no. 4, pp. 1289–1306, Apr. 2004.
[36] A. Carmi, P. Gurfil, D. Kanevsky, and B. Ramabhadran, “ABCS: Approximate Bayesian Compressed Sensing,” Tech. Rep., Human Language Technologies, IBM, 2009.
[37] T. N. Sainath, A. Carmi, D. Kanevsky, and B. Ramabhadran, “Bayesian compressive sensing for phonetic classification,” in Proc. Int. Conf. Audio, Speech, Signal Process., 2010, pp. 4370–4373.
[38] A. Kanagasundaram, D. Dean, R. Vogt, M. McLaren, S. Sridharan, M. Mason, “Weighted LDA techniques for i-vector based speaker verification,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4781–4784, 2012.
[39] S. Mikat, G. Fitscht, J. Weston!, B. Scholkopft, and K.-R. Mullert, “Fisher discriminant analysis with kernels,” in Proc. 1999 IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing, Madison, Wisconsin, United States, 1999, Aug. 23–25, pp. 41–48.
[40] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proc. 27th Annu. Asilomar Conf. Signals Syst. Comput., Nov. 1993, vol. 1, pp. 40–44.
[41] S. Mallat, Z. Zhang, “Adaptive time-frequency decomposition with matching pursuits”. IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis, pp.7–10, 1992.
[42] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Signal Process., vol. 56, pp. 2346–2356, 2008.
指導教授 王家慶 審核日期 2013-8-27
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明