參考文獻 |
[1] J. P. Campbell, “Speaker Recognition: A Tutorial,” Proceedings of the IEEE, Volume 85, Issue 9, Sept. 1997, 1437-1462.
[2] M. Faundez-Zanuy and E. Monte-Moreno, “State-of-the-Art in Speaker Recognition,” IEEE Aerospace and Electronic Systems Magazine, Volume 20, Issue 5, March 2005, 7-12.
[3] R. Mammone, X. Zhang, and R. Ramachandran, “Robust Speaker Recognition – A Feature-based Approach,” IEEE Signal Processing Magazine, Sept. 1996, 58-71.
[4] H. A. Murthy, F. Beaufays, L. P. Heck, and M. Weintraub, “Robust Text-Independent Speaker Identification over Telephone Channels,” IEEE Trans. Speech Audio Processing, Volume 7, No. 5, September 1999.
[5] J. Pelecanos and S. Sridharan, “Feature Warping for Robust Speaker Verification,” Proc. A Speaker Odyssey, 2001.
[6] D. A. Reynolds, “Channel Robust Speaker Verification via Feature Mapping,” Proc. ICASSP 2003, Volume 2, 2003, II – 53-6.
[7] R. Teunen, B. Shahshahani, and L. P. Heck, “A Model Based Transformational Approach to Robust Speaker Recognition,” Proc. ICSLP'2000, vol.2, pp. 495-498, 2000.
[8] Y. F. Liao, J. H. Yang, Z. X. Zhuang, and S. H. Chen, “A Priori Knowledge Interpolation-based Approach for Handset Mismatch-Compensated Speaker Identification,” submitted to IEEE Transactions on Audio, Speech and Language Processing.
[9] Jyh-Her Yang and Yuan-Fu Liao, “Unseen Handset Mismatch Compensation Based On Feature/Model-Space A Priori Knowledge Interpolation For Robust Speaker Recognition”, ISCLSP, pp. 65 – 68, 2004.
[10] D. A. Reyolds, T. F. Quatieri, and R. B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, Volume 10, Jan. 2000, 19-41.
[11] M. K. Sonmez, L. Heck, M. Weintraub, and E. Shriberg, “A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition,” Proc. EUROSPEECH 1997 (Rhodes, Greece), Volume 3, September 1997, 1391-1394.
[12] M. J. Carey, E. S. Parris, H. Lloyd-Thomas, and S. Bennet, “Robust Prosodic Features for Speaker Identification,” Proc. ICSLP 1996, 1996, 1800-1803.
[13] K. Sonmez, E. Shriberg, L. Heck, and M. Weintraub, “Modeling Dynamic Prosodic Variation for Speaker Verification,” In R. H. Mannell and J. Robert-Ribes (Eds.), Proc. ICSLP 1998 (Sydney), Volume 7, 1998, 3189-3192.
[14] A. G. Adami, R. Mihaescu, D. A. Reynolds, and J. J. Godfrey, “Modeling Prosodic Dynamics for Speaker Recognition,” Proc. ICASS 2003, Volume 4, April 2003, IV – 788-91.
[15] S. Kajarekar, L. Ferrer, K. Sonmez, J. Zheng, E. Shriberg, and A. Stolcke, “Modeling NERFs for Speaker Recognition,” Proc. Odyssey 2004 Speaker and Language Recognition Workshop (Toledo, Spain), pp. 51-56, June 2004.
[16] D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, and B. Xiang, “The SuperSID Project: Exploiting High-Level Information for High-Accuracy Speaker Recognition,” Proc. ICASSP 2003, Volume IV, 2003, 784-787.
[17] E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and A. Stolcke, “Modeling Prosodic Feature Sequences for Speaker Recognition,” Speech Communication, Volume 46, 2005, 455-472.
[18] “NIST - Speaker Recognition Evaluations,” http://www.nist.gov/speech/tests/spk/index.htm
[19] “NIST 2001 Speaker Recognition Evaluation – Extended Data task,” http://www.nist.gov/speech/tests/spk/2001/extended-data/
[20] Z. H. Chen, Y. F. Liao and Y. T. Juang, “Prosodic modeling and Eigen-Prosody Analysis for Robust Speaker Recognition,” Proc. ICASSP 2005, Volume 1, Issue , March 18-23, 2005 Page(s): 185 - 188.
[21] R. Baeza-Yates and B. Riberiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.
[22] L. P. Jing, H. K. Huang, H. B. Shi, “Improved Feature Selection Approach TFIDF in Text Mining,” Proc. 2002 International Conference on Machine Learning and Cybernetics, Volume 2, 2002, 944 - 946.
[23] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer,R. A. Harshman, L. A. Streeter, and K. E. Lochbaum, “Information Retrieval Using A Singular Value Decomposition Model of Latent Semantic Structure,” Proc. SIGIR, 1988, 465-480.
[24] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman. "Indexing by Latent Semantic Analysis." Journal of the American Society for Information Science 41, pp. 391-407, 1990.
[25] T. Hofmann, “Probabilistic Latent Semantic Analysis,” Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), San Fracisco, CA (pp. 289-296), 1999.
[26] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, 42, 2001, 177-196.
[27] TIMIT Speech Database, http://www.mpi.nl/world/tg/corpora/timit/timit.html
[28] D. A. Reynolds, “HTIMIT and LLHDB: Speech corpora for the study of handset transducer effects,” Proc. ICASSP 1997, Volume 2, 1997, 1535-1538.
[29] M. Hasegawa-Johnson, K. Chen, J. Cole, S. Borys, S. S. Kim, A. Cohen, T. Zhang, J. Y. Choi, H. Kim, T. Yoon, and S. Chavarria, “Simultaneous Recognition of Words and Prosody in the Boston University Radio Speech Corpus,” Speech Communication, 46(3-4), 2005, 418-439.
[30] K. Chen, M. Hasegawa-Johnson, A. Cohen, S. Borys, S. S. Kim, J. Cole, and J.Y. Choi, “Prosody Dependent Speech Recognition on Radio News Corpus of American English,” IEEE Transactions on Speech and Audio Processing, 14(1), 2006, 232-245.
[31] K. J. Chen and W. Y. Ma, “Unknown Word Extraction for Chinese Documents,” Proc. COLING 2002, 2002, 169-175.
[32] P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum-likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Ser. B, 39, 1977, 1-38.
[33] T. J. Hazen, “A Comparison of Novel Techniques for Rapid Speaker Adaptation,” Speech Communication, Volume 31, May 2000, 15-33.
[34] C. J. Leggetter and P. C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech Lang., Vol. 9, 1995, 171-185.
[35] M. Nishida, T. Kawahara, “Speaker Indexing and Adaptation using Speaker Clustering Based on Statistical Model Selection,” Proc. ICASSP 2004, Volume 1, 17-21, May 2004, I – 353-56.
[36] D. Lilt and F. Kubala, “Online Speaker Clustering,” Proc. ICASSP 2004, 2004, Volume 1, I – 333-6.
[37] J. L. Gauvain and C. H. Lee, “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Trans. on Speech and Audio Processing, 2, 1994, 291-298.
[38] B. H. Juang, W. Chou, and C. H. Lee, “Minimum Classification Error Rate Methods for Speech Recognition,” IEEE Trans. on Speech and Audio Processing. Volume 5, No. 3, May 1997.
[39] K. Sjölander and J. Beskow, “Wavesurfer,” http://www.speech.kth.se/wavesurfer/
[40] K. Sjölander, “Snack Sound Toolkit,” http://www.speech.kth.se/snack/
[41] I. J. Good, “The Population Frequencies of Species and the Estimation of Population Parameters,” Biometrika, Volume 40 (3, 4), 1953, 237-264.
[42] G. Doddington, “Speaker Recognition based on Idiolectal Differences between Speakers,” Proc. EUROSPEECH 2001 (Aalborg, Denmark), 2001, 2521-2524.
[43] B. Xiang, “Text-independent Speaker Verification with a Dynamic Trajectory Model,” IEEE Signal Processing Letters, 10(5), 2003, 141-143.
[44] Z. H. Chen, Z. R. Zeng, Y. F. Liao, and Y. T. Juang, “Probabilistic Latent Prosody Analysis for Robust Speaker Verification,” Proc. ICASSP 2006, 2006.
[45] W. C. Chang, D. Y. Chen, Z. H. Chen, Z. R. Zeng, Y. F. Liao, and Y. T. Juang, “Incorporating Prosodic with Acoustic information for ISCSLP 2006 Speaker Recognition Evaluation – Robust Cross-Channel Speaker Verification,” Proc. ISCSLP 2006, 2006. |