國語語音強健辨認之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：3.22.248.193

姓名

黃國彰(Kuo-Chang Huang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

國語語音強健辨認之研究
(Robust speech recognition in noisy environments)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

Despite sophisticated present day automatic speech recognition (ASR) techniques, a single recognizer is usually incapable of accounting for the varying conditions in a typical natural environment. Higher robustness to a range of noise cases can potentially be achieved by combining the results of several recognizers operating in parallel. To overcome this problem and improve the performance of speech recognition systems in additive conditions, special attention should be paid to the problem of robust feature and compensation of models.
This thesis is concerned with the problem of noise-resistance applied to automatic speaker-independent speech recognition. The two problems of the model compensation and robust feature are treated in this work.
In model compensation stage, first, we investigate a projection-based group delay scheme (PGDS) likelihood measure that significantly reduces noise contamination in speech recognition. Because the norm of the cepstral/GDS vector will be shrinked when the speech signals are corrupted by additive noise, the HMM parameters, namely, the mean vector and the covariance matrix, need to be furthermore modified. The proposed approach compensates the mean vector using a projection-based scale factor and the mean compensation bias, and fits the covariance matrix using a variance adaptive function. The bias and variance adaptive functions estimated from the training and/or testing data were used to balance the mismatch between different environments. Lastly, a state duration method was utilized to deal with the problem that the additive noise segments the error path in Viterbi decoding.
Secondly, we proposed a model compensation method that is similar to parallel model combination. The basis of the method is the fact that the autocorrelation function of the signal resulting from the addition of two statistically independent signals is equal to the sum of their individual autocorrelation functions. Therefore, in adjusting a clean model, its state spectral representation is transformed from the autoregressive, or cepstral, domain to the autocorrelation domain. Then, the autocorrelation of the clean model is added to a sample of the autocorrelation of the additive noise, resulting in the autocorrelation of the noisy signal, which is transformed back to the original spectral representation. At the end of this process, an adjusted model results with better capabilities of handling the noisy signal.
Most speech recognition systems are based on cepstral coefficients and their first- and second order derivatives. The derivatives are normally approximated by fitting a linear regression line to a fixed-length segment of consecutive frames. The time resolution and smoothness of the estimated derivative depends on the length of the segment. Herein, we present an approach to improve the representation of speech dynamics, which is based on the combination of multiple time resolutions. To illustrate the procedure, we take two different sets of feature combinations. In the first system, we combine separated input used different features, i.e. the cepstral and group delay spectrum coefficients leading to higher performance in all noise condition. In the second system, we extract feature over variable sized windows of three or five times the original windows size. Capturing different information in different feature combination or in multi-scale features being more robust to noise, the robust integration system gained a significant performance improvement in both clean speech and in real environmental noise.

摘要(英)

關鍵字(中)

★ 強健特徵參數
★ 模型補償

關鍵字(英)

★ robust features
★ model compensation

論文目次

Contents
Abstract iii
Acknowledgements v
List of Figures ix
List of Tables xi
1 Introduction 1
1.1 Automatic speech recognition …………………………………….…………2
1.2 Difficulty of the speech recognition task ……………………………………4
1.3 Speech recognition in the real world conditions …………………………….5
1.4 Speech recognition in noise ………………………………………………….8
1.5 Objectives of the thesis …………………………………………………...…8
1.6 Dissertation outline …………………………………………………….……9
2 Overview of Environmental Robustness in Speech Recognition 11
2.1 Introduction ………………………………………………………………...11
2.2 Speech recognition with hidden Markov models …………………………..12
2.2.1 Feature extraction of the speech signal ………………………………13
2.2.2 Model structure ………………………………………………………18
2.2.3 Training of the models …………………………………………….…19
2.2.4 Viterbi algorithm …………………………………………………..…25
2.3 Speech recognition in noise using HMM based system ……………………27
2.3.1 Speech enhancement …………………………………………………29
2.3.2 Robust parameters ……………………………………………………37
2.3.3 Model based techniques …………………………………………..….41
2.4 Summary …………………………………………………………………...51
3 Databases and recognition systems 52
3.1 Introduction ………………………………………………………………...52
3.2 Databases …………………………………………………………………..52
3.2.1 MAT2000 database …………………………………………………53
3.2.1 MAT400 database ………………………………………………...…53
3.2.2 Noises from NOISEX-92 database …………………………..……..54
3.3 Perturbations of the speech signal ………………………………………….55
3.3.1 Additive noise ……………………………………………………….55
3.3.2 Estimation of signal to noise ratio ………………………………….56
3.4 Recognition system ……………………………………………………...…57
3.4.1 Base recognition system ………………………………………...…..57
3.4.2 Word recognition system ………………………………………...….57
4 Projection-based Group Delay Scheme 59
4.1 Introduction ………………………………………………………………...60
4.2 Overview of projection-based group delay scheme ………………………..63
4.2.1 Analysis of the noisy group delay spectrum (GDS) ………………...63
4.2.2 Projection-based group delay spectrum measure …………………...66
4.3 Mean compensation likelihood measure …………………………………...68
4.4 Variance compensation likelihood measure ………………………………..71
4.5 State duration distribution ………………………………………………….73
4.6 Experimental results ………………………………………………………..75
4.7 Summary …………………………………………………………………...82
5 Weighted Autocorrelation Integration for noise compensation 84
5.1 Introduction ………………………………………………………...………84
5.2 Voice activity detection ……………………………………………….……86
5.2.1 Detection based on energy estimation ………………………………86
5.2.2 Other criteria used for speech detection …………………………….91
5.2.3 Estimation of the noise over the whole signal ………………………92
5.3 Autocorrelation Integration ……………………………………………...…93
5.3.1 Adjusting the mean vector …………………………………………94
5.3.2 Adjusting the variance vector ……………………………………….99
5.4 Weighted acoustic modeling for HMMs ………………………………...…99
5.5 Experimental results ………………………………………………………102
5.6 Summary ………………………………………………………………….104
6 Robust integration for speech features 105
6.1 Introduction ……………………………………………………….………107
6.2 Feature weighting of Cepstral/GDS coefficients ………………………...109
6.3 Multiply timescale of feature combination ………………………………..113
6.4 Experiments and results …………………………………………………...115
6.5 Summary ………………………………………………………………….119
7 Conclusions 120
7.1 Summary of findings and contributions of this thesis …………..…...……120
7.2 Future directions …………………………………………...........……...…122
Bibliography 124

參考文獻

Acero, A. (1990). Acoustical and environmental robustness in automatic speech recognition. Ph.D. Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University.
Acero, A. (1993). Acoustical and environmental robustness in automatic speech recognition. Kluwer Academic Publishers.
Allen, J.B. (1994). How do humans process and recognize speech. IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, pp. 567–577.
Atal, B. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of Acoustical Society of America, vol . 55, pp. 1304–1312.
Bahl, L.R., Brown, P.F., de Souza, P.V., and Mercer, R.L. (1988). A new algorithm for the estimation of hidden Markov model parameters. In Proceedings of the ICASSP, pages 493–496.
Bahoura, M., and Rouat, J. (2001). A New Approach for Wavelet Speech Enhancement. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pp. 1937-194, Aalborg, Denmark.
Bateman, D.C., Bye, D.K., Hunt, M.J. (1992). Spectral contrast normalization and other techniques for speech recognition in noise. In Proceeding of the IEEE 1992 International Conference on Acoustic, Speech and Signal Processing (ICASSP92), pages 241-244, San Francisco, USA.
Beattie, V.L., and Young, S.J. (1991). Noisy speech recognition using hidden Markov model state-based filtering. In Proceedings of the ICASSP, Speech and Signal Processing, pages 917–920.
Bellegarda, J.R. (1997). Statistical techniques for robust ASR: Review and perspectives. In Proceedings of European Conference on Speech Communication and Technology (Eurospeech1997), pages 33-36, Rhodes, Greece.
Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York, 2nd edition.
Berouti, M., Schwartz, R., and Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 208–211.
Berstein, A.D., and Shallom, I.D. (1991). An hypothesized Wiener filtering approach to noisy speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 913–916.
Bishop, C. (1995). Neural networks for pattern recognition. Clarendon Press, Oxford
Bocchieri, E., and Doddington, G. (1986). Frame-specific statistical features for speaker independent speech recognition. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no. 4, pp. 755–764.
Bocchieri, E., and Doddington, G. (1987). Statistical features versus word templates for speaker independent digits recognition over long-distance telephone connection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1123–1226.
Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 27, pp. 113–120.
Boll, S.F. (1992). Speech enhancement in the 1980s: Noise suppression with pattern matching. Advances in Speech Signal Processing. ed. by Furui, S., and Sonfhi, M.M., (Marcel Dekker. New York), Chapter 10.
Bourlard, H. (1999). Non-stationary multi-channel (multi-stream) processing towards robust and adaptive ASR. In Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pages 1–10, Tampere, Finland.
Bourlard, H. and Dupont, S. (1997). Subband-based speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1251–1254.
Carlson, B.A., Clements, M.A. (1991). Application of a weighted projection measure for robust hidden Markov model based speech recognition. In Proceeding of the IEEE 1991 International Conference on Acoustic, Speech and Signal Processing (ICASSP91), pages 921-924, Toronto, Canada.
Carlson, B.A., Clements, M.A. (1992). Speech recognition in noise using a projection-based likelihood measure for mixture density HMM’s. In Proceeding of the IEEE 1992 International Conference on Acoustic, Speech and Signal Processing (ICASSP92), pages 237-240, San Francisco, USA.
Carlson, B.A., Clements, M.A. (1994). A projection-based likelihood measure for speech recognition in noise. IEEE Trans. Speech and Audio Processing, vol. 2, pp. 97-102.
Chien, J.-T. (2001). Combined Linear Regression Adaptation and Bayesian Predictive Classification for Robust Speech Recognition. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pp. 1131-1135, Aalborg, Denmark.
Compernolle, D.V. (1989a). Noise adaptation in hidden Markov model speech recognition system. Computer Speech and Language, vol. 3, no. 2, pp. 151–168.
Compernolle, D.V. (1989b). Spectral estimation using a log-distance error criterion applied to speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 258–261.
Cooke, M., Morris, A., and Green, P. (1996). Recognition of occluded speech. In ESCA ETRW on the Auditory Basis of Speech Perception.
Deller, J.R., Proakis, J.G. and Hansen, J.H.L. (1953). Discrete-time processing of speech signals. Macmillan Publishing Company, 1993. H. Fletcher. Speech and hearing in communication. Krieger, New-York.
DeGroot, M.H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York.
Dubois, D. (1991). Comparison of time-dependent acoustic features for a speaker independent speech recognition system. In Eurospeech, pages 935–938.
Duda, R.O., Hart, P.E. (1973). Pattern Classification and Scene Analysis. New York : Wiley.
Ephraim, Y. (1992a). A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Trans. on Signal Processing, vol. 40, no. 4, pp. 725–735.
Ephraim, Y. (1992b). Statistical-model-based speech enhancement systems. IEEE Proceedings, vol. 80, no. 10, pp. 1526–1555.
Ephraim, Y., and Juang, B.-H. (1988). On the adaptation of hidden Markov models for enhancing noisy speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 533–536.
Ephraim, Y., and Malah, D. (1984). Speech enhancement using a minimum mean-square error short time spectral amplitude estimator. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 32, no. 6, pp. 1109–1121.
Ephraim, Y., Malah, D., and Juang, B.-H. (1989). On the application of hidden Markov models for enhancing noisy speech. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 12, pp. 1846–1856.
Erell, A., and Weintraub, M. (1993a). Filterbank energy estimation for recognition of noisy speech. IEEE Trans. Speech Audio Processing, vol. 1, no. 1, pp. 68–76.
Erell, A., and Weintraub, M. (1993b). Energy conditioned spectral estimation for recognition of noisy speech. IEEE Trans. on Speech and Audio Processing, vol. 1, no. 1, pp. 84–89.
Flores, J.A.N., and Young, S.J. (1993). Adapting a HMM-based recognizer for noisy speech enhanced by spectral subtraction. Technical Report CUED / F-INFENG / TR.123, Cambridge University Electrical Department.
Flores, J.A.N., and Young, S.J. (1993). Adapting a HMM-based recognizer for noisy speech enhanced by spectral subtraction. In Eurospeech, pages 829–832.
Flores J.A.N., and Young, S.J. (1994). Adapting a HMM-based recognizer for noisy speech enhanced by spectral subtraction. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 829–832.
Frazier, R., Samsam,S., Braida, L., and Oppenheim, A. (1976). Enhancement of speech by adaptive filtering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 251–253.
Furui, S. (1986a). On the role of spectral transition for speech perception. Journal of Acoustical Society of America, 80(4):1016–1025.
Furui, S. (1986b). Speaker independent isolated word recognition based on emphasized spectral dynamics. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1991–1994.
Furui, S. (1997). Recent advance in robust speech recognition. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 11–20.
Furui, S., and Sondhi, M.M. (1992). Advances in speech signal processing. Marcel Dekker, New York.
Gales, M.J.F. (1994). PMC for speech recognition in additive and convolutional noise. Technical Report CUED/FINFENG/ TR 154, Cambridge University, Engineering Department.
Gales, M.J.F. (1995). Model-Based Techniques for Noise Robust Speech Recognition. Ph.D. dissertation, Univ. Cambridge, Cambridge, U.K.
Gales, M.J.F. (1998). Predictive model-based compensation schemes for robust speech recognition. Speech Communication, vol. 25, pp. 49-74.
Gales, M.F.J., and Young, S.J. (1992a). An improved approach to the hidden Markov model decomposition of speech and noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. I, pages 233–236.
Gales, M.F.J., and Young, S.J. (1993a). Cepstral parameter compensation for HMM recognition in noise. Speech Communication, vol. 12, pp. 231–239.
Gales, M.F.J., and Young, S.J. (1993b). HMM recognition in noise using parallel model combination. In Eurospeech, pages 837–840.
Gales, M.F.J., and Young, S.J. (1995). A fast and flexible implementation of parallel model combination. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 133–136.
Gales, M.J.F., Pye, D., and Woodland, P.C. (1996). Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. In Proceedings of the Int. Conf. Spoken Language Process. (ICSLP96), pages 1832-1835, Philadelphia, PA, USA.
Gales, M.J.F., Woodland, P.C. (1996). Mean and Variance within the MLLR framework. Computer Speech and Language, vol. 10, pp. 249-264.
Gao, Y., Huang, T., Chen, S., and Haton, J.-P. (1992). Auditory model based on speech processing. In Int. Conf. on Spoken Language Processing (ICSLP), vol. 1, pages 73–76.
Gauvain, J.-L., Lee, and C.-H. (1994). Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Processing. vol. 2, pp. 291–298.
Ghitza, O. (1986). Auditory nerve representation as a front-end for speech recognition in a noisy environment. Computer Speech and Language, vol. 1, pp. 109–130.
Gong, Y. (1995). Speech recognition in noisy environments: a survey. Speech Communication, vol. 16, pp. 261–292.
Gong. Y., and Haton, J.-P. (1994). Stochastic trajectory modeling for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 57–60.
Graf, J.T., and Hubing, N. (1993). Dynamic time-warping for the enhancement of speech degraded by white Gaussian noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 339–342.
Hagen, A., Morris, M.C., and Bourlard, H. (1999). Different weighting schemes in the full combination sub-band approach for noise robust ASR. In Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pages 199–202, Tampere, Finland.
Hanson, J., and Applebaum, T. (1990). Robust speaker independent word recognition using static, dynamic and acceleration features: Experiments with Lombard effect and noisy speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 857–860.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. Journal of Acoustical Society of America, vol. 87, pp. 1738–1752.
Hermansky, H., Hanson, B.A., and Wakita, H. (1985). Perceptually based linear predictive analysis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 509–512.
Hermansky, H., Morgan, N., Bayya, A., and Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, pages I-121 – I-124.
Hermansky, H., Morgan, N., and Hirsch, H. (1993). Recognition of speech in additive and convolutional noise based on RASTA spectral processing. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pages 83-86.
Hermansky, H., and Sharma, S. (1999). Temporal Patterns (TRAPS) in ASR of Noisy Speech. Proc. ICASSP, vol. I, pages 289-292.
Hermansky, H., Timberwala, S., and Pavel, M. (1996). Towards ASR on partially corrupted speech. In Int. Conf. on Spoken Language Processing (ICSLP), vol. 1, pages 462–465, Philadelphia, PA.
Hernando, J., and Nadeu, C. (1991). A comparative study of parameters and distances for noisy speech recognition. In Eurospeech, pages 91–94.
Hernando, J., and Nadeu, C. (1994). Speech recognition in noisy car environment based on OSALPC representation and robust similarity measuring techniques. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 69–72.
Hirsch, H.G., Meyer, P., and Rühl, H.W. (1991). Improved speech recognition using high-pass filtering of subband envelopes. In Eurospeech, pages 413–416.
Holmes, J.N., and Sedgwick, N.C. (1986). Noise compensation for speech recognition using probabilistic models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 741–744.
Huang, K.-C., Tung, S.-L., Juang, Y.-T. (1999). Mean compensation based on projection-based group delay scheme for noisy speech recognition. IEE Electronic Letter, vol. 35, pp. 1432-1434.
Huang, K.-C., Tung, S.-L., Juang, Y.-T. (2001). Application of the variance compensation likelihood measure for robust hidden Markov model in noise. Pattern Recognition Letters, vol. 22, no. 3-4, pp. 353-358.
Huang, K.-C., Tung, S.-L., and Juang, Y.-T. (2003). A likelihood measure based on projection-based group delay scheme for Mandarin speech recognition in noise. Signal Processing, vol. 83, no. 3, pp. 611-626.
Huang, X.D., Arki, Y., and Jack, M.A. (1990). Hidden Markov models for speech recognition. Edinburgh University Press.
Huang, Y., Zhao, Y. and Levinson, S. (1999). A DCT-based fast enhancement technique for robust speech recognition in automobile usage. In Eurospeech, vol. 5, pages 1947–1950.
Hwang, T.-H., Yuo, K.-H., Wang, H.-C. (2001). Linear Interpolation of Cepstral Variance for Noisy Speech Recognition. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 877-881, Aalborg, Denmark.
Itakura, F., and Umezaki, T. (1987). Distance measure for speech recognition based on the smoothed group delay spectrum. In Proceeding of the IEEE 1987 International Conference on Acoustic, Speech and Signal Processing (ICASSP87), pages 1257-1260, Dallas, Texas.
Jelinek, F. (1997). Statistical methods for speech recognition. MIT Press.
Juang, B.-H., Rabiner L., and Wilpon, J.G. (1987). On the use of bandpass filtering in speech recognition. IEEE Trans. on Acoustics, Speech and Signal Processing, pp. 947–954.
Juang, B.H., Wilpon, J.G., and Rabiner, L. (1986). On the use of bandpass filtering in speech recognition. In Proceedings of the IEEE 1986 International Conference on Acoustic, Speech and Signal Processing (ICASSP86), pages 765-768, Tokyo, Japan.
Juang, B.-H., Rabiner, L.R. (1990). The segmental K-means algorithm for estimating parameters of hidden Markov models. IEEE Trans. Signal Processing, vol. 38, pp. 1639-1641.
Junqua, J.-C., and Haton, J.-P. (1996). Robustness in automatic speech recognition: fundamentals and application. Kluwer Academic Publishers.
Junqua, J.-C., Valente, S., Fohr, D., and Mari, J.-F. (1995). An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 852–855.
Kadirkamanathan, M., and Varga, A.P. (1991). Simultaneous model re-estimation from contamined data by "composed hidden Markov model modeling". In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 897–900.
Kim, D.Y., and Un, C.K. (1996). Probabilistic vector mapping with trajectory information for noise-robust speech recognition. IEE Electronics Letters, vol. 32, no. 17, pp. 1550–1551.
Klatt, D.H. (1976). A digital filter bank for spectral matching. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 573–576.
Koo, B., Gibson, J., and Gray, A. (1989). Filtering of colored noise for speech enhancement and coding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 349–352.
Kwang, C.O., and Hwang, S.L. (1999). Sigmoidal spectral conversion with changeable dynamic region for speech feature extraction. Electronics Letters, vol. 35, no. 2, pp. 125 –126.
Lee, C.H. (1997). On feature and model compensation approach to robust speech recognition. In Proceedings of the ESCA -NATO Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pages 45-54, Pont-a-Mousson, France.
Lee, C.-H. (1998). On stochastic feature and model compensation approaches to robust speech recognition. Speech Communication, vol. 25, pp. 29–47.
Lee, C.-H., and Gauvain, J.L. (1993). Speaker adaptation based on MAP estimation of HMM Parameter. ICASSP93 II, pages 558-561.
Lee, C.-H., and Gauvain, J.L. (1996). Bayesian adaptive learning and MAP estimation of HMM, chapter 4, pages 83–107.
Lee, C.-H., Giachin, E., Rabiner, L., Pieraccini, E., and Rosenberg, A.E. (1992). Improved acoustic modeling for large vocabulary continuous speech recognition. Computer Speech and Language, vol. 6, no. 2, pp. 103–127.
Lee, C.-H., Lin, C.-H., and Juang. B.-J. (1991). A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. on Signal Processing, vol. 39, no. 4, pp. 806–814.
Lee, C.-H., Paliwal, K.K., and Soong, F.K. (1996). Speech and speaker recognition: advanced topics. Kluwer Academic Publisher.
Lee, K.F., and Mahajan, A. (1990). Corrective and reinforcement learning for speaker independent continuous speech recognition. Computer Speech and Language, vol. 4, pp. 231–245.
Legetter, C.J., and Wooland, P.C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, vol. 9, no. 2, pp. 171-186.
Lim, J.S. (1983). Speech Enhancement, Prentice-Hall, Englewood Cliffs, NJ.
Linhard, K., and Klemm, H. (1997). Noise reduction with spectral subtraction and median filtering for suppression of musical tones. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 159–162.
Lippmann, R.P. (1996). Recognition by human and machines, miles to go before we sleep. Speech Communication, vol. 18, no. 3, pp. 247–248.
Lippmann, R.P., and Carlson, B.A. (1997). Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise. In Eurospeech, vol. 1, pages 37–40, Rhodes, Greece.
Lockwood, P., Baillargeat, C., Gillot, J., Boudy, J., and Faucon, G. (1991). Noise reduction for speech enhancement in cars: non-linear spectral subtraction/Kalman filtering. In Eurospeech, pages 83–86.
Lockwood, P., and Boudy, J. (1992). Experiments with a non linear spectral subtraction (NSS) and hidden Markov models and projection for robust speech recognition in cars. Speech Communication, vol. 11, pp. 215–228.
Mansour, D., Juang, B.-H. (1988). The short-time modified coherence representation and its application for noisy speech recognition. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP88), pages 525-528, New York City, USA.
Mansour, D., Juang, B.-H. (1989). The short-time modified coherence representation and noisy speech recognition. IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, no. 6, pp. 795-804.
Mansour, D., and Juang, B.-H. (1989). A family of distortion measures based upon projection operation for robust speech recognition. IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, pp.1659-1671.
Mansour, D., and Juang, B.-H. (1998). A family of distortion measures base upon projection operation for robust speech recognition. In Proceedings of the IEEE 1998 International Conference on Acoustic, Speech and Signal Processing (ICASSP98), pages 36-39, Seattle, Washington, USA.
Martin, F., Shikano, K., and Minami, Y. (1993). Recognition of noisy speech by composition of hidden Markov models. In Eurospeech, pages 1031–1034.
McAulay, R.J., and Malpass, M.L. (1980). Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, no. 2, pp. 137–145.
Merhav, M., and Lee, C.H. (1993). A minimax classification approach with application to robust speech recognition. IEEE Trans. on Speech and Audio Processing, vol. 1, no. 1, pp. 90–100.
Mellor, B.A., and Varga, A.P. (1993). Noise masking in a transform domain. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 87–90.
Minami, Y., and Furui, S. (1995). A maximum likelihood procedure for a universal adaptation method based on HMM composition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 129–132.
Mitra, S.K., and Kaiser, J.F. (1993). Handbook for digital signal processing. John Wiley and Sons.
Mokbel, C., and Chollet, G. (1991). Word recognition in the car: speech enhancement/spectral transformation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 925–928.
Morgan, N. (1997). Robust features and environmental compensation: a few comments. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 43–44.
Morgan, N., and Hermansky, H. (1992). RASTA extensions: robustness to additive and convolutional noise. In ETWR: speech processing in adverse conditions, pages 115–118.
Morris, A.C., Hagen, A., and Bourlard, H. (1999). The full-combination subband approach to noise robust HMM/ANN based ASR. In Eurospeech, pages 599–602.
Nadas, A., Nahamoo, D., and Picheny, M.A. (1989). Speech recognition using noise-adaptive prototypes. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 10, pp. 1495–1502.
Neumeyer, L., and Weintraub, M. (1994). Probabilistic optimum filtering for robust speech recognition. In Proc. ICASSP, vol. I, pages 417–420.
Ney, H. (1990). Acoustic-phonetic modeling using continuous mixture densities for 991-word DARPA speech recognition task. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 713–716.
Openshaw, J.P., Mason, J.S. (1994). On the limitations of cepstral features in noise. In Proceedings of the IEEE 1994 International Conference on Acoustic, Speech and Signal Processing (ICASSP94), pages 49-52, Adelaide, Australia.
Oppenhim, A.V., and Schafer, R.W. (1975). Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ.
Paliwal, K. (1993). Use of temporal correlation between successive frames in a hidden Markov models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 215–218.
Paliwal, K., and Basu, A. (1987). A speech enhancement method based on Kalman filtering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 177–180.
Picone, J.W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, vol. 81, no. 9, pp. 1214–1247.
Rabiner, L.R., and Juang, B.-H. (1992). Speech recognition and understanding. Recent advances, Trends and applications, chapter Hidden Markov models for speech recognition - strengths and limitations. Springler-Verlag.
Rabiner, L.R., and Juang, B.-H. (1993). Fundamentals of speech recognition. Prentice Hall.
Rabiner, L.R., Wilpon, J.G., and Juang, B.-H. (1986). A segmental k-means training for connected word recognition. AT&T Tech. J., vol. 65, pp. 21-32.
Rahim, M.G., and Juang, B.-H. (1996). Signal Bias Removal by maximum likelihood estimation for robust telephone speech recognition. IEEE Trans. Speech and Audio Processing, vol. 4, no. 1, pp. 19-30.
Rahim, M.G., and Juang, B.H. (1996). Chou, W., and Buhrke, E., Signal conditioning techniques for robust speech recognition. IEEE Signal Processing Letters, vol. 3, pp. 107-109.
Ramalho, M.A., and Mammone, R.J. (1994). A new speech enhancement technique with application to speaker identification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. I, pages 29–32.
Roe, D.B. (1987). Speech recognition with a noise-adapting codebook. In Proceeding of the IEEE 1987 International Conference on Acoustic, Speech and Signal Processing, (ICASSP87), pages 1139-1142, Dallas, Texas.
Rose, R.C., Hofsetter, E.M., and Reynolds, D.A. (1994). Integrated models of signal and background with application to speaker identification in noise. IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 245–257.
Sankar. A., and Lee, C.-H. (1995). Robust speech recognition based on stochastic matching. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 121–124.
Sankar, A., and Lee, C.-H. (1996). A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. Speech Audio Processing, vol. 4, pp. 190–202.
Sayed, A., and Kailath, A. (1994). A state-space approach to adaptive RLS filtering. IEEE Signal Processing Magazine, vol. 11, no. 3, pp. 18–60.
Sennff, S. (1988). A joint synchrony/mean-rate model of auditory speech recognition. Journal of Phobetics, vol. 16, pp. 55-76.
Selouani, S.-A., Tolba, H., and O'Shaughnessy, D. (2001). Robust automatic speech recognition in low-snr car environments by the application of a connectionist subspace-based approach to the mel-based cepstral coefficients. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 1577-1581, Aalborg, Denmark.
Shin, V., Kim, D.-S., Kim, M.Y., and Kim, J. (2001). Enhancement of noisy speech by using improved global soft decision. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 1929-1934, Aalborg, Denmark.
Singer, H., Umezaki, T., and Itakura, F. (1990). Low bit quantization of the smoothed group delay spectrum for speech recognition. In Proceedings of the IEEE 1990 Proceeding of International Conference on Acoustic, Speech and Signal Processing (ICASSP90), pages 761-765, Albuquerque, NM.
Stern, R.M., Raj, B., and Moreno, P.J. (1997). Compensation for environmental degradation in automatic speech recognition. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 33–42.
Takahashi, J., and Sagayama, S. (1995). Vector-field-smoothed Bayesian learning for incremental speaker adaptation. ICASSP, vol. 1, pages 696–699.
Takiguchi, T., Nakamura, S., Huo, Q., and Shikano, K. (1997). Adaptation of model parameters by HMM decomposition in noisy reverberant environments. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 155–158.
Tibrewala, S., and Hermansky, H. (1997). Multi-band and adaptation approaches to robust speech recognition. In Eurospeech, Rhodes, Greece.
Tohkura, Y. (1987). A weighted cepstral distance measure for speech recognition. IEEE Trans. ASSP, vol. 35, pp. 1414-1422.
Tufekci, Z., Gowdy, J., Gurbuz, S., and Patterson, E. (2001). Applying parallel model compensation with mel-frequency discrete wavelet coefficients for noise-robust speech recognition. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 873-877, Aalborg, Denmark.
Tung, S.-L., Lei, I.-S., and Juang, Y.-T. (1996). Projection-based group delay scheme for speech recognition. IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 138-140.
Umezaki, T., Itakura, F. (1989). Speech analysis by group delay spectrum of all-pole filters and its application to the speech distance measure for speech recognition. Transactions on Institute of Electronics and Communication Engineers of Japan (IECE), Vol. J72-D-II, no. 8. (in Japanese)
Usagawa, T., Iwata, M., and Ibata, M. (1994). Speech parameter extraction in noisy environment using a masking model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 81–84.
Varga, A.P., and Moore, P.K. (1990). Hidden Markov model decomposition of speech and noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 845–848.
Varga, A.P., and Pointing, K. (1989). Control experiments on noise compensation in hidden Markov model based continuous word recognition. In Proc. European Conf. Speech Technology, pages 167–170, Paris.
Varga, A., and Steeneken, H.J.M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, vol. 12, pp. 247-251.
Vaseghi, S.V. (1996). Advanced signal processing and digital noise reduction. Wiley and Sons Ltd.
Vaseghi, S.V., and Milner, B.P. (1993). Noisy speech recognition based on HMMs, Wiener filters and re-evaluation of most likely candidates. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 103–106.
Vaseghi, S.V., and Milner, B.P. (1995). Speech recognition in impulsive noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. I, pages 437–440.
Vaseghi, S.V., and Milner, B.P. (1997). Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. on Speech and Audio Processing, vol. 5, no. 1, pp. 11–21.
Vaseghi, S.V., Milner, B.P., and Humphries, J.J. (1994). Noisy speech recognition using cepstral time features and spectral-time filters. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 65–68.
Viiki, O., Bye, D., and Laurila, K. (1998). A recursive feature vector normalization approach for robust speech recognition in noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
Viikki, O., and Laurila, K. (1997). Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 107–110.
Virag, N. (1996). Speech enhancement based on masking properties of the human auditory system. PhD thesis.
Wang, H.-C. (1997). MAT - A Project to Collect Mandarin Speech Data through Networks in Taiwan. International Journal of Computational Linguistics and Chinese Language Processing, vol. 1, no.2, pp. 73-89.
Wang, H.-C., Seide, F., Tseng, C.-Y., and Lee, L.S. (2000). MAT2000 – Design, collection, and validation of a Mandarin 2000-speaker telephone speech database. In Proceedings of 2000 International Conference on Spoken Language Processing (ICSLP2000), pages 460-463, Beijing, China.
Wellekens, C. (1987). Explicit correlation in hidden Markov models for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 384–387.
Wu, S., Kingsbury, B., Morgan, N., and Greenberg, S. (1998). Incorporating Information from Syllable-length Time Scales into Automatic Speech Recognition. Proc. ICASSP, vol. II, pages 721-724.
Yager, R., and Filev, D. (1994). Essentials of Fuzzy Modeling and Control. New York: Wiley.
Yang, R., Mjaniemi, M., and Haavisto, P. (1995). Dynamic parameter compensation for speech recognition in noise. In Eurospeech, pages 469–472.
Young, S.J. (1992). Cepstral mean compensation for HMM recognition in noise. In ESCA Proc. Speech Processing in Adverse Conditions, pages 123–126, Cannes, France.
Zavagliogkos, G., Schwartz, R., and Makhoul, J. (1995). Batch, incremental and instantaneous adaptation techniques for speech recognition. In Proc. ICASSP, Detroit, MI, pages 676–679.
Zwicker, E. and Fastl, H. (1990). Psychoacoustics: Facts and Models. Springer-Verlag, Berlin.

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2003-6-6

推文