參考文獻 |
Acero, A. (1990). Acoustical and environmental robustness in automatic speech recognition. Ph.D. Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University.
Acero, A. (1993). Acoustical and environmental robustness in automatic speech recognition. Kluwer Academic Publishers.
Allen, J.B. (1994). How do humans process and recognize speech. IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, pp. 567–577.
Atal, B. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of Acoustical Society of America, vol . 55, pp. 1304–1312.
Bahl, L.R., Brown, P.F., de Souza, P.V., and Mercer, R.L. (1988). A new algorithm for the estimation of hidden Markov model parameters. In Proceedings of the ICASSP, pages 493–496.
Bahoura, M., and Rouat, J. (2001). A New Approach for Wavelet Speech Enhancement. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pp. 1937-194, Aalborg, Denmark.
Bateman, D.C., Bye, D.K., Hunt, M.J. (1992). Spectral contrast normalization and other techniques for speech recognition in noise. In Proceeding of the IEEE 1992 International Conference on Acoustic, Speech and Signal Processing (ICASSP92), pages 241-244, San Francisco, USA.
Beattie, V.L., and Young, S.J. (1991). Noisy speech recognition using hidden Markov model state-based filtering. In Proceedings of the ICASSP, Speech and Signal Processing, pages 917–920.
Bellegarda, J.R. (1997). Statistical techniques for robust ASR: Review and perspectives. In Proceedings of European Conference on Speech Communication and Technology (Eurospeech1997), pages 33-36, Rhodes, Greece.
Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York, 2nd edition.
Berouti, M., Schwartz, R., and Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 208–211.
Berstein, A.D., and Shallom, I.D. (1991). An hypothesized Wiener filtering approach to noisy speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 913–916.
Bishop, C. (1995). Neural networks for pattern recognition. Clarendon Press, Oxford
Bocchieri, E., and Doddington, G. (1986). Frame-specific statistical features for speaker independent speech recognition. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no. 4, pp. 755–764.
Bocchieri, E., and Doddington, G. (1987). Statistical features versus word templates for speaker independent digits recognition over long-distance telephone connection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1123–1226.
Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 27, pp. 113–120.
Boll, S.F. (1992). Speech enhancement in the 1980s: Noise suppression with pattern matching. Advances in Speech Signal Processing. ed. by Furui, S., and Sonfhi, M.M., (Marcel Dekker. New York), Chapter 10.
Bourlard, H. (1999). Non-stationary multi-channel (multi-stream) processing towards robust and adaptive ASR. In Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pages 1–10, Tampere, Finland.
Bourlard, H. and Dupont, S. (1997). Subband-based speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1251–1254.
Carlson, B.A., Clements, M.A. (1991). Application of a weighted projection measure for robust hidden Markov model based speech recognition. In Proceeding of the IEEE 1991 International Conference on Acoustic, Speech and Signal Processing (ICASSP91), pages 921-924, Toronto, Canada.
Carlson, B.A., Clements, M.A. (1992). Speech recognition in noise using a projection-based likelihood measure for mixture density HMM’s. In Proceeding of the IEEE 1992 International Conference on Acoustic, Speech and Signal Processing (ICASSP92), pages 237-240, San Francisco, USA.
Carlson, B.A., Clements, M.A. (1994). A projection-based likelihood measure for speech recognition in noise. IEEE Trans. Speech and Audio Processing, vol. 2, pp. 97-102.
Chien, J.-T. (2001). Combined Linear Regression Adaptation and Bayesian Predictive Classification for Robust Speech Recognition. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pp. 1131-1135, Aalborg, Denmark.
Compernolle, D.V. (1989a). Noise adaptation in hidden Markov model speech recognition system. Computer Speech and Language, vol. 3, no. 2, pp. 151–168.
Compernolle, D.V. (1989b). Spectral estimation using a log-distance error criterion applied to speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 258–261.
Cooke, M., Morris, A., and Green, P. (1996). Recognition of occluded speech. In ESCA ETRW on the Auditory Basis of Speech Perception.
Deller, J.R., Proakis, J.G. and Hansen, J.H.L. (1953). Discrete-time processing of speech signals. Macmillan Publishing Company, 1993. H. Fletcher. Speech and hearing in communication. Krieger, New-York.
DeGroot, M.H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York.
Dubois, D. (1991). Comparison of time-dependent acoustic features for a speaker independent speech recognition system. In Eurospeech, pages 935–938.
Duda, R.O., Hart, P.E. (1973). Pattern Classification and Scene Analysis. New York : Wiley.
Ephraim, Y. (1992a). A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Trans. on Signal Processing, vol. 40, no. 4, pp. 725–735.
Ephraim, Y. (1992b). Statistical-model-based speech enhancement systems. IEEE Proceedings, vol. 80, no. 10, pp. 1526–1555.
Ephraim, Y., and Juang, B.-H. (1988). On the adaptation of hidden Markov models for enhancing noisy speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 533–536.
Ephraim, Y., and Malah, D. (1984). Speech enhancement using a minimum mean-square error short time spectral amplitude estimator. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 32, no. 6, pp. 1109–1121.
Ephraim, Y., Malah, D., and Juang, B.-H. (1989). On the application of hidden Markov models for enhancing noisy speech. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 12, pp. 1846–1856.
Erell, A., and Weintraub, M. (1993a). Filterbank energy estimation for recognition of noisy speech. IEEE Trans. Speech Audio Processing, vol. 1, no. 1, pp. 68–76.
Erell, A., and Weintraub, M. (1993b). Energy conditioned spectral estimation for recognition of noisy speech. IEEE Trans. on Speech and Audio Processing, vol. 1, no. 1, pp. 84–89.
Flores, J.A.N., and Young, S.J. (1993). Adapting a HMM-based recognizer for noisy speech enhanced by spectral subtraction. Technical Report CUED / F-INFENG / TR.123, Cambridge University Electrical Department.
Flores, J.A.N., and Young, S.J. (1993). Adapting a HMM-based recognizer for noisy speech enhanced by spectral subtraction. In Eurospeech, pages 829–832.
Flores J.A.N., and Young, S.J. (1994). Adapting a HMM-based recognizer for noisy speech enhanced by spectral subtraction. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 829–832.
Frazier, R., Samsam,S., Braida, L., and Oppenheim, A. (1976). Enhancement of speech by adaptive filtering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 251–253.
Furui, S. (1986a). On the role of spectral transition for speech perception. Journal of Acoustical Society of America, 80(4):1016–1025.
Furui, S. (1986b). Speaker independent isolated word recognition based on emphasized spectral dynamics. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1991–1994.
Furui, S. (1997). Recent advance in robust speech recognition. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 11–20.
Furui, S., and Sondhi, M.M. (1992). Advances in speech signal processing. Marcel Dekker, New York.
Gales, M.J.F. (1994). PMC for speech recognition in additive and convolutional noise. Technical Report CUED/FINFENG/ TR 154, Cambridge University, Engineering Department.
Gales, M.J.F. (1995). Model-Based Techniques for Noise Robust Speech Recognition. Ph.D. dissertation, Univ. Cambridge, Cambridge, U.K.
Gales, M.J.F. (1998). Predictive model-based compensation schemes for robust speech recognition. Speech Communication, vol. 25, pp. 49-74.
Gales, M.F.J., and Young, S.J. (1992a). An improved approach to the hidden Markov model decomposition of speech and noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. I, pages 233–236.
Gales, M.F.J., and Young, S.J. (1993a). Cepstral parameter compensation for HMM recognition in noise. Speech Communication, vol. 12, pp. 231–239.
Gales, M.F.J., and Young, S.J. (1993b). HMM recognition in noise using parallel model combination. In Eurospeech, pages 837–840.
Gales, M.F.J., and Young, S.J. (1995). A fast and flexible implementation of parallel model combination. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 133–136.
Gales, M.J.F., Pye, D., and Woodland, P.C. (1996). Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. In Proceedings of the Int. Conf. Spoken Language Process. (ICSLP96), pages 1832-1835, Philadelphia, PA, USA.
Gales, M.J.F., Woodland, P.C. (1996). Mean and Variance within the MLLR framework. Computer Speech and Language, vol. 10, pp. 249-264.
Gao, Y., Huang, T., Chen, S., and Haton, J.-P. (1992). Auditory model based on speech processing. In Int. Conf. on Spoken Language Processing (ICSLP), vol. 1, pages 73–76.
Gauvain, J.-L., Lee, and C.-H. (1994). Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Processing. vol. 2, pp. 291–298.
Ghitza, O. (1986). Auditory nerve representation as a front-end for speech recognition in a noisy environment. Computer Speech and Language, vol. 1, pp. 109–130.
Gong, Y. (1995). Speech recognition in noisy environments: a survey. Speech Communication, vol. 16, pp. 261–292.
Gong. Y., and Haton, J.-P. (1994). Stochastic trajectory modeling for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 57–60.
Graf, J.T., and Hubing, N. (1993). Dynamic time-warping for the enhancement of speech degraded by white Gaussian noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 339–342.
Hagen, A., Morris, M.C., and Bourlard, H. (1999). Different weighting schemes in the full combination sub-band approach for noise robust ASR. In Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pages 199–202, Tampere, Finland.
Hanson, J., and Applebaum, T. (1990). Robust speaker independent word recognition using static, dynamic and acceleration features: Experiments with Lombard effect and noisy speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 857–860.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. Journal of Acoustical Society of America, vol. 87, pp. 1738–1752.
Hermansky, H., Hanson, B.A., and Wakita, H. (1985). Perceptually based linear predictive analysis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 509–512.
Hermansky, H., Morgan, N., Bayya, A., and Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, pages I-121 – I-124.
Hermansky, H., Morgan, N., and Hirsch, H. (1993). Recognition of speech in additive and convolutional noise based on RASTA spectral processing. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pages 83-86.
Hermansky, H., and Sharma, S. (1999). Temporal Patterns (TRAPS) in ASR of Noisy Speech. Proc. ICASSP, vol. I, pages 289-292.
Hermansky, H., Timberwala, S., and Pavel, M. (1996). Towards ASR on partially corrupted speech. In Int. Conf. on Spoken Language Processing (ICSLP), vol. 1, pages 462–465, Philadelphia, PA.
Hernando, J., and Nadeu, C. (1991). A comparative study of parameters and distances for noisy speech recognition. In Eurospeech, pages 91–94.
Hernando, J., and Nadeu, C. (1994). Speech recognition in noisy car environment based on OSALPC representation and robust similarity measuring techniques. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 69–72.
Hirsch, H.G., Meyer, P., and Rühl, H.W. (1991). Improved speech recognition using high-pass filtering of subband envelopes. In Eurospeech, pages 413–416.
Holmes, J.N., and Sedgwick, N.C. (1986). Noise compensation for speech recognition using probabilistic models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 741–744.
Huang, K.-C., Tung, S.-L., Juang, Y.-T. (1999). Mean compensation based on projection-based group delay scheme for noisy speech recognition. IEE Electronic Letter, vol. 35, pp. 1432-1434.
Huang, K.-C., Tung, S.-L., Juang, Y.-T. (2001). Application of the variance compensation likelihood measure for robust hidden Markov model in noise. Pattern Recognition Letters, vol. 22, no. 3-4, pp. 353-358.
Huang, K.-C., Tung, S.-L., and Juang, Y.-T. (2003). A likelihood measure based on projection-based group delay scheme for Mandarin speech recognition in noise. Signal Processing, vol. 83, no. 3, pp. 611-626.
Huang, X.D., Arki, Y., and Jack, M.A. (1990). Hidden Markov models for speech recognition. Edinburgh University Press.
Huang, Y., Zhao, Y. and Levinson, S. (1999). A DCT-based fast enhancement technique for robust speech recognition in automobile usage. In Eurospeech, vol. 5, pages 1947–1950.
Hwang, T.-H., Yuo, K.-H., Wang, H.-C. (2001). Linear Interpolation of Cepstral Variance for Noisy Speech Recognition. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 877-881, Aalborg, Denmark.
Itakura, F., and Umezaki, T. (1987). Distance measure for speech recognition based on the smoothed group delay spectrum. In Proceeding of the IEEE 1987 International Conference on Acoustic, Speech and Signal Processing (ICASSP87), pages 1257-1260, Dallas, Texas.
Jelinek, F. (1997). Statistical methods for speech recognition. MIT Press.
Juang, B.-H., Rabiner L., and Wilpon, J.G. (1987). On the use of bandpass filtering in speech recognition. IEEE Trans. on Acoustics, Speech and Signal Processing, pp. 947–954.
Juang, B.H., Wilpon, J.G., and Rabiner, L. (1986). On the use of bandpass filtering in speech recognition. In Proceedings of the IEEE 1986 International Conference on Acoustic, Speech and Signal Processing (ICASSP86), pages 765-768, Tokyo, Japan.
Juang, B.-H., Rabiner, L.R. (1990). The segmental K-means algorithm for estimating parameters of hidden Markov models. IEEE Trans. Signal Processing, vol. 38, pp. 1639-1641.
Junqua, J.-C., and Haton, J.-P. (1996). Robustness in automatic speech recognition: fundamentals and application. Kluwer Academic Publishers.
Junqua, J.-C., Valente, S., Fohr, D., and Mari, J.-F. (1995). An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 852–855.
Kadirkamanathan, M., and Varga, A.P. (1991). Simultaneous model re-estimation from contamined data by "composed hidden Markov model modeling". In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 897–900.
Kim, D.Y., and Un, C.K. (1996). Probabilistic vector mapping with trajectory information for noise-robust speech recognition. IEE Electronics Letters, vol. 32, no. 17, pp. 1550–1551.
Klatt, D.H. (1976). A digital filter bank for spectral matching. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 573–576.
Koo, B., Gibson, J., and Gray, A. (1989). Filtering of colored noise for speech enhancement and coding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 349–352.
Kwang, C.O., and Hwang, S.L. (1999). Sigmoidal spectral conversion with changeable dynamic region for speech feature extraction. Electronics Letters, vol. 35, no. 2, pp. 125 –126.
Lee, C.H. (1997). On feature and model compensation approach to robust speech recognition. In Proceedings of the ESCA -NATO Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pages 45-54, Pont-a-Mousson, France.
Lee, C.-H. (1998). On stochastic feature and model compensation approaches to robust speech recognition. Speech Communication, vol. 25, pp. 29–47.
Lee, C.-H., and Gauvain, J.L. (1993). Speaker adaptation based on MAP estimation of HMM Parameter. ICASSP93 II, pages 558-561.
Lee, C.-H., and Gauvain, J.L. (1996). Bayesian adaptive learning and MAP estimation of HMM, chapter 4, pages 83–107.
Lee, C.-H., Giachin, E., Rabiner, L., Pieraccini, E., and Rosenberg, A.E. (1992). Improved acoustic modeling for large vocabulary continuous speech recognition. Computer Speech and Language, vol. 6, no. 2, pp. 103–127.
Lee, C.-H., Lin, C.-H., and Juang. B.-J. (1991). A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. on Signal Processing, vol. 39, no. 4, pp. 806–814.
Lee, C.-H., Paliwal, K.K., and Soong, F.K. (1996). Speech and speaker recognition: advanced topics. Kluwer Academic Publisher.
Lee, K.F., and Mahajan, A. (1990). Corrective and reinforcement learning for speaker independent continuous speech recognition. Computer Speech and Language, vol. 4, pp. 231–245.
Legetter, C.J., and Wooland, P.C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, vol. 9, no. 2, pp. 171-186.
Lim, J.S. (1983). Speech Enhancement, Prentice-Hall, Englewood Cliffs, NJ.
Linhard, K., and Klemm, H. (1997). Noise reduction with spectral subtraction and median filtering for suppression of musical tones. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 159–162.
Lippmann, R.P. (1996). Recognition by human and machines, miles to go before we sleep. Speech Communication, vol. 18, no. 3, pp. 247–248.
Lippmann, R.P., and Carlson, B.A. (1997). Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise. In Eurospeech, vol. 1, pages 37–40, Rhodes, Greece.
Lockwood, P., Baillargeat, C., Gillot, J., Boudy, J., and Faucon, G. (1991). Noise reduction for speech enhancement in cars: non-linear spectral subtraction/Kalman filtering. In Eurospeech, pages 83–86.
Lockwood, P., and Boudy, J. (1992). Experiments with a non linear spectral subtraction (NSS) and hidden Markov models and projection for robust speech recognition in cars. Speech Communication, vol. 11, pp. 215–228.
Mansour, D., Juang, B.-H. (1988). The short-time modified coherence representation and its application for noisy speech recognition. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP88), pages 525-528, New York City, USA.
Mansour, D., Juang, B.-H. (1989). The short-time modified coherence representation and noisy speech recognition. IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, no. 6, pp. 795-804.
Mansour, D., and Juang, B.-H. (1989). A family of distortion measures based upon projection operation for robust speech recognition. IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, pp.1659-1671.
Mansour, D., and Juang, B.-H. (1998). A family of distortion measures base upon projection operation for robust speech recognition. In Proceedings of the IEEE 1998 International Conference on Acoustic, Speech and Signal Processing (ICASSP98), pages 36-39, Seattle, Washington, USA.
Martin, F., Shikano, K., and Minami, Y. (1993). Recognition of noisy speech by composition of hidden Markov models. In Eurospeech, pages 1031–1034.
McAulay, R.J., and Malpass, M.L. (1980). Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, no. 2, pp. 137–145.
Merhav, M., and Lee, C.H. (1993). A minimax classification approach with application to robust speech recognition. IEEE Trans. on Speech and Audio Processing, vol. 1, no. 1, pp. 90–100.
Mellor, B.A., and Varga, A.P. (1993). Noise masking in a transform domain. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 87–90.
Minami, Y., and Furui, S. (1995). A maximum likelihood procedure for a universal adaptation method based on HMM composition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 129–132.
Mitra, S.K., and Kaiser, J.F. (1993). Handbook for digital signal processing. John Wiley and Sons.
Mokbel, C., and Chollet, G. (1991). Word recognition in the car: speech enhancement/spectral transformation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 925–928.
Morgan, N. (1997). Robust features and environmental compensation: a few comments. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 43–44.
Morgan, N., and Hermansky, H. (1992). RASTA extensions: robustness to additive and convolutional noise. In ETWR: speech processing in adverse conditions, pages 115–118.
Morris, A.C., Hagen, A., and Bourlard, H. (1999). The full-combination subband approach to noise robust HMM/ANN based ASR. In Eurospeech, pages 599–602.
Nadas, A., Nahamoo, D., and Picheny, M.A. (1989). Speech recognition using noise-adaptive prototypes. IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 37, no. 10, pp. 1495–1502.
Neumeyer, L., and Weintraub, M. (1994). Probabilistic optimum filtering for robust speech recognition. In Proc. ICASSP, vol. I, pages 417–420.
Ney, H. (1990). Acoustic-phonetic modeling using continuous mixture densities for 991-word DARPA speech recognition task. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 713–716.
Openshaw, J.P., Mason, J.S. (1994). On the limitations of cepstral features in noise. In Proceedings of the IEEE 1994 International Conference on Acoustic, Speech and Signal Processing (ICASSP94), pages 49-52, Adelaide, Australia.
Oppenhim, A.V., and Schafer, R.W. (1975). Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ.
Paliwal, K. (1993). Use of temporal correlation between successive frames in a hidden Markov models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 215–218.
Paliwal, K., and Basu, A. (1987). A speech enhancement method based on Kalman filtering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 177–180.
Picone, J.W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, vol. 81, no. 9, pp. 1214–1247.
Rabiner, L.R., and Juang, B.-H. (1992). Speech recognition and understanding. Recent advances, Trends and applications, chapter Hidden Markov models for speech recognition - strengths and limitations. Springler-Verlag.
Rabiner, L.R., and Juang, B.-H. (1993). Fundamentals of speech recognition. Prentice Hall.
Rabiner, L.R., Wilpon, J.G., and Juang, B.-H. (1986). A segmental k-means training for connected word recognition. AT&T Tech. J., vol. 65, pp. 21-32.
Rahim, M.G., and Juang, B.-H. (1996). Signal Bias Removal by maximum likelihood estimation for robust telephone speech recognition. IEEE Trans. Speech and Audio Processing, vol. 4, no. 1, pp. 19-30.
Rahim, M.G., and Juang, B.H. (1996). Chou, W., and Buhrke, E., Signal conditioning techniques for robust speech recognition. IEEE Signal Processing Letters, vol. 3, pp. 107-109.
Ramalho, M.A., and Mammone, R.J. (1994). A new speech enhancement technique with application to speaker identification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. I, pages 29–32.
Roe, D.B. (1987). Speech recognition with a noise-adapting codebook. In Proceeding of the IEEE 1987 International Conference on Acoustic, Speech and Signal Processing, (ICASSP87), pages 1139-1142, Dallas, Texas.
Rose, R.C., Hofsetter, E.M., and Reynolds, D.A. (1994). Integrated models of signal and background with application to speaker identification in noise. IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 245–257.
Sankar. A., and Lee, C.-H. (1995). Robust speech recognition based on stochastic matching. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 121–124.
Sankar, A., and Lee, C.-H. (1996). A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. Speech Audio Processing, vol. 4, pp. 190–202.
Sayed, A., and Kailath, A. (1994). A state-space approach to adaptive RLS filtering. IEEE Signal Processing Magazine, vol. 11, no. 3, pp. 18–60.
Sennff, S. (1988). A joint synchrony/mean-rate model of auditory speech recognition. Journal of Phobetics, vol. 16, pp. 55-76.
Selouani, S.-A., Tolba, H., and O'Shaughnessy, D. (2001). Robust automatic speech recognition in low-snr car environments by the application of a connectionist subspace-based approach to the mel-based cepstral coefficients. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 1577-1581, Aalborg, Denmark.
Shin, V., Kim, D.-S., Kim, M.Y., and Kim, J. (2001). Enhancement of noisy speech by using improved global soft decision. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 1929-1934, Aalborg, Denmark.
Singer, H., Umezaki, T., and Itakura, F. (1990). Low bit quantization of the smoothed group delay spectrum for speech recognition. In Proceedings of the IEEE 1990 Proceeding of International Conference on Acoustic, Speech and Signal Processing (ICASSP90), pages 761-765, Albuquerque, NM.
Stern, R.M., Raj, B., and Moreno, P.J. (1997). Compensation for environmental degradation in automatic speech recognition. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 33–42.
Takahashi, J., and Sagayama, S. (1995). Vector-field-smoothed Bayesian learning for incremental speaker adaptation. ICASSP, vol. 1, pages 696–699.
Takiguchi, T., Nakamura, S., Huo, Q., and Shikano, K. (1997). Adaptation of model parameters by HMM decomposition in noisy reverberant environments. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 155–158.
Tibrewala, S., and Hermansky, H. (1997). Multi-band and adaptation approaches to robust speech recognition. In Eurospeech, Rhodes, Greece.
Tohkura, Y. (1987). A weighted cepstral distance measure for speech recognition. IEEE Trans. ASSP, vol. 35, pp. 1414-1422.
Tufekci, Z., Gowdy, J., Gurbuz, S., and Patterson, E. (2001). Applying parallel model compensation with mel-frequency discrete wavelet coefficients for noise-robust speech recognition. In Proceedings of the European Conf. Speech Communication Technology (Eurospeech2001), pages 873-877, Aalborg, Denmark.
Tung, S.-L., Lei, I.-S., and Juang, Y.-T. (1996). Projection-based group delay scheme for speech recognition. IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 138-140.
Umezaki, T., Itakura, F. (1989). Speech analysis by group delay spectrum of all-pole filters and its application to the speech distance measure for speech recognition. Transactions on Institute of Electronics and Communication Engineers of Japan (IECE), Vol. J72-D-II, no. 8. (in Japanese)
Usagawa, T., Iwata, M., and Ibata, M. (1994). Speech parameter extraction in noisy environment using a masking model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 81–84.
Varga, A.P., and Moore, P.K. (1990). Hidden Markov model decomposition of speech and noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 845–848.
Varga, A.P., and Pointing, K. (1989). Control experiments on noise compensation in hidden Markov model based continuous word recognition. In Proc. European Conf. Speech Technology, pages 167–170, Paris.
Varga, A., and Steeneken, H.J.M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, vol. 12, pp. 247-251.
Vaseghi, S.V. (1996). Advanced signal processing and digital noise reduction. Wiley and Sons Ltd.
Vaseghi, S.V., and Milner, B.P. (1993). Noisy speech recognition based on HMMs, Wiener filters and re-evaluation of most likely candidates. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 103–106.
Vaseghi, S.V., and Milner, B.P. (1995). Speech recognition in impulsive noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. I, pages 437–440.
Vaseghi, S.V., and Milner, B.P. (1997). Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. on Speech and Audio Processing, vol. 5, no. 1, pp. 11–21.
Vaseghi, S.V., Milner, B.P., and Humphries, J.J. (1994). Noisy speech recognition using cepstral time features and spectral-time filters. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pages 65–68.
Viiki, O., Bye, D., and Laurila, K. (1998). A recursive feature vector normalization approach for robust speech recognition in noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
Viikki, O., and Laurila, K. (1997). Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization. In Robust speech recognition using unknown communication channel, ESCA-NATO Tutorial and Research Workshop, pages 107–110.
Virag, N. (1996). Speech enhancement based on masking properties of the human auditory system. PhD thesis.
Wang, H.-C. (1997). MAT - A Project to Collect Mandarin Speech Data through Networks in Taiwan. International Journal of Computational Linguistics and Chinese Language Processing, vol. 1, no.2, pp. 73-89.
Wang, H.-C., Seide, F., Tseng, C.-Y., and Lee, L.S. (2000). MAT2000 – Design, collection, and validation of a Mandarin 2000-speaker telephone speech database. In Proceedings of 2000 International Conference on Spoken Language Processing (ICSLP2000), pages 460-463, Beijing, China.
Wellekens, C. (1987). Explicit correlation in hidden Markov models for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 384–387.
Wu, S., Kingsbury, B., Morgan, N., and Greenberg, S. (1998). Incorporating Information from Syllable-length Time Scales into Automatic Speech Recognition. Proc. ICASSP, vol. II, pages 721-724.
Yager, R., and Filev, D. (1994). Essentials of Fuzzy Modeling and Control. New York: Wiley.
Yang, R., Mjaniemi, M., and Haavisto, P. (1995). Dynamic parameter compensation for speech recognition in noise. In Eurospeech, pages 469–472.
Young, S.J. (1992). Cepstral mean compensation for HMM recognition in noise. In ESCA Proc. Speech Processing in Adverse Conditions, pages 123–126, Cannes, France.
Zavagliogkos, G., Schwartz, R., and Makhoul, J. (1995). Batch, incremental and instantaneous adaptation techniques for speech recognition. In Proc. ICASSP, Detroit, MI, pages 676–679.
Zwicker, E. and Fastl, H. (1990). Psychoacoustics: Facts and Models. Springer-Verlag, Berlin. |