參考文獻 |
[1] Y. Tagawa, A. Liutkus, R. Badeau and Gäel Richard, “Gaussian Processes for Underdetermined Source Separation,” IEEE Transactions on Signal Processing, vol. 2, no. 7, Jul. 2011.
[2] P. S. Huang, M. Kim, M. H. Johnson, And P. Smaragdis, “Deep learning for Monaural Speech Separation” , IEEE International Conference on Acoustic, Speech and Signal Processing, vol. 2, no. 7, Jul. 2014.
[3] G. Logeshwari, G. S. Anandha Mala, “A survey on Single Channel Speech separation”, International Conference on Advances in Communication, Network, and Computing, pp. 387-392, Feb. 2012.
[4] M. Stetter, “Regression Methods for Source Separation”, Imaging and Modeling Cortical Population Coding Strategies, pp. 105-124, 2012.
[5] S. Park and S.Choi, “Gausian Process Regression for Voice Activity Detection and Speech Enhancement”, IEEE International Joint Conference on Neural Networks, Jun. 2008.
[6] Mikkel N Schimidt, Rasmus K. Olsson, “Linear regression on sparse features for single-channel speech separation”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2007.
[7] Y. Xu, J. Du, L. R. Dai and C.H. Lee, “A Regression Approach to Speech Enhancement Based on Deep Neural Networks”, IEEE Transactions on Audio, Speech, Language Processing, vol. 23, no. 1, pp. 7-17, Jan. 2015 .
[8] D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot and R. Horaud, “A variational EM algorithm for the Separation of moving source sources”, IEEE Workshop on Application of Signal Processing to Audio and Acoustics, Oct. 2015.
[9] P. Mowlaee and R. Saeidi, “Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement”, IEEE Signal Processing Letters, vol. 20, no. 12, Dec. 2013.
[10] R. Boloix-Tortosa, E. Arias-de-Reyna, F. J. Payan-Somet, and J. J. Murillo-Fuentes, “Complex-Valued Gaussian Processes for Regression: A Widely Non-Linear Approach”,__, Nov. 2015.
[11] T. Gerkmann, M. Krawczyk-Becker, and J. L. Roux, “Phase Processing for Single-Channel Speech Enhancement: History and recent advances”, IEEE Signal Processing Magazine, vol. 32, pp. 55-66, Feb. 2015.
[12] Y. K. Lee, J. G. Park and O. W. Kwon, “Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain”, ETRI Journal, vol. 36, No. 5, pp. 721-727, Oct. 2014
[13] V. Zue, S. Seneff, and J.Glass “Speech database development at MIT: Timit and beyond”, Speech Communication, vol.9, pp.351-356, Aug. 1990.
[14] E. Vincent , R. Gribonval, and C, Févotte, “Performance measurement in blind audio source separation”, IEEE Trans. Audio, Speech and Language Processing, vol. 14, pp. 1462-1469
[15] Lin, Y. B., Pham, T., Lee, Y. S. and Wang, J. C, “Monaural source separation using nonnegative matrix factorization with graph regularization constraint”, Conference on Computational Linguistics and Speech Processing, Oct 2015.
[16] Gyoon, K. T., Kwon, K., Shin, J. W. and Soo, K. N, “NMF-based target source separation using deep neural network”, IEEE Signals Processing Letters, 22, 2,pp. 229-233, Feb. 2015.
[17] J. Eggert, E. Körner, “Sparse coding and NMF”, Proc. IEEE International Joint Conference on Neural Networks, vol.4, pp. 2529 – 2533, 2004.
[18] S.Araki, S. makino, H. Sawada and H. Mukai, “Reducing musical noise by a fine shift overlap-add method applied to source separation using time-frequency mask”, IEEE international conference on Acoustic, Speech, 2005, pp. III-81-III-82.
[19] G. Shi and P. Aarabi, “Robust digit recognition using phase-dependent time-frequency masking”, Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Hong Kong, Apr. 2003, pp. 684–687.
[20] A. C. Lindgren, M. T. Johnson, and R. J. Povinelli, “Speech recognition using reconstructed phase space features”, Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Hong Kong, Apr. 2003, pp. 60–63.
[21] R. Schlüter and H. Ney, “Using phase spectrum information for improved speech recognition performance”, Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Salt Lake City, UT, May 2001, pp. 133–136
[22] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements, “Objective Measures of Speech Quality”, Prentice Hall Advanced Reference Series, Englewood Cliffs, NJ, 1988.
[23] A. Daminaou and N. Lawrence, “Deep Gaussian processes,” JMLR, 31:207-215, 2014.
[24] Tuan Pham, Yuan-Shan Lee,Yan-Bo Lin,Tzu-Chiang Tai and Jia-Ching Wang, “Single Channel Source Separation Using Sparse NMF and Graph Regularization”, ASE BD&SI 2015, Oct. 2015.
[25] K.B. Petersen M. S. Pedersen, The Matrix Cookbook, Nov.15, 2012
[26] Rafael Boloix-Tortosa, F. Javier Payan-Somet, Eva Arias-de-Reyna, Juan José Murillo-Fuentes, “Proper Complex Gaussian Processes for Regression”, CoRR abs/1502.04868, 2015.
[27] P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for polyphonic music transcription”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 177 – 180), Oct. 2003.
[28] K. Paliwal, K. Wojcicki, and B. Shannon, “The importance of phase in speech enhancement”, Speech Communication, vol. 53, no. 4, pp. 465–494, Apr. 2011.
[29] Mikkel N. Schmidt, “Speech separation using non-negative features and sparse non-negative matrix factorization”, __, 2007.
[30] Lehel Csato and Manfred Opper, “Sparse online Gaussian processes”,´ Neural Computation, 14, pp. 641–669, 2002.
[31] Malte Kuss and Carl Edward Rasmussen, “Assessing approximate inference for binary Gaussian process classification”, Journal of Machine Learning Research, pp.1679–1704, 2005.
[32] M. E. Tipping, “Sparse Bayesian learning and the Relevance Vector Machine”, Journal of Machine Learning Research, 1, pp. 211–244, 2001.
[33] B. W. Silverman, “Some aspects of the spline smoothing approach to non-parametric regression curve fitting”, J. Roy. Stat. Soc. B, 47(1), pp. 1–52, 1985.
[34] Carl Edward Rasmussen. “Reduced rank Gaussian process learning”, Technical report, 2002.
[35] M. Helén and T. Virtanen, “Separation of drums from polyphonic music using nonnegative matrix factorization and support vector machine,” Proc. Eur. Signal Process. Conf., 2005.
[36] L. Benaroya, F. Bimbot, L. McDonagh, and R. Gribonval, “Non negative sparse representation for Wiener based source separation with a single sensor” , IEEE Int. Conf. Audio, Speech, Signal Process , pp. 613–616, 2003.
[37] S. A. Abdallah and M. D. Plumbley, “Polyphonic transcription by nonnegative sparse coding of power spectra”, Int. Conf. Music Inf. Retrieval, pp. 318–325, Oct. 2004.
[38] P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for polyphonic music transcription”, IEEE Workshop on Applications of Signal Process. Audio Acoust., pp. 177–180, 2003.
[39] C. Uhle, C. Dittmar, and T. Sporer, “Extraction of drum tracks from polyphonic music using independent subspace analysis”, Proc. 4th Int. Symp. Independent Compon. Anal. Blind Signal Separation, pp. 843–848, 2003.
[40] S. Haykin, Z.Chen, ”The Cocktail Party Problem”, Neural Computation, 17, pp. 1875–1902, Oct. 2005.
[41] R. S. Bolia, W. T. Nelson, and R. M. Morley, “Asymmetric performance in the cocktail party effect: Implications for the design of spatial audio displays”, Human Factors, 43, pp.208–216, 2001.
[42] K. Crispien, &T. Ehrenberg, “Evaluation of the cocktail party effect for multiple speech stimuli within a spatial audio display”, Journal of the Audio Engineering Society,43, pp. 932–940, 1995.
[43] M. L. Hawley, R. Y. Litovsky, and J. F. Culling, “The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer”, Journal of the Acoustical Society of America, 115, pp. 833–843, 2004.
[44] W. A. Yost, R. H. Jr Dye., Jr., and S. Sheft, “A simulated cocktail party with up to three sound sources”, Perception & Psychophysics,58, pp. 1026–1036, 1996.
[45] A. Bronkhorst, “The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions”, Acustica 86, pp.117–128, 2000.
[46] C. E. Rasmussen, C. K. I. Williams, “Gaussian Processes for Machine Learning”. the MIT Press, 2006
[47] C. E. Rasmussen, “Gaussian processes in machine learning”, Available in http://www.cs.ubc.ca/hutter/earg/papers05/rasmussen_gps_in_ml.pdf, January 2011.
[48] M. Ebden, “Gaussian processes for regression: A quick introduction”, Available in http://www.robots.ox.ac.uk/mebden/reports/GPtutorial.pdf, August 2008.
[49] M. Gibbs and D. J. MacKay, “Efficient implementation of Gaussian processes”, Technical report, 1997.
[50] B. Huhle, T. Schairer, A. Schilling, and W. Strasser, “Learning to localize with gaussian process regression on omnidirectional image data”, Intelligent Robots and Systems (IROS), on 2010 IEEE/RSJ International Conference, pp. 5208 – 5213, Oct. 2010.
[51] J.Ko, D. Klein, D. Fox, and D. Haehnel, “Gaussian processes and reinforcement learning for identification and control of an autonomous blimp”, Robotics and Automation, on 2007 IEEE International Conference, pp. 742 –747, April 2007.
[52] I. G. Mattingly, T. A. Sebeok, ”Speech synthesis for phonetic and phonological models”, Current Trends in Linguistics, Mouton, The Hague, 12, pp. 2451–2487, 1974.
[53] A. Breen, “Speech Synthesis Models: A Review”, Electronics & Communication Engineering Journal, vol. 4, pp. 19-31, 1992.
[54] Macon M., Clements C, “Speech Concatenation and Synthesis Using an Overlap-Add Sinusoidal Model”, Proceedings of ICASSP 96, pp. 361-364, 1996.
[55] R. J. McAulay, and T. F. Quatieri, “Speech analysis/synthesis based on a sinusoidal representation,” IEEE Trans. on Acoustics, Speech, and Signal Processing, 34, pp. 744–754, 1986.
[56] T. F. Quatieri and R. J. McAulay, “Speech transformations based on a sinusoidal representation,” Proc. Int. Conf. Acoust., Speech, Signal Processing, pp. 489, 1985.
[57] T. F. Quatieri and R. J. McAulay, “Speech transformations based on a sinusoidal representation,” in Proc. Int. Con$ Acoust., Speech, Signal Processing, Tampa, FL, 1985, p. 489.
[58] X. Rodet and P. Depalle, “A new additive synthesis method using inverse Fourier transform and spectral envelopes”, Proceedings of International Computer Music Conference, pp. 410-411, 1992.
[59] A. Spanias, “Speech coding: A tutorial review”, Proc. IEEE, vol. 82, pp. 1541–1582, Oct. 1994
[60] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. Englewood Cliffs, NJ: Prentice-Hall, 1978.
[61] X. Serra and J. Smith, “Spectral modeling synthesis: a sound analysis/synthesis system based on a deterministic plus stochastic decomposition”, Computer Music Journal, vol. 14, no. 4, pp. 12-24, 1990.
[62] P. Depalle and X. Rodet, ”Synthèse additive par FTT inverse”, Rapport Interne IRCAM, Paris, 1990.
[63] Ph. Depalle and G. Poirot, ”A modular system for analysis, processing and synthesis of sound signals”, Proc. of the Int. Comp. Music Conf., Montreal, Canada, 1991.
[64] X. Rodet, P. Depalle & G. Poirot, ”Speech Analysis and Synthesis Methods Based on Spectral Envelopes and Voiced/Unvoiced Functions”, European Conference on Speech Tech., Edinburgh, U.K., Sept. 1987.
[65] M. R. Portnoff, “Time-scale modification of speech based on short-time Fourier analysis,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP–30, pp. 374–390, Jun. 1981
[66] F. Rumsey, T. McCormick, Sound and recording - an introduction, Elsevier, 2002.
[67] C. E. Speaks, Introduction to sound, Singular, 1999.
[68] I. Cohen and S. Gannot, “Spectral enhancement methods,” in Springer Handbook of Speech Processing, J. Benesty, M. M. Sondhi, Y. Huang (Eds.), Springer 2008, pp.873-901, 2008.
[69] J. Du and Q. Huo, “A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions,” Proc. Interspeech, pp. 569–572, 2008.
[70] D. Griffin and J. S. Lim, “Signal estimation from modified short time Fourier transform,” IEEE Trans. on ASSP, vol.32, no.2, pp.236-243, 1984.
[71] F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform”, Proc. IEEE, vol. 66, pp. 51–83, Jan. 1978.
[72] S. Seneff, “System to independently modify excitation and/or spectrum of speech waveform without explicit pitch extraction”, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP–30, pp. 566–578, Aug. 1982.
[73] F. J. Harris, “On the use of windows for harmonic analysis with the discrete fourier transform,” Proc. IEEE, vol. 66, pp. 51–83, Jan. 1978.
[74] Kuldip K. Paliwal, Leigh D. Alsteris, “On the usefulness of STFT phase spectrum in human listening tests”, Speech communication 45, pp. 153-170, 2005.
[75] J. B. Allen and L. R. Rabiner, “A unified approach to short-time Fourier analysis and synthesis”, Proc. IEEE, vol. 65, pp. 1558-1564, 1997.
[76] X. Serra and J. O. Smith III, “Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Comput. Music J., vol. 14, pp. 12–24, 1990.
[77] R. J. McAulay and T. F. Quatieri, “Phase modeling and its application to sinusoidal transform coding,” Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 1713–1715, Apr. 1986.
[78] B. Yegnanarayana, D.K. Saukia, and T.R. Krishnan, “Significance of group delay functions in signal reconstruction from spectral magnitude or phase,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, no. 3, pp. 610-622, 1984.
[79] K. K. Paliwal and L. Alsteris, “Usefulness of phase spectrum in human speech perception,” in Proc. Eur. Conf. Speech Communication and Technology (Eurospeech), Geneva, Switzerland, Sep. 2003, pp. 2117–2120.
[80] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236–243, Apr. 1984.
[81] B. Bozkurt, B. Doval, C. D′Alessandro and T. Dutoit, “Improved differential phase spectrum processing for formant tracking”, Proc. ICSLP, Jeju, Korea, Oct. 2004 |