參考文獻 |
[1] G. Hinton, S. Osindero, and Y. Teh, ‘‘A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
[2] G. Hinton and R. Salakhutdinov, ‘‘Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006.
[3] Y. Bengio, A. Courville, and P. Vincent, ‘‘Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013.
[4] A. Belouchrani, K. Meraim, J. Cardoso, and E. Moulines, ‘‘A blind source separation technique based on second-order statistics,” IEEE Transactions on Signal Processing, vol. 45, pp. 434-44, 1997.
[5] J. Cardoso, ‘‘Source separation using higher order moments,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 1989, pp. 2109-2112.
[6] Y. Tan, J. Wang, and J. M. Zurada, ‘‘Nonlinear blind source separation using a radial basis function network,” IEEE Transactions Neural Networks, vol. 12, pp. 134-144, 2001.
[7] J. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” IEE Proceedings F-Radar and Signal Processing, 1993, vol. 140, no. 6, pp. 362-370, December 1993.
[8] A. Bell and T. Sejnowski, “An Information-maximization approach to blind separation,” Neural Computation, vol. 7, pp. 1004-1034, 1995.
[9] P. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Deep learning for monaural speech separation,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 1581-1585.
[10] Y. Wang and D. L. Wang, “Feature denoising for speech separation in unknown noisy environments,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7472-7476.
[11] E. Grais and H. Erdogan, “Single channel speech music separation using nonnegative matrix factorization and spectral masks,” in Proceedings International Conference on Digital Signal Processing, pp. 1-6, 2011.
[12] N. Shuai, H. Zhang, X. L. Zhang, and W. J. Liu, “Deep stacking networks with time series for speech separation,’’ in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6667-6671, May 2014.
[13] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504-507, 2006.
[14] A. Ng, “Sparse autoencoder,” CS294A Lecture notes, pp. 72-2011.
[15] S. Nie, H. Zhang, X. Zhang, and W. Liu, “Deep stacking networks with time series for speech separation,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 6667-6671.
[16] D. Williamson, Y. Wang, and D. Wang, “Complex Ratio Masking for Monaural Speech Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, pp. 483-492, March 2016.
[17] R. Caruana, “Multitask learning,” Machine Learning, vol. 28, pp. 41-75, 1997.
[18] N. Guberman, “On Complex Valued Convolutional Neural Networks,” ArXiv preprint arXiv:1602.09046 , Feb. 2016.
[19] BSS_eval ToolBox : http://bass-db.gforge.inria.fr/bss_eval/ , available on 2014/7/11.
[20] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2001, vol. 2, pp. 749-752.
[21] C. Hsu and J. Jang, “On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 2, pp. 310-319, Feb. 2010.
[22] N. Ono, Z. Koldovský, S. Miyabe, and N. Ito, “The 2013 Signal Separation Evaluation Campaign,” in Proceedings IEEE International Workshop on Machine Learning for Signal Processing, pp. 1-6, Sep. 2013.
[23] B. Xia and C. Bao, “Speech enhancement with weighted denoising Auto-Encoder,” in Proceedings Interspeech, 2013, pp. 3444-3448.
[24] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising Autoencoder,” in Proceedings Interspeech, 2013, pp. 436-440.
[25] M. Hermans and B. Schrauwen, “Training and analyzing deep recurrent neural networks,” in Proceedings Advances in Neural Information Processing Systems, 2013, pp. 190-198.
[26] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” in Proceedings International Conference on Learning Representations, 2014.
[27] F. Weninger, F. Eyben, and B. Schuller, “Single-channel speech separation with memory-enhanced recurrent neural networks,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 3709-3713.
[28] P. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Deep learning for monaural speech separation,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 1562-1566.
[29] P. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Joint optimization of masks and deep recurrent neural networks for monaural source separation,” ArXiv preprint arXiv:1502.04149, pp. 1-12, 2015.
[30] P. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, ‘‘Joint optimization of masks and deep recurrent neural networks for monaural source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136-2147.
[31] J. Bouvrie, “Notes on convolutional neural networks,’’ Technical report, 2006.
[32] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov 1998.
[33] L. Lin, K. Wang, W. Zuo, M. Wang, J. Luo, and L. Zhang, ‘‘A deep structured model with radius-margin bound for 3D human activity recognition,” International Journal of Computer Vision, 2016.
[34] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, ‘‘Large-scale video classification with convolutional neural networks,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2014.
[35] A. Jourjine, S. Rickard, and Ö. Yilmaz, “Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2000, pp. 2985-2988.
[36] Ö. Yιlmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Transactions on Signal Processing, vol. 52, no. 7, pp. 1830-1847, Jul. 2004.
[37] G. Bao, Z. Ye, X. Xu, and Y. Zhou, “A compressed sensing approach to blind separation of speech mixture based on a two-layer sparsity model,” IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 5, pp. 899-906, May 2013.
[38] S. Roweis, “One microphone source separation,” in Proceedings Advances in Neural Information Processing Systems, 2000, pp. 793-799.
[39] M. Schmidt and R. Olsson, “Single-channel speech separation using sparse non-negative matrix factorization,” in Proceedings Interspeech, 2006, pp. 2614-2617.
[40] M. Radfar and R. Dansereau, “Single-channel speech separation using soft mask filtering,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2299-2310, Nov. 2007.
[41] Y. Lee, I. Lee, and O. Kwon, “Single-channel speech separation using phase-based methods,” IEEE Transactions on Consumer Electronics, vol. 56, no. 4, pp. 2453-2459, Nov. 2010.
[42] B. King and L. Atlas, “Single-channel source separation using complex matrix factorization,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 8, pp. 2591-2597, Nov. 2011.
[43] B. Gao, W. Woo, and S. Dlay, “Single-channel source separation using EMD-subband variable regularized sparse features,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 961-976, May 2011.
[44] P. Mowlaee, R. Saeidi, M. Christensen, Z. Tan, T. Kinnunen, P. Franti, and S. Jensen, “A joint approach for single-channel speaker identification and speech separation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 9, pp. 2586-2601, Nov. 2012.
[45] C. Demir, M. Saraclar, and A. Cemgil, “Single-channel speech-music separation for robust ASR with mixture models,” IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 4, pp. 725-736, Apr. 2013.
[46] P. Li, Y. Guan, B. Xu, and W. Liu, “Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 2014-2023, Nov. 2006.
[47] G. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech and Language, vol. 8, no. 4, pp. 297-336, 1994.
[48] D. P. Ellis, “Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures,” Speech Communication, vol. 27, no. 3, pp. 281-298, 1999.
[49] B. King and L. Atlas, “Single-channel source separation using simplified-training complex matrix factorization,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 4206-4209.
[50] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[51] J. Eggert and E. Korner, “Sparse coding and NMF,” in Proceedings IEEE International Joint Conference Neural Networks, 2004, vol. 4, pp. 2529-2533.
[52] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, “Complex NMF: A new sparse representation for acoustic signals,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 3437-3440.
[53] Y. Wang, A. Narayanan, and D. Wang, ‘‘On Training Targets for Supervised Speech Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 1849-1858, Dec. 2014.
[54] Y. Wang and D. Wang, “Cocktail party processing via structured prediction,” in Proceedings Advances in Neural Information Processing Systems, 2012, pp. 224-232.
[55] Y. Wang and D. Wang, “Towards scaling up classification-based speech separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 7, pp. 1381-1390, Jul. 2013.
[56] K. Paliwal, K. Wójcicki, and B. Shannon, “The importance of phase in speech enhancement,” Speech Communication, vol. 53, no. 4,pp. 465-494, Apr. 2011.
[57] M. Krawczyk and T. Gerkmann, ‘‘STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 1931-1940, Dec. 2014.
[58] T. Gerkmann, M. Krawczyk-Becker and J. Le Roux, ‘‘Phase Processing for Single-Channel Speech Enhancement: History and recent advances,” IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 55-66, March 2015.
[59] H. Leung and S. Haykin, ‘‘The complex backpropagation algorithm,” IEEE Transactions on Signal Processing, vol. 39, no. 9, pp. 2101-2104, Sep 1991.
[60] N. Benvenuto and F. Piazza, ‘‘On the complex backpropagation algorithm,” IEEE Transactions on Signal Processing, vol. 40, no. 4, pp. 967-969, Apr 1992.
[61] A. Hirose, Complex-Valued Neural Networks, 2nd ed. Berlin, Germany: Springer-Verlag, 2012. |