參考文獻 |
[1] Kitamura, Daichi, et al. ”Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing.” Signal Processing and Information Technology (ISSPIT), 2013 IEEE International Symposium on. IEEE, 2013.
[2] Xu, Yong, et al. ”A regression approach to speech enhancement based on deep neural networks.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23.1 (2015): 7-19.
[3] Wang, Yuxuan, Arun Narayanan, and DeLiang Wang. ”On training targets for supervised speech separation.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 22.12 (2014): 1849-1858.
[4] Huang, Po-Sen, et al. ”Deep learning for monaural speech separation.” Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[5] Huang, Po-Sen, et al. ”Joint optimization of masks and deep recurrent neural networks for monaural source separation.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23.12 (2015): 2136-2147.
[6] Wang, Guan-Xiang, Chung-Chien Hsu, and Jen-Tzung Chien. ”Discriminative deep recurrent neural networks for monaural speech separation.” Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
[7] Wang, D., and Jae Lim. ”The unimportance of phase in speech enhancement.” IEEE Transactions on Acoustics, Speech, and Signal Processing 30.4 (1982): 679-681.
[8] Kazama, Michiko, et al. ”On the significance of phase in the short term Fourier spectrum for speech intelligibility.” The Journal of the Acoustical Society of America 127.3 (2010): 1432-1439.
[9] Gerkmann, Timo, Martin Krawczyk-Becker, and Jonathan Le Roux. ”Phase processing for single-channel speech enhancement: History and recent advances.” IEEE signal processing Magazine 32.2 (2015): 55-66.
[10] Moon, Sang-Hyun, Bonam Kim, and In-Sung Lee. ”Importance of phase information in speech enhancement.” Complex, Intelligent and Software Intensive Systems (CISIS), 2010 International Conference on. IEEE, 2010.
[11] Paliwal, Kuldip, Kamil Wójcicki, and Benjamin Shannon. ”The importance of phase in speech enhancement.” speech communication 53.4 (2011): 465-494.
[12] Williamson, Donald S., Yuxuan Wang, and DeLiang Wang. ”Complex ratio masking for joint enhancement of magnitude and phase.” Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
[13] Williamson, Donald S., Yuxuan Wang, and DeLiang Wang. ”Complex ratio masking for monaural speech separation.” IEEE/ACM transactions on audio, speech, and language processing 24.3 (2016): 483-492.
[14] Lee, Yuan-Shan, et al. ”Fully complex deep neural network for phase-incorporating monaural source separation.” Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
[15] A. Jourjine, S. Rickard, and Ö. Yilmaz, “Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2000, pp. 2985-2988.
[16] Ö. Yιlmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Transactions on Signal Processing, vol. 52, no. 7, pp. 1830-1847, Jul. 2004.
[17] G. Bao, Z. Ye, X. Xu, and Y. Zhou, “A compressed sensing approach to blind separation of speech mixture based on a two-layer sparsity model,” IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 5, pp. 899-906, May 2013.
[18] A. Belouchrani, K. Meraim, J. Cardoso, and E. Moulines, ‘‘A blind source separation technique based on second-order statistics,” IEEE Transactions on Signal Processing, vol. 45, pp. 434-44, 1997.
[19] J. Cardoso, ‘‘Source separation using higher order moments,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 1989, pp. 2109-2112.
[20] Y. Tan, J. Wang, and J. M. Zurada, ‘‘Nonlinear blind source separation using a radial basis function network,” IEEE Transactions Neural Networks, vol. 12, pp. 134-144, 2001.
[21] J. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” IEE Proceedings F-Radar and Signal Processing, 1993, vol. 140, no. 6, pp. 362-370, December 1993.
[22] A. Bell and T. Sejnowski, “An Information-maximization approach to blind separation,” Neural Computation, vol. 7, pp. 1004-1034, 1995.
[23] S. Roweis, “One microphone source separation,” in Proceedings Advances in Neural Information Processing Systems, 2000, pp. 793-799.
[24] M. Schmidt and R. Olsson, “Single-channel speech separation using sparse non-negative matrix factorization,” in Proceedings Interspeech, 2006, pp. 2614-2617.
[25] M. Radfar and R. Dansereau, “Single-channel speech separation using soft mask filtering,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2299-2310, Nov. 2007.
[26] Y. Lee, I. Lee, and O. Kwon, “Single-channel speech separation using phase-based methods,” IEEE Transactions on Consumer Electronics, vol. 56, no. 4, pp. 2453-2459, Nov. 2010.
[27] B. King and L. Atlas, “Single-channel source separation using complex matrix factorization,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 8, pp. 2591-2597, Nov. 2011.
[28] B. Gao, W. Woo, and S. Dlay, “Single-channel source separation using EMD-subband variable regularized sparse features,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 961-976, May 2011.
[29] P. Mowlaee, R. Saeidi, M. Christensen, Z. Tan, T. Kinnunen, P. Franti, and S. Jensen, “A joint approach for single-channel speaker identification and speech separation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 9, pp. 2586-2601, Nov. 2012.
[30] C. Demir, M. Saraclar, and A. Cemgil, “Single-channel speech-music separation for robust ASR with mixture models,” IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 4, pp. 725-736, Apr. 2013.
[31] P. Li, Y. Guan, B. Xu, and W. Liu, “Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 6, pp. 2014-2023, Nov. 2006.
[32] G. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech and Language, vol. 8, no. 4, pp. 297-336, 1994.
[33] D. P. Ellis, “Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures,” Speech Communication, vol. 27, no. 3, pp. 281-298, 1999.
[34] B. King and L. Atlas, “Single-channel source separation using simplified-training complex matrix factorization,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 4206-4209.
[35] D. Lee and H. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[36] J. Eggert and E. Korner, “Sparse coding and NMF,” in Proceedings IEEE International Joint Conference Neural Networks, 2004, vol. 4, pp. 2529-2533.
[37] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, “Complex NMF: A new sparse representation for acoustic signals,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 3437-3440.
[38] Y. Wang and D. L. Wang, “Feature denoising for speech separation in unknown noisy environments,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7472-7476.
[39] E. Grais and H. Erdogan, “Single channel speech music separation using nonnegative matrix factorization and spectral masks,” in Proceedings International Conference on Digital Signal Processing, pp. 1-6, 2011.
[40] N. Shuai, H. Zhang, X. L. Zhang, and W. J. Liu, “Deep stacking networks with time series for speech separation,’’ in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6667-6671, May 2014.
[41] Y. Wang and D. Wang, “Cocktail party processing via structured prediction,” in Proceedings Advances in Neural Information Processing Systems, 2012, pp. 224-232.
[42] Y. Wang and D. Wang, “Towards scaling up classification-based speech separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 7, pp. 1381-1390, Jul. 2013.
[43] B. Xia and C. Bao, “Speech enhancement with weighted denoising Auto-Encoder,” in Proceedings Interspeech, 2013, pp. 3444-3448.
[44] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising Autoencoder,” in Proceedings Interspeech, 2013, pp. 436-440.
[45] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504-507, 2006.
[46] M. Hermans and B. Schrauwen, “Training and analyzing deep recurrent neural networks,” in Proceedings Advances in Neural Information Processing Systems, 2013, pp. 190-198.
[47] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” in Proceedings International Conference on Learning Representations, 2014.
[48] F. Weninger, F. Eyben, and B. Schuller, “Single-channel speech separation with memory-enhanced recurrent neural networks,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 3709-3713.
[49] He, Kaiming, et al. ”Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” Proceedings of the IEEE international conference on computer vision. 2015.
[50] Glorot, Xavier, and Yoshua Bengio. ”Understanding the difficulty of training deep feedforward neural networks.” Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 2010.
[51] Kingma, Diederik, and Jimmy Ba. ”Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980 (2014).
[52] Tieleman, T. and Hinton, G. Lecture 6.5 - RMSProp, COURSERA: Neural Networks for Machine Learning.Technical report, 2012.
[53] Hsu, Chao-Ling, and Jyh-Shing Roger Jang. ”On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset.” IEEE Transactions on Audio, Speech, and Language Processing 18.2 (2010): 310-319.
[54] BSS_eval ToolBox : http://bass-db.gforge.inria.fr/bss_eval/ , available on 2014/7/11.
[55] Loizou, Philipos C. Speech enhancement: theory and practice. CRC press, 2013.
[56] Rix, Antony W., et al. ”Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs.” Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP′01). 2001 IEEE International Conference on. Vol. 2. IEEE, 2001.
[57] A. Ng, “Sparse autoencoder,” CS294A Lecture notes, pp. 72-2011.
[58] F. Weninger, F. Eyben, and B. Schuller, “Single-channel speech separation with memory-enhanced recurrent neural networks,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 3709-3713. |