參考文獻 |
[1] P. Di, L. D. Milone, H. L. Rufiner, and M. Yanagida, “Perceptual evaluation of blind source separation for robust speech recognition,” Signal Processing, no. 10, pp. 2578-2583, 2008.
[2] K. Reindl, Y. Zheng, and W. Kellermann, “Speech enhancement for binaural hearing aids based on blind source separation,” In Proc. ISCCSP, pp. 1-6, 2010.
[3] A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, and L. Daudet, “Kernel additive models for source separation,” IEEE Transactions on Signal Processing, pp. 4298-4310, 2014.
[4] P. Smaragdis, C. Fevotte, G. J. Mysore, N. Mohammadiha, and M. Hoffman, “Static and dynamic source separation using nonnegative factorizations: a unified view,” IEEE Signal Processing Magazine, pp. 66-75, 2014.
[5] T. Pham, Y. S. Lee, Y. B. Lin, T. C. Tai, and J. C. Wang, “Single channel source separation using sparse NMF and graph regularization,” In Proc. of the ASE BigData and SocialInformatics, p. 55, 2015.
[6] P. S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Deep learning for monaural speech separation,” In Proc. ICASSP, pp. 1562-1566, 2014.
[7] D. D. Lee, and H. S. Seung, “Algorithms for non-negative matrix factorization,” Advances in Neural Information Processing Systems, Cambridge, MA, USA: MIT Press, 13, 2001.
[8] N. Mikkel, “Speech separation using non-negative features and sparse non-negative matrix factorization,” Elsevier, 2007.
[9] A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, & L. Daudet, “Kernel additive models for source separation,” IEEE Transactions on Signal Processing, pp. 4298-4310, 2014.
[10] K. Minje, and P. Smaragdis, “Mixtures of local dictionaries for unsupervised speech enhancement,” IEEE Signal Processing Letters, pp. 293 – 297, 2015.
[11] J. Eggert, and E. Körner, “Sparse coding and NMF,” in Proc. IEEE International Joint Conference on Neural Networks, no. 4, pp. 2529 - 2533.
[12] D. Cai, X. He, J. Han, and T. Huang, “Graph regularized nonnegative matrix factorization for data representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1548-1560, 2010.
[13] P. Sprechmann, A. M. Bronstein, and G. Sapiro, “Real-time online singing voice separation from monaural recordings using robust low-rank modeling,” In Proc. ISMIR, pp. 67-72. 2012.
[14] W. Xu, L. Xin, and G. Yihong, “Document clustering based on non-negative matrix factorization,” In Proc. International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267-273, 2003.
[15] V. P. Pauca, S. Farial, M. W. Berry, and R. J. Plemmons, “Text mining using non-negative matrix factorizations,” In Proc. SIAM International Conference on Data Mining, 2004.
[16] P. Hoyer, “Non-negative matrix factorization with sparseness Constraints,” Journal of machine learning research, pp. 1457-1469, 2004.
[17] C. Févotte, and J. Idier, “Algorithms for nonnegative matrix factorization with the beta-divergence,” Neural Computation, pp. 2421-2456, 2011.
[18] J. L. Roux, F. Weninger, and J. R. Hershey, “Sparse NMF – half-baked or well done?” Mitsubishi Electric Research Laboratories Technical Report. 2015.
[19] Y. Wang, and D. L. Wang, “Towards scaling up classification-based speech separation,” IEEE Transactions on Audio, Speech and Language Processing, no. 7, pp. 1381-1390, 2013.
[20] S. Nie, S. Liang, H. Li, X. L. Zhang, Z. L. Yang, W. J. Liu, and L. K. Dong, “Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation,” In Proc. ICASSP, pp. 469-473, 2016.
[21] M. Belkin, and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” Advances in neural information processing systems, Cambridge, MIT Press, 2001.
[22] T. Virtanen, “Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria,” IEEE Transactions on Audio, Speech and Language Processing, no. 3, pp. 1066-1074, 2007.
[23] F. Weninger, L. R. Jonathan, J. R. Hershey, and S. Watanabe, “Discriminative NMF and its application to single-channel source separation,” In Proc. INTERSPEECH, pp. 865-869. 2014.
[24] P. Sprechmann, A. M. Bronstein, and G. Sapiro, “Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement,” In Proc. Hands-free Speech Communication and Microphone Array, pp. 11-15, 2014.
[25] K. T. Gyoon, K. Kwon, J. W. Shin, and K. N. Soo, “NMF-based target source separation using deep neural network,” IEEE Signals Processing Letters, 229-233, 2015.
[26] C. Fevotte, N. Bertin, and J. L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music,” Neural computation, pp. 793-830, 2009.
[27] X. Niyogi. “Locality preserving projections,” In Proc. Neural information processing systems, MIT, 2004.
[28] P. S. Huang, M. Kim, M. Hasegawa-Johnson, & P. Smaragdis, “Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks,” In ISMIR, pp. 477-482, 2014.
[29] D. E. Rumelhart, G. E. Hinton, & R. J. Williams, “Learning representations by back-propagating errors”. Nature, 1986.
[30] D. C. Liu, and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Mathematical programming, no. 1-3, pp. 503-528, 1989.
[31] Y. H. Yang, “Low-rank representation of both singing voice and music accompaniment via learned dictionaries,” In Proc. ISMIR, pp. 427-432. 2013.
[32] J. Mairal, F. Bach, J. Ponce and G. Sapiro. “Online learning for matrix factorization and sparse coding,” Journal of Machine Learning Research, vol. 11, pp. 19-60. 2010.
[33] E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech and Language Processing, no. 14, pp. 1462-1469, 2006.
[34] S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, and Y. Mitsufuji, “Improving music source separation based on deep neural networks through data augmentation and network blending,” In Proc. ICASSP, pp. 261-265, 2017.
[35] S. W. McCulloch, and W. Pitts. “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, pp.115-133, 1943.
[36] A. A. Nugraha, L. Antoine, and V. Emmanuel. “Multichannel audio source separation with deep neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, no. 9, pp. 1652-1664, 2016.
[37] S. Ioffe, and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift”. arXiv preprint, arXiv:1502.03167, 2015.
[38] T. Pham, Y. S. Lee, Y. B. Lin, Y. H. Li, T. C. Tai, and J. C. Wang, “Single channel source separation using graph sparse NMF and adaptive dictionary learning,” Intelligent Data Analysis, vol. 21, 2017.
[39] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov. “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, pp.1929-1958, 2014.
[40] J. Yosinski, J. Clune, Y. Bengio and H. Lipson. “How transferable are features in deep neural networks?” In Advances in neural information processing systems. pp. 3320-3328, 2014.
[41] B. Logan. “Mel frequency cepstral coefficients for music modeling,” ISMIR, 2000.
[42] J. Andén and S. Mallat, “Deep scattering spectrum,” IEEE Transactions on Signal Processing, vol. 62, no. 16, pp. 4114-4128, 2014.
[43] J. Bruna, P. Sprechmann, and Y. Lecun, “Source separation with scattering non-negative matrix factorization,” In Proc. ICASSP, 2015.
[44] M. Cooke, J. Barker, S. Cunninghamand and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” Journal of the Acoustical Society of America, pp. 2421-2424, 2006
[45] S. Seneff, J. Glass, V. Zue, “Speech database development at MIT: Timit and beyond,” Speech Communication, pp. 351-356, 1990. |