參考文獻 |
[1] Yi Luo, Zhuo Chen, John R Hershey, Jonathan Le Roux, and Nima Mesgarani, “Deep clustering and conventional networks for music separation: Stronger together,” In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 61–65, 2017.
[2] Andreas Jansson, Eric J. Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar, and Tillman Weyde. “Singing voice separation with deep U-Net convolutional networks”. In Proceedings of the International Society for Music Information Retrieval Conference, pp. 323–332, 2017.
[3] Y. Luo and N. Mesgarani, “TasNet: Surpassing ideal timefrequency masking for speech separation,” arXiv preprint arXiv:1809.07454, 2018.
[4] Daniel Stoller, Sebastian Ewert, and Simon Dixon, “WaveU-Net: A multi-scale neural network for end-to-end source separation,” In Proceedings of the International Society for Music Information Retrieval Conference, vol. 19, pp. 334–340, 2018.
[5] C. Lea, R. Vidal, A. Reiter, and G. D. Hager, “Temporal convolutional networks: A unified approach to action segmentation,” In European Conference on Computer Vision. Springer, pp. 47–54, 2016.
[6] C. L. M. D. F. Ren´e and V. A. R. G. D. Hager, “Temporal convolutional networks for action segmentation and detection,” In IEEE International Conference on Computer Vision, 2017.
[7] S. Bai, J. Z. Kolter, and V. Koltun. “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.” arXiv:1803.01271, 2018.
[8] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” In Conference on Computer Vision and Pattern Recognition, 2015.
[9] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation.” In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241, 2015.
[10] F.-R. St¨oter, A. Liutkus, and N. Ito, “The 2018 signal separation evaluation campaign,” In Proc. International Conference on Latent Variable Analysis and Signal Separation, 2018.
[11] Jen-Yu Liu and Yi-Hsuan Yang. “Denoising auto-encoder with recurrent skip connections and residual regression for music source separation.” In Proc. IEEE Int. Conf. Machine Learning and Applications, pp. 773–778, 2018.
[12] Naoya Takahashi and Yuki Mitsufuji. “Multi-scale multi-band densenets for audio source separation.” In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp 21–25, 2017.
[13] Naoya Takahashi, Nabarun Goswami, and Yuki Mitsufuji. “MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation.” Proc. International Workshop on Acoustic Signal Enhancement, pp. 106–110, 2018.
[14] Wang, D., and Jae Lim. “The unimportance of phase in speech enhancement.” IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 679-681, 1982.
[15] Kazama, Michiko, et al. “On the significance of phase in the short term Fourier spectrum for speech intelligibility.” The Journal of the Acoustical Society of America, pp. 1432-1439, 2010.
[16] Gerkmann, Timo, Martin Krawczyk-Becker, and Jonathan Le Roux. “Phase processing for single-channel speech enhancement: History and recent advances.” IEEE signal processing Magazine, pp. 55-66, 2015.
[17] Moon, Sang-Hyun, Bonam Kim, and In-Sung Lee. “Importance of phase information in speech enhancement.” Complex, Intelligent and Software Intensive Systems, 2010.
[18] Paliwal, Kuldip, Kamil Wójcicki, and Benjamin Shannon. “The importance of phase in speech enhancement.” speech communication, pp. 465-494, 2011.
[19] Y. Tan, J. Wang, and J. M. Zurada, ‘‘Nonlinear blind source separation using a radial basis function network,” IEEE Transactions Neural Networks, vol. 12, pp. 134-144, 2001.
[20] S. Pascual, A. Bonafonte, J. Serrà, “Segan: Speech enhancement generative adversarial network”, INTERSPEECH, 2017.
[21] Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016.
[22] Badrinarayanan, V., Kendall, A., Cipolla, R.: “SegNet: a deep convolutional encoder-decoder architecture for image segmentation.”, 2017.
[23] Grauman, K., Darrell, T.: “The pyramid match kernel: discriminative classification with sets of image features.” In IEEE International Conference on Computer Vision, 2005.
[24] Lazebnik, S., Schmid, C., Ponce, J. “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories.” In Conference on Computer Vision and Pattern Recognition, 2006.
[25] He, K., Zhang, X., Ren, S., Sun, J. “Spatial pyramid pooling in deep convolutional networks for visual recognition.” In The European Conference on Computer Vision, 2014.
[26] Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. “Pyramid scene parsing network.” In Conference on Computer Vision and Pattern Recognition, 2017.
[27] Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: “Rethinking atrous convolution for semantic image segmentation.” arXiv:1706.05587, 2017
[28] Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L. “DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs.” In The IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, 834–848, 2017
[29] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. “Encoder-decoder with atrous separable convolution for semantic image segmentation.” In The European Conference on Computer Vision, 2018.
[30] Sifre, L. “Rigid-motion scattering for image classification.” Ph.D. thesis, 2014
[31] Vanhoucke, V. “Learning visual representations at scale” (invited talk). In The International Conference on Learning Representations, 2014
[32] Howard, A.G., et al. “MobileNets: efficient convolutional neural networks for mobile vision applications.” arXiv:1704.04861, 2017
[33] Vincent Dumoulin and Francesco Visin. “A guide to convolution arithmetic for deep learning.” arXiv preprint arXiv:1603.07285, 2016.
[34] A. Vaswani, N. Shazeer, N. Parmar, and J. Uszkoreit, “Attention is all you need,” arXiv Preprint, arXiv:1706.03762, 2017.
[35] J. Hu, L. Shen, and G. Sun. “Squeeze-and-excitation networks.” arXiv preprint arXiv:1709.01507, 2017.
[36] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. “CBAM: convolutional block attention module.” In The European Conference on Computer Vision, 2018.
[37] Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, and Michael Auli. “Pay less attention with lightweight and dynamic convlutions.” In Proc. of The International Conference on Learning Representations, 2019.
[38] Chollet, F.: Xception. “deep learning with depthwise separable convolutions.” In Conference on Computer Vision and Pattern Recognition, 2017
[39] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. “Mobilenetv2: Inverted residuals and linear bottlenecks.” Conference on Computer Vision and Pattern Recognition, 2018.
[40] W. Shi, J. Caballero, F. Husz´ar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” In Conference on Computer Vision and Pattern Recognition, 2016.
[41] M. S. Sajjadi, R. Vemulapalli, and M. Brown, “Frame-recurrent video super-resolution,” In Conference on Computer Vision and Pattern Recognition, 2018.
[42] T.-J. Yang, M. D. Collins, Y. Zhu, J.-J. Hwang, T. Liu, X. Zhang, V. Sze, G. Papandreou, and L.-C. Chen,“DeeperLab: Single-Shot Image Parser,” arXiv preprint arXiv:1902.05093, 2019.
[43] Maas, Andrew L, Hannun, Awni Y, and Ng, Andrew Y. “Rectifier nonlinearities improve neural network acoustic models.” In International Conference on Machine Learning, vol. 30, 2013.
[44] Xu, B.; Wang, N.; Chen, T.; and Li, M. “Empirical evaluation of rectified activations in convolutional network.” arXiv preprint arXiv:1505.00853, 2015.
[45] Antoine Liutkus, Derry Fitzgerald, and Zafar Rafii. Scalable audio separation with light kernel additive modelling. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 76–80, 2015.
[46] BSSEval v4 Evaluation tools : https://github.com/sigsep/sigsep-mus-eval , available on 2019/6/30.
[47] Raspberry Pi 3 Model B : https://www.raspberrypi.org/products/raspberry-pi-3-model-b, available on 2019/6/30.
[48] E. Vincent, R. Gribonval, and C. Fevotte. “Performance measurement in blind audio source separation.” In IEEE Transactions on Audio, Speech, and Language Processing, pp. 1462–1469, 2006.
[49] Tensorflow 1.13.1: https://github.com/tensorflow/tensorflow/releases/tag/v1.13.1, available on 2019/6/30.
[50] Diederik P Kingma and Jimmy Ba. Adam. “A method for stochastic optimization”. 2015.
[51] Alice Cohen-Hadria, Axel Roebel, and Geoffroy Peeters. “Improving Singing Voice Separation Using Deep U-Net and Wave-U-Net with Data Augmentation.” submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2019.
[52] C. Liu, L.-C. Chen, F. Schroff, H. Adam, W. Hua, A. Yuille, and L. Fei-Fei. Auto-DeepLab: “Hierarchical neural architecture search for semantic image segmentation.” arXiv preprint arXiv:1901.02985, 2019.
[53] Wave-U-Net: https://github.com/f90/Wave-U-Net, available on 2019/6/30.
[54] C. Dong, C. C. Loy, K. He, and X. Tang. “Image super-resolution using deep convolutional networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
[55] Roux, J. L., Wisdom, S., Erdogan, H., and Hershey, J. R. “SDR-half-baked or well done?” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2019.
|