參考文獻 |
[1] A . Kumar and D. Florencio, , “Speech enhancement in multiple-noise conditions using deep neural networks,” inProc. of theInt. Speech Communication Association Conf. (INTERSPEECH), 2016.
[2] Lei Sun, Jun Du, Li-Rong Dai, and Chin-Hui Lee, “ Multiple-target deep learningfor LSTM-RNN based speech enhancemen,” InHSCMA, 2017.
[3] T. Gao, J. Du, L.-R. Dai, and C.-H. Lee,, “Densely connected pro-gressive learning for LSTM-based speech enhancement,,” in Proc. ICASSP, 2018.
[4] T. Kounovsky and J. Malek,, “Single channel speech enhance-ment using convolutional neural network,” in Electronics, Con-trol, Measurement, Signals and their Application to Mechatronics(ECMSM), 2017 IEEE International Workshop of., IEEE, 2017.
[5] S. R. Park and J. Lee, “A Fully Convolution Neural Network for Speech Enhancement”.arXiv preprint arXiv:1609.07132, 2016.
[6] Pascual, Santiago, Antonio Bonafonte, and Joan Serrà, “SEGAN: Speech Enhancement Generative Adversarial Network”.arXiv preprint arXiv:1703.09452(2017)..
[7] Y. Luo and N. Mesgarani,, “Tasnet: time-domain audio separation network for real-time, single-channel speech separation”. in2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 696–700..
[8] Y. Luo, Z. Chen, and T. Yoshioka, “Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation”.arXiv preprint arXiv:1910.06379, 2019.
[9] Y. Luo and N. Mesgarani, “Conv-TASnet: Surpassing ideal time-frequency magnitude masking for speech separation,” IEEE/ACMtransactions on audio, speech, and language processing, vol. 27,no. 8, pp. 1256–1266, 2019.
[10] C. K. Reddy, V. Gopal, R. Cutler, E. Beyrami, R. Cheng,H. Dubey, S. Matusevych, R. Aichner, A. Aazami, S. Braunet al., “The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” arXiv preprint arXiv:2005.13981, 2020.
[11] Nils L. Westhausen, Bernd T. Meyer, “Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression,” arXivpreprint arXiv:2005.07551, 2020.
[12] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of the IEEE, vol. 67,no. 12, pp. 1586–1604, 1979.
[13] Y. Ephraim and D. Malah,, “Speech enhancement using aminimummean square error short-time spectral amplitude estimator,” IEEE Transactions on acoustics, speech, and signal pro-cessing, vol. 32, no. 6, pp. 1109–1121, 1984.
[14] N. Krishnamurthy and J. H. Hansen, “Babble noise: modeling, analysis, and applications,” IEEE transactions on audio, speech, and language processing, vol. 17, no. 7, pp. 1394–1407, 2009.
[15] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “An experimental study on speech enhancement based on deep neural networks,” IEEESignal processing letters, vol. 21, no. 1, pp. 65–68, 2013.
[16] K. Han, Y. Wang, and D. Wang, “Learning spectral mapping for speech dereverberation”.IEEE, 2014, pp. 4628–4632.
[17] S. R. Park and J. Lee, “A fully convolutional neural network for speech enhancement,” arXiv preprint arXiv:1609.07132, 2016.
[18] H. Phanet al., “Improving gans for speech enhancement,” preprint arXiv:2001.05532, 2020.
[19] S. Pascual, A. Bonafonte, and J. Serra, “Segan: Speech enhancement generative adversarial network,” preprint arXiv:1703.09452, 2017.
[20] E. Nachmani, Y. Adi, and L. Wolf,, “Voice separation withan unknown number of multiple speakers,” arXiv:2003.01531, 2020.
[21] A. Dfossezet al., “Music source separation in the waveform domain,” preprint arXiv:1911.13254, 2019.
[22] N. Virag, “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Transactions on Speech and Audio Processing, pp. 126 - 137, 1999.
[23] Thomas Lotter and Peter Vary, “Dual-Channel Speech Enhancement by Superdirective Beamforming,” EURASIP Journal on Advances in Signal Processing, 2006.
[24] J. Meyer, K.U. Simmer, “Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction,” 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997.
[25] L. Griffiths and C. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Transactions on antennas and propagation, vol. 30, no. 1, pp. 27–34, 1982.
[26] C. Valentini-Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “Investigating rnn-based speech enhancement methods for noise-robust text-to-speech”. inSSW, 2016, pp. 146–152..
[27] J.-M. Valin, “ A hybrid dsp/deep learning approach to real-time full-band speech enhancemen”. in2018 IEEE 20th InternationalWorkshop on Multimedia Signal Processing (MMSP).IEEE,2018, pp. 1–5..
[28] Y. Isik, J. L. Roux, Z. Chen, S. Watanabe, and J. R. Hershey, “Single-channel multi-speaker separation using deep clustering,” arXiv preprint arXiv:1607.02173, 2016.
[29] M. Kolbæk, D. Yu, Z.-H. Tan, and J. Jensen, “Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks,” IEEE/ACM Transactions on Au-dio, Speech, and Language Processing, vol. 25, no. 10, pp. 1901–1913, 2017.
[30] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780,, 1997.
[31] M. Maciejewski, G. Wichern, E. McQuinn, and J. Le Roux, “Whamr!: Noisy and reverberant single-channel speech separation”. inICASSP 2020-2020 IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP). IEEE, 2020,pp. 696–700..
[32] W. Verhelst, “Overlap-add methods for time-scaling of speech”.Speech Communication, 2000.
[33] Geoffrey Hinton, Oriol Vinyals, Jeff Dean, "Distilling the Knowledge in a Neural Network," arXiv:1503.02531, 2015.
[34] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur,, “Librispeech: An ASR corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
[35] “ITU-T P.808: Subjective evaluation of speech quality with a crowdsourcing approach”.2018.
[36] J. F. Gemmeke et al., “Audio Set: An ontology and human-labeled dataset for audio events,” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),, 2017.
[37] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, “FitNets: Hints for Thin Deep Nets”.In ICLR, 2015..
[38] G. Pirker, M. Wohlmayr, S. Petrik, and F. Pernkopf,, “A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario,”.
[39] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra., “Perceptual Evaluation of Speech Quality (PESQ)-A New Method for Speech Quality Assessment of Telephone Networks and Codecs,” InIEEE International Conference on Acoustics, Speech, and SignalProcessing,, 2001.
[40] TAAL, C. H., HENDRIKS, R. C., HEUSDENS, R.,ANDJENSEN, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” Audio, Speech, and LanguageProcessing, IEEE, 2011. |