摘要(英) |
Our daily environment is full of all kinds of sounds, the ones that are meaningful to us and need to be collected are signals, and the ones that are not needed or make interference are noise. The sound of wind exists everywhere in nature, and it is an unavoidable interference when recording on outdoors.
In this paper, we proposed the method utilizes the characteristics of the speech separation model and combine two different masks to enhance the signal. We adopt a recurrent neural network architecture to use spectral features for training. The application of recurrent neural network to time-varying functions has good results, and it has better performance in continuous audio signals. Because wind noise is non-stationary and non-periodic, it is not easy to deal with it. In the recurrent neural network, we use a Bidirectional Gated Recurrent Unit (BGRU) network to train the mask. Training masks for wind and speech signals for mixed signals, respectively, by adjusting the weight ratios respectively, to estimate the Ideal Ratio Mask (IRM), and then use the weights of the two masks to improve the loss function suitable for dual masks. Different from the general noise reduction methods, this method separates the signal in addition to preserving the necessary part while reducing the interference of the noise, and reversely uses the noise mask to assist in strengthening and removing the noise. |
參考文獻 |
[1] P. A. Nelson and S. 1. Elliott, Active Control of Sound, San Diego: Academic Press, 1992
[2] Turing, A. M. 1950. Computing Machinery and Intelligence. Mind 59(236): 433–460.
[3] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[4] D. O. Hebb, “Organization of Behavior,” New York: Wiley & Sons.
[5] Rosenblatt, F. The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psychological Review. 1958
[6] P. J. Werbos, “Beyond regression: new tools for prediction and analysis in the behavioral sciences,” Ph.D. thesis, Harvard University, 1974.
[7] J.J.Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proc. Nut. Acad. Sci., U.S., vol. 79, pp. 2554-2558, Apr. 1982.
[8] S.Hochreiter, J.Schmidhuber, “Long short-term memory,” Neural computation, 9(8):1735–1780, 1997.
[9] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs], December 2014.
[10] D. Yu, M. Kolbak, Z.-H. Tan, and J. Jensen, "Permutation invariant training of deep models for speaker-independent multi-talker speech separation," in Proceedings of ICASSP, pp. 241-245, 2017
[11] Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur.” Librispeech: An ASR corpus based on public domain audio books,” in ICASSP, 19-24 April, 2015.
[12] Y. Wang, A. Narayanan, and D.L. Wang, "On training targets for supervised speech separation," IEEE/ACM Trans. Audio Speech Lang. Proc., vol. 22, pp. 1849-1858, 2014.
[13] Jean-Marc Valin,” A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement,” in IEEE Multimedia Signal Processing(MMSP), August, 2018 |