dc.description.abstract | Single channel source separation (SCSS) aims to accurately separate specific signals from mixtures such as: extracting vocal from accompaniments, separating male and female. The problem is hard when one microphone is available and the training data is usually limited. This dissertation propose the novel approaches which improve the previous methods to produce better performances on single channel source separation. To solve problem of SCSS, the supervised method was used through before-hand trained model and prior features. The method proposed in this thesis was the combination of non-negative matrix factorization (NMF), deep recurrent neural networks (DRNN) and manifold regularization.
Deep neural networks gained the popularity in the recently years, it has numerous applications in the different fields such as object recognition, image classification, sound recognition, image generation and especially monaural source separation. However, deep neural networks (DNN) based source separation ignores temporal continuities of vocal signal as well as has no consideration to geometrical structure of input data. Because deep neural networks treat the input data as independent information sequence. To deal with these issues, this paper proposes a novel approach for source separation based DRNN which is the combination of DNN and one layer of recurrent neural networks (RNN). Besides, the prior information learned by NMF attached to DRNN to force the output signal more similar to prior information lead to the concentrated solution. This approach make sure that the solution will always converge and those prior information can enhance the training process of DRNN in somehow. Manifold regularization exploit the intrinsic geometry of input data and keep it intact. Manifold characteristic produced from clean data of each sources.
There are four contributions in this thesis. Firstly, state-of-art variants of NMF with β-divergence that are more efficient than conventional ones was utilized to learn patterns from cleaning sources. We incorporated those learned patterns into the output of DRNN and consider the prior information as the last layer of DRNN output. The weight and bias of connection between the output of DRNN and the last layer need to be fixed during the training of DRNN. Because the dimension of these features is quite big and we can get the benefit if the features of DRNN and NMF are different. Secondly, the manifold regularization is developed to take account of inner-structure of input data in DRNN training process. The manifold regularization will help the features of DRNN are more discriminate and avoid the overlap features. Thirdly, the two type of frequency masking, soft mask and binary mask, was examined to measure its performance in SCSS. Four, the new objective function was proposed for DRNN, manifold regularization and the learned patterns. Experimental results on MIR-1K dataset exhibit that the proposed algorithm yields a higher performance than the baselines in term of signal-to-distortion ratio, signal-to-interference ratio and signal-noise ratio. | en_US |