Please use this identifier to cite or link to this item:
|Title: ||單負源分離與非負矩陣分解和深度學習;Monaural source separation with non-negative matrix factorization and deep learning|
|Authors: ||范俊;Tuan, Pham|
|Keywords: ||深度學習;源分離;非負矩陣分解;Deep learning;Source separation;Non-negative matrix factorization|
|Issue Date: ||2018-08-31 14:47:19 (UTC+8)|
|Abstract: ||單通道聲源分離(SCSS)的目的是準確地將特定的信號從混和的訊號中分離出來，如:從伴奏中提取聲音，區分男女。當只有單一個麥克風可用時，訓練數據會非常有限的，則問題就很難解決。本文提出一個改進現有方法的新方法，以在單通道源分離中獲得更好的性能。為解決SCSS問題，採用了事前訓練模型和先驗特徵的監督方法。本文提出的方法是非負矩陣分解(NMF)、深度遞歸神經網路(DRNN)和流形正則化(manifold regularization)相結合。|
這篇論文有四個貢獻。首先, 技術發展水平變異的NMFβ-divergence,比傳統的更有效利用學習模式從乾淨音源。我們將學到的模式合並到DRNN的輸出中，並將先前的資訊作為DRNN輸出的最後一層。在DRNN的訓練過程中，需要確定DRNN輸出和最後一層之間的連接的權重(weight)和偏差(bias)。因為這些特徵的維度相當大，如果DRNN和NMF的特徵不同，我們就能從中獲益。其次，針對DRNN訓練過程中輸入數據的內部結構，提出了多種正則化方法。流形正則化有助於DRNN的特徵更加區分和避免重疊特徵。然後，對軟遮罩(soft mask)和二元遮罩(binary mask)這兩種頻率遮罩進行了測試，以測試其在SCSS中的性能。第四，提出了DRNN、流形正則化和學習模式的新目標函數。MIR-1K資料集的實驗結果表明，該演算法在信號失真比、信號干擾比、信噪比等方面均優於baselines。
;Single channel source separation (SCSS) aims to accurately separate specific signals from mixtures such as: extracting vocal from accompaniments, separating male and female. The problem is hard when one microphone is available and the training data is usually limited. This dissertation propose the novel approaches which improve the previous methods to produce better performances on single channel source separation. To solve problem of SCSS, the supervised method was used through before-hand trained model and prior features. The method proposed in this thesis was the combination of non-negative matrix factorization (NMF), deep recurrent neural networks (DRNN) and manifold regularization.
Deep neural networks gained the popularity in the recently years, it has numerous applications in the different fields such as object recognition, image classification, sound recognition, image generation and especially monaural source separation. However, deep neural networks (DNN) based source separation ignores temporal continuities of vocal signal as well as has no consideration to geometrical structure of input data. Because deep neural networks treat the input data as independent information sequence. To deal with these issues, this paper proposes a novel approach for source separation based DRNN which is the combination of DNN and one layer of recurrent neural networks (RNN). Besides, the prior information learned by NMF attached to DRNN to force the output signal more similar to prior information lead to the concentrated solution. This approach make sure that the solution will always converge and those prior information can enhance the training process of DRNN in somehow. Manifold regularization exploit the intrinsic geometry of input data and keep it intact. Manifold characteristic produced from clean data of each sources.
There are four contributions in this thesis. Firstly, state-of-art variants of NMF with β-divergence that are more efficient than conventional ones was utilized to learn patterns from cleaning sources. We incorporated those learned patterns into the output of DRNN and consider the prior information as the last layer of DRNN output. The weight and bias of connection between the output of DRNN and the last layer need to be fixed during the training of DRNN. Because the dimension of these features is quite big and we can get the benefit if the features of DRNN and NMF are different. Secondly, the manifold regularization is developed to take account of inner-structure of input data in DRNN training process. The manifold regularization will help the features of DRNN are more discriminate and avoid the overlap features. Thirdly, the two type of frequency masking, soft mask and binary mask, was examined to measure its performance in SCSS. Four, the new objective function was proposed for DRNN, manifold regularization and the learned patterns. Experimental results on MIR-1K dataset exhibit that the proposed algorithm yields a higher performance than the baselines in term of signal-to-distortion ratio, signal-to-interference ratio and signal-noise ratio.
|Appears in Collections:||[資訊工程研究所] 博碩士論文|
Files in This Item:
All items in NCUIR are protected by copyright, with all rights reserved.