單負源分離與非負矩陣分解和深度學習;Monaural source separation with non-negative matrix factorization and deep learning

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/77531

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/77531

題名:	單負源分離與非負矩陣分解和深度學習;Monaural source separation with non-negative matrix factorization and deep learning
作者:	范俊;Tuan, Pham
貢獻者:	資訊工程學系
關鍵詞:	深度學習;源分離;非負矩陣分解;Deep learning;Source separation;Non-negative matrix factorization
日期:	2018-07-16
上傳時間:	2018-08-31 14:47:19 (UTC+8)
出版者:	國立中央大學
摘要:	單通道聲源分離(SCSS)的目的是準確地將特定的信號從混和的訊號中分離出來，如:從伴奏中提取聲音，區分男女。當只有單一個麥克風可用時，訓練數據會非常有限的，則問題就很難解決。本文提出一個改進現有方法的新方法，以在單通道源分離中獲得更好的性能。為解決SCSS問題，採用了事前訓練模型和先驗特徵的監督方法。本文提出的方法是非負矩陣分解(NMF)、深度遞歸神經網路(DRNN)和流形正則化(manifold regularization)相結合。深度神經網路近年來得到了廣泛的應用，在物件識別、圖像分類、聲音識別、圖像生成、尤其是單聲源分離等領域都有廣泛的應用。然而，基於神經網路(DNN)的聲源分離忽略了語音信號的時序連續性，不考慮輸入資料的幾何結構。因為深度神經網路將輸入資料視為獨立的訊息序列。為解決這些問題，本文提出了一種新的基於神經網路的聲源分離方法，即DNN和一層遞迴神經網路(RNN)的結合。此外，NMF附加到DRNN上的先前資訊迫使輸出信號更類似於先前資訊，從而導致集中求解。該方法確保解決方案總是收斂的，並且這些先前訊息可以在某種程度上增強DRNN的訓練過程。流形正則化利用了輸入數據的固有幾何特徵，使其保持完整。從各個來源的乾淨資料中產生的流行(manifold)特性。這篇論文有四個貢獻。首先, 技術發展水平變異的NMFβ-divergence,比傳統的更有效利用學習模式從乾淨音源。我們將學到的模式合並到DRNN的輸出中，並將先前的資訊作為DRNN輸出的最後一層。在DRNN的訓練過程中，需要確定DRNN輸出和最後一層之間的連接的權重(weight)和偏差(bias)。因為這些特徵的維度相當大，如果DRNN和NMF的特徵不同，我們就能從中獲益。其次，針對DRNN訓練過程中輸入數據的內部結構，提出了多種正則化方法。流形正則化有助於DRNN的特徵更加區分和避免重疊特徵。然後，對軟遮罩(soft mask)和二元遮罩(binary mask)這兩種頻率遮罩進行了測試，以測試其在SCSS中的性能。第四，提出了DRNN、流形正則化和學習模式的新目標函數。MIR-1K資料集的實驗結果表明，該演算法在信號失真比、信號干擾比、信噪比等方面均優於baselines。 ;Single channel source separation (SCSS) aims to accurately separate specific signals from mixtures such as: extracting vocal from accompaniments, separating male and female. The problem is hard when one microphone is available and the training data is usually limited. This dissertation propose the novel approaches which improve the previous methods to produce better performances on single channel source separation. To solve problem of SCSS, the supervised method was used through before-hand trained model and prior features. The method proposed in this thesis was the combination of non-negative matrix factorization (NMF), deep recurrent neural networks (DRNN) and manifold regularization. Deep neural networks gained the popularity in the recently years, it has numerous applications in the different fields such as object recognition, image classification, sound recognition, image generation and especially monaural source separation. However, deep neural networks (DNN) based source separation ignores temporal continuities of vocal signal as well as has no consideration to geometrical structure of input data. Because deep neural networks treat the input data as independent information sequence. To deal with these issues, this paper proposes a novel approach for source separation based DRNN which is the combination of DNN and one layer of recurrent neural networks (RNN). Besides, the prior information learned by NMF attached to DRNN to force the output signal more similar to prior information lead to the concentrated solution. This approach make sure that the solution will always converge and those prior information can enhance the training process of DRNN in somehow. Manifold regularization exploit the intrinsic geometry of input data and keep it intact. Manifold characteristic produced from clean data of each sources. There are four contributions in this thesis. Firstly, state-of-art variants of NMF with β-divergence that are more efficient than conventional ones was utilized to learn patterns from cleaning sources. We incorporated those learned patterns into the output of DRNN and consider the prior information as the last layer of DRNN output. The weight and bias of connection between the output of DRNN and the last layer need to be fixed during the training of DRNN. Because the dimension of these features is quite big and we can get the benefit if the features of DRNN and NMF are different. Secondly, the manifold regularization is developed to take account of inner-structure of input data in DRNN training process. The manifold regularization will help the features of DRNN are more discriminate and avoid the overlap features. Thirdly, the two type of frequency masking, soft mask and binary mask, was examined to measure its performance in SCSS. Four, the new objective function was proposed for DRNN, manifold regularization and the learned patterns. Experimental results on MIR-1K dataset exhibit that the proposed algorithm yields a higher performance than the baselines in term of signal-to-distortion ratio, signal-to-interference ratio and signal-noise ratio.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	133	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....