基於置換不變模型之雙遮罩風噪聲消除方式

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：18.216.137.32

姓名

劉緯宏(Wei-Hung Liu) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於置換不變模型之雙遮罩風噪聲消除方式
(Dual-Masking Wind Noise Reduction System Based on Permutation Invariant Training Model)

相關論文

★ 基於區域權重之衛星影像超解析技術	★ 延伸曝光曲線線性特性之調適性高動態範圍影像融合演算法
★ 實現於RISC架構之H.264視訊編碼複雜度控制	★ 基於卷積遞迴神經網路之構音異常評估技術
★ 具有元學習分類權重轉移網路生成遮罩於少樣本圖像分割技術	★ 具有注意力機制之隱式表示於影像重建三維人體模型
★ 使用對抗式圖形神經網路之物件偵測張榮	★ 基於弱監督式學習可變形模型之三維人臉重建
★ 以非監督式表徵分離學習之邊緣運算裝置低延遲樂曲中人聲轉換架構	★ 基於序列至序列模型之 FMCW雷達估計人體姿勢
★ 基於多層次注意力機制之單目相機語意場景補全技術	★ 基於時序卷積網路之單FMCW雷達應用於非接觸式即時生命特徵監控
★ 視訊隨選網路上的視訊訊務描述與管理	★ 基於線性預測編碼及音框基頻週期同步之高品質語音變換技術
★ 基於藉語音再取樣萃取共振峰變化之聲調調整技術	★ 即時細緻可調性視訊在無線區域網路下之傳輸效率最佳化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-7-1以後開放)

摘要(中)

生活中充滿了各式各樣的聲音，對於我們有意義而需要收集的為訊號，而不需要或是造成干擾的則為雜訊。而風聲存在大自然中各處，在戶外收音時是一個無可避免的干擾。
本篇論文提出的方法則是利用語音分離模型的特點，採取兩個不同的遮罩來同時對訊號加強，反過來能更好的消除風噪聲帶來的干擾。訓練方法採取遞迴神經網路 (Recurrent Neural Network, RNN) 的架構去利用頻譜特徵做訓練。遞迴神經網絡應用在時變函數的成效良好，對於連續的音頻訊號的特徵在做處理時會有較佳的表現。風噪聲在表現上為非平穩且不具週期性，較不易直接針對它進行處理。在遞迴神經網路中我們選用雙向門控遞迴單元 (Gated Recurrent Unit, GRU) 的網絡來訓練遮罩。針對混合訊號分別對風聲和語音訊號訓練遮罩，透過分別調整權重比例，來估計出理想比例遮罩(Ideal Ratio Mask, IRM)，再利用兩個遮罩的權重改良出適合雙遮罩的損失函數。有別於一般方法的雜訊消除，這種除了保留自身所需部分同時減弱噪聲的干擾的方式分離信號，並反向利用雜訊的遮罩來輔助強化進而去除雜訊。此雙遮罩結合的方法能比單純使用單遮罩有更佳的效果

摘要(英)

Our daily environment is full of all kinds of sounds, the ones that are meaningful to us and need to be collected are signals, and the ones that are not needed or make interference are noise. The sound of wind exists everywhere in nature, and it is an unavoidable interference when recording on outdoors.
In this paper, we proposed the method utilizes the characteristics of the speech separation model and combine two different masks to enhance the signal. We adopt a recurrent neural network architecture to use spectral features for training. The application of recurrent neural network to time-varying functions has good results, and it has better performance in continuous audio signals. Because wind noise is non-stationary and non-periodic, it is not easy to deal with it. In the recurrent neural network, we use a Bidirectional Gated Recurrent Unit (BGRU) network to train the mask. Training masks for wind and speech signals for mixed signals, respectively, by adjusting the weight ratios respectively, to estimate the Ideal Ratio Mask (IRM), and then use the weights of the two masks to improve the loss function suitable for dual masks. Different from the general noise reduction methods, this method separates the signal in addition to preserving the necessary part while reducing the interference of the noise, and reversely uses the noise mask to assist in strengthening and removing the noise.

關鍵字(中)

★ 深度學習
★ 風噪聲
★ 雜訊消除
★ 語音分離
★ 雙遮罩

關鍵字(英)

★ deep learning
★ wind noise
★ noise reduction
★ speech separation
★ dual mask

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第一章緒論 1
1-1 研究背景 1
1-2 研究動機與目的 3
1-3 論文架構 4
第二章　聲音雜訊消除相關介紹 5
2-1 聲音雜訊消除基本介紹 5
2-2-1 物理降噪 6
2-2-2 主動降噪 7
2-2-3 單通道降噪 7
2-2-4 多通道降噪 10
第三章深度學習相關介紹 11
3-1 類神經網路 12
3-1-1 類神經網路發展歷史 12
3-1-2 多層感知機 17
3-2 深度學習 19
3-2-1 遞迴神經網路 19
3-2-2 長短期記憶 22
3-2-3 門控遞迴單元 24
3-3 置換不變訓練 25
第四章提出之架構 27
4-1 系統架構 27
4-2 語音資料庫前處理 29
4-3 訓練階段 32
4-4 測試階段 36
第五章實驗結果與分析討論 37
5-1 實驗環境介紹 37
5-2 實驗結果比較與討論 38
5-2-1 風聲錄製與分析 38
5-2-2 與相關方法比較 40
5-2-3 在訓練中去除強風 43
5-2-4 損失函數比較 45
第六章結論與未來展望 46
參考文獻 47

參考文獻

[1] P. A. Nelson and S. 1. Elliott, Active Control of Sound, San Diego: Academic Press, 1992
[2] Turing, A. M. 1950. Computing Machinery and Intelligence. Mind 59(236): 433–460.
[3] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[4] D. O. Hebb, “Organization of Behavior,” New York: Wiley & Sons.
[5] Rosenblatt, F. The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psychological Review. 1958
[6] P. J. Werbos, “Beyond regression: new tools for prediction and analysis in the behavioral sciences,” Ph.D. thesis, Harvard University, 1974.
[7] J.J.Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proc. Nut. Acad. Sci., U.S., vol. 79, pp. 2554-2558, Apr. 1982.
[8] S.Hochreiter, J.Schmidhuber, “Long short-term memory,” Neural computation, 9(8):1735–1780, 1997.
[9] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs], December 2014.
[10] D. Yu, M. Kolbak, Z.-H. Tan, and J. Jensen, "Permutation invariant training of deep models for speaker-independent multi-talker speech separation," in Proceedings of ICASSP, pp. 241-245, 2017
[11] Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur.” Librispeech: An ASR corpus based on public domain audio books,” in ICASSP, 19-24 April, 2015.
[12] Y. Wang, A. Narayanan, and D.L. Wang, "On training targets for supervised speech separation," IEEE/ACM Trans. Audio Speech Lang. Proc., vol. 22, pp. 1849-1858, 2014.
[13] Jean-Marc Valin,” A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement,” in IEEE Multimedia Signal Processing(MMSP), August, 2018

指導教授

張寶基(Pao-Chi Chang)

審核日期

2021-7-30

推文