基於遞迴神經網路之聲學回聲消除技術

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：80

、訪客IP：18.117.8.159

姓名

蔡曜丞(Yao-Cheng Tsai) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於遞迴神經網路之聲學回聲消除技術
(Acoustic Echo Cancellation Based on Recurrent Neural Network)

相關論文

★ 基於區域權重之衛星影像超解析技術	★ 延伸曝光曲線線性特性之調適性高動態範圍影像融合演算法
★ 實現於RISC架構之H.264視訊編碼複雜度控制	★ 基於卷積遞迴神經網路之構音異常評估技術
★ 具有元學習分類權重轉移網路生成遮罩於少樣本圖像分割技術	★ 具有注意力機制之隱式表示於影像重建三維人體模型
★ 使用對抗式圖形神經網路之物件偵測張榮	★ 基於弱監督式學習可變形模型之三維人臉重建
★ 以非監督式表徵分離學習之邊緣運算裝置低延遲樂曲中人聲轉換架構	★ 基於序列至序列模型之 FMCW雷達估計人體姿勢
★ 基於多層次注意力機制之單目相機語意場景補全技術	★ 基於時序卷積網路之單FMCW雷達應用於非接觸式即時生命特徵監控
★ 視訊隨選網路上的視訊訊務描述與管理	★ 基於線性預測編碼及音框基頻週期同步之高品質語音變換技術
★ 基於藉語音再取樣萃取共振峰變化之聲調調整技術	★ 即時細緻可調性視訊在無線區域網路下之傳輸效率最佳化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

時至今日為止，聲學回聲消除 (Acoustic Echo Cancellation, AEC) 都是一個在語音和信號處理中常見的問題。應用的場景如電話會議，免持聽筒和移動通信。在過去我們用可適性濾波器來處理聲學回聲消除的問題，而今日我們可以用深度學習的方式來解決聲學回聲消除中複雜的問題。
本篇論文提出的方法則是把聲學回聲消除視為語音分離的問題，取代傳統的可適性濾波器估測聲學回聲。並利用深度學習中的遞迴神經網路 (Recurrent Neural Network, RNN) 架構去訓練模型。由於遞迴神經網絡模擬時變函數的能力良好，所以可以在解決聲學回聲消除問題中發揮作用。我們訓練具有記憶的雙向的長短期記憶網路 (Long Short Term Memory Network, LSTM) 及雙向的門控遞迴單元 (Gated Recurrent Unit, GRU) 的遞迴神經網絡。從單講語音以及雙講語音中提取特徵，並透過調整權重來控制特徵之間的大小比例，來估計理想比例掩蔽(Ideal Ratio Mask, IRM)。利用這種方式來分離信號，從而達到去除回聲的目的。實驗結果表明該方法消除回聲的效果良好。

摘要(英)

Acoustic echo cancellation is a common problem in speech and signal processing until now. Application scenarios such as telephone conference, hands-free handsets and mobile communications. In the past we used adaptive filters to deal with acoustic echo cancellation, and today we can use deep learning to solve complex problems in acoustic echo cancellation.
The method proposed in this work is to consider acoustic echo cancellation as a problem of speech separation, instead of the traditional adaptive filter to estimate acoustic echo. And use the recurrent neural network architecture in deep learning to train the model. Since the recurrent neural network has a good ability to simulate time-varying functions, it can play a role in solving the problem of acoustic echo cancellation. We train a bidirectional long short-term memory network and a bidirectional gated recurrent unit. Features are extracted from single-talk speech and double-talk speech. Adjust weights to control the ratio between double-talk speech and single-talk speech, and estimate the ideal ratio mask. This way to separate the signal, in order to achieve the purpose of removing the echo. The experimental results show that the method has good effect in echo cancellation.

關鍵字(中)

★ 深度學習
★ 聲學回聲消除
★ 語音分離
★ 遞迴神經網路

關鍵字(英)

★ Deep Learning
★ Acoustic Echo Cancellation
★ Speech Separation
★ Recurrent Neural Network

論文目次

摘要 iv
Abstract v
誌謝 vi
目錄 viii
圖目錄 x
表目錄 xi
第一章緒論 1
1-1 研究背景 1
1-2 研究動機與目的 3
1-3 論文架構 4
第二章聲學回聲消除相關介紹 5
2-1 聲學回聲消除基本介紹 5
2-2 聲學回聲消除相關技術 7
2-2-1 可適性數位濾波器 7
2-2-2 可適性演算法 9
2-3 開源軟體Speex回聲消除功能介紹 11
第三章深度學習相關介紹 13
3-1 類神經網路 14
3-1-1 類神經網路發展歷史 15
3-1-2 多層感知機 19
3-2 深度學習 22
3-2-1 遞迴神經網路 23
3-2-2 長短期記憶 26
3-2-3 門控遞迴單元 28
第四章提出之架構 30
4-1 系統架構 30
4-2 語音資料庫前處理 32
4-3 訓練階段 35
4-4 測試階段 37
第五章實驗結果與分析討論 38
5-1 實驗環境與數據集介紹 38
5-2 評分方法 40
5-3 實驗結果比較與討論 41
第六章結論與未來展望 54
參考文獻 55

參考文獻

[1] Peter Wilson, "Design Recipes for FPGAs (Second Edition), Chapter 9 - Digital Filters", pp. 117-134, Elsevier, 2016.
[2] J. Benesty and P. Duhamel, “A fast exact least mean square adaptive algorithm,” IEEE Trans. Signal Processing, vol. 40, pp. 2904–2920, 1992.
[3] Mohd Zaizu Ilyas, Ali O. Noor, Khairul Anuar Ishak, Aini Hussain, Salina Abdul Samad, "Normalized Least Mean Square Adaptive Noise Cancellation Filtering forSpeaker Verification in Noisy Environments", International Conference on Electronic Design (2008)
[4] https://www.speex.org/
[5] J. S. Soo and K. K. Pang, “Multidelay block frequency domain adaptive filter,” IEEE Trans. Acoust. Speech Signal Process., vol. 38, no. 2, pp. 373–376, Feb. 1990.
[6] Turing, A. M. 1950. Computing Machinery and Intelligence. Mind 59(236): 433–460.
[7] Searle, J. R. (1980) Minds, brains, and programs. Behavioral and Brain Sciences 3:417–57.
[8] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[9] F. A. Makinde, C. T. Ako, O. D. Orodu, I. U. Asuquo, "Prediction of crude oil viscosity using feed-forward back-propagation neural network (FFBPNN)," Petroleum and Coal , vol. 54, pp. 120-131, 2012.
[10] D. O. Hebb, “Organization of Behavior,” New York: Wiley & Sons.
[11] Rosenblatt, F. The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psychological Review. 1958
[12] M. Minsky, S. Papert, “Perceptrons,” Cambridge, MA: MIT Press.
[13] P. J. Werbos, “Beyond regression: new tools for prediction and analysis in the behavioral sciences,” Ph.D. thesis, Harvard University, 1974.
[14] M. Minsky and S. Paper, “Perceptrons,” Cambridge, MA: MIT Press.
[15] J.J.Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proc. Nut. Acad. Sci., U.S., vol. 79, pp. 2554-2558, Apr. 1982.
[16] L. F. Lamel, R. H. Kassel, and S. Seneff, “Speech database development: Design and analysis of the acoustic-phonetic corpus,” in Speech Input/Output Assessment and Speech Databases, 1989.
[17] S.Hochreiter, J.Schmidhuber, “Long short-term memory,” Neural computation, 9(8):1735–1780, 1997.
[18] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs], December 2014.
[19] J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943–950, 1979.
[20] D. Yu, M. Kolbak, Z.-H. Tan, and J. Jensen, "Permutation invariant training of deep models for speaker-independent multi-talker speech separation," in Proceedings of ICASSP, pp. 241-245, 2017
[21] Y. Wang, A. Narayanan, and D.L. Wang, "On training targets for supervised speech separation," IEEE/ACM Trans. Audio Speech Lang. Proc., vol. 22, pp. 1849-1858, 2014.
[22] TensorFlow: an open source Python package for machine intelligence, https://www.ten-sorflow.org, retrieved Dec. 1, 2016.
[23] J. Dean, et al. “Large-Scale Deep Learning for Building Intelligent Computer Systems,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Min-ing, pp. 1-1, Feb. 2016.
[24] Librosa: an open source Python package for music and audio analysis, https://github.com/librosa, retrieved Dec. 1, 2016.
[25] B. McFee, C. Raffe, D. Liang, D. P. W. Ellis, M. McVicar, E.Battenberg, and O. Nieto, “librosa: Audio and Music Signal Analysis in Python,” in Proceedings of the 14th Python in Conference, Jul. 2015.

指導教授

張寶基(Pao-Chi Chang)

審核日期

2019-7-24

推文