基於深度學習之殘響消除

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：26

、訪客IP：3.139.93.168

姓名

陳昱安(Yu-An Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於深度學習之殘響消除
(Acoustic Reverberation Cancellation Based on Deep Neural Network)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

聲音在日常生活中扮演著重要的地位，但大多環境內往往會有殘響的存在，例如視訊會議、遠距教學甚至是手機通訊等面對的議題，因此語音的清晰度顯得格外為重要。
深層類神經網路(Deep Neural Network, DNN)目前已經成為處理訊號問題的熱門方法。本論文主要以深層網路為基礎設計一個不同於以往的新架構，結合了自編碼器與深層遞迴類神經網路，稱之序列至序列自編碼模型(Sequence to sequence Autoencoder, SA)，作法是用經由短時傅立葉轉換後，將能量資訊(magnitude)輸入至網路模型，藉由同時考慮能量的時間關係和自身的結構資訊，輸出為預估的能量大小，並結合相位資訊(phase)映射回時域上。最後，本論文提出的方法使用Chime4和REVERB challenge 2014的資料作評估，實驗結果顯示本方法較其他深度類神經網路更加優秀。

摘要(英)

Sound plays an important role in daily life, but most of the environment often has reverberations, such as video conferencing, distance education, and even mobile communication. Therefore, the clarity of speech is particularly important.
The Deep Neural Network (DNN) has become a popular method for dealing with signal problems. This paper mainly designs a new architecture different from the previous one based on the deep network. It combines the Auto-Encoder and deep recursive neural network, called the sequence to sequence Autoencoder (SA). The method is to input the magnitude into the network model by using the energy of output of the short-time Fourier transform. Considering the temporal relationship of energy and its structural information, the estimated energy is output and then combined with the phase information to map to the time domain. Finally, the proposed method in this paper uses Chime4 and REVERB challenge 2014 data for reverberation elimination. The experimental results show that this method is superior than other deep neural networks.

關鍵字(中)

★ 殘響
★ 深度學習

關鍵字(英)

論文目次

中文摘要 V
ABSTRACT VI
章節目錄 VII
圖目錄 X
第一章　緒論 1
1-1背景 1
1-2 研究動機與目的 2
1-3 研究方法與章節介紹 3
第二章　相關文獻探討 4
2-1 音訊特徵 4
2-1-1 時頻譜 4
2-1-2 線性預估系數 5
2-1-3 梅爾頻譜(Mel-spectrum) 6
2-1-4 梅爾頻率倒譜系數(Mel-Frequency Cepstral Coefficients, MFCCs) 7
2-2深度學習 8
2-2-1類神經網路發展及概念 10
2-2-2感知機 11
2-3深層類神經網路應用於殘響消除 12
2-4基於深度去噪式自編碼器之去殘響 15
2-5深度卷積神經網路之去殘響 18
2-6基於遞迴式類神經網路之去殘響 21
第三章序列至序列自編碼模型之殘響消除 26
3-1序列至序列自編碼模型架構 27
3-2 序列至序列自編碼模型正傳遞 28
3-3序列至序列自編碼模型倒傳遞 31
3-4序列至序列自編碼模型設置 34
第四章實驗設計與結果 36
4-1實驗環境及深層類神經網路設置 36
4-2與其他方法的比較 38
4-2-1 訓練集誤差函數 38
4-2-2 測試集SDR、SAR和SNR數值比較 40
4-2-3範例音檔頻譜圖 42
4-2-4網路模型計算效率 44
第五章結論及未來研究方向 45
第六章參考文獻 46

參考文獻

[1] G. Hinton, S. Osindero, and Y. Teh, ‘‘A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
[2] G. Hinton and R. Salakhutdinov, ‘‘Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006.
[3] Y. Bengio, A. Courville, and P. Vincent, ‘‘Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013.
[4] Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal processing magazine 29.6 (2012): 82-97.
[5] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504-507, 2006.
[6] A. Ng, “Sparse autoencoder,” CS294A Lecture notes, pp. 72-2011.
[7] S. Nie, H. Zhang, X. Zhang, and W. Liu, “Deep stacking networks with time series for speech separation,” in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 6667-6671.
[8] M. Hermans and B. Schrauwen, “Training and analyzing deep recurrent neural networks,” in Proceedings Advances in Neural Information Processing Systems, 2013, pp. 190-198.
[9] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” in Proceedings International Conference on Learning Representations, 2014
[10] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[11] LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and neural networks 3361.10 (1995): 1995.
[12] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[13] He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer, Cham, 2016.
[14] ]Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
[15] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).]
[16] O′Shaughnessy, Douglas. "Linear predictive coding." IEEE potentials 7.1 (1988): 29-32.
[17] Logan, Beth. "Mel Frequency Cepstral Coefficients for Music Modeling." ISMIR. Vol. 270. 2000.
[18] Molau, Sirko, et al. "Computing mel-frequency cepstral coefficients on the power spectrum." Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP′01). 2001 IEEE International Conference on. Vol. 1. IEEE, 2001.
[19] McCulloch, Warren S., and Walter Pitts. "A logical calculus of the ideas immanent in nervous activity." The bulletin of mathematical biophysics 5.4 (1943): 115-133.
[20] Wu, Bo, et al. "A reverberation-time-aware approach to speech dereverberation based on deep neural networks." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.1 (2017): 102-111.
[21] Han, Kun, Yuxuan Wang, and DeLiang Wang. "Learning spectral mapping for speech dereverberation." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[22] Feng, Xue, Yaodong Zhang, and James Glass. "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[23] Wang, D. S., Y. X. Zou, and W. Shi. "A deep convolutional encoder-decoder model for robust speech dereverberation." Digital Signal Processing (DSP), 2017 22nd International Conference on. IEEE, 2017.
[24] Park, Sunchan, et al. "Linear prediction-based dereverberation with very deep convolutional neural networks for reverberant speech recognition." Electronics, Information, and Communication (ICEIC), 2018 International Conference on. IEEE, 2018.
[25] Weninger, Felix, et al. "Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[26] Ahmad, Abdul Manan, Saliza Ismail, and D. F. Samaon. "Recurrent neural network with backpropagation through time for speech recognition." Communications and Information Technology, 2004. ISCIT 2004. IEEE International Symposium on. Vol. 1. IEEE, 2004.
[27] Weninger, Felix, et al. "Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[28] Santos, Joao Felipe, and Tiago H. Falk. "Speech Dereverberation With Context-Aware Recurrent Neural Networks." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.7 (2018): 1236-1246.
[29] He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.
[30] Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.
[31] Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
[32] Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on Machine learning. ACM, 2008.
[33] Feng, Xue, Yaodong Zhang, and James Glass. "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[34] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." nature323.6088 (1986): 533
[35] Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv

指導教授

王家慶

審核日期

2018-8-8

推文