基於H-ELM調適之跨資料集語音增強

、線上人數：12

、訪客IP：18.191.205.129

姓名	萬程玲(Join Wan Chanlyn Sigalingging) 查詢紙本館藏	畢業系所	資訊工程學系
論文名稱	基於H-ELM調適之跨資料集語音增強 (H-ELM Model Adaptation for Across-corpora Speech Enhancement)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 ( 永不開放)
摘要(中)	摘要 .隨著計算功能強大的計算機硬件已經為許多用戶所用，智能手機，平板電腦和筆記本電腦等語音處理設備的數量也在增加。因此，語音在許多應用中起著重要作用，例如免提電話，數字助聽器，基於語音的計算機接口或家庭娛樂系統。當語音增強算法的輸入或輸出信號被噪聲破壞時，它們試圖改善通信系統的性能。為了解決這些問題，我們提出了一種分層極端學習機（H-ELM）框架，旨在根據一組隨機選擇的隱藏單元和分析確定的輸出，有效，快速地從單通道語音信號中去除背景噪聲。通過利用稀疏自動編碼器進行權重和部署。最近傳統上採用多任務學習和轉移學習方法來改善深度學習模型的性能。採用這兩種方法，我們在本研究中建立了H-ELM模型適應性，以研究H-ELM的兼容性並實現性能的進一步提高。我們訓練Aurora-4並由TIMIT調整以幫助以前訓練過的模型。我們還使用特徵掩模理想比率掩蔽（IRM）來比較我們實驗中的特徵圖。實驗結果表明，基於H-ELM和H-ELM模型自適應的語音增強技術始終優於傳統的DDAE框架，H-ELM模型適應可以在標準化的客觀評估方面提高適應H-ELM TIMIT的性能。各種測試條件。除此之外，特徵掩碼IRM略好於特徵映射。
摘要(英)	ABSTRACT As computationally powerful computer hardware has become available to many users, the number of speech processing devices such as smartphones, tablets and notebooks has increased. As a consequence, speech plays an important role in many applications, e.g., hands-free telephony, digital hearing aids, speech-based computer interfaces, or home entertainment systems. Speech enhancement algorithms attempt to improve the performance of communication systems when their input or output signals are corrupted by noise. To address these issues, we present a hierarchical extreme learning machine (H-ELM) framework, aimed at the effective and fast removal of background noise from a single-channel speech signal, based on a set of randomly chosen hidden units and analytically determined output weights and deployed by leveraging sparse autoencoders. Multi-task learning and transfer learning approaches have conventionally been adopted recently to improve the performances of deep learning models. Adopt these two approaches we build H-ELM model adaptation in this study, to investigate the compatibility of H-ELM and achieve further improvements in the performance. We train the Aurora-4 and adapted by TIMIT to help of the previously trained model. We also use feature mask Ideal Ratio Masking (IRM) to compared feature map on our experiments. The experimental results indicate that both H-ELM and H-ELM model adaptation based speech enhancement techniques consistently outperform the conventional DDAE framework and H-ELM model adaptation can improve the performance adapted to H-ELM TIMIT, in terms of standardized objective evaluations, under various testing conditions. Beside that, the feature mask IRM is slightly better than feature map.
關鍵字(中)	★ deep denoising autoencoder ★ hierarchical extreme learning ★ model adaptation ★ IRM ★ speech enhancement	關鍵字(英)	★ deep denoising autoencoder ★ hierarchical extreme learning ★ model adaptation ★ IRM ★ speech enhancement
論文目次	Table of Contents 摘要 i ABSTRACT ii ACKNOWLEDGMENT iii LIST OF FIGURES vii LIST OF TABLES viii TABLE OF GRAPHS ix CHAPTER I 1 INTRODUCTION 1 CHAPTER 2 4 SPEECH ENHANCEMENT 4 2.1. Machine Learning Based Speech Enhancement 5 2.2. Deep Denoising Autoencoder 6 2.3. Extreem Learning Machine 8 2.4. Hierarchical Extreem Learning Machine 10 CHAPTER 3 13 MODEL ADAPTATION & FEATURE LEARNING METHODE 13 3.1. H-ELM Model Adaptation 13 3.2. Feature Learning Method 15 3.3.1. Feature Mapping Learning Based 17 3.3.2. Feature Mapping Learning Based 18 3.1.3. Ideal Ratio Masking (IRM) 19 CHAPTER 4 22 MODEL ADAPTATION & FEATURE LEARNING METHODE 22 4.1. Datasets 22 4.1.1 Aurora-4 22 4.1.2. TIMIT 23 4.2. Speech Enhancement Setup 25 4.2.1. For H-ELM 25 4.2.2 For H-ELM Model Adaptation 25 4.2.3. For DDAE 27 4.3. Evaluation Metrics 28 4.3.1. PESQ 28 4.3.2. STOI 28 4.3.3. SSNRI 28 CHAPTER 5 29 EXPERIMENT RESULTS 29 5.1. Analysis by Number Training Data 29 5.2. Analysis by Model Structure 31 5.2.1. Aurora-4 Result 31 5.2.2. TIMIT Result 33 5.2.3. Model Adaptation Result 35 5.3. Direct Mapping Vs IRM 37 5.4. Analysis STOI and SSNRI 39 5.5. Analysis by Spectrogram 41 CHAPTER 6 45 CONCLUSION 45 REFERENCES 46
參考文獻	REFERENCES [1] Prajna Kunche and K.V.V.S. Reddy, “Metaheuristic Applications to Speech Enhancement,” SpringerBriefs in Electrical and Computer Engineering., p. 3, 2016. [2] Tassadaq Hussain, K. Cho, Sabato Marco Siniscalchi, Chi-Chun Lee, Syu-Siang Wang and Yu Tsao, “Experimental Study on Extreme Learning Machine Applications for Speech Enhancement,”IEEE Access vol. 5, 2017. [3] Jeremy Chiaming Yang et al., “Speech Enhancement via Ensamble Modeling NMF Adaptation ,” International Conference on Consumer Electronics-Taiwan, 2016. [4] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural networks for noise reduction in robust ASR,” in Proc. Interspeech, 2012, pp. 22–25. [5] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising autoencoder.” in Proc. Interspeech, 2013, pp. 436–440. [6] F. Weninger, F. Eyben, and B. Schuller, “Single-channel speech separation with memory-enhanced recurrent neural networks,” in Proc. IEEE Intl. Conf. on Acoustic, Speech and Signal Processing, 2014. [7] X. Feng, Y. Zhang, and J. Glass, “Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition,” in Proc. IEEE Intl. Conf. on Acoustic, Speech and Signal Processing, 2014.. [8] J. Du, L. Dai, and Q. Huo, “Synthesized stereo mapping via deep neural networks for noisy speech recognition,” in Proc. IEEE Intl. Conf. on Acoustic, Speech and Signal Processing, 2014, pp. 1764–1768. [9] A. Narayanan and D. Wang, “Ideal ratio mask estimation using deep neural networks,” in Proc. IEEE Intl. Conf. on Acoustic, Speech and Signal Processing, 2013, pp. 7092–7096. [10] Y. Wang, A. Narayanan, and D. Wang, “On training targets for supervised speech separation,” IEEE/ACM Trans. on Audio, Speech and Language Processing, vol. 22, no. 12, pp. 1849–1858, 2014.. [11] Zhuo Chen, Yan Huang, Jinyu Li, and Yifan Gong, “Improving Mask Learning Based Speech Enhancement system with Restoration Layer and Residual Connection,” Conference: Interspeech 2017. [12] Xugang Lu, Yu Tsao, Shigeki Matsuda, and Chiori Hori, “Speech Enhancement on Deep Denoising Autoencoder,” Conference: Interspeech 2013. [13] Ryandhimas Edo, Jia-Ching Wang, and Yu Tsao, “Study of Robustness of DNN Acoustic Modeling Based on Multi-style Training with Speech Enhancement,” Master Thesis NCU Taiwan. pp. 18–23, June 2017. [14] Y. Xu, J. Du, L. R. Dai, and C. H. Lee, ‘‘A regression approach to speech enhancement based on deep neural networks,’’ IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 1, pp. 7–19, Jan. 2015. [15] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, ‘‘Extreme learning machine: Theory and applications,’’ Neurocomputing, vol. 70, nos. 1–3, pp. 489–501, 2006. [16] Ryandhimas E. Zezario et al, ‘‘Deep Denoising Autoencoder Based Post Filter for Speech Enhancement,’’ Proceedings, APSIPA Annual Summit and Conference. 2018. [17] A. A. Mohammed, R. Minhas, Q. M. J. Wu, and M. A. Sid-Ahmed, “Human face recognition based on multidimensional PCA and extreme learning machine,” Pattern Recognit., vol. 44, nos. 10–11, pp. 2588–2597, 2011. [18] C. Pan, D. S. Park, Y. Yang, and H. M. Yoo, “Leukocyte image segmentation by visual attention and extreme learning machine,” Neural Comput. Appl., vol. 21, no. 6, pp. 1217–1227, 2012 [19] R. Minhas, A. Baradarani, S. Seifzadeh, and Q. M. J. Wu, “Human action recognition using extreme learning machine based on visual vocabularies,” Neurocomputing, vol. 73, nos. 10–12, pp. 1906–1917, 2010. [20] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden nodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006 [21] G.-B. Huang, M.-B. Li, L. Chen, and C.-K. Siew, “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. 71, nos. 4–6, pp. 576 -583, 2008 [22] Jiexiong Tang, Chenwei Deng, and G.-B. Huang, “Extreem Learning Machine for Multilayer Perceptron,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 4 , 2016. [23] L. L. C. Kasun, H. Zhou, G.-B. Huang, and C. M. Vong, “Representational learning with extreme learning machine for big data,” IEEE Intell. Syst., vol. 28, no. 6, pp. 31–34, Nov. 2013. [24] Y. Xu, J. Du, L. R. Dai, and C. H. Lee, “A regression approach to speech enhancement based on deep neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 7–19, 2014. [25] Xiao-Lei Zhang, and DeLiang Wang, “A Deep Ensemble Learning Method for Monaural Speech Separation,” IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar; 24(5): 967–977. [26] Kun Han et al, “Deep Neural Network Based Spectral Feature Mapping for Robust Speech Recognition,” Conference: Interspeech 2015. [27] DeLiang Wang, and Jitong Chen, “Supervised Speeh Separation Based on Deep Learning: An Overview,” IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct; 26(10): 1702–1726. [28] C. H. Taal, R. C. Hendriks, and R. Heusdens, “Matching pursuit for channel selection in cochlear implants based on an intelligibility metric,” in Proc. EUSIPCO, 2012, pp. 504–508. [28] C. H. Taal, R. C. Hendriks, and R. Heusdens, “Matching pursuit for channel selection in cochlear implants based on an intelligibility metric,” in Proc. EUSIPCO, 2012, pp. 504–508. [29] A. H. Andersen, J. M. d. Haan, Z. H. Tan, and J. Jensen, “Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 1908–1920, 2016. [30] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, 2010. [31] Philipos C. Loizou and Gibak Kim,” Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions”, IEEE transactions on audio, speech, and language processing, vol. 19, no. 1, January 2011 [32] N. Parihar, J. Picone, D. Pearce, and H.-G. Hirsch, ‘‘Performance analysis of the Aurora large vocabulary baseline system,’’ in Proc. 12th Eur. Signal Process. Conf., 2004, pp. 553–556. [33] Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee, ‘‘Cross-language Transfer Learning for Deep Neural Network Based Speech Enhancement,’’ 9th International Symposium on Chinese Spoken LanguageProcessing (ISCSLP). 2014. [34] S. Quackenbush, T. Barnwell, and M. Clements, Objective Measures of Speech Quality. Englewood Cliffs, NJ, USA: Prentice-Hall, 1988.
指導教授	王家慶曹昱(Jia-Ching Wang Yu Tsao)	審核日期	2019-7-25
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 106522602 詳細資訊