摘要(英) |
Reverberation is the result of sound reflecting off the surrounding walls. It has minimal impact on individuals with normal hearing, but it affects those with severe hearing impairment who wear cochlear implant. In environments with noise and reverberation, it becomes difficult for them to engage in conversations with others. Traditional methods for echo suppression require prior knowledge of the echo signal′s characteristics, which is challenging to obtain in real life. However, neural network algorithms can overcome this limitation by leveraging their powerful computational capabilities to improve echo suppression without the need for prior knowledge.
This paper aims to evaluate different neural network models, including Hierarchical Extreme Learning Machine (HELM), Deep Denoise Autoencoder (DDAE), Integrated Deep and Ensemble Learning Algorithm (IDEA), Late Suppression Long Short-Term Memory (LS-LSTM), and Late Suppression Unet (LS-Unet), on normal speech and speech for cochlear implant in various environments involving reverberation, noise, and a combination of both. The TMHINT dataset is used for experimentation, and the objective evaluation methods utlize Short Time Objective Intelligibility (STOI) and Normalized Covariance Metric (NCM), respectively. Both normal speech and speech for cochlear implant will be envaluated. The results from experiments on normal speech demonstrate that LS-Unet achieves the highest improvement in Chinese speech intelligibility in noisy and reverberant environments, respectively. Additionally, HELM shows the least training time while simultaneously enhancing speech intelligibility in the reverberation experiment. The reverberation experiment also utilizes the commonly used TIMIT dataset in speech research, with the results indicating that IDEA yields the best improvement in English intelligibility in a reverberant environment.
In experiments involving noise and reverberation, training the models with noisy and reverberant speech does not lead to an improvement in speech intelligibility when tested with speech containing both noise and reverberation. However, the LS-Unet(Noise) model, trained on noisy speech, proves effective in enhancing the intelligibility of speech containing both noise and reverberation. In the experiments with speech for cochlear implant in noisy and reverberant conditions, LS-Unet(Noise) respectively combined with the traditional ACE strategy and the deep learning-based ElectrodeNet-CS strategy for speech encoding both demonstrate improved intelligibility. Among them, LS-Unet(Noise) combined with the ACE strategy has higher results.
Finally, a subjective evaluation experiment was conducted, involving seven participants with normal hearing. They were asked assess the performance of LS-Unet(Noise) combined with both ACE strategy and the EletrodeNet-CS strategy in different noise and reverberation environments for cochlear implant. The results showed that in the presence of 5dB noise and reverberation, both strategies achieved an average sentence recognition rate exceeding 80%, and even reached up to 90% in low reverberation conditions. In the case of 0dB noise and reverberation, both strategies achieved an average sentence recognition rate exceeding 60%, and in low reverberation conditions, the recognition rate was close to 80%.
Furthermore, using the Mann-Whitney U test, the difference between LS-Unet(Noise)+ACE strategy and LS-Unet(Noise)+ElectrodeNet-CS strategy was examined, and the results indicated that there was no significant difference between the two methods. |
參考文獻 |
Braun. S., B. Schwartz, S. Gannot, and E. A. P. Habets. (2016). "Late reverberation PSD estimation for single-channel dereverberation using relative convolutive transfer functions, " in Proc. Int. Workshop Acoust. Signal Enhancement, pp. 1–5.
Fu, Q.-J., Hsu, C.-J., and Horng, M.-J. (2004). "Effects of Speech Processing Strategy on Chinese Tone Recognition by Nucleus-24 Cochlear Implant Users," Ear & Hearing 25, pp. 501-508.
Goldsworthy. Ray. L, Julie E. Greenberg. (2004). "Analysis of speech-based speech transmission index methods with implications for nonlinear operations, " J Acoust Soc Am 116 (6), pp. 3679–3689.
Habets. E. A. P, S. Gannot, and I. Cohen. (2009). "Late reverberant spectral variance estimation based on a statistical model, " IEEE Signal Process. Lett., vol. 16, no. 9, pp. 770–774.
Hussain, T., Siniscalchi, S. M., Wang, H.-L. S., Tsao, Y., Salerno, V. M., & Liao, W.-H. (2020). "Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation, " IEEE Transactions on Cognitive and Developmental Systems, pp. 744- 758.
Lee, W. J., Wang, S. S., Chen, F., Lu, X., Chien, S. Y., & Tsao, Y. (2018). "Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm, " IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5454-5458.
Leon. D and F. A. Tobar. (2021). "Late reverberation suppression using u-nets, " arXiv e-prints. Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science – Sound, arXiv:2110.02144
Li. J, L. Deng, Y. Gong, and R. Haeb-Umbach. (2014). "An overview of noise robust automatic speech recognition, " IEEE/ACM Trans. Audio, Speech, Language Process, vol. 22, no. 4, pp. 745–777.
LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor. (2015). "Fully convolutional networks for semantic segmentation, " In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431-3440.
Nakatani. T, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B. H. Juang. (2010). "Speech dereverberation based on variance-normalized delayed linear prediction, " IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1717–1731.
Nachar, N. (2008). The Mann‐Whitney U: A Test for Assessing Whether Two Independent Samples Distribution. Tutorials in Quantitative Methods for Psychology, vol.4, pp. 13-20.
Nisa, H. K. (2021). Speech dereverberation based on HELM framework for cochlear implant coding strategy. Master′s Thesis, Institute of Electrical Engineering, National Central University.
Nogueira, W., Büchner, A., Lenarz, T., and Edler, B. (2005). "A Psychoacoustic “NofM”-Type Speech Coding Strategy for Cochlear Implants," EURASIP Journal on Applied Signal Processing 18, pp. 3044-3059.
Stickney, G. S., Assmann, P. F., Chang, J., and Zeng, F.-G. (2007). "Effects of cochlear implant processing and fundamental frequency on the intelligibility of competing sentences," J. Acoust. Soc. Am. 122, pp. 1069-1078.
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). "An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125-2136.
Turner, C. W., Gantz, B. J., Vidal, C., Behrens, A., and Henry, B. A. (2004). "Speech recognition in noise for cochlear implant listeners: Benefits of residual acoustic hearing," J. Acoust. Soc. Am. 115, pp. 1729-1735
Zhao. X, Y. Wang, and D. Wang. (2014). "Robust speaker identification in noisy and reverberant conditions, " IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 4, pp. 836-845.
Zhao. Yan, DeLiang. Wang, Buye. Xu, and Tao. Zhang. (2018). "Late reverberation suppression using recurrent neural networks with long short-term memory, " in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5434-5438.
黃國原. (2009). 「模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識之影響」. 國立中央大學電機工程研究所. 碩士論文.
黃銘緯. (2005). 「台灣地區噪音下漢語語音聽辨測試」. 國立台北護理學院聽語障礙科學研究所
林金賢. (2021). 「深度學習用於語音回響抑制之研究」. 國立中央大學電機工程研究所. 碩士論文.
健康醫療網. (2022). 擷取自 June 1 ,2023 , https://www.healthnews.com.tw/article/55477
HackMD. (2019). 擷取自 June 1 ,2023 , https://hackmd.io/@allen108108/rkn-oVGA4
程式人生. (2019). 擷取自 June 10 ,2023 , https://www.796t.com/content/1546695390.html
IT邦幫忙. (2019). 擷取自 June 12 ,2023 ,
https://ithelp.ithome.com.tw/articles/10223055 |