摘要: | 迴響是由聲音經過四周牆壁反射所產生的,對擁有正常聽力的人影響不大,但對於配戴人工電子耳的重度聽力障礙者是有影響的,尤其是在噪音與迴響環境中一定是無法與其他人交談的,傳統迴響抑制的方法需要事先知道迴響訊號的相關信息,在現實生活中很難實現,而類神經網路演算法依靠其強大的計算能力可以改善傳統方法需要事先知道迴響訊號相關信息的特性。本論文使用階層式極限學習機(Hierarchical Extreme Learning Machine, HELM)、深層降噪自編碼器(Deep Denoise Autoencoder, DDAE)、整體式深度與集成學習演算法(Integrated Deep and Ensemble Learning Algorithm, IDEA)、後期抑制長短期記憶網路(Late Suppression Long Short-Term Memory, LS-LSTM)及後期抑制U網路(Late Suppression Unet, LS-Unet)這些不同的類神經網路模型,在不同的迴響、噪音及噪音加迴響的環境的正常語音及人工電子耳語音做評估,使用的實驗語料為TMHINT,使用的客觀評估方法為短時客觀理解度(Short Time Objective Intelligibility, STOI)及歸一化共變異數指標(Normalized Covariance Metric, NCM),分別對正常語音及人工電子耳語音做評估。在正常語音實驗研究結果顯示LS-Unet分別在噪音環境與迴響環境所提升的中文語音理解度是最高的,而HELM則是在迴響實驗中,訓練所花費的時間最少同時提升語音理解度,迴響實驗也使用國外常用來做語音研究的TIMIT語料,其結果顯示IDEA在迴響環境所提升的英語理解度是最好的,在噪音與迴響環境的實驗中,若是這些方法用噪音與迴響語音做訓練,測試噪音加迴響的語音是無法提升語音理解度,我們用噪音環境所訓練而成的LS-Unet(Noise)的去噪音模型,測試噪音加迴響的語音是可以有效提升語音理解度,而在人工電子耳噪音與迴響語音實驗裡,LS-Unet(Noise)分別結合人工電子耳語音編碼策略傳統的ACE策略及以深度學習所訓練而成的ElectrodeNet-CS策略所產生的人工電子耳語音,其結果皆能提升噪音與迴響環境中人工電子耳語音的理解度,而又以LS-Unet(Noise)結合ACE策略的結果較高,最後透過主觀評估實驗,讓七位正常聽力受試者在不同的人工電子耳語音噪音與迴響環境裡,評估LS-Unet(Noise)分別結合ACE策略及ElectrodeNet-CS策略這兩種方法,其結果顯示在噪音環境為5dB且有迴響環境的情況下平均句子辨識率皆有超過80%的句子辨識率,甚至在低迴響的情況下高達90%的句子辨識率,而在噪音環境為0dB且有迴響環境的情況下平均句子辨識率皆有超過60%的句子辨識率,在低迴響的情況下也有接近80%的句子辨識率,使用曼-惠特尼 U 檢定(Mann-Whitney U test),檢驗LS-Unet(Noise)+ACE策略與LS-Unet(Noise)+ElectrodeNet-CS策略的差異性,結果顯示兩種方法皆無顯著差異。;Reverberation is the result of sound reflecting off the surrounding walls. It has minimal impact on individuals with normal hearing, but it affects those with severe hearing impairment who wear cochlear implant. In environments with noise and reverberation, it becomes difficult for them to engage in conversations with others. Traditional methods for echo suppression require prior knowledge of the echo signal′s characteristics, which is challenging to obtain in real life. However, neural network algorithms can overcome this limitation by leveraging their powerful computational capabilities to improve echo suppression without the need for prior knowledge. This paper aims to evaluate different neural network models, including Hierarchical Extreme Learning Machine (HELM), Deep Denoise Autoencoder (DDAE), Integrated Deep and Ensemble Learning Algorithm (IDEA), Late Suppression Long Short-Term Memory (LS-LSTM), and Late Suppression Unet (LS-Unet), on normal speech and speech for cochlear implant in various environments involving reverberation, noise, and a combination of both. The TMHINT dataset is used for experimentation, and the objective evaluation methods utlize Short Time Objective Intelligibility (STOI) and Normalized Covariance Metric (NCM), respectively. Both normal speech and speech for cochlear implant will be envaluated. The results from experiments on normal speech demonstrate that LS-Unet achieves the highest improvement in Chinese speech intelligibility in noisy and reverberant environments, respectively. Additionally, HELM shows the least training time while simultaneously enhancing speech intelligibility in the reverberation experiment. The reverberation experiment also utilizes the commonly used TIMIT dataset in speech research, with the results indicating that IDEA yields the best improvement in English intelligibility in a reverberant environment. In experiments involving noise and reverberation, training the models with noisy and reverberant speech does not lead to an improvement in speech intelligibility when tested with speech containing both noise and reverberation. However, the LS-Unet(Noise) model, trained on noisy speech, proves effective in enhancing the intelligibility of speech containing both noise and reverberation. In the experiments with speech for cochlear implant in noisy and reverberant conditions, LS-Unet(Noise) respectively combined with the traditional ACE strategy and the deep learning-based ElectrodeNet-CS strategy for speech encoding both demonstrate improved intelligibility. Among them, LS-Unet(Noise) combined with the ACE strategy has higher results. Finally, a subjective evaluation experiment was conducted, involving seven participants with normal hearing. They were asked assess the performance of LS-Unet(Noise) combined with both ACE strategy and the EletrodeNet-CS strategy in different noise and reverberation environments for cochlear implant. The results showed that in the presence of 5dB noise and reverberation, both strategies achieved an average sentence recognition rate exceeding 80%, and even reached up to 90% in low reverberation conditions. In the case of 0dB noise and reverberation, both strategies achieved an average sentence recognition rate exceeding 60%, and in low reverberation conditions, the recognition rate was close to 80%. Furthermore, using the Mann-Whitney U test, the difference between LS-Unet(Noise)+ACE strategy and LS-Unet(Noise)+ElectrodeNet-CS strategy was examined, and the results indicated that there was no significant difference between the two methods. |