博碩士論文 110521165 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:59 、訪客IP:3.22.217.242
姓名 吳俊易(Jyun-Yi Wu)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱 神經網路應用於人工電子耳語音去迴響
(Application of neural network speech dereverberation for cochlear implants)
相關論文
★ 獨立成份分析法於真實環境中聲音訊號分離之探討★ 口腔核磁共振影像的分割與三維灰階值內插
★ 數位式氣喘尖峰氣流量監測系統設計★ 結合人工電子耳與助聽器對中文語音辨識率的影響
★ 人工電子耳進階結合編碼策略的中文語音辨識成效模擬--結合助聽器之分析★ 中文發聲之神經關聯性的腦功能磁振造影研究
★ 利用有限元素法建構3維的舌頭力學模型★ 以磁振造影為基礎的立體舌頭圖譜之建構
★ 腎小管之草酸鈣濃度變化與草酸鈣結石關係之模擬研究★ 口腔磁振影像舌頭構造之自動分割
★ 微波輸出窗電性匹配之研究★ 以軟體為基準的助聽器模擬平台之發展-噪音消除
★ 以軟體為基準的助聽器模擬平台之發展-回饋音消除★ 模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識率之影響
★ 用類神經網路研究中文語音聲調產生之神經關聯性★ 教學用電腦模擬生理系統之建構
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2026-9-1以後開放)
摘要(中) 迴響是由聲音經過四周牆壁反射所產生的,對擁有正常聽力的人影響不大,但對於配戴人工電子耳的重度聽力障礙者是有影響的,尤其是在噪音與迴響環境中一定是無法與其他人交談的,傳統迴響抑制的方法需要事先知道迴響訊號的相關信息,在現實生活中很難實現,而類神經網路演算法依靠其強大的計算能力可以改善傳統方法需要事先知道迴響訊號相關信息的特性。本論文使用階層式極限學習機(Hierarchical Extreme Learning Machine, HELM)、深層降噪自編碼器(Deep Denoise Autoencoder, DDAE)、整體式深度與集成學習演算法(Integrated Deep and Ensemble Learning Algorithm, IDEA)、後期抑制長短期記憶網路(Late Suppression Long Short-Term Memory, LS-LSTM)及後期抑制U網路(Late Suppression Unet, LS-Unet)這些不同的類神經網路模型,在不同的迴響、噪音及噪音加迴響的環境的正常語音及人工電子耳語音做評估,使用的實驗語料為TMHINT,使用的客觀評估方法為短時客觀理解度(Short Time Objective Intelligibility, STOI)及歸一化共變異數指標(Normalized Covariance Metric, NCM),分別對正常語音及人工電子耳語音做評估。在正常語音實驗研究結果顯示LS-Unet分別在噪音環境與迴響環境所提升的中文語音理解度是最高的,而HELM則是在迴響實驗中,訓練所花費的時間最少同時提升語音理解度,迴響實驗也使用國外常用來做語音研究的TIMIT語料,其結果顯示IDEA在迴響環境所提升的英語理解度是最好的,在噪音與迴響環境的實驗中,若是這些方法用噪音與迴響語音做訓練,測試噪音加迴響的語音是無法提升語音理解度,我們用噪音環境所訓練而成的LS-Unet(Noise)的去噪音模型,測試噪音加迴響的語音是可以有效提升語音理解度,而在人工電子耳噪音與迴響語音實驗裡,LS-Unet(Noise)分別結合人工電子耳語音編碼策略傳統的ACE策略及以深度學習所訓練而成的ElectrodeNet-CS策略所產生的人工電子耳語音,其結果皆能提升噪音與迴響環境中人工電子耳語音的理解度,而又以LS-Unet(Noise)結合ACE策略的結果較高,最後透過主觀評估實驗,讓七位正常聽力受試者在不同的人工電子耳語音噪音與迴響環境裡,評估LS-Unet(Noise)分別結合ACE策略及ElectrodeNet-CS策略這兩種方法,其結果顯示在噪音環境為5dB且有迴響環境的情況下平均句子辨識率皆有超過80%的句子辨識率,甚至在低迴響的情況下高達90%的句子辨識率,而在噪音環境為0dB且有迴響環境的情況下平均句子辨識率皆有超過60%的句子辨識率,在低迴響的情況下也有接近80%的句子辨識率,使用曼-惠特尼 U 檢定(Mann-Whitney U test),檢驗LS-Unet(Noise)+ACE策略與LS-Unet(Noise)+ElectrodeNet-CS策略的差異性,結果顯示兩種方法皆無顯著差異。
摘要(英) Reverberation is the result of sound reflecting off the surrounding walls. It has minimal impact on individuals with normal hearing, but it affects those with severe hearing impairment who wear cochlear implant. In environments with noise and reverberation, it becomes difficult for them to engage in conversations with others. Traditional methods for echo suppression require prior knowledge of the echo signal′s characteristics, which is challenging to obtain in real life. However, neural network algorithms can overcome this limitation by leveraging their powerful computational capabilities to improve echo suppression without the need for prior knowledge.
This paper aims to evaluate different neural network models, including Hierarchical Extreme Learning Machine (HELM), Deep Denoise Autoencoder (DDAE), Integrated Deep and Ensemble Learning Algorithm (IDEA), Late Suppression Long Short-Term Memory (LS-LSTM), and Late Suppression Unet (LS-Unet), on normal speech and speech for cochlear implant in various environments involving reverberation, noise, and a combination of both. The TMHINT dataset is used for experimentation, and the objective evaluation methods utlize Short Time Objective Intelligibility (STOI) and Normalized Covariance Metric (NCM), respectively. Both normal speech and speech for cochlear implant will be envaluated. The results from experiments on normal speech demonstrate that LS-Unet achieves the highest improvement in Chinese speech intelligibility in noisy and reverberant environments, respectively. Additionally, HELM shows the least training time while simultaneously enhancing speech intelligibility in the reverberation experiment. The reverberation experiment also utilizes the commonly used TIMIT dataset in speech research, with the results indicating that IDEA yields the best improvement in English intelligibility in a reverberant environment.
In experiments involving noise and reverberation, training the models with noisy and reverberant speech does not lead to an improvement in speech intelligibility when tested with speech containing both noise and reverberation. However, the LS-Unet(Noise) model, trained on noisy speech, proves effective in enhancing the intelligibility of speech containing both noise and reverberation. In the experiments with speech for cochlear implant in noisy and reverberant conditions, LS-Unet(Noise) respectively combined with the traditional ACE strategy and the deep learning-based ElectrodeNet-CS strategy for speech encoding both demonstrate improved intelligibility. Among them, LS-Unet(Noise) combined with the ACE strategy has higher results.
Finally, a subjective evaluation experiment was conducted, involving seven participants with normal hearing. They were asked assess the performance of LS-Unet(Noise) combined with both ACE strategy and the EletrodeNet-CS strategy in different noise and reverberation environments for cochlear implant. The results showed that in the presence of 5dB noise and reverberation, both strategies achieved an average sentence recognition rate exceeding 80%, and even reached up to 90% in low reverberation conditions. In the case of 0dB noise and reverberation, both strategies achieved an average sentence recognition rate exceeding 60%, and in low reverberation conditions, the recognition rate was close to 80%.
Furthermore, using the Mann-Whitney U test, the difference between LS-Unet(Noise)+ACE strategy and LS-Unet(Noise)+ElectrodeNet-CS strategy was examined, and the results indicated that there was no significant difference between the two methods.
關鍵字(中) ★ 人工電子耳 關鍵字(英) ★ Cochlear Implants
論文目次 摘要 I
Abstract IV
目錄 VI
圖目錄 IX
表目錄 XI
第一章 緒論 1
1.1 研究動機 1
1.2 文獻回顧 3
1.2.1 語音增強策略 3
1.2.2 人工電子耳編碼策略 6
1.3 研究目的與貢獻 7
1.4 論文架構 8
第二章 研究背景及相關原理 10
2.1 語音訊號前處理 10
2.2 類神經網路 12
2.2.1 單層感知器 13
2.2.2 多層感知器與深度神經網路 15
2.2.3 卷積神經網路 17
2.2.4 全卷積神經網路 22
2.2.5 長短期記憶網路 31
2.3 人工電子耳編碼策略 35
2.4 結論 37
第三章 語料庫及演算法架構 38
3.1 語料庫 38
3.1.1 迴響環境下的語音 39
3.1.2 噪音環境下的語音 42
3.1.3 噪音與迴響環境下的語音 44
3.2 正常語音去迴響系統 46
3.2.1 HELM 48
3.2.2 DDAE 51
3.2.3 IDEA 52
3.2.4 LS-Unet 54
3.2.5 LS-LSTM 56
3.3 人工電子耳語音去迴響系統 57
3.4 結論 59
第四章 研究結果與討論 60
4.1 客觀評估方法 60
4.1.1 短時間客觀理解度 62
4.1.2 歸一化共變異數指標 63
4.2客觀評估結果 64
4.2.1 正常語音下迴響環境的客觀評估結果 65
4.2.2 正常語音下噪音環境的客觀評估結果 71
4.2.3 正常語音下噪音與迴響環境的客觀評估結果 73
4.2.4 人工電子耳語音下噪音與迴響環境的客觀評估結果 75
4.3 主觀評估方法 78
4.3.1 受試者 79
4.3.2 實驗方法 79
4.4 主觀評估結果 82
第五章 結論與未來展望 84
5.1 結論 84
5.2 未來展望 88
參考文獻 90
參考文獻 Braun. S., B. Schwartz, S. Gannot, and E. A. P. Habets. (2016). "Late reverberation PSD estimation for single-channel dereverberation using relative convolutive transfer functions, " in Proc. Int. Workshop Acoust. Signal Enhancement, pp. 1–5.

Fu, Q.-J., Hsu, C.-J., and Horng, M.-J. (2004). "Effects of Speech Processing Strategy on Chinese Tone Recognition by Nucleus-24 Cochlear Implant Users," Ear & Hearing 25, pp. 501-508.

Goldsworthy. Ray. L, Julie E. Greenberg. (2004). "Analysis of speech-based speech transmission index methods with implications for nonlinear operations, " J Acoust Soc Am 116 (6), pp. 3679–3689.

Habets. E. A. P, S. Gannot, and I. Cohen. (2009). "Late reverberant spectral variance estimation based on a statistical model, " IEEE Signal Process. Lett., vol. 16, no. 9, pp. 770–774.

Hussain, T., Siniscalchi, S. M., Wang, H.-L. S., Tsao, Y., Salerno, V. M., & Liao, W.-H. (2020). "Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation, " IEEE Transactions on Cognitive and Developmental Systems, pp. 744- 758.

Lee, W. J., Wang, S. S., Chen, F., Lu, X., Chien, S. Y., & Tsao, Y. (2018). "Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm, " IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5454-5458.

Leon. D and F. A. Tobar. (2021). "Late reverberation suppression using u-nets, " arXiv e-prints. Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science – Sound, arXiv:2110.02144

Li. J, L. Deng, Y. Gong, and R. Haeb-Umbach. (2014). "An overview of noise robust automatic speech recognition, " IEEE/ACM Trans. Audio, Speech, Language Process, vol. 22, no. 4, pp. 745–777.

LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor. (2015). "Fully convolutional networks for semantic segmentation, " In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431-3440.

Nakatani. T, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B. H. Juang. (2010). "Speech dereverberation based on variance-normalized delayed linear prediction, " IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1717–1731.

Nachar, N. (2008). The Mann‐Whitney U: A Test for Assessing Whether Two Independent Samples Distribution. Tutorials in Quantitative Methods for Psychology, vol.4, pp. 13-20.

Nisa, H. K. (2021). Speech dereverberation based on HELM framework for cochlear implant coding strategy. Master′s Thesis, Institute of Electrical Engineering, National Central University.

Nogueira, W., Büchner, A., Lenarz, T., and Edler, B. (2005). "A Psychoacoustic “NofM”-Type Speech Coding Strategy for Cochlear Implants," EURASIP Journal on Applied Signal Processing 18, pp. 3044-3059.

Stickney, G. S., Assmann, P. F., Chang, J., and Zeng, F.-G. (2007). "Effects of cochlear implant processing and fundamental frequency on the intelligibility of competing sentences," J. Acoust. Soc. Am. 122, pp. 1069-1078.

Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). "An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125-2136.

Turner, C. W., Gantz, B. J., Vidal, C., Behrens, A., and Henry, B. A. (2004). "Speech recognition in noise for cochlear implant listeners: Benefits of residual acoustic hearing," J. Acoust. Soc. Am. 115, pp. 1729-1735


Zhao. X, Y. Wang, and D. Wang. (2014). "Robust speaker identification in noisy and reverberant conditions, " IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 4, pp. 836-845.

Zhao. Yan, DeLiang. Wang, Buye. Xu, and Tao. Zhang. (2018). "Late reverberation suppression using recurrent neural networks with long short-term memory, " in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5434-5438.

黃國原. (2009). 「模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識之影響」. 國立中央大學電機工程研究所. 碩士論文.

黃銘緯. (2005). 「台灣地區噪音下漢語語音聽辨測試」. 國立台北護理學院聽語障礙科學研究所

林金賢. (2021). 「深度學習用於語音回響抑制之研究」. 國立中央大學電機工程研究所. 碩士論文.

健康醫療網. (2022). 擷取自 June 1 ,2023 , https://www.healthnews.com.tw/article/55477

HackMD. (2019). 擷取自 June 1 ,2023 , https://hackmd.io/@allen108108/rkn-oVGA4

程式人生. (2019). 擷取自 June 10 ,2023 , https://www.796t.com/content/1546695390.html

IT邦幫忙. (2019). 擷取自 June 12 ,2023 ,
https://ithelp.ithome.com.tw/articles/10223055
指導教授 吳炤民(Chao-Min Wu) 審核日期 2023-8-8
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明