基於臨床實驗結果發展深度學習去迴響人工電子耳語音編碼策略;Based on Clinical Trial Outcomes to Develop Deep Learning Dereverberation Coding Strategy of Cochlear Implants

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/95700

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95700

題名:	基於臨床實驗結果發展深度學習去迴響人工電子耳語音編碼策略;Based on Clinical Trial Outcomes to Develop Deep Learning Dereverberation Coding Strategy of Cochlear Implants
作者:	江旭崧;Jiang, Xu-Song
貢獻者:	電機工程學系
關鍵詞:	人工電子耳;深度學習;語音去迴響;Cochlear Implant;Deep Learning;Speech Dereverberation
日期:	2024-07-24
上傳時間:	2024-10-09 17:10:15 (UTC+8)
出版者:	國立中央大學
摘要:	本論文內容分析人工電子耳語音編碼策略在模擬噪音迴響條件下，中文語音理解度差異，並針對迴響問題，發展深度學習去迴響人工電子耳語音編碼策略。人工電子耳（Cochlear Implant, CI）為一種植入耳蝸並能夠刺激聽神經的聽覺輔具（Assistive Listening Device），可以恢復重度聽力損失患者的聽覺感知，目前人工電子耳有許多不同的語音處理編碼策略，例如目前使用度最高且商業化的進階結合編碼（Advanced Combination Encoder, ACE）策略，或是近年來，有學者以類神經網路為架構發展人工電子耳處理策略模型（ElectrodeNet-CS），通常在安靜環境下，人工電子耳語音編碼策略都能提供良好的語音辨識理解程度，然而位於噪音迴響干擾的環境時，人工電子耳的使用者的語音辨識理解程度會急劇下降。本研究首先進行ACE策略和ElectrodeNet-CS策略的臨床實驗（Clinical Trial），比較不同噪音條件下，人工電子耳使用者字元辨識分數（Word Recognition Score, WRS）的差異，根據臨床實驗結果，兩種策略在乾淨語音的條件下，平均字元辨識分數皆達80\%以上，但是隨著噪音訊號增強，人工電子耳使用者的平均分數明顯下滑，訊噪比（Signal-to-noise Ratios,SNR）為-5dB時，辨識分數不到10\%，噪音嚴重影響人工電子耳的語音辨識度，且目前人工電子耳語音編碼策略仍無法有效處理噪音問題，為了評估類神經網路發展的人工電子耳語音編碼策略是否與傳統策略有相似的性能，以成對母體平均數差異檢定（Paired-Samples T-tests）檢驗兩種策略的實驗數據，檢驗結果顯示兩種策略無顯著差異，說明類神經網路可以達成與傳統策略相似的功能。緊接著進行第二種臨床實驗，使用去迴響的深度學習模型LS-Unet作為兩種策略的前處理，比較不同噪音迴響條件下，兩種策略的語音辨識度的差異，評估使用LS-Unet作為前處理是否有效改善人工電子耳使用者受到噪音迴響干擾時的語音辨識程度，根據臨床實驗結果，所有噪音迴響條件的平均字元辨識分數皆在50\%以下，兩種策略加上LS-Unet並不能提供良好的語音辨識度，對於噪音迴響的處理仍有很大的改善空間，但在參與臨床實驗二的受試者中，有一位個案s19的實驗結果明顯高於其他個案，在所有條件下的辨識分數高達70\%以上，而這位個案是唯一一位天生聽力損失患者，且在學語前就植入人工電子耳，或許這是s19個案辨識度較優秀的原因。最後根據臨床實驗結果，在噪音迴響條件下，人工電子耳語音編碼策略無法有效的轉換語音訊號，本研究針對迴響對人工電子耳造成的影響，加以改善人工電子耳語音辨識度，以Unet模型用於分離迴響訊號中的語音特徵，並加上不同功能的網路層，發展創新的深度學習去迴響人工電子耳語音編碼策略RT-Unet，輸入迴響語音頻譜訊號，經過模型運算後，輸出語音電極訊號，本論文將更詳細說明模型架構、訓練方法和測試評估結果。在評估測試結果中，RT-Unet在處裡迴響語音有優良的客觀評估分數，所有迴響時間條件的平均短時客觀理解度（Short-Time Objective Intelligibility, STOI）分數達到0.76622，而歸一化共異變數指標（Normalized Covariance Metric, NCM）分數高達0.91302相對其他電子耳語音編碼策略的客觀評量分數，RT-Unet表現更為優秀，證明RT-Unet模型架構的可行性，為人工電子耳語音訊號處理提供可研究發展方向。;This paper analyzes the differences in Chinese speech comprehension under simulated noise and reverberation conditions with various cochlear implant (CI) speech coding strategies. It also aims to develop a deep learning dereverberation strategy for CI speech encoding to address reverberation issues. A CI is an assistive listening device implanted in the cochlea that can stimulate the auditory nerve, helping to restore auditory perception for patients with severe hearing loss. Currently, there are many different speech processing encoding strategies for CIs, such as the Advanced Combination Encoder (ACE) strategy, which is the most widely used and commercialized. In recent years, researchers have developed CI processing strategy models based on neural network architectures, such as ElectrodeNet-CS. Typically, in quiet environments, CI speech coding strategies can provide good speech recognition and comprehension. However, in environments with noise and reverberation interference, the speech recognition and comprehension abilities of CI users drastically decline. This study first conducted clinical trials comparing the ACE strategy and the ElectrodeNet-CS strategy, examining the differences in word recognition scores (WRS) of cochlear implant users under various noise conditions. According to the clinical trial results, both strategies achieved an average word recognition score of over 80\% in clean speech conditions. However, as the noise level increased, the average scores of CI users significantly declined. When the signal-to-noise ratio (SNR) was -5dB, the recognition scores dropped to below 10\%. Noise severely impacts the speech recognition ability of CI users, and current CI speech encoding strategies still cannot effectively handle noise issues. To evaluate whether the neural network-developed CI speech encoding strategy performs similarly to traditional strategies, paired-samples T-tests were used to examine the experimental data of the two strategies. The test results showed no significant differences between the two strategies, indicating that neural network strategies can achieve functionality similar to traditional strategies. Subsequently, a second clinical trial was conducted using the LS-Unet deep learning dereverberation model as a preprocessing step for both strategies. The study compared the speech recognition performance of the two strategies under various noise and reverberation conditions, evaluating whether using LS-Unet as preprocessing effectively improves the speech recognition ability of cochlear implant users affected by noise and reverberation. According to the clinical trial results, the average word recognition scores under all noise and reverberation conditions were below 50\%. Adding LS-Unet to both strategies did not provide good speech recognition performance, indicating that there is still significant room for improvement in handling noise and reverberation.In the participants of Clinical Trial II, there is one case whose experimental results are significantly higher than those of other cases, with recognition scores exceeding 70\% under all conditions. This individual is the only participant with congenital hearing loss who received a cochlear implant before developing speech, which might be the reason for their superior recognition ability. Finally, based on the clinical trial results, it was found that cochlear implant speech encoding strategies cannot effectively convert speech signals under noise and reverberation conditions. This study aims to improve the speech recognition ability of cochlear implants affected by reverberation by using the Unet model to separate speech features from reverberant signals. By adding different functional network layers, an innovative deep learning dereverberation cochlear implant speech encoding strategy, RT-Unet, was developed. The model takes reverberant speech spectrum signals as input, processes them, and outputs speech electrode signals. This thesis provides a detailed description of the model architecture, training methods, and test evaluation results. In the evaluation of the test results, RT-Unet achieved excellent objective evaluation scores when handling reverberant speech. The average Short-Time Objective Intelligibility (STOI) score for all reverberation time conditions reached 0.76622, and the Normalized Covariance Metric (NCM) score was as high as 0.91302. Compared to the objective evaluation scores of other cochlear implant speech encoding strategies, RT-Unet performed better, demonstrating the feasibility of the RT-Unet model architecture and providing a promising research and development direction for cochlear implant speech signal processing.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	19	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....