探討以聽覺生理為基礎和以深度學習為基礎之人工電子耳聲音編碼策略

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：29

、訪客IP：18.225.195.213

姓名

黃心和(Enoch Hsin-Ho Huang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

探討以聽覺生理為基礎和以深度學習為基礎之人工電子耳聲音編碼策略
(Investigations of Cochlear Implant Sound Coding Strategies Based on Auditory Physiology and Deep Learning)

相關論文

★ 獨立成份分析法於真實環境中聲音訊號分離之探討	★ 口腔核磁共振影像的分割與三維灰階值內插
★ 數位式氣喘尖峰氣流量監測系統設計	★ 結合人工電子耳與助聽器對中文語音辨識率的影響
★ 人工電子耳進階結合編碼策略的中文語音辨識成效模擬--結合助聽器之分析	★ 中文發聲之神經關聯性的腦功能磁振造影研究
★ 利用有限元素法建構3維的舌頭力學模型	★ 以磁振造影為基礎的立體舌頭圖譜之建構
★ 腎小管之草酸鈣濃度變化與草酸鈣結石關係之模擬研究	★ 口腔磁振影像舌頭構造之自動分割
★ 微波輸出窗電性匹配之研究	★ 以軟體為基準的助聽器模擬平台之發展-噪音消除
★ 以軟體為基準的助聽器模擬平台之發展-回饋音消除	★ 模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識率之影響
★ 用類神經網路研究中文語音聲調產生之神經關聯性	★ 教學用電腦模擬生理系統之建構

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文是關於人工電子耳(Cochlear Implant, CI，又稱為人工耳蝸)聲音編碼策略的研究成果，其中探索了以聽覺生理和深度學習為基礎的編碼策略之原理與機制，並模擬這些策略在中文語音理解度(Speech Intelligibility)方面的表現。聲音編碼策略的重要功能是負責將關鍵語音資訊轉換為大腦可理解的神經脈衝形式，讓經過壓縮的電刺激訊號得以通過電神經瓶頸(Electroneural Bottleneck)。目前電子耳聆聽仍有其限制，故編碼策略的改良頗為重要。

本研究將聽覺生理知識及人工智慧技術，分別應用於電子耳編碼策略的改良。在聽覺生理的探討中，選出三個以聽覺生理為基礎的編碼策略：生物助聽器(Biologically Inspired Hearing Aid, BioAid)、包絡增強(Envelope Enhancement, EE)、基本頻率調變(Fundamental Frequency Modulation, F0mod)，將三者與目前最廣泛使用的進階組合編碼(Advanced Combination Encoder, ACE)策略整合，成為四個不同的單獨性編碼策略(Singular Coding Strategy)，且進而提出了四種所衍生而成的組合性編碼策略(Combinational Coding Strategy)，再進行獨特的比較性研究(Comparative Study)。在深度學習的研究中，有別於傳統的編碼策略和機器學習前處理，我們直接以深度學習開發的編碼策略ElectrodeNet。此研究除了對於深度神經網路(Deep Neural Network, DNN)、卷積神經網路(Convolutional Neural Network, CNN)、長短期記憶網路(Long Short-Term Memory, LSTM)的架構進行效果評估，也針對多種不同的實驗條件進行比較，更提出了涵蓋頻道選擇(Channel Selection, CS)功能的改良版ElectrodeNet-CS策略。本研究採用聲碼器合成電子耳模擬語音，除了進行客觀評估，並在NCU-CI實驗平台上進行正常聽力個案的中文句子聽力測驗。

在聽覺生理的研究結果中，當訊噪比在5 dB以上時，EE策略在短時客觀理解度(Short-Term Objective Intelligibility, STOI)和聽力實驗的平均分數稍微高於ACE策略，而在組合性編碼策略中， EE功能的開啟也可以改善其他編碼策略的語音理解度。在深度學習部分，當ElectrodeNet策略採用DNN、CNN和LSTM的網路架構時，和ACE策略在STOI和正規化共變異數測量(Normalized Covariance Metric, NCM)的分數上呈現了高度的相關性。在不同語言的訓練語料和噪音環境下，ElectrodeNet和ACE策略亦具備密切的關連。此外，更進階的ElectrodeNet-CS策略，甚至在STOI分數上稍微超越ACE的表現。

本研究依照聽覺生理提出了組合性編碼策略及獨特的比較性研究，並發展出以深度學習為處理核心的聲音編碼策略，其成果證實了所提出方法的可行性，亦可對相關領域提供一些啟發。

摘要(英)

This dissertation presents the research outcomes on cochlear implant (CI) sound coding strategies. This study explores the principles and mechanisms of cochlear implant (CI) coding strategies based on auditory physiology and deep learning, and simulates the performance of these strategies in Mandarin speech intelligibility. The coding strategy plays a crucial role in encoding and converting the key speech information into neural impulse patterns that the auditory brain can recognize, so that the compressed electrical stimuli can pass through the limited electroneural bottleneck. With the current limitations in CI listening, the improvement of the sound coding strategy is of great importance.

This study applies relevant knowledge and technology in auditory physiology and artificial intelligence (AI) to the innovation of the CI coding strategy. In the investigation of auditory physiology, three coding strategies based on auditory physiology, including the biologically inspired hearing aid (BioAid), envelope enhancement (EE), and fundamental frequency modulation (F0mod), are selected and integrated with the widely used advanced combination encoder (ACE) strategy. With the four singular coding strategies, it is proposed to derive four combinational coding strategies, and a comparative study was conducted for them. In the investigation of deep learning, unlike traditional coding strategies and machine-learning-based preprocessing, this study introduces ElectrodeNet, a coding strategy developed directly using deep learning. The performance of ElectrodeNet is evaluated for the architectures of deep neural network (DNN), convolutional neural network (CNN), and long short-term memory (LSTM). Various experimental factors were compared. Furthermore, an improved coding strategy containing the channel selection (CS) function, ElectrodeNet-CS, is also proposed.

In the outcomes of the investigation of auditory physiology, the EE strategy achieved average scores in short-term objective intelligibility (STOI) and listening experiments slightly higher than those for ACE at signal-to-noise ratios (SNRs) of 5 dB or above. In combinational coding strategies, the activation of the EE function also slightly improved the speech comprehension of the other coding strategies. In the investigation of deep learning, the ElectrodeNets based on the DNN, CNN, and LSTM architectures demonstrated high correlations with the ACE strategy in terms of STOI and the normalized covariance metric (NCM) scores. With training datasets of different languages and conditions of different noise types, strong relationships were also revealed between ElectrodeNet and ACE. Furthermore, the more advanced strategy of ElectrodeNet-CS even surpasses ACE slightly in STOI scores.

This research conducts a unique comparative study and proposes the combinational coding strategies based on auditory physiology, and develops coding strategies based on deep learning. The research outcomes not only demonstrate the feasibility of the proposed approaches but also offer valuable insights into related fields.

關鍵字(中)

★ 人工電子耳
★ 聲音編碼策略
★ 聽覺生理
★ 深度學習
★ 語音理解度

關鍵字(英)

★ cochlear implant
★ sound coding strategy
★ auditory physiology
★ deep learning
★ speech intelligibility

論文目次

摘要 i
ABSTRACT iii
致謝 v
ACKNOWLDGEMENTS vi
CONTENTS vii
LIST OF FIGURES x
LIST OF TABLES xii

CHAPTER 1 Introduction 1
1.1 Problem Statement 2
1.2 Motivations and Contributions 6
1.3 Dissertation Outline 7

CHAPTER 2 Literature Review 8
2.1 The CI System 8
2.2 Auditory Physiology 12
2.3 Sound Coding Strategy 18
2.4 Deep Learning 27

CHAPTER 3 Methods 32
3.1 Coding Strategies Based on Auditory Physiology 32
3.2 Coding Strategies Based on Deep Learning 36
3.3 Research Environment and Common Evaluation Methods 41
3.4 Evaluation for Strategies Based on Auditory Physiology 48
3.5 Evaluation for Strategies Based on Deep Learning 49

CHAPTER 4 Evaluation Results and Discussions 53
4.1 Results for Coding Strategies Based on Auditory Physiology 53
4.2 Results for Coding Strategies Based on Deep Learning 57
4.3 Discussion on Combinational Strategies 63
4.4 Discussion on Deep-Learning-Based Strategies 65
4.5 General Discussions 69

CHAPTER 5
Conclusion and Future Work 72
5.1 Conclusion 72
5.2 Future Work 73

References 75
Publication List 90

參考文獻

[1] Loizou, P. C. (1998). Mimicking the human ear. IEEE signal processing magazine, 15(5), 101-130.
[2] Zeng, F. G., Rebscher, S., Harrison, W., Sun, X., & Feng, H. (2008). Cochlear implants: System design, integration, and evaluation. IEEE Reviews in Biomedical Engineering, 1, 115-142.
[3] Clark, G. M. (2008). Personal reflections on the multichannel cochlear implant and a view. Journal of Rehabilitation Research & Development, 45(58), 651-694.
[4] Clark, G. M. (2015). The multichannel cochlear implant: Multidisciplinary development of electrical stimulation of the cochlea and the resulting clinical benefit. Hearing Research, 322, 4-13.
[5] Wilson, B. S., & Dorman, M. F. (2008). Cochlear implants: Current designs and future possibilities. Journal of Rehabilitation Research and Development, 45(5), 695-730.
[6] Wilson, B. S. (2019). The remarkable cochlear implant and possibilities for the next large step forward. Acoustics Today, 15(1), 53-61.
[7] Carlyon, R. P., & Goehring, T. (2021). Cochlear implant research and development in the twenty-first century: A critical update. Journal of the Association for Research in Otolaryngology, 22(5), 481-508.
[8] Zeng, F. G. (2022). Celebrating the one millionth cochlear implant. JASA Express Letters, 2(7), 077201.
[9] Morton, C. C., & Nance, W. E. (2006). Newborn hearing screening—a silent revolution. New England Journal of Medicine, 354(20), 2151-2164.
[10] Lin, H.-C., Shu, M.-T., Chang, K.-C., & Bruna, S. M. (2002). A universal newborn hearing screening program in Taiwan. International Journal of Pediatric Otorhinolaryngology, 63(3), 209-218
[11] Lin, H.-C., Chang, H.-W., & Hsieh, W.-H. (2018). The past, present and future of universal newborn hearing screening in Taiwan. Journal of Early Hearing Detection and Intervention, 3(1), 54-56.
[12] Percy-Smith, L., Tønning, T. L., Josvassen, J. L., Mikkelsen, J. H., Nissen, L., Dieleman, E., ... & Cayé-Thomasen, P. (2018). Auditory verbal habilitation is associated with improved outcome for children with cochlear implant. Cochlear Implants International, 19(1), 38-45.
[13] Kral, A., Dorman, M. F., & Wilson, B. S. (2019). Neuronal development of hearing and language: Cochlear implants and critical periods. Annual Review of Neuroscience, 42, 47-65.
[14] Wilson, B. S., Finley, C. C., Lawson, D. T., Wolford, R. D., Eddington, D. K., & Rabinowitz, W. M. (1991). Better speech recognition with cochlear implants. Nature, 352(6332), 236-238.
[15] Vandali, A. E., Whitford, L. A., Plant, K. L., & Clark, G. M. (2000). Speech perception as a function of electrical stimulation rate: Using the Nucleus 24 cochlear implant system. Ear and Hearing, 21(6), 608-624.
[16] Taddei, A., López, E. A., & Reyes, R. A. R. (2021). Children with hearing disabilities during the pandemic: Challenges and perspectives of inclusion. Education Sciences & Society-Open Access, 12(1), 178–196.
[17] Perea Pérez, F., Hartley, D. E., Kitterick, P. T., & Wiggins, I. M. (2022). Perceived listening difficulties of adult cochlear implant users under measures introduced to combat the spread of COVID-19. Trends in Hearing, 26, 1–22.
[18] Wouters, J., McDermott, H. J., & Francart, T. (2015). Sound coding in cochlear implants: From electric pulses to hearing. IEEE Signal Processing Magazine, 32(2), 67-80.
[19] Dhanasingh, A., & Jolly, C. (2017). An overview of cochlear implant electrode array designs. Hearing Research, 356, 93-103.
[20] Jeschke, M., & Moser, T. (2015). Considering optogenetic stimulation for cochlear implants. Hearing Research, 322, 224-234.
[21] Dombrowski, T., Rankovic, V., & Moser, T. (2019). Toward the optical cochlear implant. Cold Spring Harbor Perspectives in Medicine, 9(8), a033225.
[22] Revuelta, M., Santaolalla, F., Arteaga, O., Alvarez, A., SánchezdelRey, A., & Hilario, E. (2017). Recent advances in cochlear hair cell regeneration—a promising opportunity for the treatment of agerelated hearing loss. Ageing research reviews, 36, 149-155.
[23] Chen, Y., Zhang, S., Chai, R., & Li, H. (2019). Hair cell regeneration. In H. Li & R. Chai (Eds.), Hearing Loss: Mechanisms, Prevention and Cure, 1-16, Springer, Singapore.
[24] Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270(5234), 303-304.
[25] Dorman, M. F., Loizou, P. C., Spahr, A. J., & Maloff, E. (2002). A comparison of the speech understanding provided by acoustic models of fixedchannel and channelpicking signal processors for cochlear implants. Journal of Speech, Language, and Hearing Research, 45(4), 783-788.
[26] Moberly, A. C., Doerfer, K., & Harris, M. S. (2019). Does Cochlear implantation improve cognitive function?. The Laryngoscope, 129(10), 2208-2209.
[27] Almomani, F., AlMomani, M. O., Garadat, S., Alqudah, S., Kassab, M., Hamadneh, S., ... & Gans, R. (2021). Cognitive functioning in Deaf children using Cochlear implants. BMC pediatrics, 21(1), 1-13.
[28] McRackan, T. R., Bauschard, M., Hatch, J. L., FrankoTobin, E., Droghini, H. R., Nguyen,
S. A., & Dubno, J. R. (2018). Meta-analysis of quality-of-life improvement after cochlear implantation and associations with speech recognition abilities. The Laryngoscope, 128(4), 982-990.
[29] Haukedal, C. L., Lyxell, B., & Wie, O. B. (2020). Healthrelated quality of life with cochlear implants: The children’s perspective. Ear and Hearing, 41(2), 330343.
[30] Bond, M., Mealing, S., Anderson, R., Elston, J., Weiner, G., Taylor, R. S., ... & Stein, K. (2009). The effectiveness and cost-effectiveness of cochlear implants for severe to profound deafness in children and adults: A systematic review and economic model. Health Technology Assessment, 13(44), 1-330.
[31] Huang, E. H.-H., Wu, C.-M., & Lin, H.-C. (2019, Nov. 28) Simulation of three auditory physiology based CI sound coding strategies with Mandarin speech. Proc. 12th Asia Pacific Symposium on Cochlear Implant and Related Sciences (APSCI2019). Tokyo, Japan, pp. O2-2.
[32] Huang, E. H.-H., Wu, C.-M., & Lin, H.-C. (2021). Combination and comparison of sound coding strategies using cochlear implant simulation with Mandarin speech. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 2407-2416.
[33] Huang, E. H.-H., Hung, K.-H., Tsao, Y., & Wu, C.-M. (2019, Nov. 28) ElectrodeNet–Artificial intelligence based sound coding strategy for cochlear implants. Proc. 12th Asia Pacific Symposium on Cochlear Implant and Related Sciences (APSCI2019). Tokyo, Japan, pp. O2-5.
[34] Huang, E. H.-H., Chao, R., Tsao, Y., & Wu, C.-M. (2023). ElectrodeNet–A Deep Learning Based Sound Coding Strategy for Cochlear Implants. IEEE Transactions on Cognitive and Developmental Systems, Accepted, May 2, 2023.
[35] Henry, F., Glavin, M., & Jones, E. (2023). Noise reduction in cochlear implant signal processing: A review and recent developments. IEEE reviews in biomedical engineering, 16, 319331.
[36] Carlson, M. L., Neff, B. A., Link, M. J., Lane, J. I., Watson, R. E., McGee, K. P., ... & Driscoll, C. L. (2015). Magnetic resonance imaging with cochlear implant magnet in place: Safety and imaging quality. Otology & Neurotology, 36(6), 965971.
[37] Cochlear Implant HELP. (2022, December 12) Cochlear Implant Comparison Chart v12.3a, accessed June 2, 2023, from https://cochlearimplanthelp.com/cochlear-implant-comparison-chart/
[38] Pickles, J. (2012). An introduction to the physiology of hearing (4th ed.). Emerald Group Publishing
[39] Úlehlová, L., Voldřich, L., & Janisch, R. (1987). Correlative study of sensory cell density and cochlear length in humans. Hearing research, 28(23), 149-151.
[40] Hakizimana, P., & Fridberger, A. (2021). Inner hair cell stereocilia are embedded in the tectorial membrane. Nature Communications, 12(1), 2604.
[41] LopezPoveda, E. A. (2018). Olivocochlear efferents in animals and humans: From anatomy to clinical relevance. Frontiers in Neurology, 9, 197.
[42] Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences, 6(1), 37-46.
[43] Swanson, B. A. (2008). Pitch perception with cochlear implants. Ph.D. thesis. University of Melbourne, Australia.
[44] Saremi, A., Beutelmann, R., Dietz, M., Ashida, G., Kretzberg, J., & Verhulst, S. (2016). A comparative study of seven human cochlear filter models. The Journal of the Acoustical Society of America, 140(3), 1618-1634.
[45] Baby, D., Van Den Broucke, A., & Verhulst, S. (2021). A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications. Nature Machine Intelligence, 3(2), 134143.
[46] Vecchi, A. O., Varnet, L., Carney, L. H., Dau, T., Bruce, I. C., Verhulst, S., & Majdak, P. (2022). A comparative study of eight human auditory models of monaural processing. Acta Acustica, 6, 17.
[47] Majdak, P., Hollomey, C., & Baumgartner, R. (2022). AMT 1. x: A toolbox for reproducible research in auditory modeling. Acta Acustica, 6, 19.
[48] Seligman, P. M., Patrick, J. F., Tong, Y. C., Clark, G. M., Dowell, R. C., & Crosby, P. A. (1984). A signal processor for a multiple-electrode hearing prosthesis. Acta Oto-Laryngologica, 98(sup411), 135-139.
[49] Blamey, P. J., Dowell, R. C., Clark, G. M., & Seligman, P. M. (1987). Acoustic parameters measured by a formant-estimating speech processor for a multiple-channel cochlear implant. The Journal of the Acoustical Society of America, 82(1), 38-47.
[50] Patrick, J. F., & Clark, G. M. (1991). The Nucleus 22channel cochlear implant system. Ear Hear, 12(4), 35-95.
[51] Eddington, D. K. (1980). Speech discrimination in deaf subjects with cochlear implants.
The Journal of the Acoustical Society of America, 68(3), 885891.
[52] Merzenich, M. M. (1985). UCSF cochlear implant device. In: R.A. Schindler and M.M. Merzenich (Eds.), Cochlear Implants, Raven Press, New York, 121-129.
[53] Kessler, D. K. (1999). The Clarion® Multi-strategy™ cochlear implant. Annals of Otology, Rhinology & Laryngology, 108, 8-16.
[54] Nogueira, W., Büchner, A., Lenarz, T., & Edler, B. (2005). A psychoacoustic "NofM" type speech coding strategy for cochlear implants. EURASIP Journal on Advances in Signal Processing, 2005(18), 3044-3059.
[55] Riss, D., Hamzavi, J. S., Selberherr, A., Kaider, A., Blineder, M., Starlinger, V., ... & Arnoldner, C. (2011). Envelope versus fine structure speech coding strategy: A crossover study. Otology & Neurotology, 32(7), 1094-1101.
[56] Brendel, M., Buechner, A., Krueger, B., Frohne-Buechner, C., & Lenarz, T. (2008). Evaluation of the Harmony soundprocessor in combination with the speech coding strategy HiRes 120. Otology & Neurotology, 29(2), 199-202.
[57] Firszt, J. B., Holden, L. K., Reeder, R. M., & Skinner, M. W. (2009). Speech recognition in cochlear implant recipients: Comparison of standard HiRes and HiRes 120 sound processing. Otology & Neurotology, 30(2), 146.
[58] Reynolds, S. M., & Gifford, R. H. (2019). Effect of signal processing strategy and stimulation type on speech and auditory perception in adult cochlear implant users. International Journal of Audiology, 58(6), 363372.
[59] Schramm, D., Chen, J., Morris, D. P., Shoman, N., Philippon, D., CayéThomasen, P., ... & Gnansia, D. (2020). Clinical efficiency and safety of the Oticon Medical Neuro cochlear implant system: A multicenter prospective longitudinal study. Expert Review of Medical Devices, 17(9), 959967.
[60] Swanson, B., Van Baelen, E., Janssens, M., Goorevich, M., Nygard, T., & Van Herck, K. (2007). Cochlear implant signal processing ICs. In IEEE Custom Integrated Circuits Conference (CICC) 437-442.
[61] Oppenheim, A. V., Schafer, R. W., & Buck, J. R. (1999). Discrete-Time Signal Processing. (2nd ed.) Prentice-Hall.
[62] Nie, K., Stickney, G., & Zeng, F. G. (2005). Encoding frequency modulation to improve cochlear implant performance in noise. IEEE transactions on biomedical engineering, 52(1), 64-73.
[63] Laneau, J., Wouters, J., & Moonen, M. (2006). Improved music perception with explicit pitch coding in cochlear implants. Audiology and Neurotology, 11(1), 38-52.
[64] Milczynski, M., Wouters, J., & Van Wieringen, A. (2009). Improved fundamental frequency coding in cochlear implant signal processing. The Journal of the Acoustical Society of America, 125(4), 2260-2271.
[65] Milczynski, M., Chang, J. E., Wouters, J., & Van Wieringen, A. (2012). Perception of Mandarin Chinese with cochlear implants using enhanced temporal pitch cues. Hearing Research, 285(12), 1-12.
[66] Francart, T., Osses, A., & Wouters, J. (2015). Speech perception with F0mod, a cochlear implant pitch coding strategy. International Journal of Audiology, 54(6), 424-432.
[67] Vandali, A. E., & van Hoesel, R. J. (2011). Development of a temporal fundamental frequency coding strategy for cochlear implants. The Journal of the Acoustical Society of America, 129(6), 4023-4036.
[68] Vandali, A. E., Dawson, P. W., & Arora, K. (2017). Results using the OPAL strategy in Mandarin speaking cochlear implant recipients. International Journal of Audiology, 56(sup2), S74-S85.
[69] Vandali, A., Dawson, P., Au, A., Yu, Y., Brown, M., Goorevich, M., & Cowan, R. (2019). Evaluation of the optimized pitch and language strategy in cochlear implant recipients. Ear and Hearing, 40(3), 555-567.
[70] Ping, L., Wang, N., Tang, G., Lu, T., Yin, L., Tu, W., & Fu, Q. J. (2017). Implementation and preliminary evaluation of ′C-tone′: A novel algorithm to improve lexical tone recognition in Mandarin-speaking cochlear implant users. Cochlear Implants International, 18(5), 240-249.
[71] Meng, Q., Zheng, N., & Li, X. (2015). A temporal limits encoder for cochlear implants. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5863-5867.
[72] Meng, Q., Zheng, N., & Li, X. (2016). Mandarin speech-in-noise and tone recognition using vocoder simulations of the temporal limits encoder for cochlear implants. The Journal of the Acoustical Society of America, 139(1), 301310.
[73] Kan, A., & Meng, Q. (2021). The temporal limits encoder as a sound coding strategy for bilateral cochlear implants. IEEE/ACM transactions on audio, speech, and language processing, 29, 265-273.
[74] Zhou, H., Kan, A., Yu, G., Guo, Z., Zheng, N., & Meng, Q. (2022). Pitch perception with the temporal limits encoder for cochlear implants. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 30, 2528-2539.
[75] Ali, H., Hong, F., Hansen, J. H., & Tobey, E. (2014). Improving channel selection of sound coding algorithms in cochlear implants. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 905-909.
[76] Saba, J. N., Ali, H., & Hansen, J. H. (2018). Formant priority channel selection for an
"n-of-m" sound processing strategy for cochlear implants. The Journal of the Acoustical Society of America, 144(6), 3371-3380.
[77] Saba, J. N., Ali, H., & Hansen, J. H. (2023). The effects of estimation accuracy, estimation approach, and number of selected channels using formant-priority channel selection for an "n-of-m" sound processing strategy for cochlear implants. The Journal of the Acoustical
Society of America, 153(5), 3100-3100.
[78] Li, X., Nie, K., Imennov, N. S., Rubinstein, J. T., & Atlas, L. E. (2013). Improved perception of music with a harmonic based algorithm for cochlear implants. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 21(4), 684-694.
[79] Geurts, L., & Wouters, J. (1999). Enhancing the speech envelope of continuous interleaved sampling processors for cochlear implants. The Journal of the Acoustical Society of America, 105(4), 2476-2484.
[80] Koning, R., & Wouters, J. (2012). The potential of onset enhancement for increased speech intelligibility in auditory prostheses. The Journal of the Acoustical Society of America, 132(4), 2569-2581.
[81] Koning, R., & Wouters, J. (2016). Speech onset enhancement improves intelligibility in adverse listening conditions for cochlear implant users. Hearing Research, 342, 13-22.
[82] Nogueira, W., Rode, T., & Büchner, A. (2016). Spectral contrast enhancement improves speech intelligibility in noise for cochlear implants. The Journal of the Acoustical Society of America, 139(2), 728-739.
[83] Lai, W. K., Dillier, N., & Killian, M. (2018). A neural excitability based coding strategy for cochlear implants. Journal of Biomedical Science and Engineering, 11(07), 159181.
[84] Tabibi, S., Kegel, A., Lai, W. K., & Dillier, N. (2020). A bioinspired coding (BIC) strategy for cochlear implants. Hearing Research, 388, 107885.
[85] LopezPoveda, E. A., EustaquioMartín, A., Stohl, J. S., Wolford, R. D., Schatzer, R., & Wilson, B. S. (2016). A binaural cochlear implant sound coding strategy inspired by the contralateral medial olivocochlear reflex. Ear and Hearing, 37(3), e138.
[86] LopezPoveda, E. A., & EustaquioMartín, A. (2018). Objective speech transmission improvements with a binaural cochlear implant sound-coding strategy inspired by the contralateral medial olivocochlear reflex. The Journal of the Acoustical Society of America, 143(4), 2217-2231.
[87] LopezPoveda, E. A., EustaquioMartín, A., Fumero, M. J., Gorospe, J. M., López, R. P., Revilla, M. A. G., ... & Stohl, J. S. (2020). Speech-in-noise recognition with more realistic implementations of a binaural cochlear-implant sound coding strategy inspired by the medial olivocochlear reflex. Ear and Hearing, 41(6), 1492.
[88] Meddis, R., Clark, N.R., Lecluyse, W., and Jürgens, T. (2013). BioAid-Ein biologisch inspiriertes hörgerät (BioAidA biologically inspired hearing aid),＂Zeitschrift der Audiologie/Audiological Acoustics, 52, 148-152.
[89] Jürgens, T., Clark, N. R., Lecluyse, W., & Meddis, R. (2016). Exploration of a physiologically-inspired hearing-aid algorithm using a computer model mimicking impaired hearing. International Journal of Audiology, 55(6), 346-357.
[90] Langner, F., & Jürgens, T. (2016). Forward-masked frequency selectivity improvements in simulated and actual cochlear implant users using a preprocessing algorithm. Trends in Hearing, 20, 1-14.
[91] Clark, N. R., Lecluyse, W., & Jürgens, T. (2018). Analysis of compressive properties of the BioAid hearing aid algorithm. International Journal of Audiology, 57(sup3), S130-S138.
[92] Ernst, S. M., Kortlang, S., Grimm, G., Bisitz, T., Kollmeier, B., & Ewert, S. D. (2018). Binaural model-based dynamic-range compression. International Journal of Audiology, 57(sup3), S31-S42.
[93] Lopez-Poveda, E. A., & Meddis, R. (2001). A human nonlinear cochlear filterbank. The Journal of the Acoustical Society of America, 110(6), 3107-3118.
[94] Clark, N. (2012) The biologically inspired hearing aid, accessed June 6, 2023, from http:// bioaid.org.uk/
[95] Moore, B. C., & Carlyon, R. P. (2005). Perception of pitch by people with cochlear hearing loss and by cochlear implant users. Pitch: Neural coding and perception, 234–277. Springer.
[96] Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd Edition). Pearson.
[97] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354-359.
[98] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pretraining.
[99] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
[100] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
[101] Werbos, P. J. (1981). Applications of advances in nonlinear sensitivity analysis. In Proceedings of the 10th IFIP Conference, New York City, (pp. 762-770).
[102] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366.
[103] Bolner, F., Goehring, T., Monaghan, J., Van Dijk, B., Wouters, J., & Bleeck, S. (2016). Speech enhancement based on neural networks applied to cochlear implant coding strategies. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6520-6524).
[104] Goehring, T., Bolner, F., Monaghan, J. J., Van Dijk, B., Zarowski, A., & Bleeck, S. (2017). Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users. Hearing Research, 344, 183-194.
[105] Lai, Y. H., Chen, F., Wang, S. S., Lu, X., Tsao, Y., & Lee, C. H. (2017). A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation. IEEE Transactions on Biomedical Engineering, 64(7), 1568-1578.
[106] Lai, Y. H., Tsao, Y., Lu, X., Chen, F., Su, Y. T., Chen, K. C., ... & Lee, C. H. (2018). Deep
learning–based noise reduction approach to improve speech intelligibility for cochlear implant recipients. Ear and Hearing, 39(4), 795-809.
[107] LeCun, Y. (1989). Generalization and network design strategies. Technical Report CRG-TR894, 1–19.
[108] Mamun, N., Khorram, S., & Hansen, J. H. (2019). Convolutional neural network-based speech enhancement for cochlear implant recipients. In Interspeech, pp. 4265–4269.
[109] Wang, N. Y. H., Wang, H. L. S., Wang, T. W., Fu, S. W., Lu, X., Wang, H. M., & Tsao, Y. (2021). Improving the intelligibility of speech for simulated electric and acoustic stimulation using fully convolutional neural networks. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 184-195.
[110] Tseng, R. Y., Wang, T. W., Fu, S. W., Lee, C. Y., & Tsao, Y. (2020). A study of joint effect on denoising techniques and visual cues to improve speech intelligibility in cochlear implant simulation. IEEE Transactions on Cognitive and Developmental Systems, 13(4), 984-994.
[111] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
Hochreiter, S., & Schmidhuber, J. (1997). Long shortterm memory. Neural Computation, 9(8), 1735-1780.
[113] Nogueira, W., Gajecki, T., Krüger, B., Janer Mestres, J., & Büchner, A. (2016). Development of a sound coding strategy based on a deep recurrent neural network for monaural source separation in cochlear implants. In Proc. 12th ITG Conference on Speech Communication.
[114] Goehring, T., Keshavarzi, M., Carlyon, R. P., & Moore, B. C. (2019). Using recurrent neural networks to improve the perception of speech in nonstationary noise by people with cochlear implants. The Journal of the Acoustical Society of America, 146(1), 705-718.
[115] Chu, K., Collins, L., & Mainsah, B. (2021). A causal deep learning framework for classifying phonemes in cochlear implants. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6498-6502.
[116] Lesica, N. A., Mehta, N., Manjaly, J. G., Deng, L., Wilson, B. S., & Zeng, F. G. (2021). Harnessing the power of artificial intelligence to transform hearing healthcare and research. Nature Machine Intelligence, 3(10), 840-849.
[117] Crowson, M. G., Lin, V., Chen, J. M., & Chan, T. C. (2020). Machine learning and cochlear implantation—a structured review of opportunities and challenges. Otology & Neurotology, 41(1), e36-e45.
[118] GuerraJiménez, G., De Miguel, Á. R., González, J. C. F., Barreiro, S. B., Plasencia, D. P., & Macías, Á. R. (2016). Cochlear implant evaluation: Prognosis estimation by data mining system. The Journal of International Advanced Otology, 12(1), 1-7.
[119] Gao, X., Grayden, D. B., & McDonnell, M. D. (2016). Modeling electrode place discrimination in cochlear implant stimulation. IEEE Transactions on Biomedical Engineering, 64(9), 22192229.
[120] Pile, J., Wanna, G. B., & Simaan, N. (2017). Robotassisted perception augmentation for online detection of insertion failure during cochlear implant surgery. Robotica, 1598-1615.
[121] Meeuws, M., Pascoal, D., Bermejo, I., Artaso, M., De Ceulaer, G., & Govaerts, P. J. (2017). Computer-assisted CI fitting: Is the learning capacity of the intelligent agent FOX beneficial for speech understanding?. Cochlear Implants International, 18(4), 198-206.
[122] Desmond, J. M., Collins, L. M., & Throckmorton, C. S. (2013). Using channelspecific statistical models to detect reverberation in cochlear implant stimuli. The Journal of the Acoustical Society of America, 134(2), 1112-1120.
Chu, K., Throckmorton, C., Collins, L., & Mainsah, B. (2018). Using machine learning to mitigate the effects of reverberation and noise in cochlear implants. In Proceedings of Meetings on Acoustics 33(1), 1–13.
[124] Pons, J., Janer, J., Rode, T., & Nogueira, W. (2016). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. The Journal of the Acoustical Society of America, 140(6), 4338-4349.
[125] Gajęcki, T., & Nogueira, W. (2018). Deep learning models to remix music for cochlear implant users. The Journal of the Acoustical Society of America, 143(6), 36023615.
[126] Nogueira, W., Nagathil, A., & Martin, R. (2019). Making music more accessible for cochlear implant listeners: Recent developments. IEEE Signal Processing Magazine, 36(1), 115-127.
[127] Tahmasebi, S., Gajȩcki, T., & Nogueira, W. (2020). Design and evaluation of a realtime audio source separation algorithm to remix music for cochlear implant users. Frontiers in Neuroscience, 14, 434.
[128] Bianco, M. J., Gerstoft, P., Traer, J., Ozanich, E., Roch, M. A., Gannot, S., & Deledalle, C. A. (2019). Machine learning in acoustics: Theory and applications. The Journal of the Acoustical Society of America, 146(5), 3590-3628.
[129] Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S. Y., & Sainath, T. (2019). Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing, 13(2), 206-219.
[130] Wang, D. (2017). Deep learning reinvents the hearing aid. IEEE spectrum, 54(3), 32-37.
Beck, D. (2021). Hearing, listening and deep neural networks in hearing aids. Journal of Otolaryngology, 13(1), 58.
[132] Vandali, A. E., Sucher, C., Tsang, D. J., McKay, C. M., Chew, J. W., & McDermott,
H. J. (2005). Pitch ranking ability of cochlear implant recipients: A comparison of soundprocessing strategies. The Journal of the Acoustical Society of America, 117(5), 31263138.
[133] Morton, K. D., Torrione Jr, P. A., Throckmorton, C. S., & Collins, L. M. (2008). Mandarin Chinese tone identification in cochlear implants: Predictions from acoustic models. Hearing Research, 244(12), 6676.
[134] Swanson, B., & Mauch, H. (2006). Nucleus Matlab Toolbox 4.20 software user manual, Cochlear Ltd, Lane Cove NSW, Australia.
[135] Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML (Vol. 30, No. 1, p. 3).
[136] Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proc. ICLR, May 2015.
[137] PyTorch. (2023) TORCH.TOPK, accessed June 6, 2023, from https://pytorch.org/docs/ stable/generated/torch.topk.html
[138] Wu, C.-M., Huang, K.-Y., & Lin, H.-C. (2009). Effects of channel number, stimulation rate, and electroacoustic stimulation of cochlear implant simulation on Chinese speech recognition in noise. Proc. 7th Asia Pacific Symposium on Cochlear Implant and Related Sciences (APSCI2009). Singapore, pp. RS2B7.
[139] Wong, L. L., Soli, S. D., Liu, S., Han, N., & Huang, M. W. (2007). Development of the Mandarin hearing in noise test (MHINT). Ear and Hearing, 28(2), 70S-74S.
[140] Yang, H.-M., & Wu, J.-L. (2005). Mandarin lexical neighborhood test (MLNT) for pre school children: Development of test and its validation. Journal of Taiwan Otolaryngology Head and Neck Surgery, 40, 1-12.
[141] Nissen, S. L., Harris, R. W., & Dukes, A. (2008). Word recognition materials for native speakers of Taiwan Mandarin. American Journal of Audiology, 17(1), 6879.
[142] Nissen, S. L., Harris, R. W., & Slade, K. B. (2007). Development of speech reception threshold materials for speakers of Taiwan Mandarin. International Journal of Audiology, 46(8), 449-458.
[143] Cheng, W.-J. (2006). Effects of speech recognition to Chinese-speaking cochlear implant patients combined with acoustic hearing. Master’s thesis. National Central University, Taiwan.
[144] Dong, S.-H. (2007). Modeling advanced combination encoder combined acoustic hearing for Chinese speaking patients using cochlear implants. Master’s thesis. National Central University, Taiwan.
[145] Huang, G.-Y. (2009). Effects of channel number, stimulation rate, and electroacoustic stimulation of cochlear implant simulation on Chinese speech recognition in noise. Master’s thesis. National Central University, Taiwan.
[146] Tsai, W.-L. (2011). Preprocessing with microphone array and noise reduction for electroacoustic stimulation of cochlear implant simulation on Chinese speech recognition in noise. Master’s thesis. National Central University, Taiwan.
[147] Nisa, H. K. (2021). Speech dereverberation based on HELM framework for cochlear implant coding strategy. Master’s thesis. National Central University, Taiwan.
[148] Pratiwi, E. W. (2021). Temporal and spectral analysis of children song perception with different simulated cochlear implant coding strategies. Master’s thesis. National Central University, Taiwan.
[149] Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., & Dahlgren, N. L. (1993). DARPA TIMIT acoustic-phonetic continuous speech corpus CDROM NASA STI/Recon Technical Report N, vol. 93, no. 27403.
[150] Karpagavalli, S., & Chandra, E. (2016). A review on automatic speech recognition architecture and approaches. International Journal of Signal Processing, Image Processing and Pattern Recognition, 9(4), 393404.
[151] Pandey, A., & Wang, D. (2020). On cross-corpus generalization of deep learning based speech enhancement. IEEE/ACM transactions on audio, speech, and language processing, 28, 24892499.
[152] Monson, B. B., & Buss, E. (2022). On the use of the TIMIT, QuickSIN, NU-6, and other widely used bandlimited speech materials for speech perception experiments. The Journal of the Acoustical Society of America, 152(3), 16391645.
[153] King, S. E., Firszt, J. B., Reeder, R. M., Holden, L. K., & Strube, M. (2012). Evaluation of TIMIT sentence list equivalency with adult cochlear implant recipients. Journal of the American Academy of Audiology, 23(05), 313-331.
[154] Gifford, R. H., Loiselle, L., Natale, S., Sheffield, S. W., Sunderhaus, L. W., S. Dietrich, M., & Dorman, M. F. (2018). Speech understanding in noise for adults with cochlear implants: Effects of hearing configuration, source location certainty, and head movement. Journal of Speech, Language, and Hearing Research, 61(5), 1306-1321.
[155] Chen, F., & Hu, Y. (2019). Segmental contributions to cochlear implant speech perception. Speech Communication, 106, 79-84.
[156] Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13(1), 25–45.
[157] Spearman, C. (1904). The Proof and Measurement of Association between Two Things.
The American Journal of Psychology, 15(1), 72–101.
[158] Zou, K. H., Tuncali, K., & Silverman, S. G. (2003). Correlation and simple linear regression. Radiology, 227(3), 617628.
[159] Akoglu, H. (2018). User’s guide to correlation coefficients. Turkish journal of emergency medicine, 18(3), 91-93.
[160] Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2125-2136.
[161] Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing‐impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703-1716.
[162] Goldsworthy, R. L., & Greenberg, J. E. (2004). Analysis of speech-based speech transmission index methods with implications for nonlinear operations. The Journal of the Acoustical Society of America, 116(6), 3679-3689.
[163] Chen, F., & Loizou, P. C. (2011). Predicting the intelligibility of vocoded and wideband Mandarin Chinese. The Journal of the Acoustical Society of America, 129(5), 3281-3290.
[164] Santos, J. F., Cosentino, S., Hazrati, O., Loizou, P. C., & Falk, T. H. (2013). Objective speech intelligibility measurement for cochlear implant users in complex listening environments. Speech Communication, 55(78), 815-824.
[165] Falk, T. H., Parsa, V., Santos, J. F., Arehart, K., Hazrati, O., Huber, R., ... & Scollie,
S. (2015). Objective quality and intelligibility prediction for users of assistive listening devices: Advantages and limitations of existing tools. IEEE signal processing magazine, 32(2), 114-124.
[166] Watkins, G. D., Swanson, B. A., & Suaning, G. J. (2018). An evaluation of output signal to noise ratio as a predictor of cochlear implant speech intelligibility. Ear and Hearing, 39(5), 958-968.
[167] Kates, J. M., & Arehart, K. H. (2015). The hearingaid audio quality index (HAAQI).
IEEE/ACM transactions on audio, speech, and language processing, 24(2), 354-365.
[168] Tahmasebi, S., SegoviaMartinez, M., & Nogueira, W. (2023). Optimization of Sound Coding Strategies to Make Singing Music More Accessible for Cochlear Implant Users. Trends in Hearing, 27, 118.
[169] Hu, H., Lutman, M. E., Ewert, S. D., Li, G., & Bleeck, S. (2015). Sparse nonnegative matrix factorization strategy for cochlear implants. Trends in Hearing, 19, 1–16.
[170] Mourão, G. L., Costa, M. H., & Paul, S. (2020). Speech intelligibility for cochlear implant users with the MMSE noise-reduction time-frequency mask. Biomedical Signal Processing and Control, 60, 101982.
[171] Langner, F., Büchner, A., & Nogueira, W. (2020). Evaluation of an adaptive dynamic compensation system in cochlear implant listeners. Trends in Hearing, 24, 1–13.
[172] Loizou, P. C., & Poroy, O. (2001). Minimum spectral contrast needed for vowel identification by normal hearing and cochlear implant listeners. The Journal of the Acoustical Society of America, 110(3), 1619-1627.
[173] Moore, B. C. (2003). Speech processing for the hearing-impaired: successes, failures, and implications for speech mechanisms. Speech communication, 41(1), 81-91.
[174] Green, T., Faulkner, A., & Rosen, S. (2002). Spectral and temporal cues to pitch in noise excited vocoder simulations of continuousinterleavedsampling cochlear implants. The Journal of the Acoustical Society of America, 112(5), 2155-2164.
[175] Laneau, J., Moonen, M., & Wouters, J. (2006). Factors affecting the use of noiseband vocoders as acoustic models for pitch perception in cochlear implants. The Journal of the Acoustical Society of America, 119(1), 491-506.
[176] Price, M., Glass, J., & Chandrakasan, A. P. (2017, February). 14.4 A scalable speech recognizer with deepneuralnetwork acoustic models and voiceactivated power gating. In IEEE International SolidState Circuits Conference (ISSCC) (pp. 244-245)
[177] Chen, J., & Ran, X. (2019). Deep learning with edge computing: A review. Proceedings of the IEEE, 107(8), 16551674.
[178] Litovsky, R. Y., Goupell, M. J., Kan, A., & Landsberger, D. M. (2017). Use of research interfaces for psychophysical studies with cochlearimplant users. Trends in Hearing, 21, 1–15.
[179] Hagendorff, T. (2020). The ethics of AI ethics: An evaluation of guidelines. Minds and Machines, 30(1), 99120.
[180] Wasmann, J. W. A., Lanting, C. P., Huinck, W. J., Mylanus, E. A., van der Laak, J. W., Govaerts, P. J., ... & Barbour, D. L. (2021). Computational audiology: New approaches to advance hearing health care in the digital age. Ear and Hearing, 42(6), 1499-1507.
[181] Sadjadi, S. O., & Hansen, J. H. (2010). Assessment of singlechannel speech enhancement techniques for speaker identification under mismatched conditions. In INTERSPEECH (pp. 2138–2142).
[182] Fujimoto, M., & Kawai, H. (2019). One-pass single-channel noisy speech recognition using a combination of noisy and enhanced features. In INTERSPEECH (pp. 486-490).
[183] Sato, H., Ochiai, T., Delcroix, M., Kinoshita, K., Kamo, N., & Moriya, T. (2022). Learning to enhance or not: Neural networkbased switching of enhanced and observed signals for overlapping speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6287-6291).
[184] Kaufmann, T. B., Foroogozar, M., Liss, J., & Berisha, V. (2023). Requirements for Mass Adoption of Assistive Listening Technology by the General Public. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

指導教授

吳炤民(Chao-Min Wu)

審核日期

2023-7-26

推文