中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/92797
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38468661      Online Users : 274
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/92797


    Title: 探討以聽覺生理為基礎和以深度學習為基礎之人工電子耳聲音編碼策略;Investigations of Cochlear Implant Sound Coding Strategies Based on Auditory Physiology and Deep Learning
    Authors: 黃心和;Huang, Enoch Hsin-Ho
    Contributors: 電機工程學系
    Keywords: 人工電子耳;聲音編碼策略;聽覺生理;深度學習;語音理解度;cochlear implant;sound coding strategy;auditory physiology;deep learning;speech intelligibility
    Date: 2023-07-26
    Issue Date: 2023-10-04 16:10:54 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 本論文是關於人工電子耳(Cochlear Implant, CI,又稱為人工耳蝸)聲音編碼策略的研究成果,其中探索了以聽覺生理和深度學習為基礎的編碼策略之原理與機制,並模擬這些策略在中文語音理解度(Speech Intelligibility)方面的表現。聲音編碼策略的重要功能是負責將關鍵語音資訊轉換為大腦可理解的神經脈衝形式,讓經過壓縮的電刺激訊號得以通過電神經瓶頸(Electroneural Bottleneck)。目前電子耳聆聽仍有其限制,故編碼策略的改良頗為重要。

    本研究將聽覺生理知識及人工智慧技術,分別應用於電子耳編碼策略的改良。在聽覺生理的探討中,選出三個以聽覺生理為基礎的編碼策略:生物助聽器(Biologically Inspired Hearing Aid, BioAid)、包絡增強(Envelope Enhancement, EE)、基本頻率調變(Fundamental Frequency Modulation, F0mod),將三者與目前最廣泛使用的進階組合編碼(Advanced Combination Encoder, ACE)策略整合,成為四個不同的單獨性編碼策略(Singular Coding Strategy),且進而提出了四種所衍生而成的組合性編碼策略(Combinational Coding Strategy),再進行獨特的比較性研究(Comparative Study)。在深度學習的研究中,有別於傳統的編碼策略和機器學習前處理,我們直接以深度學習開發的編碼策略ElectrodeNet。此研究除了對於深度神經網路(Deep Neural Network, DNN)、卷積神經網路(Convolutional Neural Network, CNN)、長短期記憶網路(Long Short-Term Memory, LSTM)的架構進行效果評估,也針對多種不同的實驗條件進行比較,更提出了涵蓋頻道選擇(Channel Selection, CS)功能的改良版ElectrodeNet-CS策略。本研究採用聲碼器合成電子耳模擬語音,除了進行客觀評估,並在NCU-CI實驗平台上進行正常聽力個案的中文句子聽力測驗。

    在聽覺生理的研究結果中,當訊噪比在5 dB以上時,EE策略在短時客觀理解度(Short-Term Objective Intelligibility, STOI)和聽力實驗的平均分數稍微高於ACE策略,而在組合性編碼策略中, EE功能的開啟也可以改善其他編碼策略的語音理解度。在深度學習部分,當ElectrodeNet策略採用DNN、CNN和LSTM的網路架構時,和ACE策略在STOI和正規化共變異數測量(Normalized Covariance Metric, NCM)的分數上呈現了高度的相關性。在不同語言的訓練語料和噪音環境下,ElectrodeNet和ACE策略亦具備密切的關連。此外,更進階的ElectrodeNet-CS策略,甚至在STOI分數上稍微超越ACE的表現。

    本研究依照聽覺生理提出了組合性編碼策略及獨特的比較性研究,並發展出以深度學習為處理核心的聲音編碼策略,其成果證實了所提出方法的可行性,亦可對相關領域提供一些啟發。
    ;This dissertation presents the research outcomes on cochlear implant (CI) sound coding strategies. This study explores the principles and mechanisms of cochlear implant (CI) coding strategies based on auditory physiology and deep learning, and simulates the performance of these strategies in Mandarin speech intelligibility. The coding strategy plays a crucial role in encoding and converting the key speech information into neural impulse patterns that the auditory brain can recognize, so that the compressed electrical stimuli can pass through the limited electroneural bottleneck. With the current limitations in CI listening, the improvement of the sound coding strategy is of great importance.

    This study applies relevant knowledge and technology in auditory physiology and artificial intelligence (AI) to the innovation of the CI coding strategy. In the investigation of auditory physiology, three coding strategies based on auditory physiology, including the biologically inspired hearing aid (BioAid), envelope enhancement (EE), and fundamental frequency modulation (F0mod), are selected and integrated with the widely used advanced combination encoder (ACE) strategy. With the four singular coding strategies, it is proposed to derive four combinational coding strategies, and a comparative study was conducted for them. In the investigation of deep learning, unlike traditional coding strategies and machine-learning-based preprocessing, this study introduces ElectrodeNet, a coding strategy developed directly using deep learning. The performance of ElectrodeNet is evaluated for the architectures of deep neural network (DNN), convolutional neural network (CNN), and long short-term memory (LSTM). Various experimental factors were compared. Furthermore, an improved coding strategy containing the channel selection (CS) function, ElectrodeNet-CS, is also proposed.

    In the outcomes of the investigation of auditory physiology, the EE strategy achieved average scores in short-term objective intelligibility (STOI) and listening experiments slightly higher than those for ACE at signal-to-noise ratios (SNRs) of 5 dB or above. In combinational coding strategies, the activation of the EE function also slightly improved the speech comprehension of the other coding strategies. In the investigation of deep learning, the ElectrodeNets based on the DNN, CNN, and LSTM architectures demonstrated high correlations with the ACE strategy in terms of STOI and the normalized covariance metric (NCM) scores. With training datasets of different languages and conditions of different noise types, strong relationships were also revealed between ElectrodeNet and ACE. Furthermore, the more advanced strategy of ElectrodeNet-CS even surpasses ACE slightly in STOI scores.

    This research conducts a unique comparative study and proposes the combinational coding strategies based on auditory physiology, and develops coding strategies based on deep learning. The research outcomes not only demonstrate the feasibility of the proposed approaches but also offer valuable insights into related fields.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML35View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明