不同人工電子耳編碼策略之兒歌感知時頻分析;Temporal and Spectral Analysis of Children Song Perception with Different Simulated Cochlear Implant Coding Strategies

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/86772

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86772

題名:	不同人工電子耳編碼策略之兒歌感知時頻分析;Temporal and Spectral Analysis of Children Song Perception with Different Simulated Cochlear Implant Coding Strategies
作者:	普蒂薇;Pratiwi, Epri Wahyu
貢獻者:	電機工程學系
關鍵詞:	熟悉旋律辨認(familiar melody identification);時域品質(temporal quality);頻域品質(spectral quality);音樂(music);節律(rhythm);音調(pitch);人工電子耳(cochlear implant);人工電子耳模擬(cochlear implant simulation);聲碼器(vocoder);familiar melody identification;temporal quality;spectral quality;music;rhythm;pitch;cochlear implant;cochlear implant simulation;vocoder
日期:	2021-08-30
上傳時間:	2021-12-07 13:12:00 (UTC+8)
出版者:	國立中央大學
摘要:	音樂的特徵囊括了不同的頻譜線索和時域線索，這些線索在音樂感知上扮演了重要的角色。人工電子耳(Cochlear Implant, CI)編碼策略主要用於語音傳遞，但此法仍會造成音訊失真。本研究檢驗了音調(pitch)與節律(rhythm)對於旋律辨識(melody recognition)的相對貢獻，同時也評估了三種人工電子耳編碼策略對音樂品質的影響，此三種策略分別為：進階聯合編碼(Advanced Combinational Encoder, ACE) 、基本頻率調變(Fundamental Frequency Modulation, F0mod) ，以及包絡增強(Envelope Enhancement, EE) 。本研究的旋律資料庫內有30首流行於台灣的童歌，每首童歌都有兩種音樂線索，分別為時域線索(音調)以及頻域線索(節律)，這些童歌的旋律皆會經由中央大學的人工電子耳模擬器搭配三種人工耳電子耳的編碼策略所處理。接下來，藉由一個主觀的聆聽測試，此測試是從熟悉旋律辨識(Familiar Melody Identification, FMI)測試中處理後的刺激訊號來測量旋律感知(melody perception)，藉此收集作答正確率及反應時間。共有5名正常聽力參與者參加熟悉旋律辨識測試。開始測試時要先從30首童歌中選出15首，參與者需要使用這15首歌，且這15首歌都會經由不同的人工電子耳編碼策略處理，處理後的歌曲會有不同的音樂特徵(音調及節律)。每名參與者會聽到90種刺激訊號 (15首歌，乘以2種音樂特徵，乘以3種人工電子耳編碼策略)，這些參與者共選出了23首童歌旋律。熟悉旋律辨認的結果顯示，當旋律辨識中留存了節律線索，則熟悉旋律辨識測試的表現顯著較佳(p < 0.05)，其聽者有較高的作答正確率以及較快的反應時間。此外，旋律伴隨節律線索經由進階聯合編碼策略有最好的分數，其分數為86.80%。以此23首童歌旋律作為基礎，並使用包絡差值指標(Envelope Difference Index, EDI) 、音強錯配型態(Intensity Mismatch Pattern)和對數頻譜距離(Log Spectral Distance, LSD)，三種方法來進行客觀分析，來評估原始音樂訊號，以及根據時域特徵及頻率特徵處理後的音樂訊號。原始訊號與經由進階聯合編碼、基本頻率調變、以及包絡增強三種訊號處理方式所處理後的訊號，其之間的平均音強錯配型態(Intensity Mismatch Pattern)分別為5.9、6.4，以及6.0，錯配型態越低，振幅旋律保留得越好；除此之外，原始訊號與處理後的訊號，其間的包絡差值指標數值在進階聯合編碼、基本頻率調變、以及包絡增強三種訊號處理方式的數值分別為0.11、0.11，以及0.15。包絡差值指標越高，頻域包絡保留得越好；原始訊號與處理後的訊號在頻域品質的差異，透過三種訓號處理方式的表現分別為2.10、2.16，以及2.19，對數頻譜距離越低，頻域品質越佳。綜合主觀與客觀分析，進階聯合編碼策略在頻域及時域品質的保留上有最好的效果，另外進階聯合編碼策略和節律線索合併使用時，在旋律辨識上有最高的準確性。 ;Acoustic music features include various spectral and temporal cues, which play a critical role in music perception. The cochlear implant (CI) coding strategy designs primarily to convey speech, but music distortion remains. This study examined the relative contribution of pitch and rhythm to melody recognition, as well as the music quality from three CI coding strategies, Advanced Combinational Encoder (ACE), Fundamental Frequency Modulation (F0mod), and Envelope Enhancement (EE). The database of melody children′s songs consisted of 30 popular songs in Taiwan. Each melody children song had two music features, temporal (pitch) and spectral (rhythm). Then, the melody was processed with three CI coding strategies using NCU-CI, a cochlear implant simulation software. Then, a pilot subjective listening test was conducted to measure the melody perception from the processed stimuli using the familiar melody identification (FMI) test by collecting the percent correct and response time. There were 5 NH participants who joined the FMI test. The FMI test was begun with selecting 15 of 30 songs by the participants. Then, the participants tested with 15 chosen songs with different music features (pitch and rhythm) that were processed with three CI strategies in each FMI test session. Each participant had 90 tested stimuli (15 songs x 2 music features x 3 CI coding strategies). In total, 23 melody children songs were chosen by the participants. The results indicated that when the rhythm cues were preserved in melody recognition, the FMI performance was significantly better (p<0.05) by having a higher percent correct and faster response time than the pitch cues. Also, the melody with the rhythm cues processed with the ACE strategy achieved the best score, 86.80%. Based on the 23 chosen melody children′s songs, it was further examined using objective analysis. The envelope difference index (EDI), the intensity mismatch pattern, and the log spectral distance (LSD) were used to assess the quality of processed music compared to original music based on temporal and spectral features for the objective tests. The average intensity mismatch pattern between original and processed by the ACE, F0mod, and EE strategy were 5.9, 6.4, and 6.0, respectively. The lower the mismatch pattern, the better the amplitude melody was preserved. Then, the EDI value between original and processed by the ACE, F0mod, and EE strategy were 0.11, 0.11, and 0.15, respectively. The higher the EDI value, the better the temporal envelope was preserved. Then, the spectral quality differences between original and processed by the ACE, F0mod, and EE strategy were 2.10, 2.16, and 2.19, respectively. The lower the LSD, the better the spectral quality. In line with the subjective and objective analysis, the ACE strategy was the most outperforming the CI coding strategy in preserving spectral and temporal quality in our study. The results also revealed that the rhythm cues combined with the ACE strategy performed the highest accuracy in the melody recognition.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	57	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....