姓名 郭昱均(Yu-Jyun Guo)  查詢紙本館藏   畢業系所 認知與神經科學研究所
論文名稱 基頻輪廓和音樂經驗對嘈雜環境中語句辨識影響之行為和腦電波實驗
(Effects of Degraded-Fundamental Frequency Contour and Musician Experience on Speech Perception in Noisy Environment: Behavioral and Electroencephalography Experiments)
★ 語音知覺與中文識字能力的關係★ 於虛擬環境中之雙耳時間差偵測機制研究
★ 母語為中文與英文者偵測頻率掃描訊號方向之事件相關腦電波研究★ 動態聽覺編碼在3D空間中的神經相關機制
★ 絕對音感能力對嘈雜環境中旋律和語句辨識影響之行為和腦造影研究
摘要(中) 在噪音環境中的語句辨識(Speech-in-noise; SIN)對於日常溝通方面至關重要,尤其對於老年人和聽力障礙者特別具挑戰性。影響 SIN表現的關鍵因素之一是基頻(Fundamental frequency; F0)的輪廓。音樂家在音樂和語言領域中通常表現出更優異的基頻辨別能力,這可能有助於他們在噪音環境中辨識語句。然而,目前尚不清楚人們如何在缺乏基頻資訊的情況下跟蹤語句,以及音樂家在這種情況下是否具有優勢。為了釐清這些問題,我們進行了行為和腦電圖(EEG)實驗,對中文至關重要的聲調(Tone)和語調(Intonation)之基頻輪廓進行平調,藉以檢測在噪音環境中的語句辨識表現。此外,我們探討音樂家是否在缺乏基頻訊息的情況下具有語句辨識的優勢。
行為實驗探討基頻輪廓對音樂家和非音樂家在噪音中語句辨識表現的影響。實驗中測試30名音樂家和30名非音樂家在不同訊噪比(0, −5, −9 dB)的背景噪音下,對原始(Original)、平坦語調(Flat-intonation)、平坦聲調(Flat-tone)和全部平調(Flat-all)之基頻中文句子的理解度。音樂感知能力則以音樂能力測試(Profile of Music Perception Skills; PROMS)和音高辨別作業(Pitch discrimination task)來測量。結果顯示,平坦語調和平坦聲調的語句理解度相似,而全部平調的語句理解度最低。在噪音中,任何類型的平調基頻語句皆沒有發現到音樂家優勢,隨著訊噪比的增加,兩組的語句理解度都有所提高。音樂家展現出比非音樂家更小的基頻音高辨別閾值,且與噪音中語句理解度呈負相關。無論是否有音樂經驗,PROMS測試中音高和重音的處理能力均與語句理解度呈正相關。
EEG實驗測試在背景噪音中缺乏基頻輪廓對連續中文語句的神經跟蹤反應。三十名沒有音樂經驗的人在不同訊噪比(0, −9, −12 dB)的背景噪音下,聆聽具有原始、平坦聲調和全部平調輪廓的連續中文語句。我們使用以包絡線為特徵的時間響應函數(Temporal response function, TRF)模型,並擷取在δ(1–4 Hz)和θ頻段(4–8 Hz)中的神經語句跟蹤反應。受試者並完成音樂能力測試(PROMS)以及語句理解作業。結果顯示,約在200 ms和400 ms的δ頻段TRF峰值受到F0輪廓的影響,平坦聲調的語句比原始或全部平調的語句誘發了更大的峰值振幅。在θ頻段之TRF,則在約100 ms和200 ms的峰值看見訊噪比的影響,隨著訊噪比降低,峰值振幅增加且峰值延遲時間延長。處理音高的能力與對應不同基頻類型語句下的δ頻段TRF峰值呈負相關,而語句理解度與對應不同訊噪比中語句的θ頻段TRF峰值呈正相關。
摘要(英) Speech-in-noise (SIN) perception is critical for everyday communication and particularly challenging for the elderly and hearing impaired. A key factor influencing SIN perception is the fundamental frequency (F0) contour. Musicians often exhibit enhanced F0 discrimination in both music and language domains, which may contribute to their putative advantage in SIN perception. However, it is currently unclear how people track speech with degraded F0 information, and whether musicians confer an advantage in such conditions. To address these issues, we conducted behavioral and electroencephalography (EEG) experiments to examine speech perception in noisy environments, degrading the F0 contour at the level of tone and intonation critical for Mandarin speech. Additionally, we investigated whether musicians confer an advantage in speech perception with degraded F0 information.
The behavioral study examined the effects of F0 contour on speech-in-noise performance in musicians and non-musicians. Thirty musicians and 30 non-musicians were tested on the intelligibility of Mandarin Chinese sentences with original, flat-tone, flat-intonation, and flat-all F0 contours embedded in background noise under three signal-to-noise ratios (SNRs: 0, −5, −9 dB). Music perception skills were objectively measured using the Profile of Music Perception Skills (PROMS) and a pitch discrimination task. Results showed similar intelligibility for speech with flat-tone and flat-intonation contours, while the flat-all speech reduced intelligibility the most. No musician advantage was found for any type of flattened-F0 speech in noise, with improved speech intelligibility as SNR increased for both groups. Musicians exhibited smaller F0 pitch discrimination limens than non-musicians, which correlated with improved speech intelligibility in noise. Regardless of musician status, performance on pitch and accent PROMS test was linked to better speech understanding.
The EEG experiment investigated the neural tracking of continuous Mandarin speech with degraded F0 contour in background noise. Thirty non-musician participants listened to continuous Mandarin speech with natural, flat-tone, and flat-all F0 contours at three SNRs (0, −9, −12 dB). We employed the temporal response function (TRF) model with envelope as feature to index neural speech tracking in the delta (1–4 Hz) and theta frequency bands (4–8 Hz). Participants also completed an online speech comprehension task and an offline PROMS test for music perception skills. Results showed that delta band TRF peak response at around 200 ms and 400 ms was affected by F0 contour, with flat-tone speech inducing a greater peak amplitude compared to original or flat-all contours. The theta band TRF peak responses at around 100 ms and 200 ms were affected by SNR, with increased peak amplitude and delayed peak latency as SNR decreased. Speech intelligibility was significantly correlated to the theta band TRF response across SNR levels, while music tuning skills were significantly related to the delta band TRF response across F0 types.
These results demonstrate that degrading the F0 contours significantly impacts both behavioral and neural responses in speech-in-noise perception. Behavioral response and neural tracking of speech are influenced by both the type of F0 contour in speech and the level of background noise, highlighting the importance of F0 information in speech perception. While musician experience did not provide an advantage in comprehending speech with degraded F0 contours, pitch-related musical skills might improve speech perception in low-noise environments. These findings suggest the potential application of perceptual musical skills to enhance speech perception in challenging listening contexts.
關鍵字(中) ★ 腦電圖
★ 基頻
★ 音樂經驗
★ 音樂性
★ 噪音中語句感知
★ 時間響應函數
★ 聲調
★ 包絡線跟蹤
關鍵字(英) ★ Electroencephalography
★ Fundamental frequency
★ Musical experience
★ Musicality
★ Speech-in-noise perception
★ Temporal response function
★ Tone
★ Envelope tracking
論文目次 摘要 i
Abstract iii
Acknowledgments v
Table of Contents vii
List of Figures xi
List of Tables xiii
Appendix xiv
Chapter I General Introduction 1
1-1 Speech-in-noise perception 2
1-2 The role of F0 in speech perception 3
1-2-1 F0 information in speech 3
1-2-2 Behavioral studies on manipulated-F0 speech in noise perception 4
1-3 Musical ability and speech perception 6
1-4 Neural studies on manipulated-F0 speech in noise perception 8
1-5 Research aims 11
1-5-1 Motivations 11
1-5-2 Research questions 12
1-5-3 Hypothesis 13
Chapter II Behavioral Experiments 15
2-1 Introduction 15
2-2 Methods 15
2-2-1 Participants 15
2-2-2 Stimuli 16
2-2-3 Speech-in-noise task 18
2-2-4 F0 discrimination task 18
2-2-5 Profile of Music Perception Skills (PROMS) 19
2-3 Results 21
2-3-1 Participants’ characteristics 21
2-3-2 Speech intelligibility in noisy environments for musicians and non-musicians with degraded fundamental frequency contours 25
2-3-3 Relationship between musician experience and speech intelligibility 28
2-3-4 Relationship between pitch discrimination performance and speech intelligibility 30
2-4 Discussion 32
2-4-1 The limited benefits of musical training transfer to speech-in-noise perception 32
2-4-2 The relationship between pitch and accent perception and SIN performance 34
2-4-3 The contribution of tone and intonation in speech intelligibility 35
Chapter III Electroencephalography Experiments 36
3-1 Introduction 36
3-2 Methods 36
3-2-1 Participants 36
3-2-2 Stimuli 37
3-2-3 Behavioral task-PROMS 38
3-2-4 Experimental design 39
3-2-5 EEG data recording 42
3-2-6 EEG data preprocessing 42
3-2-7 Computation of speech envelope 43
3-2-8 Forward temporal response function (TRF) estimation 43
3-2-9 TRF prediction accuracy 44
3-2-10 Mass-univariate one sample t-test 45
3-2-11 Mass-univariate ANOVA 46
3-2-12 TRF response: peak amplitude and latency 47
3-2-13 Relationship between behavioral results and TRF response 47
3-3 Results 49
3-3-1 Participants’ characteristics 49
3-3-2 Behavioral performance 50
3-3-3 The influence of F0 type on delta band tracking of speech 53
3-3-4 F0 effects on delta band TRF peak amplitude and latency 55
3-3-5 The impact of SNR on theta band tracking of speech 59
3-3-6 Effects of SNR on theta band TRF peak amplitude and latency 62
3-3-7 Correlation between PROMS subtest performance and delta band TRF 64
3-3-8 Correlation between TRF responses and behavioral performance on speech recognition task 65
3-4 Discussion 67
3-4-1 F0 type effect on delta band TRF 68
3-4-2 SNR effect on theta band TRF 70
3-4-3 The interaction effect between F0 type and SNR was found on delta but not theta band TRF 70
3-4-4 Musicality vs. delta TRF response 73
3-4-5 The rhythm of speech structure and neural activity 73
Chapter IV General Discussion 75
Chapter V Conclusion and Future Directions 77
References 78
Appendix 87
指導教授 謝宜蕙(I-Hui Hsieh) 審核日期 2024-7-29
