摘要: | 在噪音環境中的語句辨識(Speech-in-noise; SIN)對於日常溝通方面至關重要,尤其對於老年人和聽力障礙者特別具挑戰性。影響 SIN表現的關鍵因素之一是基頻(Fundamental frequency; F0)的輪廓。音樂家在音樂和語言領域中通常表現出更優異的基頻辨別能力,這可能有助於他們在噪音環境中辨識語句。然而,目前尚不清楚人們如何在缺乏基頻資訊的情況下跟蹤語句,以及音樂家在這種情況下是否具有優勢。為了釐清這些問題,我們進行了行為和腦電圖(EEG)實驗,對中文至關重要的聲調(Tone)和語調(Intonation)之基頻輪廓進行平調,藉以檢測在噪音環境中的語句辨識表現。此外,我們探討音樂家是否在缺乏基頻訊息的情況下具有語句辨識的優勢。 行為實驗探討基頻輪廓對音樂家和非音樂家在噪音中語句辨識表現的影響。實驗中測試30名音樂家和30名非音樂家在不同訊噪比(0, −5, −9 dB)的背景噪音下,對原始(Original)、平坦語調(Flat-intonation)、平坦聲調(Flat-tone)和全部平調(Flat-all)之基頻中文句子的理解度。音樂感知能力則以音樂能力測試(Profile of Music Perception Skills; PROMS)和音高辨別作業(Pitch discrimination task)來測量。結果顯示,平坦語調和平坦聲調的語句理解度相似,而全部平調的語句理解度最低。在噪音中,任何類型的平調基頻語句皆沒有發現到音樂家優勢,隨著訊噪比的增加,兩組的語句理解度都有所提高。音樂家展現出比非音樂家更小的基頻音高辨別閾值,且與噪音中語句理解度呈負相關。無論是否有音樂經驗,PROMS測試中音高和重音的處理能力均與語句理解度呈正相關。 EEG實驗測試在背景噪音中缺乏基頻輪廓對連續中文語句的神經跟蹤反應。三十名沒有音樂經驗的人在不同訊噪比(0, −9, −12 dB)的背景噪音下,聆聽具有原始、平坦聲調和全部平調輪廓的連續中文語句。我們使用以包絡線為特徵的時間響應函數(Temporal response function, TRF)模型,並擷取在δ(1–4 Hz)和θ頻段(4–8 Hz)中的神經語句跟蹤反應。受試者並完成音樂能力測試(PROMS)以及語句理解作業。結果顯示,約在200 ms和400 ms的δ頻段TRF峰值受到F0輪廓的影響,平坦聲調的語句比原始或全部平調的語句誘發了更大的峰值振幅。在θ頻段之TRF,則在約100 ms和200 ms的峰值看見訊噪比的影響,隨著訊噪比降低,峰值振幅增加且峰值延遲時間延長。處理音高的能力與對應不同基頻類型語句下的δ頻段TRF峰值呈負相關,而語句理解度與對應不同訊噪比中語句的θ頻段TRF峰值呈正相關。 實驗結果顯示,基頻輪廓顯著影響在噪音環境下語句辨識的行為和神經跟蹤反應。行為反應和神經跟蹤反應都受到語句中的F0輪廓類型和背景噪音程度的影響,這代表基頻資訊在語句辨識佔有重要性。雖然音樂經驗並未在理解缺乏基頻輪廓的語句方面提供優勢,但處理音高的能力可能提升在低噪音環境中的語句理解能力。研究結果未來期能應用在以音樂相關訓練來改善噪音環境中理解語句之能力,對老化及聽損相關族群有所助益。;Speech-in-noise (SIN) perception is critical for everyday communication and particularly challenging for the elderly and hearing impaired. A key factor influencing SIN perception is the fundamental frequency (F0) contour. Musicians often exhibit enhanced F0 discrimination in both music and language domains, which may contribute to their putative advantage in SIN perception. However, it is currently unclear how people track speech with degraded F0 information, and whether musicians confer an advantage in such conditions. To address these issues, we conducted behavioral and electroencephalography (EEG) experiments to examine speech perception in noisy environments, degrading the F0 contour at the level of tone and intonation critical for Mandarin speech. Additionally, we investigated whether musicians confer an advantage in speech perception with degraded F0 information. The behavioral study examined the effects of F0 contour on speech-in-noise performance in musicians and non-musicians. Thirty musicians and 30 non-musicians were tested on the intelligibility of Mandarin Chinese sentences with original, flat-tone, flat-intonation, and flat-all F0 contours embedded in background noise under three signal-to-noise ratios (SNRs: 0, −5, −9 dB). Music perception skills were objectively measured using the Profile of Music Perception Skills (PROMS) and a pitch discrimination task. Results showed similar intelligibility for speech with flat-tone and flat-intonation contours, while the flat-all speech reduced intelligibility the most. No musician advantage was found for any type of flattened-F0 speech in noise, with improved speech intelligibility as SNR increased for both groups. Musicians exhibited smaller F0 pitch discrimination limens than non-musicians, which correlated with improved speech intelligibility in noise. Regardless of musician status, performance on pitch and accent PROMS test was linked to better speech understanding. The EEG experiment investigated the neural tracking of continuous Mandarin speech with degraded F0 contour in background noise. Thirty non-musician participants listened to continuous Mandarin speech with natural, flat-tone, and flat-all F0 contours at three SNRs (0, −9, −12 dB). We employed the temporal response function (TRF) model with envelope as feature to index neural speech tracking in the delta (1–4 Hz) and theta frequency bands (4–8 Hz). Participants also completed an online speech comprehension task and an offline PROMS test for music perception skills. Results showed that delta band TRF peak response at around 200 ms and 400 ms was affected by F0 contour, with flat-tone speech inducing a greater peak amplitude compared to original or flat-all contours. The theta band TRF peak responses at around 100 ms and 200 ms were affected by SNR, with increased peak amplitude and delayed peak latency as SNR decreased. Speech intelligibility was significantly correlated to the theta band TRF response across SNR levels, while music tuning skills were significantly related to the delta band TRF response across F0 types. These results demonstrate that degrading the F0 contours significantly impacts both behavioral and neural responses in speech-in-noise perception. Behavioral response and neural tracking of speech are influenced by both the type of F0 contour in speech and the level of background noise, highlighting the importance of F0 information in speech perception. While musician experience did not provide an advantage in comprehending speech with degraded F0 contours, pitch-related musical skills might improve speech perception in low-noise environments. These findings suggest the potential application of perceptual musical skills to enhance speech perception in challenging listening contexts. |