用類神經網路模型模擬語音感知的神經機制

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：87

、訪客IP：18.221.4.52

姓名

李明鴻(Ming-hong Li) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

用類神經網路模型模擬語音感知的神經機制
(Simulation of Neural Mechanism for Speech Perception with Neural Network Model)

相關論文

★ 獨立成份分析法於真實環境中聲音訊號分離之探討	★ 口腔核磁共振影像的分割與三維灰階值內插
★ 數位式氣喘尖峰氣流量監測系統設計	★ 結合人工電子耳與助聽器對中文語音辨識率的影響
★ 人工電子耳進階結合編碼策略的中文語音辨識成效模擬--結合助聽器之分析	★ 中文發聲之神經關聯性的腦功能磁振造影研究
★ 利用有限元素法建構3維的舌頭力學模型	★ 以磁振造影為基礎的立體舌頭圖譜之建構
★ 腎小管之草酸鈣濃度變化與草酸鈣結石關係之模擬研究	★ 口腔磁振影像舌頭構造之自動分割
★ 微波輸出窗電性匹配之研究	★ 以軟體為基準的助聽器模擬平台之發展-噪音消除
★ 以軟體為基準的助聽器模擬平台之發展-回饋音消除	★ 模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識率之影響
★ 用類神經網路研究中文語音聲調產生之神經關聯性	★ 教學用電腦模擬生理系統之建構

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

經由心理語言學實驗的結果得知，“感知磁吸效應” (perceptual magnet effect)是種影響到幼兒往後語言發展的重要因素之一，這種效應會造成聽覺感知空間受到扭曲，導致一個音位(phoneme)周遭的聲音都會被歸成同一類範疇。本研究的目的是以類神經網路發展一種能模擬語音感知(speech perception)的模型，以類神經網路的非監督式學習(unsupervised learning)方式讓模型能從語音上的共振峰中找出一個音位的語音範疇(phonetic category)，來模擬人類從聽覺上獲得語言的過程。本論文透過修改自我組織映射(Self-Organizing Map，SOM)演算法以及藉由心理語言學實驗結果比較，讓模型能呈現英文母音的聽覺感知空間。從模擬結果顯示模型能辨認英文子音/r/與/l/、典型音與非典型音的差異以及形成母音的聽覺感知空間。而且本論文透過模擬語音感知及結合具有語音產生能力的類神經網路模型(Directions Into Velocities Articulator, DIVA)，呈現人類獲得言語能力的過程，例如讓模型去學習產生英文或中文母音等等。目前除了能讓DIVA 模型學習英文母音以外，更進一步的推廣至中文母音的發音(/ㄚ/、/一/、/ㄨ/、/ㄝ/、/ㄛ/、/ㄩ/)。未來將繼續發展本論文的模型，希望能用於探討大腦與語言之間的關係，藉此衍生至臨床上的應用。

摘要(英)

Based on the results of the psycholinguistic experiments, the perceptual magnet effect is the important factor in speech development. This effect produced a warped auditory space to the corresponding phoneme. The purpose of this study was to develop a neural network model in simulation of speech perception. The neural network model with unsupervised learning was used to determine the phonetic categories of phoneme according to the formant frequencies of the vowels. The modified “Self-Organizing Map”(SOM) algorithm was proposed to produce the auditory perceptual space of English vowels. Simulated results were compared with findings from psycholinguistic experiments, such as categorization of English /r/ and /l/ and prototype and non-prototype vowels, to indicate the model’s ability to produce auditory perception space. In addition, this speech perception model was combined with the neural network model (Directions Into Velocities Articulator, DIVA) to simulate categorization of ten English vowels and their productions to show the learning capability of speech perception and production. We further extended this modified DIVA model to show its capability to categorize six Chinese vowels (/a/、/i/、/u/、/e/、/o/、/y/) and their productions. Finally, this study proposed further development and related discussions for this speech perception model and its clinical application.

關鍵字(中)

★ 自我組織映射
★ 聽覺感知
★ 語音感知
★ 類神經網路
★ 感知磁吸效應

關鍵字(英)

★ Self-Organizing Map
★ perceptual magnet effect
★ neural network
★ auditory perception
★ speech perception

論文目次

摘要 ......................................................................................................... I
Abstract ..................................................................................................... III
致謝 .......................................................................................................... IV
目錄 ........................................................................................................ V
圖目錄 ...................................................................................................... IX
表目錄 ..................................................................................................... XII
第一章緒論 .......................................................................................... 1
1.1 研究動機 ..................................................................................... 1
1.2 語音的感知 ................................................................................. 2
1.2.1 語音上的聲學特徵 .......................................................... 3
1.2.2 語音的感知實驗 .............................................................. 5
1.3 文獻探討 ..................................................................................... 6
1.3.1 聽覺感知研究的回顧 ...................................................... 6
1.3.2 語音的感知模型回顧 .................................................... 10
1.3.3 DIVA 模型 ....................................................................... 12
1.4 研究目的 ................................................................................... 15
1.5 論文架構 ................................................................................... 16
第二章神經網路理論 .......................................................................... 18
2.1 神經網路之簡介 ....................................................................... 18
2.1.1 神經元模型 ..................................................................... 19
2.1.2 神經網路架構 ................................................................. 21
2.1.3 神經網路的類型 ............................................................ 21
2.2 學習機制 .................................................................................... 22
2.2.1 監督式學習 .................................................................... 23
2.2.2 非監督式學習 ................................................................. 25
2.3 自我組織特徵映射網路 ........................................................... 28
2.4 樣式識別(pattern recognition) .................................................. 32
第三章語音模型 .................................................................................. 33
3.1 語音的感知模型 ....................................................................... 33
3.1.1 模型架構 ........................................................................ 33
3.1.2 共振峰的表示 ................................................................ 34
3.1.3 聽覺映射區 .................................................................... 35
3.1.4 母體向量(population vector) ......................................... 36
3.1.5 結合SOM 網路的運用 ................................................... 36
3.1.6 語音的產生 ..................................................................... 38
3.2 DIVA 模型 .................................................................................. 39
3.2.1 DIVA 模型的發聲流程 ................................................... 40
3.2.2 語音映射區(Speech Sound Map) .................................. 40
3.2.3 口咽感覺向量(Orosensory Direction Vector) ................ 41
3.2.4 構音器官的運動向量(Articulator Velocity Vector) ...... 42
3.2.5 聽覺回饋系統 ................................................................. 44
3.2.6 語音處理程序 ................................................................ 44
第四章實驗與方法 .............................................................................. 45
4.1 實驗方法 ................................................................................... 45
4.2 模擬實驗 ................................................................................... 47
4.2.1 英文子音/r/-/l/的辨認 ..................................................... 47
4.2.2 典型音與非典型音的實驗 ............................................ 49
4.2.3 訓練聽覺感知空間 ........................................................ 51
4.3 透過DIVA 模型模擬語音感知 ................................................ 53
4.3.1 DIVA 模型的介面 ........................................................... 53
4.3.2 增加聽覺感知至DIVA 模型 .......................................... 56
第五章結果與討論 .............................................................................. 58
5.1 模擬結果 .................................................................................... 58
5.1.1 英文子音/r/-/l/的辨認 ..................................................... 58
5.1.2 典型音與非典型音的辨認差異 ..................................... 62
5.1.3 聽覺感知空間的訓練 ..................................................... 63
5.2 DIVA 模型的聽覺感知空間 ...................................................... 66
5.2.1 語音感知與語音產生 .................................................... 67
5.2.2 語音感知與語音產生間的關係 .................................... 70
5.2.3 語音感知的衍生討論 .................................................... 73
5.2.4 利用實際人聲訓練中文母音 ........................................ 78
5.3 語音感知模型與神經生理學上的關係 ................................... 81
第六章結論與未來展望 ...................................................................... 82
6.1 結論............................................................................................ 82
6.2 未來展望 ................................................................................... 82
附錄 A ...................................................................................................... 84
附錄 B ...................................................................................................... 85
參考文獻 ................................................................................................... 88

參考文獻

[1] Benzeghiba, M., Mori, R. D., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., et al. (2007). Automatic speech recognition and speech variability: A review. Speech Communication , 47, pp. 763-786.
[2] Buchsbaum, B. R., Hickok, G., and Humphries, C. (2001). Role of left posterior superior temporal gyrus in phonological processing for speech perception and production. Cognitive Science , 25, pp.663-678.
[3] Damper, R. I. and Harnad, S. R. (2000). Neural network models of categorical perception. Perception & Psychophysics , 62, pp. 843-867.
[4] Du, K.-L. (2010). Clustering: A neural network approach. Neural Networks , 23, pp. 89-107.
[5] Eimas, P. D., Siqueland, E. R., Jusczyk, P., and Vigorito, J. (1971). Speech Perception in Infants. Science , 171, pp. 303-306.
[6] Garcia, D., Hall, D. A., and Plack, C. J. (2010). The effect of stimulus context on pitch representations in the human auditory cortex. NeuroImage , 51, pp. 808–816.
[7] Guenther, F. H. (1994). A Neural Network Model Of Speech Acquisition And Motor Equivalent Speech Production. Biological Cybernetics , 72, pp. 43-53.
[8] Guenther, F. H. (1995). Speech Sound Acquisition, Coarticulation, and Rate Effects in a Neural Network Model of Speech Production. Psychological Review , 102, pp. 594-621.
[9] Guenther, F. H. and Gjaja, M. N. (1996). The Perceptual Magnet Effect as an Emergent Property of Neural Map Formation. Journal of the Acoustical Society of America , 100, pp. 1111-1121.
[10] Guenther, F. H., Hampson, M., and Johnson, D. (1998). A Theoretical Investigation of Reference Frames for the Planning of Speech Movements. Psychological Review , 105, pp. 611-633.
[11] Guenther, F. H. and Bohland, J. W. (2002). Learning Sound Categories: A Neural Model and Supporting Experiments. Acoustical Science and Technology , 23 (4), pp. 213-221.
[12] Guenther, F. H., Nieto-Castanon, A., Ghosh, S. S., and Tourville, J. A. (2004). Representation of Sound Categories in Auditory Cortical Maps. Journal of Speech, Language, and Hearing Research , 47 (1), pp. 46-57.
[13] Guenther, F. H., Ghosh, S. S., and Tourville, J. A. (2006). Neural Modeling and Imaging of the Cortical Interactions Underlying Syllable Production. Brain & Language , 96, pp. 280-301.
[14] Haykin, S. (2008). Neural networks and learning machines. New York: Prentice Hall.
[15] Hickok, G. and Poeppel, D. (2007). The cortical organization of speech. NEUROSCIENCE , 8.
[16] Hickok, G., Love, T., Swinney, D., Wong, E. C., and Buxton, R. B. (1997). Functional MR Imaging during Auditory Word Perception : A Single-Trial Presentation Paradigm. BRAIN AND LANGUAGE , 58, pp. 197–201.
[17] Hoshino, O., Miyamoto, M., Zheng, M., and Kuroiwa, K. (2002). A neural network model for encoding and perception ofvowel sounds. Neurocomputing , 44-46, pp. 435–442.
[18] Iverson, P. and Kuhl, P. K. (1996). Influences of phonetic identification and category goodness on American listeners’ perception of /r/ and /l/. Journal of the Acoustical Society of America , 99 (2).
[19] Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E., Tohkura, Y., Kettermann, A., et al. (2003). A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition , 87, pp.
B47–B57.
[20] Juang, B.-H., Rabiner, L. R., and Wilpon, J. G. (1987). On The Use Of Bandpass Liftering In Speech Recognition. Ieee Transactions On Acoustics, Speech, And Signal Processing , 7.
[21] Kohonen, T. (1990). The Self-Organizing Map. Proceedings of the IEEE , 78 (9), pp. 1964-1948.
[22] Kohonen, T. (2003). Self-organizing neural rojections. Neural Networks , 19, pp. 723–733.
[23] Kroger, B. J., Kannampuzha, J., and Neuschaefer-Rube, C. (2009). Towards a neurocomputational model of speech production and perception. Speech Communication , 51, pp. 793–809.
[24] Kuhl, P. K. (1991). Human adults and human infants show a ‘perceptual magnet effect’ for the prototypes of speech categories, monkeys do not. Perception & Psychophysics , 50 (2), pp. 93-107.
[25] Kuhl, P. K. (2000). A new view of language acquisition. PNAS , 97 (22), pp. 11850–11857.
[26] Kuhl, P. K. (2010). Brain Mechanisms in Early Language Acquisition. Neuron , 67.
[27] Miller, J. D. (1989). Auditory-perceptual interpretation of the vowel. Journal of the Acoustical Society of America , 85 (5), pp. 2114-2134.
[28] Peterson, G. E. and Barney, H. L. (1952). Control Methods Used in a Study of the Vowels. Journal of the Acoustical Society of America , 24 (2), pp. 175-184.
[29] Rabiner, L. R., Lee, C. H., Juang, B. H., and Wilpon, J. G. (1989). HMM Clustering for Connected Word Recognition. Acoustics, Speech, and Signal Processing, IEEE , pp. 405-408.
[30] Ru, P., Chi, T., and Shamma, S. (2003). The synergy between speech production and perception. Journal of the Acoustical Society of America , 113 (1).
[31] Schwartz, J.-L., Basirat, A., Me´nard, L., and Sato, M. (2010). The Perception-for-Action-Control Theory (PACT): A perceptuo-motor theory of speech perception. Journal of Neurolinguistics , pp. 1–19.
[32] Terband, H., Maassen, B., Guenther, F., and Brumberg, J. (2009). Computational Neural Modeling of Speech Motor Control in Childhood Apraxia of Speech (CAS). Jounal of Speech, Language, and Hearing Research , 52 (6), pp. 1595–1609.
[33] 王士元, 彭剛, (2007). 語言、語音與技術. 香港城市大學出版社, 香港.
[34] 王韜維, (2009). 用類神經網路研究中文語音聲調產生之神經關聯性. 碩士論文, 國立中央大學電機工程研究所.
[35] 張民賢, (2009). 台灣六歲至八歲低年級學齡兒童單母音之量化分析. 碩士論文, 國立成功大學醫學工程研究所.
[36] 張維哲, (1992). 人工神經網路. 全欣資訊圖書股份有限公司, 臺北市.
[37] 壽天德, (2003). 神經生物學. 九州圖書文物有限公司, 臺北市.
[38] 蘇木春, 張孝德, (2007). 機器學習：類神經網路、模糊系統以及基因演算法則. 全華圖書股份有限公司, 臺北市.

指導教授

吳炤民(Chao-min Wu)

審核日期

2011-8-26

推文