整合大腦與構音之類神經網路模型模擬中文字詞之產生

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：3.144.9.25

姓名

洪國軒(Kuo-Hsuan Hung) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

整合大腦與構音之類神經網路模型模擬中文字詞之產生
(The neural network model integrating brain and speech model for Chinese syllables)

相關論文

★ 獨立成份分析法於真實環境中聲音訊號分離之探討	★ 口腔核磁共振影像的分割與三維灰階值內插
★ 數位式氣喘尖峰氣流量監測系統設計	★ 結合人工電子耳與助聽器對中文語音辨識率的影響
★ 人工電子耳進階結合編碼策略的中文語音辨識成效模擬--結合助聽器之分析	★ 中文發聲之神經關聯性的腦功能磁振造影研究
★ 利用有限元素法建構3維的舌頭力學模型	★ 以磁振造影為基礎的立體舌頭圖譜之建構
★ 腎小管之草酸鈣濃度變化與草酸鈣結石關係之模擬研究	★ 口腔磁振影像舌頭構造之自動分割
★ 微波輸出窗電性匹配之研究	★ 以軟體為基準的助聽器模擬平台之發展-噪音消除
★ 以軟體為基準的助聽器模擬平台之發展-回饋音消除	★ 模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識率之影響
★ 用類神經網路研究中文語音聲調產生之神經關聯性	★ 教學用電腦模擬生理系統之建構

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著正子放射型電腦斷層照影( PET )、功能磁共振成像( fMRI )與磁共振成像( MRI )等技術的演進，讓大腦與說話的神經關聯不再披著一層未知的面紗，即便如此，許多言語障礙依然沒有有效的治療方式。因此，本研究結合大腦與構音模型模擬中文字詞與聲調變化，而過去的構音模型多以母音為主，本模型加入特定子音模擬CV結構，最後應用在模擬語言障礙成因的假設，找出語言治療的有效方法。
而本研究所使用之學習構音模型為以類神經網路為基礎的模型-DIVA ( direction into velocities articulator )，大腦訊號模型為-GODIVA ( gradient order DIVA )，DIVA模型主要模擬部位為左側運動前皮質、上顳葉皮質、下頂葉皮質、運動皮質及小腦皮質五個功能區，分別對應於語音聽覺映射( speech sound map，SSM)、聽覺狀態與誤差映射( auditory state and error map )、體感狀態與誤差映射( somatosensory state and error map )、構音器速度與位置映射( articulatory velocity and position map )以及小腦模塊( cerebellum )。而GODIVA模型則為模擬大腦左下額葉溝、額葉島蓋及前運動輔助區，分別代表語音音韻表現區、語音結構表現區以及語音聽覺映射區。因此實驗方法為找出兩模型交集區塊語音聽覺映射作為GODIVA投射至DIVA的輸入，而GODIVA部份輸出為大腦訊號指令，先將大腦指令轉變為對應的構音訊號，再利用類神經網路模型改變基頻的學習目標，並與實際聲譜圖與聲道結構做比對，在單母音聲譜圖部份，模擬結果除了/ㄨ/以外之母音都位於人聲共振峰之範圍內，但都位於範圍之邊界地帶，母音共振峰結果趨勢為往F1為450Hz、F2為1600Hz靠近。在CV結構聲道構造部份，選擇塞音與雙母音/ㄞ/結合，中文子音的塞音部分，則沒有有無聲的差別，皆為有無送氣，因此在調整上先確定舌頭位置，再調整送氣大小，模擬結果與實際構音有相同趨勢。然而受限於DIVA模型發聲構造僅分為唇、齒齦、硬顎、軟顎、小舌頭與咽六個部份，有些子音無法精確模擬，且母音的選擇也會影響子音的發聲，未來希望能將DIVA聲道模型切割得更為細部與完善，達到訊號由大腦下達指令，構音器精確模擬所有中文聲調字詞之功能。

摘要(英)

With the advent of PET, fMRI and MRI, the brain function areas of speech are no longer covered with an unknown veil. Even so, there are still no effective treatments for many speech disorders. While in the past speech models were dominated by vowels, this study proposes to combine the brain and the speech model to simulate Chinese syllables and tone changes. With the integrated model, we can add designated consonants to simulate CV structure, and finally applied in the simulation of disorder hypotheses to find effective ways to treat language.

In this study, the brain and speech model used were DIVA( direction into velocities articulator ) and GODIVA( gradient order DIVA ). The DIVA model contains the speech sound map (SSM), the auditory state and error map, the somatosensory state and error map, the articulatory velocity and position map, and the cerebellum, each component of the model correspond to the left anterior pre-motor cortex, the parietal and temporal cortex, the parietal lobe cortex, the motor cortex and the cerebellar cortex. The GODIVA model simulates the left inferior frontal sulcus, the frontal operculum and the pre-supplementary motor area, which respectively represent the phonological performance area, the speech structure performance area and the speech auditory mapping area.

Our approach was to apply the intersection of two models, the speech auditory map, as the projection from GODIVA to DIVA. The output of the GODIVA model was used as the brain signal instruction. The first step of this study was to change the brain instruction into auditory signal, and then use the neural network model to adjust the fundamental frequency of the learning target. At last, we compared the simulation results with the actual sound spectrum and shape of the vocal tract. In the part of the vowel spectrum, the simulation results were located within the regions of typical vowel formants except for the vowel /ㄨ/ (/u/), but were all located at the boundary regions. The first formant of the tested vowels tend to approach 450 Hz and the second formants near1600 Hz. In the part of the vocal tract shape of CV structure, we select the stop consonants and diphthon /ㄞ/ (/ai/) as the CV structure. Because Chinese stop consonants have no difference in voice cue but aspiration, we only have to adjust tongue location and intensity of the aspiration. It is obvious to notice that the same trend existed between the simulation results and the actual vocal tract shapes. However, due to the fact that the speech structure of the DIVA model is divided only into six parts, including labial, alveolar ridge, hard palate, velum, uvula and pharynx, some consonants cannot be accurately simulated. Besides, the selection of the vowel affects the simulation stability. For future study, we hope that the vocal tract shape of the DIVA model could be modified to accurately simulate all Chinese tonal syllables.

關鍵字(中)

★ 中文聲調
★ Direction Into Velocities Articulator
★ Gradient Order DIVA
★ speech sound map

關鍵字(英)

★ Chinese tone
★ Direction Into Velocities Articulator
★ Gradient Order DIVA
★ speech sound map

論文目次

中華民國一零七年一月 I
中文摘要 VI
Abstract VIII
致謝 X
目錄 XI
圖目錄 XIII
第一章緒論 1
1.1 研究動機： 1
1.2 說話大腦生理： 3
1.3 構音生理 5
1.4 發聲語音學 7
1.5 文獻探討 11
1.6 本研究目的 17
1.7 論文內容架構 18
第二章類神經網路 20
2.0類神經網路 20
2.1 監督式學習網路( supervised learning network ) 23
2.2 非監督式學習( unsupervised learning ) 28
2.3 聯想式學習( associate learning ) 30
2.4 最適化應用( Optimization application ) 32
第三章 DIVA 與GODIVA模型 35
3.1 DIVA模型 35
3.2 DIVA模型的數學定義 40
3.3 GODIVA模型 44
3.4 GODIVA模型的數學意義 47
第四章實驗方法及設備 55
4.1 GODIVA與DIVA模型整合 55
4.2 加入中文聲調之DIVA模型 62
4.3 DIVA模型的中文建模 65
第五章結果與討論 69
5.1 聲調 71
5.2 單韻母 72
5.3 雙母音 78
5.4 CV結構 81
第六章結論與未來展望 86
6.1 結論 86
6.2 未來展望 88
參考資料 89
圖目錄
圖1. 1大腦功能區分布圖(黃華民，2008) 4
圖1. 2 人類的發聲器官 6
表1. 1 Peterson和Barney學者以及Hillenbrand學者研究各母音共振峰值之比較 8
圖 2. 1 類神經網路架構(M. Hajek, 2005) 22
圖 3. 1 DIVA 模型示意圖( Guenther , 2006 ) 36
圖 3. 4 左下額葉溝示意圖(Bohland, 2010) 48
圖 3. 6 基底核與丘腦迴路示意圖(Bohland，2010) 52
圖 3. 7 額葉島蓋示意圖(Bohland, 2010) 53
表 4. 1 GODIVA之輸出聲韻母 67
表 4. 2 注音符號常用拼音 67
表 5.1 注音符號分類表 70
圖 5. 1 聲調模擬目標 71
圖 5. 2 聲調模擬結果 72
圖 5. 3 中文/ㄚ/之口腔構造與共振峰值 73
圖 5. 4 中文/ㄛ/之口腔構造與共振峰值 73
圖 5. 5 中文/ㄜ/之口腔構造與共振峰值 73
圖 5. 6 中文/ㄝ/之口腔構造與共振峰值 73
圖 5. 7 中文/ㄦ/之口腔構造與共振峰值 74
圖 5. 8 中文/一/之口腔構造與共振峰值 74
圖 5. 9 中文/ㄨ/之口腔構造與共振峰值 74
圖 5. 10 中文/ㄩ/之口腔構造與共振峰值 74
圖 5. 11 各母音第一、第二共振峰值比較 76
圖 5. 12 真人中文母音共振峰值 76
圖 5. 12為24為男性對中文母音/ㄚ/、/一/、/ㄨ/、/ㄝ/、/ㄛ/及/ㄜ/所做之F1-F2圖 77
表 5. 2模擬與真人中文母音共振峰比較 77
圖 5. 13 中文/ㄞ/之口腔構造與共振峰值 79
圖 5. 14 中文/ㄟ/之口腔構造與共振峰值 79
圖 5. 15 中文/ㄠ/之口腔構造與共振峰值 80
圖 5. 16 中文/ㄡ/之口腔構造與共振峰值 80
圖 5. 17 雙母音第一、第二共振峰軌跡值比較 81
圖 5. 18 中文拜 /ㄅㄞˋ/之口腔構造與共振峰值 82
圖 5. 19 中文派 /ㄆㄞˋ/之口腔構造與共振峰值 82
圖 5. 20 中文帶 /ㄉㄞˋ/之口腔構造與共振峰值 83
圖 5. 21 中文泰 /ㄊㄞˋ/之口腔構造與共振峰值 83
圖 5. 22 中文蓋 /ㄍㄞˋ/之口腔構造與共振峰值 84
圖 5. 23中文慨 /ㄎㄞˋ/之口腔構造與共振峰值 84
圖 5. 24真實字音對應口腔與舌頭位置：左為/ㄅ/、/ㄆ/，中為/ㄉ/、/ㄊ/，右為/ㄍ/、/ㄎ/( Ferrand, 2000 ) 85

參考文獻

Bohland JW, Guenther FH. (2006).An fMRI investigation of syllable sequence production. NeuroImage;32(2):821–841.
Bohland, J.W., Bullock, D. and Guenther, F.H. (2010). Neural representations and mechanisms for the performance of simple speech sequences. Journal of Cognitive Neuroscience, 22 (7), pp. 1504-1529.
Brown JW, Bullock D, Grossberg S. (2004).How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades. Neural Networks;17(4):471–510.
Civier Oren, Bullocka Daniel, Max Ludo, and Guenthere Frank H., (2013) Computational modeling of stuttering caused by impairments in a basal ganglia thalamo-cortical circuit involved in syllable selection and initiation
Delattre P. C., Liherman A. M., Cooper F. S., (1955). Acoustic loci and transitional cues for consonants, Journal of lhe Acoustical Societu of Am~rica, 27.769-773
Dunn, H. K. (1950). “The calculation of vowel resonances, and an electrical vocal tract,” J. Acoust. Soc. Am. 22, 740–753.
Fagyal Zsuzsanna (2001).Phonetics and speaking machines on the mechanical aimulation of human apeech in the 17th century. Historiographia Linguistica XXVIII:3.289–330
Fant, G. (1972). “Vocal tract wall effects, losses, and resonance bandwidths.” Speech Transmission Laboratory Quarterly progress and status report 2(3): 28-52.
Ferrand, C.T. (2001). Speech Science: an integrated approach to theory and clinical practice, Allyn&Bacon.
Gelfand JR, Bookheimer SY. (2003).Dissociating neural mechanisms of temporal sequencing and processing phonemes. Neuron;38(5):831–842.
Grossberg S. (1973). Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics;52:213–257.

Guenther, F. H., & Ghosh, S. S. (2003). A model of cortical and cerebellar function in speech. In Proceedings of the XVth international congress of phonetic sciences (pp. 169–173).
Guenther, F. H., (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72, 43-53.
Hajek M., (2005)."Neural Networks," University of KwaZulu-Natal.
Hallé, P. A. (1994). Evidence for tone-specific activity of the sternohyoid muscle in modern standard Chinese. Language and Speech, 37, 103-123
Henke WL (1966) Dynamic articulatory model of speech production using computer simulation. Ph.D. dissertation, Massachusetts Institute of Technology
Hillenbrand, J., L. A. Getty, M. J. Clark and K. Wheeler (1995). “Acoustic characteristics of American English vowels.” The Journal of the Acoustical society of America 97(5): 3099-3111.
International Phonetic Association (1999). Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet, Cambridge University Press, Combridge.
Ge Jianqiao, Peng Gang, Lyu Bingjiang, Wang Yi, Yan Zhuo, Zhendong Niu, Li Hai Tan, Alexander P. Leff and Gao Jia-Hong (2015). Cross-language differences in the brain network subserving intelligible speech. PNAS March, 112 (10) 2972-2977
Jonas Saran, (1981). The supplementary motor region and speech emission. JOURNAL OF COMMUNICATION DISORDERS 14 (1981). 349-373
Kelly, J., and Lochbaum, C. (1962). “Speech synthesis,” in Proceedings of the Fourth International Congress on Acoustics, Paper G42, Copenhagen, Denmark, Sept., pp. 1–4.
Kelso JAS, Tuller B, Vatikiotis-Bateson E, Fowler CA (1984) Functionally specific articulatory cooperation following jaw perturbations during speech: Evidence for coordinative structures. Journal of Experimental Psychology: Human Perception and Performance 10: 812-832
Maeda, S. (1982). “A digital simulation method of the vocal-tract system.” Speech communication 1(3): 199-229.

Miyawaki K, Strange W, Verbrugge R, Liberman AM, Jenkins JJ, Fujimura O (1975) An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception and Psychophysics 18: 331-340
Papoutsi M, de Zwart JA, Jansma JM, Pickering MJ, Bednar JA, Horwitz B. From Phonemes to Articulatory Codes: An fMRI Study of the Role of Broca’s Area in Speech Production. Cereb Cortex.2009
Parent A, Hazrati LN. Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamocortical loop. Brain Research Reviews 1995; 91–127.
Peterson, G. E. and H. L. Barney (1952). “Control methods used in a study of the vowels.” The Journal of the Acoustical Society of America 24(2): 175-184.
Shima K and Tanji J. (2000).Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movements. Journal of Neurophysiology;84(4):2148–2160.
Stevens, K. N. and A. S. House (1956). “Studies of formant transitions using a vocal tract analog.” The Journal of the Acoustical Society of America 28(4): 578-585.
Story, B. H., I. R. Titze and E. A. Hoffman (1996). “Vocal tract area functions from magnetic resonance imaging.” The Journal of the Acoustical Society of America 100(1): 537-554.
Takemoto, H., K. Honda, S. Masaki, Y. Shimada and I. Fujimoto (2006). “Measurement of temporal changes in vocal tract area function from 3D cine-MRI data.” The Journal of the Acoustical Society of America 119(2): 1037-1049.
Valimiiki, V., Karjalainen. M., and Kuisma, T., (1994). Articulatory control of a vocal tract model based on fractional delay waveguide filters. Proceeding of International Symposium on Speech, Image Processing and Neural Networks, 13-16.
Wright, G. T. H. and Owens F. J., (1993). An optimized multirate sampling technique for the dynamic variation of vocal tract length in the Kelly-Lochbaum speech synthesis model. IEEE Transactions on Speech And Audio Processing, 1, 109-113.

Wu Chao-Min, Wang Tao-Wei, and Li Ming-Hung (2015). Study of neural mechanism of Mandarin vowel perception and diphthong production with neural network model. Journal of the Phonetic Society of Japan. 19(2), pp.115-123.
Ziegler W, Kilian B, Deger K. (1997).The role of the left mesial frontal cortex in fluent speech: evidence from a case of left supplementary motor area hemorrhage. Neuropsychologia;35(9):1197–1208.
王小川 (2004). 語音訊號處理，初版，全華圖書股份有限公司，台灣台北。
黃華民 (2008). 臨床神經解剖學基礎, 合記書局, 台灣台北。
葉怡成 (2003). 類神經網路模式應用與實作，八版，儒林圖書有限公司
鄭靜宜 (2012). 華語雙音節詞基頻的聲調共構效果，台灣聽力語言學會雜誌第28期,27-48

指導教授

吳炤民(Chao-Min Wu)

審核日期

2018-1-31

推文