摘要(英) |
With the advent of PET, fMRI and MRI, the brain function areas of speech are no longer covered with an unknown veil. Even so, there are still no effective treatments for many speech disorders. While in the past speech models were dominated by vowels, this study proposes to combine the brain and the speech model to simulate Chinese syllables and tone changes. With the integrated model, we can add designated consonants to simulate CV structure, and finally applied in the simulation of disorder hypotheses to find effective ways to treat language.
In this study, the brain and speech model used were DIVA( direction into velocities articulator ) and GODIVA( gradient order DIVA ). The DIVA model contains the speech sound map (SSM), the auditory state and error map, the somatosensory state and error map, the articulatory velocity and position map, and the cerebellum, each component of the model correspond to the left anterior pre-motor cortex, the parietal and temporal cortex, the parietal lobe cortex, the motor cortex and the cerebellar cortex. The GODIVA model simulates the left inferior frontal sulcus, the frontal operculum and the pre-supplementary motor area, which respectively represent the phonological performance area, the speech structure performance area and the speech auditory mapping area.
Our approach was to apply the intersection of two models, the speech auditory map, as the projection from GODIVA to DIVA. The output of the GODIVA model was used as the brain signal instruction. The first step of this study was to change the brain instruction into auditory signal, and then use the neural network model to adjust the fundamental frequency of the learning target. At last, we compared the simulation results with the actual sound spectrum and shape of the vocal tract. In the part of the vowel spectrum, the simulation results were located within the regions of typical vowel formants except for the vowel /ㄨ/ (/u/), but were all located at the boundary regions. The first formant of the tested vowels tend to approach 450 Hz and the second formants near1600 Hz. In the part of the vocal tract shape of CV structure, we select the stop consonants and diphthon /ㄞ/ (/ai/) as the CV structure. Because Chinese stop consonants have no difference in voice cue but aspiration, we only have to adjust tongue location and intensity of the aspiration. It is obvious to notice that the same trend existed between the simulation results and the actual vocal tract shapes. However, due to the fact that the speech structure of the DIVA model is divided only into six parts, including labial, alveolar ridge, hard palate, velum, uvula and pharynx, some consonants cannot be accurately simulated. Besides, the selection of the vowel affects the simulation stability. For future study, we hope that the vocal tract shape of the DIVA model could be modified to accurately simulate all Chinese tonal syllables.
|
參考文獻 |
Bohland JW, Guenther FH. (2006).An fMRI investigation of syllable sequence production. NeuroImage;32(2):821–841.
Bohland, J.W., Bullock, D. and Guenther, F.H. (2010). Neural representations and mechanisms for the performance of simple speech sequences. Journal of Cognitive Neuroscience, 22 (7), pp. 1504-1529.
Brown JW, Bullock D, Grossberg S. (2004).How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades. Neural Networks;17(4):471–510.
Civier Oren, Bullocka Daniel, Max Ludo, and Guenthere Frank H., (2013) Computational modeling of stuttering caused by impairments in a basal ganglia thalamo-cortical circuit involved in syllable selection and initiation
Delattre P. C., Liherman A. M., Cooper F. S., (1955). Acoustic loci and transitional cues for consonants, Journal of lhe Acoustical Societu of Am~rica, 27.769-773
Dunn, H. K. (1950). “The calculation of vowel resonances, and an electrical vocal tract,” J. Acoust. Soc. Am. 22, 740–753.
Fagyal Zsuzsanna (2001).Phonetics and speaking machines on the mechanical aimulation of human apeech in the 17th century. Historiographia Linguistica XXVIII:3.289–330
Fant, G. (1972). “Vocal tract wall effects, losses, and resonance bandwidths.” Speech Transmission Laboratory Quarterly progress and status report 2(3): 28-52.
Ferrand, C.T. (2001). Speech Science: an integrated approach to theory and clinical practice, Allyn&Bacon.
Gelfand JR, Bookheimer SY. (2003).Dissociating neural mechanisms of temporal sequencing and processing phonemes. Neuron;38(5):831–842.
Grossberg S. (1973). Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics;52:213–257.
Guenther, F. H., & Ghosh, S. S. (2003). A model of cortical and cerebellar function in speech. In Proceedings of the XVth international congress of phonetic sciences (pp. 169–173).
Guenther, F. H., (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72, 43-53.
Hajek M., (2005)."Neural Networks," University of KwaZulu-Natal.
Hallé, P. A. (1994). Evidence for tone-specific activity of the sternohyoid muscle in modern standard Chinese. Language and Speech, 37, 103-123
Henke WL (1966) Dynamic articulatory model of speech production using computer simulation. Ph.D. dissertation, Massachusetts Institute of Technology
Hillenbrand, J., L. A. Getty, M. J. Clark and K. Wheeler (1995). “Acoustic characteristics of American English vowels.” The Journal of the Acoustical society of America 97(5): 3099-3111.
International Phonetic Association (1999). Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet, Cambridge University Press, Combridge.
Ge Jianqiao, Peng Gang, Lyu Bingjiang, Wang Yi, Yan Zhuo, Zhendong Niu, Li Hai Tan, Alexander P. Leff and Gao Jia-Hong (2015). Cross-language differences in the brain network subserving intelligible speech. PNAS March, 112 (10) 2972-2977
Jonas Saran, (1981). The supplementary motor region and speech emission. JOURNAL OF COMMUNICATION DISORDERS 14 (1981). 349-373
Kelly, J., and Lochbaum, C. (1962). “Speech synthesis,” in Proceedings of the Fourth International Congress on Acoustics, Paper G42, Copenhagen, Denmark, Sept., pp. 1–4.
Kelso JAS, Tuller B, Vatikiotis-Bateson E, Fowler CA (1984) Functionally specific articulatory cooperation following jaw perturbations during speech: Evidence for coordinative structures. Journal of Experimental Psychology: Human Perception and Performance 10: 812-832
Maeda, S. (1982). “A digital simulation method of the vocal-tract system.” Speech communication 1(3): 199-229.
Miyawaki K, Strange W, Verbrugge R, Liberman AM, Jenkins JJ, Fujimura O (1975) An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception and Psychophysics 18: 331-340
Papoutsi M, de Zwart JA, Jansma JM, Pickering MJ, Bednar JA, Horwitz B. From Phonemes to Articulatory Codes: An fMRI Study of the Role of Broca’s Area in Speech Production. Cereb Cortex.2009
Parent A, Hazrati LN. Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamocortical loop. Brain Research Reviews 1995; 91–127.
Peterson, G. E. and H. L. Barney (1952). “Control methods used in a study of the vowels.” The Journal of the Acoustical Society of America 24(2): 175-184.
Shima K and Tanji J. (2000).Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movements. Journal of Neurophysiology;84(4):2148–2160.
Stevens, K. N. and A. S. House (1956). “Studies of formant transitions using a vocal tract analog.” The Journal of the Acoustical Society of America 28(4): 578-585.
Story, B. H., I. R. Titze and E. A. Hoffman (1996). “Vocal tract area functions from magnetic resonance imaging.” The Journal of the Acoustical Society of America 100(1): 537-554.
Takemoto, H., K. Honda, S. Masaki, Y. Shimada and I. Fujimoto (2006). “Measurement of temporal changes in vocal tract area function from 3D cine-MRI data.” The Journal of the Acoustical Society of America 119(2): 1037-1049.
Valimiiki, V., Karjalainen. M., and Kuisma, T., (1994). Articulatory control of a vocal tract model based on fractional delay waveguide filters. Proceeding of International Symposium on Speech, Image Processing and Neural Networks, 13-16.
Wright, G. T. H. and Owens F. J., (1993). An optimized multirate sampling technique for the dynamic variation of vocal tract length in the Kelly-Lochbaum speech synthesis model. IEEE Transactions on Speech And Audio Processing, 1, 109-113.
Wu Chao-Min, Wang Tao-Wei, and Li Ming-Hung (2015). Study of neural mechanism of Mandarin vowel perception and diphthong production with neural network model. Journal of the Phonetic Society of Japan. 19(2), pp.115-123.
Ziegler W, Kilian B, Deger K. (1997).The role of the left mesial frontal cortex in fluent speech: evidence from a case of left supplementary motor area hemorrhage. Neuropsychologia;35(9):1197–1208.
王小川 (2004). 語音訊號處理,初版,全華圖書股份有限公司,台灣台北。
黃華民 (2008). 臨床神經解剖學基礎, 合記書局, 台灣台北。
葉怡成 (2003). 類神經網路模式應用與實作,八版,儒林圖書有限公司
鄭靜宜 (2012). 華語雙音節詞基頻的聲調共構效果,台灣聽力語言學會雜誌第28期,27-48
|