||The purpose of this study is to build an articulatory model that employs an equivalent lumped electric circuit and related mathematical function to represent the vocal fold and vocal tract systems based on the physiological data from the literature to simulate individual’s vowel production under normal circumstances. Two vocal tract area functions of vowel production from the magnetic resonance imaging (MRI) studies by researchers of Takemoto group and Story, and two vocal folds models (Rosenberg glottal signal and two-mass model) were used to verify our model.|
The vocal folds are composed of two symmetrical mucous membranes across the larynx to generate sound through vibration. We simulated the glottal signal with the mathematical functions of Rosenberg’s study and the two-mass model representing the vocal folds as two concatenated mass-spring-damper systems.
In this study, the vocal tract system from the glottis to the lips was modeled as a tube with many concatenated sections. Based on the lossless tube model, we were able to employ the variation of volume velocity and sound pressure to build a mathematical vocal tract model. Although this approach is relatively simple, the problem is that the viscous effect from the vocal tract wall on vowel production is ignored. On the contrary, MAEDA proposed a vocal tract model that considered energy consumption on the vocal tract wall and also put forward a way to transform a physical model into an equivalent electric circuit model. With MAEDA’s vocal tract model, it is plausible to simulate the vowel production we want with the glottal signals.
In this study, we utilized vocal tract area functions from Story’s (/AA/、/IY/、/UW/、/AE/、/AO/) and Takemoto’s (/a/、/i/、/u/、/e/、/o/) research, to verify our vocal tract model with their corresponding vowels production. Furthermore, we applied Rosenberg and the two-mass model with the MAEDA model and observed what effects would be on the vowel production using different glottal signals.
The results showed that both the Rosenberg’s signal and two-mass model have low-pass filter characteristics. However, the frequency response of the two-mass model had more low frequency and less high frequency signals. In combination with our vocal tract model used in this study, these two glottal signals were capable of being used to simulate English and Japanese vowel production, respectively. But when they were used with the vocal tract portion of the DIVA (Directions Into Velocities Articulator, DIVA) model, they were incapable of simulating the correct Japanese vowel due to the formant frequency range limitation defined by the DIVA model.
In addition, we verified our articulatory model with the vocal tract area function from Story’s study (vocal tract sections varies from 42 to 46 sections depending on different vowels), and found that the differences for the first three formant frequencies between both studies were -7.4, -2.58, and -0.46%, respectively. Furthermore, the differences between ours and Takemoto’s study (vocal tract sections ranges from 68 to 75 sections depending on different vowels) were only -2.01, 1.99, and -0.75%, respectively. In summary, our model could simulate individual’s vowel production under normal circumstances based on the physiological data from the literature; the accuracy of vowel simulation could be higher as the vocal tract is divided into more sections in our model.
Birkholz, P. (2013). “Modeling consonant-vowel coarticulation for articulatory speech synthesis.” PloS one 8(4): e60603.
Buchaillard, S., P. Perrier and Y. Payan (2009). “A biomechanical model of cardinal vowel production: Muscle activations and the impact of gravity on tongue positioning.” The Journal of the Acoustical Society of America 126(4): 2033-2051.
Dang, J. and K. Honda (1997). “Acoustic characteristics of the piriform fossa in models and humans.” The Journal of the Acoustical Society of America 101(1): 456-465.
Dang, J., K. Honda and H. Suzuki (1994). “Morphological and acoustical analysis of the nasal and the paranasal cavities.” The Journal of the Acoustical Society of America 96(4): 2088-2100.
Dunn, H. K., J. L. Flanagan and P. J. Gestrin (1962). “Complex zeros of a triangular approximation to the glottal wave.” The Journal of the Acoustical Society of America 34(12): 1977-1978.
Fant, G. (1972). “Vocal tract wall effects, losses, and resonance bandwidths.” Speech Transmission Laboratory Quarterly progress and status report 2(3): 28-52.
Flanagan, J. L. (1965). Speech analysis, synthesis and perception. Springer –Verlag, Berlin, Germany.
Hillenbrand, J., L. A. Getty, M. J. Clark and K. Wheeler (1995). “Acoustic characteristics of American English vowels.” The Journal of the Acoustical society of America 97(5): 3099-3111.
Honda, K., T. Kurita, Y. Kakita and S. Maeda (1995). “Physiology of the lips and modelingof lip gestures.” Journal of Phonetics 23(1): 243-254.
International Phonetic Association (1999). Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet, Cambridge University Press, Combridge.
Ishizaka, K. and T. Kaneko (1968). “On equivalent mechanical constants of the vocal cords.” The Journal of the Acoustical socirty of Japan 24: 312-313.
Ishizaka, K. and J. L. Flanagan (1972). “Synthesis of Voiced Sounds From a Two-Mass Model of the Vocal Cords.” Bell system technical journal 51(6): 1233-1268.
Ladefoged, P. and D. E. Broadbent (1957). “Information conveyed by vowels.” The Journal of the Acoustical Society of America 29(1): 98-104.
LaMar, M. D., Y. Qi and J. Xin (2003). “Modeling vocal fold motion with a hydrodynamic semicontinuum model.” The Journal of the Acoustical Society of America 114(1): 455-464.
Lloyd, J. E., I. Stavness and S. Fels(2012). “ArtiSynth: a fast interactive biomechanical modeling toolkit combining multibody and finite element simulation.”In Yohan Payan, Soft Tissue Biomechanical Modeling for Computer Assisted Surgey (pp. 355-394). Springer –Verlag, Berlin, Germany.
Maeda, S. (1982). “A digital simulation method of the vocal-tract system.” Speech communication 1(3): 199-229.
Mokhtari, P., H. Takemoto and T. Kitamura (2008). “Single-matrix formulation of a time domain acoustic model of the vocal tract with side branches.” Speech Communication 50(3): 179-190.
Peterson, G. E. and H. L. Barney (1952). “Control methods used in a study of the vowels.” The Journal of the Acoustical Society of America 24(2): 175-184.
Pruthi, T., & Espy-Wilson, C. (2007). Acoustic parameters for the automatic detection of vowel nasalization. Proceedings of Interspeech 2007, Antwerp, Belgium, Aug 27-31. 1925-1928.
Rosenberg, A. E. (1971). “Effect of glottal pulse shape on the quality of natural vowels.” The Journal of the Acoustical Society of America 49(2B): 583-590.
Stevens, K. N. and A. S. House (1956). “Studies of formant transitions using a vocal tract analog.” The Journal of the Acoustical Society of America 28(4): 578-585.
Stort, B. H. and I. R. Titze (1995). “Voice simulation with a body-cover model of the vocal folds.” The Journal of the Acoustical Society of America 97(2): 1249-1260.
Story, B. H., I. R. Titze and E. A. Hoffman (1996). “Vocal tract area functions from magnetic resonance imaging.” The Journal of the Acoustical Society of America 100(1): 537-554.
Takemoto, H., K. Honda, S. Masaki, Y. Shimada and I. Fujimoto (2006). “Measurement of temporal changes in vocal tract area function from 3D cine-MRI data.” The Journal of the Acoustical Society of America 119(2): 1037-1049.
Titze, I. R. and B. H. Story (2002). “Rules for controlling low-dimensional vocal fold models with muscle activation.” The Journal of the Acoustical Society of America 112(3): 1064-1076.
Van den Berg, J., J. Zantema and P. Doornenbal Jr (1957). “On the air resistance and the Bernoulli effect of the human larynx.” The journal of the acoustical society of America 29(5): 626-631.
Wei, J., J. Liu, Q. Fang, W. Lu, J. Dang and K. Honda (2015). “A Novel Method for Constructing 3D Geometric Articulatory Models.” Journal of Signal Processing Systems, DOI 10.1007/s11265-015-1002-8,1-8.
Zhang, Z. and C. Y. Espy-Wilson (2004). “A vocal-tract model of American English/l/.” The Journal of the Acoustical Society of America 115(3): 1274-1280.
College of Santa Fe Auditory Theory. (2015)： (2015/12/29 Access)
王小川 (2009). “語音訊號處理 修定二版,” 全華圖書股份有限公司,台灣新北市。