利用不同母音聲道面積函數及聲門信號驗證構音模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：34

、訪客IP：3.21.162.87

姓名

許恕瑋(SHU-WEI HSU) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

利用不同母音聲道面積函數及聲門信號驗證構音模型
(Verification of an articulatory model with different vowel vocal tract area functions and glottal signals)

相關論文

★ 獨立成份分析法於真實環境中聲音訊號分離之探討	★ 口腔核磁共振影像的分割與三維灰階值內插
★ 數位式氣喘尖峰氣流量監測系統設計	★ 結合人工電子耳與助聽器對中文語音辨識率的影響
★ 人工電子耳進階結合編碼策略的中文語音辨識成效模擬--結合助聽器之分析	★ 中文發聲之神經關聯性的腦功能磁振造影研究
★ 利用有限元素法建構3維的舌頭力學模型	★ 以磁振造影為基礎的立體舌頭圖譜之建構
★ 腎小管之草酸鈣濃度變化與草酸鈣結石關係之模擬研究	★ 口腔磁振影像舌頭構造之自動分割
★ 微波輸出窗電性匹配之研究	★ 以軟體為基準的助聽器模擬平台之發展-噪音消除
★ 以軟體為基準的助聽器模擬平台之發展-回饋音消除	★ 模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識率之影響
★ 用類神經網路研究中文語音聲調產生之神經關聯性	★ 教學用電腦模擬生理系統之建構

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本研究的目的是使用等效電路模型以及數學函數來建構產生母音的構音模型，其中結合了聲帶模型與聲道模型，並且使用文獻上所提供的生理參數作為依據，模擬人類正常情形下的語音產生。為了驗證此模型的正確與否，本論文分別使用Takemoto以及Story兩位學者在文獻中所提供的核磁共振造影(Magnetic Resonance Imaging, MRI)的聲道面積函數(vocal tract area function)，還有Rosenberg學者提供的聲門信號以及Two-Mass模型來驗證此模型。
聲帶(vocal fords)位於喉部，是左右對稱的瓣膜結構，利用振動來產生聲音。在模擬聲門訊號中，我們使用Rosenberg學者所提供的數學函數以及Two-Mass模型來產生聲門訊號。Two-Mass模型是使用物理模型轉化成等效電路模型，其中利用兩個質量塊表示聲帶，彈簧以及阻尼來表示聲帶的肌肉運動。在聲道模型中，我們將聲道當成不同管子所組合而成的多節管。在無損管(lossless tube)的模型中，我們可以利用流速以及聲壓的變化得出一數學模型，但是雖然這種方法較為簡單，卻忽略了聲道管壁對於語音的影響。MAEDA學者則是提供了一個包含了聲道管壁能量消耗的聲道系統模型，其中也給出了將模型轉換成等效電路的方法，利用此模型，就可以結合聲門訊號產生想要的語音。
本論文使用Story學者(/AA/、/IY/、/UW/、/AE/、/AO/)以及Takemoto(/a/、/i/、/u/、/e/、/o/)學者在文獻中所提供的母音聲道面積函數模擬母音，並且比較母音的前三共振峰值與聲道形狀的關聯。另外也使用Rosenberg學者提供的聲門訊號以及Two-Mass聲帶模型產生的訊號與MAEDA模型作結合，並且觀察使用不同的聲門訊號對語音會有什麼影響。研究結果顯示，Rosenberg訊號與Two-Mass聲帶模型在頻域上一樣保有低通濾波器的特性。而Two-Mass聲帶模型，在低頻的能量上會較明顯，高頻的能量則衰減的較快。搭配本論文的構音系統模型，這兩種聲門信號都能夠模擬英、日文母音的發音。但是搭配DIVA (Directions Into Velocities Articulator, DIVA) 模型在模擬日文的時候，共振峰值超出DIVA能夠模擬的範圍，所以沒辦法產生正確的日文母音。至於我們的模擬結果與Story學者(其聲道節數依不同母音分別為42~46節)的結果比較，前三共振峰值的平均誤差分別為-7.4、2.58以及-0.46%；而比較Takemoto學者(其聲道節數依不同母音分為68~75節)的模擬結果，前三個共振峰值平均誤差為-2.01、1.99以及0.75%。以上結果顯示，本論文的模型可以成功的模擬英文以及日語母音的聲音，並且能夠使用生理參數調控模型，而且當聲道分割成越多節管時，本模型的母音前三共振峰的準確度越高。

摘要(英)

The purpose of this study is to build an articulatory model that employs an equivalent lumped electric circuit and related mathematical function to represent the vocal fold and vocal tract systems based on the physiological data from the literature to simulate individual’s vowel production under normal circumstances. Two vocal tract area functions of vowel production from the magnetic resonance imaging (MRI) studies by researchers of Takemoto group and Story, and two vocal folds models (Rosenberg glottal signal and two-mass model) were used to verify our model.

The vocal folds are composed of two symmetrical mucous membranes across the larynx to generate sound through vibration. We simulated the glottal signal with the mathematical functions of Rosenberg’s study and the two-mass model representing the vocal folds as two concatenated mass-spring-damper systems.

In this study, the vocal tract system from the glottis to the lips was modeled as a tube with many concatenated sections. Based on the lossless tube model, we were able to employ the variation of volume velocity and sound pressure to build a mathematical vocal tract model. Although this approach is relatively simple, the problem is that the viscous effect from the vocal tract wall on vowel production is ignored. On the contrary, MAEDA proposed a vocal tract model that considered energy consumption on the vocal tract wall and also put forward a way to transform a physical model into an equivalent electric circuit model. With MAEDA’s vocal tract model, it is plausible to simulate the vowel production we want with the glottal signals.

In this study, we utilized vocal tract area functions from Story’s (/AA/、/IY/、/UW/、/AE/、/AO/) and Takemoto’s (/a/、/i/、/u/、/e/、/o/) research, to verify our vocal tract model with their corresponding vowels production. Furthermore, we applied Rosenberg and the two-mass model with the MAEDA model and observed what effects would be on the vowel production using different glottal signals.

The results showed that both the Rosenberg’s signal and two-mass model have low-pass filter characteristics. However, the frequency response of the two-mass model had more low frequency and less high frequency signals. In combination with our vocal tract model used in this study, these two glottal signals were capable of being used to simulate English and Japanese vowel production, respectively. But when they were used with the vocal tract portion of the DIVA (Directions Into Velocities Articulator, DIVA) model, they were incapable of simulating the correct Japanese vowel due to the formant frequency range limitation defined by the DIVA model.

In addition, we verified our articulatory model with the vocal tract area function from Story’s study (vocal tract sections varies from 42 to 46 sections depending on different vowels), and found that the differences for the first three formant frequencies between both studies were -7.4, -2.58, and -0.46%, respectively. Furthermore, the differences between ours and Takemoto’s study (vocal tract sections ranges from 68 to 75 sections depending on different vowels) were only -2.01, 1.99, and -0.75%, respectively. In summary, our model could simulate individual’s vowel production under normal circumstances based on the physiological data from the literature; the accuracy of vowel simulation could be higher as the vocal tract is divided into more sections in our model.

關鍵字(中)

★ Two-Mass聲帶模型
★ Rosenberg訊號模型
★ MEADA聲道模型
★ 語音產生
★ 聲道面積函數

關鍵字(英)

★ Two-mass model
★ Rosenberg signal model
★ MAEDA model
★ Speech production
★ Vocal tract area function

論文目次

摘要 I
Abstract III
致謝 V
目錄 VI
圖目錄 VIII
表目錄 XIII
第一章緒論 1
1.1研究動機 1
1.2發音基本要素 3
1.3聲帶的運動 5
1.4構音器官與共振峰 5
1.5文獻探討 7
1.6論文架構 14
第二章發音系統介紹 15
2.1發音器官 15
2.2發音與聲道形狀的關係 18
2.3語音訊號的模型 21
第三章模型的介紹與推導 25
3.1聲門模型 25
3.1.1聲帶壓力的分布關係 26
3.1.2 彈簧之於聲帶模型的性質 30
3.1.3聲帶的能量損失 31
3.2無損的單節管模型 31
3.3無損的多節管模型 36
3.3.1嘴唇端的訊號流向 39
3.3.2聲帶端的訊號流向 40
3.4 MAEDA模型 41
第四章結果與討論 47
4.1 聲帶模型的模擬 47
4.2 無損管模型的模擬 54
4.2.1 單節管模型的模擬 54
4.2.2 多節管模型的模擬 55
4.3 MAEDA模型的模擬 58
4.3.1 與Story學者比較的研究與結果 60
4.3.2 與Takemoto學者比較的研究與結果 64
4.3.3 與DIVA模型的比較 69
4.3.4 兩位學者的結果與比較 72
4.4 聲譜的比較 80
4.4.1本論文與Story學者比較的結果 81
4.4.2本論文與Takemoto學者比較的結果 86
4.4.3 Two-Mass模型結合本論文模型(Story學者的數據) 90
第五章結論與未來展望 95
5.1結論 95
5.2未來展望 98
參考文獻 100
附錄 104

參考文獻

英文參考資料：

Birkholz, P. (2013). “Modeling consonant-vowel coarticulation for articulatory speech synthesis.” PloS one 8(4): e60603.

Buchaillard, S., P. Perrier and Y. Payan (2009). “A biomechanical model of cardinal vowel production: Muscle activations and the impact of gravity on tongue positioning.” The Journal of the Acoustical Society of America 126(4): 2033-2051.

Dang, J. and K. Honda (1997). “Acoustic characteristics of the piriform fossa in models and humans.” The Journal of the Acoustical Society of America 101(1): 456-465.

Dang, J., K. Honda and H. Suzuki (1994). “Morphological and acoustical analysis of the nasal and the paranasal cavities.” The Journal of the Acoustical Society of America 96(4): 2088-2100.

Dunn, H. K., J. L. Flanagan and P. J. Gestrin (1962). “Complex zeros of a triangular approximation to the glottal wave.” The Journal of the Acoustical Society of America 34(12): 1977-1978.

Fant, G. (1972). “Vocal tract wall effects, losses, and resonance bandwidths.” Speech Transmission Laboratory Quarterly progress and status report 2(3): 28-52.

Flanagan, J. L. (1965). Speech analysis, synthesis and perception. Springer –Verlag, Berlin, Germany.

Hillenbrand, J., L. A. Getty, M. J. Clark and K. Wheeler (1995). “Acoustic characteristics of American English vowels.” The Journal of the Acoustical society of America 97(5): 3099-3111.

Honda, K., T. Kurita, Y. Kakita and S. Maeda (1995). “Physiology of the lips and modelingof lip gestures.” Journal of Phonetics 23(1): 243-254.

International Phonetic Association (1999). Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet, Cambridge University Press, Combridge.

Ishizaka, K. and T. Kaneko (1968). “On equivalent mechanical constants of the vocal cords.” The Journal of the Acoustical socirty of Japan 24: 312-313.

Ishizaka, K. and J. L. Flanagan (1972). “Synthesis of Voiced Sounds From a Two-Mass Model of the Vocal Cords.” Bell system technical journal 51(6): 1233-1268.

Ladefoged, P. and D. E. Broadbent (1957). “Information conveyed by vowels.” The Journal of the Acoustical Society of America 29(1): 98-104.

LaMar, M. D., Y. Qi and J. Xin (2003). “Modeling vocal fold motion with a hydrodynamic semicontinuum model.” The Journal of the Acoustical Society of America 114(1): 455-464.

Lloyd, J. E., I. Stavness and S. Fels(2012). “ArtiSynth: a fast interactive biomechanical modeling toolkit combining multibody and finite element simulation.”In Yohan Payan, Soft Tissue Biomechanical Modeling for Computer Assisted Surgey (pp. 355-394). Springer –Verlag, Berlin, Germany.
Maeda, S. (1982). “A digital simulation method of the vocal-tract system.” Speech communication 1(3): 199-229.

Mokhtari, P., H. Takemoto and T. Kitamura (2008). “Single-matrix formulation of a time domain acoustic model of the vocal tract with side branches.” Speech Communication 50(3): 179-190.
Peterson, G. E. and H. L. Barney (1952). “Control methods used in a study of the vowels.” The Journal of the Acoustical Society of America 24(2): 175-184.

Pruthi, T., & Espy-Wilson, C. (2007). Acoustic parameters for the automatic detection of vowel nasalization. Proceedings of Interspeech 2007, Antwerp, Belgium, Aug 27-31. 1925-1928.

Rosenberg, A. E. (1971). “Effect of glottal pulse shape on the quality of natural vowels.” The Journal of the Acoustical Society of America 49(2B): 583-590.

Stevens, K. N. and A. S. House (1956). “Studies of formant transitions using a vocal tract analog.” The Journal of the Acoustical Society of America 28(4): 578-585.

Stort, B. H. and I. R. Titze (1995). “Voice simulation with a body-cover model of the vocal folds.” The Journal of the Acoustical Society of America 97(2): 1249-1260.

Story, B. H., I. R. Titze and E. A. Hoffman (1996). “Vocal tract area functions from magnetic resonance imaging.” The Journal of the Acoustical Society of America 100(1): 537-554.

Takemoto, H., K. Honda, S. Masaki, Y. Shimada and I. Fujimoto (2006). “Measurement of temporal changes in vocal tract area function from 3D cine-MRI data.” The Journal of the Acoustical Society of America 119(2): 1037-1049.

Titze, I. R. and B. H. Story (2002). “Rules for controlling low-dimensional vocal fold models with muscle activation.” The Journal of the Acoustical Society of America 112(3): 1064-1076.

Van den Berg, J., J. Zantema and P. Doornenbal Jr (1957). “On the air resistance and the Bernoulli effect of the human larynx.” The journal of the acoustical society of America 29(5): 626-631.

Wei, J., J. Liu, Q. Fang, W. Lu, J. Dang and K. Honda (2015). “A Novel Method for Constructing 3D Geometric Articulatory Models.” Journal of Signal Processing Systems, DOI 10.1007/s11265-015-1002-8,1-8.

Zhang, Z. and C. Y. Espy-Wilson (2004). “A vocal-tract model of American English/l/.” The Journal of the Acoustical Society of America 115(3): 1274-1280.

網頁參考資料：
College of Santa Fe Auditory Theory. (2015)： (2015/12/29 Access)
http://www.feilding.net/sfuad/musi3012-01/html/

中文參考資料：
王小川 (2009). “語音訊號處理修定二版,” 全華圖書股份有限公司,台灣新北市。

指導教授

吳炤民(Chao-Min Wu)

審核日期

2016-1-26

推文