博碩士論文 102521601 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:67 、訪客IP:3.144.108.113
姓名 蘇尼爾(Dahnial Syauqy)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱
(Feature Based Scoring on Visible Speech Diagnostic and Rehabilitation System)
相關論文
★ 獨立成份分析法於真實環境中聲音訊號分離之探討★ 口腔核磁共振影像的分割與三維灰階值內插
★ 數位式氣喘尖峰氣流量監測系統設計★ 結合人工電子耳與助聽器對中文語音辨識率的影響
★ 人工電子耳進階結合編碼策略的中文語音辨識成效模擬--結合助聽器之分析★ 中文發聲之神經關聯性的腦功能磁振造影研究
★ 利用有限元素法建構3維的舌頭力學模型★ 以磁振造影為基礎的立體舌頭圖譜之建構
★ 腎小管之草酸鈣濃度變化與草酸鈣結石關係之模擬研究★ 口腔磁振影像舌頭構造之自動分割
★ 微波輸出窗電性匹配之研究★ 以軟體為基準的助聽器模擬平台之發展-噪音消除
★ 以軟體為基準的助聽器模擬平台之發展-回饋音消除★ 模擬人工電子耳頻道數、刺激速率與雙耳聽對噪音環境下中文語音辨識率之影響
★ 用類神經網路研究中文語音聲調產生之神經關聯性★ 教學用電腦模擬生理系統之建構
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 隨著近年來身心障礙的族群越來越受到重視,現今的社會有越來越多的構音障礙者會尋求言語障礙的治療以及復健。因此,發展一套語言治療師能用來輔助及復健治療的工具就顯的愈來愈重要。本研究目的是發展一套以軟體為主的可見式語音診斷與復健系統,使用者可以透過使用者介面分別對說話正常與構音障礙個案錄下語音訊號,並比對兩組語音信號的波形、頻譜、聲譜及基頻等資訊,提供量化的客觀分析。除此之外,此系統藉由比對構音障礙個案與正常個案的語音資訊進行評分,分析的語音資訊包含了聲調、母音辨識、有聲/無聲、擦音偵測與語音強度等。在聲調的辨識上選用倒頻譜法與簡化逆向濾波追蹤(Simplified Inverse Filter Tracking, SIFT) 演算法做聲調的擷取;在母音辨識上則是選用第K位最近鄰居模型(K-Nearest Neighbors, K-NN)和多層感知機模型(Multilayer Perceptron, MLP)。在有聲與無聲的偵測上,利用聲調、短時能量與過零率等資訊進行判斷;在擦音的評分則是利用聲調、衝直條的位置與強度進行辨識。系統的自動評分功能是利用適應性符號辨識指標(Adaptive Signed Correlation Index, ASCI) 對語音進行評分量化;測量構音障礙者所產生的母音在聲調與有聲/無聲與正常人的相似度,並計算兩者的尤拉距離作為評分的標準,子音則是比較衝直條與有聲/無聲等資訊進行評分。最後系統將母音與子音的評分平均提供使用者此段語音訊號的量化結果。
為了評估此系統的實用性、功能性以及正確性,本研究分析八組 (其中包含六位成人,二位小孩;四位男性,四位女性)音訊來做後續的結果分析以及比較,此音訊為先前的研究於台北榮民總醫院以及衛生福利部桃園醫院新屋分院復健科語言治療團隊合作的資料。從共有2086音框分析的結果顯示;在聲調的辨識上,倒傳遞法的錯誤率5.32%略低於SIFT法的6.6%;母音的辨識上,多層感知機的正確率(男生92.61%, 女生86.75% 小孩83.75%)及速度都略好於K-NN模型(91.67%, 86.21%, 和 80.69%)。系統的擦音辨識結果與四位評分者(男性,23-27歲) 做出的評分有79.7% 的一致性 (64組評分51組相符);而在綜合評分上,有81.25的一致性(192筆資料156筆相符)。本研究所開發的系統除了操作簡單以外,從實驗的結果可以顯示本系統提供專業分析,亦可作為言語障礙者構音狀況評估、診斷以及復健的工具。
摘要(英) Since speech is one of primary means of communication, the needs of speech diagnosis and rehabilitation for patient having speech disorders are increasing. Therefore, the development of advanced system to assist the speech diagnosis and rehabilitation assessment is getting more important. The purpose of this study is to develop a tool to assist speech therapy and rehabilitation which focused on building simple interface to let the assessment be done without the need of particular knowledge of speech processing while at the same time also provided further deep analysis of the speech which can be useful for the speech therapist. Practically, the tool provides automatic scoring based on the comparison of the patient’s speech signal with another normal person’s speech signal on several aspects including pitch, vowel classification, voiced-unvoiced detection, fricative detection and sound intensity comparison to provide a quantitative analysis. In order to provide accurate pitch estimation, this research compared the use of two pitch tracking algorithms including cepstrum method and Simplified Inverse Filter Tracking (SIFT) method. In addition, this study also compared the use of two popular classification algorithms including K-Nearest Neighbors (K-NN) and Multilayer Perceptron (MLP) algorithm to classify vowels based on pitch and formants. The voiced-unvoiced decision employed the speech information including pitch, short term time energy and zero crossing rates, while the fricative detection employed the speech information including pitch, spectral peak location and their intensity. Finally, the automatic scoring was then done by using Adaptive Signed Correlation Index (ASCI) to quantify the similarity on pitch contour and voiced/unvoiced detection. Regarding the vowel quality scoring, it measured the Euclidean distance as the scoring quantification when both classes are different. For the strident fricative detection, the scoring was based on the location of the spectral peak of the fricative segments using distance metrics and based on the voiced or voiceless classification. Last, the overall score was computed from the average score of all features scoring.
In order to evaluate the performance and the practicality of the system, this study used and analyzed 8 patient’s speech recordings (6 adults and 2 children, 4 males and 4 females) which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Tao Yuan General Hospital. The experiment result on pitch algorithm comparison showed that from a total of 2086 frames, the cepstrum method had 5.32% of gross pitch error (GPE) which was lower than 6.6% by the SIFT method. For the vowel classification algorithm, MLP method provided better accuracy (92.61% for men, 86.75% for women and 83.75% for children) compared to K-NN method (91.67%, 86.21% and 80.69%) and up to 5-times faster in the computation time. Particularly on the fricative detection, the outcome of the tool showed that 51 out of 64 audio observations (79.7%) from 4 respondents (graduate students, laboratory members, males between 23-27 years old) were consistent. In total, it can be calculated that from 192 audio and visual observations done by 4 respondents, 156 grading results performed by the tool were consistent (81.25%). The experimental results also showed the advantage of the tool by using provided simple and professional mode to indicate the difference on several aspects of speech between normal speaker and patient with speech disorders to assist the speech diagnostic and rehabilitation.
關鍵字(中) ★ 構音障礙
★ 語音治療輔具
★ 多層感知機
★ 第K位最近鄰居模型
★ 倒頻譜演算法
★ 簡化逆向濾波追蹤演算法
關鍵字(英) ★ Speech disorder
★ computer assisted speech therapy
★ Multilayer Perceptron
★ K-Nearest Neighbor
★ cepstrum method
★ SIFT method
論文目次 Abstract (Chinese) i
Abstract (English) iii
Acknowledgement v
List of figures ix
List of Tables xiii

Chapter I: Introduction 1
1.1 Background and motivation 1
1.2 Literature review 3
1.2.1 Speech disorders 3
1.2.2 Research on assisted speech rehabilitation 6
1.3 Objectives of the thesis 10
1.4 Thesis outline 11

Chapter II: Overview of Speech Acoustic and Processing 13
2.1 Spectral analysis of speech 13
2.1.1 Vowels 15
2.1.2 Monophthongs & diphthongs 19
2.1.3 Consonants 20
2.2 Speech processing 23
2.3 Linear Predictive Coding 24
2.3.1 Autocorrelation 27
2.3.2 Dynamic Time Warping 28
2.4 Pitch tracking algorithm 28
2.4.1 Cepstrum based pitch tracking 29
2.4.2 Simplified Inverse Filter Tracking 30
2.5 Vowel classification algorithm 31
2.5.1 K-Nearest neighborhood 32
2.5.2 Multilayer Perceptron 33
2.6 ASCI similarity index algorithm 35
2.7 Summary 36

Chapter III: Methodology and System Architecture Design 38
3.1 Introduction 38
3.2 Signal preprocessing 39
3.3 Pitch tracking 42
3.3.1 Cepstrum based 42
3.3.2 SIFT method 43
3.3.3 Pitch contour smoothing 45
3.4 Voiced-Unvoiced detection 46
3.5 Vowel classification 48
3.5.1 Database 48
3.5.2 K-Nearest neighborhood 49
3.5.3 Multilayer Perceptron 50
3.6 Fricatives detection 53
3.7 Similarity quantification 55
3.7.1 Pitch and VUS similarity 55
3.7.2 Vowel similarity 57
3.7.3 Strident fricative similarity 59
3.7.4 Loudness (sound intensity) similarity 60
3.7.5 Final overall scoring 60
3.8 System architecture and interface 60
3.9 Summary 71

Chapter IV: Results and Discussion 72
4.1 Introduction to speech disorder cases 72
4.2 Experimental procedures 73
4.3 Experimental results 74
4.3.1 Result of pitch tracking algorithm comparison 74
4.3.2 Result of vowel classification comparison 76
4.3.2 Result of overall system experiment 79
4.4 Discussion 96
4.5 Summary 99

Chapter V: Conclusion and Future Works 100
5.1 Conclusion 100
5.2 Future works 102

References 104
Web References 108
參考文獻 Chen, C.L., Lin, K.C., Chen, C.H., Chen, C.C., Liu, W.Y., Chung, C.Y., (2010). Factors associated with motor speech control in children with spastic cerebral palsy. Chang Gung Medical Journal, vol. 33, 415–423.
Cherif, A., Bouafif, L., Dabbabi, T. (2001). Pitch detection and formant analysis of Arabic speech processing. Applied Acoustics, vol. 62 (10), 1129-1140.
D’Alatri, L., Paludetti, G., Contarino, M.F., Galla, S., Marchese, M.R., Bentivoglio, A.R.(2008). Effects of bilateral subthalamic nucleus stimulation and medication on Parkinsonian speech impairment. Journal of Voice, vol. 22, 365–372.
Delattre, P., Liberman, A. M., Cooper, F. S. (1955). Acoustic loci and transitional cues for consonants. Journal of the Acoustical Society of America, vol. 27, 769-774.
Dworkin, J.P., Melecam, R.J., & Stachlerm, R.J. (2003). More on the role of the mandible in speech production: clinical correlates of Green, Moore, and Reilly’s (2002) findings. Journal of Speech, Language, and Hearing Research, vol. 46, 1016–1019.
Falk, T.H., Chan, W.Y., Shein, F. (2012). Characterization of a typical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, vol. 54, Issue 5, 622–631.
Flynn, N., Foulkes, P. (2011). Comparing vowel formant normalization methods in Proc. of the 17th International Congress of Phonetic Sciences, Hong Kong, 683-686.
Gardner, M.W., Dorling, S.R. (1998). Artificial neural networks (the multilayer perceptron) a review of applications in the atmospheric sciences. Atmospheric Environment Vol. 32, 2627-2636.
Ghonim, A., Lim, J., Smith, J., Wolfe, J. (2013). The division of the perceptual vowel plane for different accent of English and the characteristic separation required to distinguish vowels. Journal of Acoustics Australia, vol. 41 no. 2, 160-164.
Glykas, M., Chytas P., (2004). Technology assisted speech and language therapy. International Journal of Medical Informatics, vol. 73, 529-541.
Golipour, L., O’Shaughnessy, D. (2009). Context-Independent phoneme recognition using K-Nearest Neighbor classification approach. In Proc. IEEE ICASSP, Taipei, 1341-1344.
Heeney, S.A. (1979). Speech and language disorder survey among adults in Christchurch. University of Canterbury.
Hillenbrand, J., Gayvert, R.T. (1993). Vowel classification based on fundamental frequency and formant frequencies. Journal of Speech and Hearing research, vol. 36, 694-700.
Hillenbrand, J., Getty, L. A., Clark, M. J., Wheeler, K. (1995). Acoustic characteristics of American English vowels, Journal of the Acoustical Society of America, vol. 97, 3009-3111.
Howell, P., Rosen, S. (1983). Production and perception of rise time in the voiceless affricate/fricative distinction. Journal of the Acoustical Sociaety of America. vol. 73, 976-984.
Hsieh, P.H. (2013). Visible Speech Diagnostic and Rehabilitation System. National Central University: Taiwan.
Kadambe, S., Bartels, F.B. (1992). Application of the wavelet transform for pitch Detection of speech signals. IEEE Transactions on Information Theory, vol. 38, No. 2, 917-924.
Kataria, A., Singh, M.D. (2013). A review of data classification using k-nearest neighbour algorithm. International Journal of Emerging Technology and Advanced Engineering, vol. 3, 354-360.
Kent, R.D., Rosenbek, J.C.(1983). Acoustic patterns of apraxia of speech. Journal of Speech and Hearing Research, vol. 26, 231-248.
Kent, R.D., Kent, J.F., Weismer, G., Martin, R.E., Sufit, R.L., Brooks, B.R., Rosenbek, J.C. (1989). Relationships between speech intelligibility and the slope of second-formant transitions in dysarthric subjects. Clinical Linguistics & Phonetics, vol. 3, 347-358
Kent, R.D., Weismer, G., Kent, J.F., Vorperian, H.K., Duffy, J.R. (1999). Acoustic studies of dysarthric speech: methods, progress, and potential. J. Communication Disorders. vol. 32, 141-186.
Kent, R.D., Read, C. (2002). The acoustic analysis of speech, Thomson Learning: Albany, NY, USA.
Khan, T., Westina, J., Doughertya, M. (2014). Cepstral Separation Difference: a novel approach for speech impairment quantification in parkinson′s disease. Biocybernetics and Biomedical Engineering, vol. 34, Issue 1, 25–34.
Lian J, Garner G, Muessig D, and Lang V.(2010). A simple method to quantify the morphological similarity between signals. Journal of Signal Processing, vol. 90, 684-688, 2010.
Liss, J.M., Spitzer, S., Caviness, J.N., Adler, C., Edwards, B. (1998). Syllabic strength and lexical boundary decisions in the perception of hypokinetic dysarthric speech, Journal of the Acoustical Society of America, vol. 104, No. 4, 2457-2466.
Liu, C.L, Lee, C.H., Lin, P.M. (2010). A fall detection system using K-Nearest Neighbor classifier. Expert Systems with Applications. vol. 37, 7174–7181.
Makhoul, J. (1975). Linear prediction: A Tutorial Review. In Proceedings of the IEEE, vol. 63, no.4, 561-580.
Maier, A., Haderlein, T., Eysholdt, U., Rosanowski, F., Batliner, A., Schuster, M., Noth, E. (2009). PEAKS - A system for the automatic evaluation of voice and speech disorders. Speech Communication, vol. 51, 425-437.
Markel, J.D. (1972). The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, vol. AU-20, No.5, 367-377.
Marzban, C. and Stumpf, G. J. (1996). A neural network for tornado prediction based on doppler radar derived attributes. Journal of Applied Meteorology, vol. 35, 617-626.
Meng, Z., Chen, Y., Li, X. (2006). Statistical survey of monophthong formants in mandarin for students being trained as broadcasters, in Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, November 2006, Tsinghua University Press, 280 – 286.
Mulligan, M., Carpenter, J., Riddel, J., Delaney, M.K., Badger, G., Kruinski, P., Tandan, R. (1994). Intelligibility and the acoustic characteristics of speech in Amyotrophic Lateral Sclerosis (ALS). Journal of Speech and Hearing Research. vol. 37, 496-503.
Noll, A.M. (1967). Cepstrum pitch determination. Journal of the Acoustical Society of America, vol. 41, 293-309.
Oh, K. A., Un, C. K. (1984). A performance comparison of pitch extraction algorithms for noisy speech. In Proc. IEEE ICASSP, San Diego, CA, vol. 9, 85-88.
Owens Jr., R.E., Metz, D.E., Haas, A., (1999). Introduction to communication Disorders: a Life Span Perspective. Needham Heights: Allyn and Bacon.
Park, S. H., Kim, D. J., Lee, J. H., Yoon, T. S. (1994). Integrated speech training system for hearing impaired. IEEE transitions on rehabilitation engineering, vol. 2, No. 4, 189-196.
Peterson, G. E., Barney, H. L. (1952). Control methods used in a study of the vowels, Journal of the Acoustical Society of America, vol. 24, 175-184.
Pinto, S., Ozsancak, C., Tripoliti, E., Thobois, S., Dowsey, P.L., Auzou, P. (2004). Treatments for dysarthria in Parkinson′s disease. Lancet Neurol, vol. 3 (9), 547–556.
Popovici, D.V., Belciu, C.B. (2012). Professional challenges in computer-assisted speech therapy. Procedia - Social and Behavioral Sciences, vol. 33, 518 – 522.
Qualls, C.D. (2012). Communication disorders in multicultural and international populations (Fourth Edition). Elsevier.
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., McGonegal, C.A. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24 (5), 399-418.
Rabiner, L.R. (1977). On the use of autocorrelation analysis for pitch detection, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 25, No. 1, 24-33.
Rabiner, L.R., Juang, B.H. (1993). Fundamentals of speech recognition. Prentice Hall: Englewood Cliffs, NJ.
Robert, E.O. Jr., Metz, D.E., Haas, A. (2000). Introduction to communication disorders, Pearson Education: Needham Heights, MA.
Rodrıguez, W.R., Saz, O., Lleida, E., (2012). A prelingual tool for the education of altered voices. Speech Communication, vol. 54, 583-600.
Rosenbek, J. (1980). Apraxia of Speech-relationship to Stuttering. Journal of Fluency Disorders. vol. 5, 233-253.
Roy, E.A.(1983). Current perspectives on disruptions to limb praxis. Physical Therapy. vol. 63, 1998-2003.
Ruszj, J., Cmejla, R., Ruzickoya, H., Ruzicka, E. (2011). Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease, Journal of the Acoustical Society of America, vol. 129, No. 1, 350-367.
Saini, I., Singh, D., Khosla, A. (2013). QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research, vol 4, 331-344.
Saz, O., Yin, S.C., Lleida, E., Rose, R., Vaquero, C., Rodriguez, W.R. (2009). Tools and technologies for computer-aided speech and language therapy. Speech Communication, vol. 51, 948–967.
Stevens, K.N., House, A.S. (1955). Development of a quantitative description of vowel articulation. Journal of the Acoustical Society of America, vol. 27,484-493.
Surveillance of Cerebral Palsy in Europe (SCPE) (2002). Prevalence and characteristics of children with cerebral palsy in Europe. Dev Med Child Neurol, vol. 44, 633–640.
Welch, R.M., Sengupta, S.K., Goroch, A.K. (1992). Polar cloud and surface classification using AVHRR imagery: an intercomparison method. Journal of Applied Meteorology, vol. 31, 405-420.
Willis, G.L., Armstrong, S.M. (1998). Orphan neurones and amine excess: the functional neuropathology of Parkinsonism and neuropsychiatric disease. Brain Research Reviews, vol. 27, Issue 3, 177–242.
Yamada, Y., Javkin, H., Youdelman, K. (2000). Assistive speech technology for persons with speech impairments. Speech Communication, vol. 30, 179-187.

Web References

Boersma, P., Weenink, D. (2014). Praat: doing phonetics by computer [Computer program]. Version 5.3.75, retrieved 30 April 2014 from http://www.praat.org/
Hewlett-Packard (Hewlett-Packard Co., Palo Alto, California, USA) http://www8.hp.com/us/en/hp-information/index.html
MATLAB (The MathWorks, Natick, Massachusetts, USA): http://www.mathworks.com/products/matlab/
R.O.C (Taiwan) Ministry of the Interior Department′s website, December 31, 2011 Information: http://www.moi.gov.tw/stat/index.aspx
Sensimetrics, SpeechStation2 (Malden, MA, USA): http://www.sens.com/
Synapse Adaptive, SpeechViewer III (San Rafael, CA, USA): http://www.synapseadaptive.com/profile.htm
Tiger DRS Inc., Dr. Speech (Seattle, WA, USA): http://www.drspeech.com/Distributors.html
指導教授 吳炤民(Chao-Min Wu) 審核日期 2014-8-4
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明