類神經網路應用於語音情緒的分析與辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：21

、訪客IP：3.22.42.14

姓名

陳正倫(Zheng-lun Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

類神經網路應用於語音情緒的分析與辨識
(The Analysis and Recognition of Emotional Speech Using Artificial Neural Networks)

相關論文

★ 以Q-學習法為基礎之群體智慧演算法及其應用	★ 發展遲緩兒童之復健系統研製
★ 從認知風格角度比較教師評量與同儕互評之差異：從英語寫作到遊戲製作	★ 基於檢驗數值的糖尿病腎病變預測模型
★ 模糊類神經網路為架構之遙測影像分類器設計	★ 複合式群聚演算法
★ 身心障礙者輔具之研製	★ 指紋分類器之研究
★ 背光影像補償及色彩減量之研究	★ 類神經網路於營利事業所得稅選案之應用
★ 一個新的線上學習系統及其於稅務選案上之應用	★ 人眼追蹤系統及其於人機介面之應用
★ 結合群體智慧與自我組織映射圖的資料視覺化研究	★ 追瞳系統之研發於身障者之人機介面應用
★ 以類免疫系統為基礎之線上學習類神經模糊系統及其應用	★ 基因演算法於語音聲紋解攪拌之應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文提出一個多頻帶線性預估倒頻譜係數(multi-band linear predictive cepstral coefficients)的語音情緒特徵，利用離散小波轉換將訊號分解至多個子頻帶，對全頻帶和每個頻帶萃取出線性預估編碼係數，同時分析不同參數多頻帶線性預估倒頻譜係數，最後決定以分解2層、10階線性預估編碼係數和縮短取樣比例為8的做為參數。並且結合音高和能量曲線特徵，總共有52特徵，最後藉由費雪比例選擇出32個做為7種情緒的語音情緒辨識系統特徵，其整體辨識率達到90%。
最後本論文比較三種不同的類神經網路辨識器(多層感知機、放射基底函數網路和多維矩形複合式神經網路)。在整體資料集辨識率，多層感知機有90% 以上的最佳辨識率；模糊化多維矩形複合式神經網路對於訓練資料有著高達百分百的辨識結果；最後放射基底函數網路在測試資料集有68% 的辨識率。

摘要(英)

This thesis presents a multi-band linear predictive cepstral coefficients (MBLPCC) feature for the emotional speech recognition system. Base on discrete wavelet transform (DWT), the emotional speech is decomposed into various frequency subband, and LPCC of the lower frequency subband for each decomposition process are calculated.
Furthermore, we analyze the different parameters of MBLPCC, and then decide to decompose two times, 10 LPCC coefficients and the downsampling ratio of eight as the parameters. We also combine MBLPCC with pitch and energy curve features, a total of 52 features, and choose 32 features by Fisher’s ratio for the seven kinds of emotion of emotional speech recognition system, and achieves the recognition rate of 68%.
Finally, we compare three different artificial neural networks (ANN) recognizer, multilayer perceptrons (MLP), radial basis function networks (RBF) and fuzzy hyperrectangular composite neutral networks (FHRCNN). In the recognition rate of overall data set, MLP achieved the best rate of over 90%. FHRCNN with training data set achieves recognition result of up to 100%. Finally, RBFN with testing data set achieves the recognition rate of 68%.

關鍵字(中)

★ 模糊化多維矩形複合式神經網路
★ 類神經網路
★ 費雪比例
★ 多頻帶線性預估倒頻譜係數
★ 音高
★ 語音情緒辨識

關鍵字(英)

★ emotional speech recognition
★ FHRCNN
★ MBLPCC
★ pitch
★ Fisher's ratio
★ artificial neural networks

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1-1研究動機 1
1-2研究目的 1
1-3論文架構 2
第二章相關研究 3
2-1情緒分類 3
2-2語音情緒資料庫 5
2-3語音情緒特徵 6
2-4語音情緒辨識方法 7
2-5結論 8
第三章語音情緒辨識系統 10
3-1系統架構 10
3-2特徵擷取 12
3-2-1音高(pitch) 12
3-2-2能量(energy) 16
3-2-3多頻帶線性預估倒頻譜係數(multi-band linear predictive cepstral coefficients) 18
3-2-4特徵選取 23
3-3辨識方法 24
3-3-1多層感知機(multilayer perceptrons) 24
3-3-2放射基底函數網路(radial basis function networks) 27
3-3-3模糊化多維矩形複合式類神經網路(fuzzy hyperrectangular composite neural networks) 30
第四章實驗結果與分析 37
4-1語音情緒資料庫 37
4-2特徵分析和比較 38
4-2-1多頻帶線性預估倒頻譜系數 40
4-2-2 MBLPCC和音高能量比較 43
4-3特徵選取 44
4-4不同類神經網路的比較結果 50
4-5結論 52
第五章結論與未來展望 56
5-1結論 56
5-2未來展望 56
參考文獻 57
附錄1、語音情緒資料庫語句對應表 61
附錄2、特徵編號表 62

參考文獻

[1] A. Austermann, N. Esau, L. Kleinjohann, and B. Kleinjohann, “Fuzzy emotion recognition in natural speech dialogue,” Robot and Human Interactive Communication, pp. 317-322, Aug. 2005.
[2] R. Banse and K. R. Scherer, “Acoustic profiles in vocal emotion expression,” Personality and Social Psychology, vol. 70, no. 3, pp.614-636, 1996.
[3] M. W. Bhatti, Y. Wang, and L. Guan, “A neural network approach for human emotion recognition in speech,” in Proc. of the International Symposium on Circuits and System, vol.2, pp. II-181-4, May 2004.
[4] F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A Database of German Emotional Speech,” in Proc. of the INTERSPEECH, p.1517-1520, 2005.
[5] R. R. Cornelius, “Theoretical approaches to emotion,” in Proc. of the ISCA Workshop on Speech and Emotion, pp. 3-10, 2000.
[6] R. Cowie, E. Douglas-Cowie, B. Apolloni, J.Taypor, A. Romano, and W. Fellenz, “What a neural net needs to know about emotion words,” in Proc. of the 3rd World Multiconference on Circuits, Systems, Comms. and Computers, pp. 109-114, July, 1999.
[7] R. Cristi, Modern digital signal processing, Pacific Grove, CA, USA:Brook/Cole-Thomson Learning, 2004.
[8] F. Dellaert, T. Polzin, and A. Waibel, “Recognizing emotion in speech,” in Proc. ICSLP, pp. 1970-1973, 1996.
[9] R. O. Duda and P.E.Hart, Pattern Classification and Scene Analysis, New York：Wiley, 1973.

[10] F. Fragopanagos and J.G. Taylor, “Emotion recognition in human-computer interaction,” Neural Networks, vol. 18, pp. 389-405, 2005.
[11] H. H. Kyung, E. H. Kim, Y. K. Kwak, “Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept,” in IEEE International Workshop on Robot and Human Interactive Communication, pp. 312-316, Aug. 2005.
[12] D. Liu, et al., "A Structure Optimization Method Based on Fisher Ratio," in 3rd International Conf. on Natural Computation, vol. 1, pp. 54-58, 2007.
[13] I. R. Murray and J. L. Arnott, “Toward a simulation of emotion in synthetic speech: A review of the literature on human vocal emotion,” J. of the Acoustical Society of America, pp. 1097-1108, 1993.
[14] J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using neural networks,” in 6th International Conf. on Neural Information Proc., vol. 2, pp. 495-501, Nov. 1999.
[15] T. L. Nwe, F. S. Wei, and L. C DE Silva, “Speech based emotion classification,” in Proc. of IEEE Region 10 International Conf. on Electrical and Electronic Technology, vol. 1, pp. 297-301, Aug 2001.
[16] A. A. Razak, R. Komiya, and M. I. Z. Abidin, “Comparison between fuzzy and NN method for speech emotion recognition,” in 3rd International Conf. on Information Technology and Applications, vol. 1, pp. 297-302, July 2005.
[17] J. Rong, Y. P. Chen, M. Chowdhury, and G. Li, “Acoustic features extraction for emotion recognition,” in 6th IEEE/ACIS International Conf. on Computer and Information Science, pp. 419-424, 2007.
[18] J. Sato and S. Morishima, “Emotion modeling in speech production using emotion space”, in 5th IEEE International Workshop on Robot and Human Communication, pp. 472-477, Nov. 1996.
[19] K. R. Scherer, “Vocal communication of emotion: A review of research paradigms,” Speech Communication, vol. 40, pp. 227-256, 2003.
[20] B. Schuller, M. Lang, and G. Rigoll, “Automatic emotion recognition by the speech signal,” in 6th World Multiconference on Systemics, Cybermetics and Informatics, pp. 367-372, 2002.
[21] B. Schuller, M. Lang, and G. Rigoll, “Hidden Markov model-based speech emotion recognition,” in Proc. of the IEEE ICASSP Conf., pp. 1-4, April 2003.
[22] B. Schuller, M. Lang, and G. Rigoll, “Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture,” in IEEE International Conf. on Acoustics, Speech, and Signal Proc., vol. 1, pp. I-577-80, May 2004.
[23] M. M. Sondhi, “New methods of pitch extraction,” IEEE Trans. on Audio and Electroacoust, vol. 16, pp. 262-266, 1968.
[24] M. C. Su, C. -W. Liu, and S. -S. Tsay, “Neural-network-based Fuzzy Model and its Application to Transient Stability Prediction in Power Systems,” IEEE Trans. Systems, Man, and Cybernetics, vol. 29, no. 1, pp. 149-157, 1999.
[25] M. C. Su, "Use of Neural Networks as Medical Diagnosis Expert Systems," Computers in Biology and Medicine, Vol. 24, No. 6, pp. 419-429, 1994.
[26] K. Truong and D. Van Leeuwen, “An ‘open-set’ detection evaluation methodology for automatic emotion recognition in speech,” in Workshop on Paralinguistic Speech – between models and data, pp. 5-10, 2007
[27] D. Ververidis and C. Kotropoulos, “A State of the Art Review on Emotional Speech Databases,” in Proc. 1st Richmedia Conf., pp. 109-119, Oct. 2003.

[28] D. Ververidis and C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods,” Speech Communication, vol. 48, no. 9, pp. 1162-1181, Jan. 2006.
[29] T. Vogt and E. André, “Improving automatic emotion recognition from speech via gender differentiation,” in Proc. Language Resources and Evaluation Conference, 2006.
[30] C. M. Whissell, “The dictionary of affect in language,” in Emotion: Theory, Research, and Experience, Robert Plutchik and Henry Kellerman, Ed. Academic Press, pp. 113-131.
[31] Z. Xiao, E. Dellandrea, W. Dou, and L. Chen, “Hierarchical classification of emotional speech,” IEEE Transactions on Multimedia, submitted for publication, 2007.
[32] T. Yamada, H. Hashimoto, and N. Tosa, “Pattern recognition of emotion with neural network,” in 21st International Conf. on Industrial Electronics, Control, and Instrumentation, vol. 1, pp. 183-187 Nov. 1995.
[33] Audio Signal Processing and Recognition. [Online]. Available:
http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/
June 22, 2009 [date accessed]
[34] Berlin Database of Emotional Speech. [Online]. Available: http://www.expressive-speech.net/emodb, June 22, 2009 [date accessed]
[35] 求是科技，數位影像處理技術大全，文魁資訊，台北市，民國九十七年。
[36] 蘇木春、張孝德，機器學習：類神經網路、模糊系統以及基因演算法則，修訂二板，全華科技圖書公司，台北，民國九十五年。
[37] 陳萬城，「雜訊環境下強健性語者辨識的新方法」，博士論文，電機工程學系，淡江大學，民國九十八年一月。

指導教授

蘇木春(Mu-Chun Su)

審核日期

2009-7-16

推文