基於麥克風陣列的語者辨識系統設計與實作

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：98

、訪客IP：18.117.254.22

姓名

游仁男(Jen-Nan Yu) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

基於麥克風陣列的語者辨識系統設計與實作
(Design and Implementation of a Microphone Array Based Speaker Recognition System)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

為了提升單一麥克風語者辨識系統的效能。本研究因此設計一個基於麥克風陣列的嵌入式語者辨識系統，系統分成四個模組：麥克風陣列聲音訊號擷取、波束成形、語者特徵擷取和語者辨識模組。聲音訊號模組經由微機電(MEMS)麥克風組成的環形麥克風陣列收集語者聲音資訊；波束成形模組藉由多通道聲音處理來增強語音訊號與除去周圍的雜訊；在語者特徵擷取模組，我們使用線性預測編碼倒頻譜(LPCC)來表示語者的聲音特徵模型；最後藉由機率神經網路(PNN)分類器來進行語者辨識。我們建置一個實驗的語者聲音資料庫，錄製十二人共120個相同語句的聲音資料，來驗證此一語者辨識系統，實驗過程藉由機率神經網路平滑參數與波束成形參數的訓練來最佳化辨識率。實驗結果顯示，基於麥克風陣列的語者辨識系統，相較於單一麥克風的語者辨識系統，可降低約百分之十的錯誤相等率。

摘要(英)

The study is to design an embedded speaker identification system based on microphone array in order to improve the efficiency of single microphone identification systems. The system is composed of four modules including sound signal extraction from microphone array, beam forming, speaker features extraction and speaker identification module. Sound signal module is to collect speaker sound information by using loop microphone array composed of Micro Electro Mechanical System (MEMS) microphone; Beam forming is to enhance sound signal and remove background noise via multi-channel sound processing; Linear Predictive Cepstrum Coefficient (LPCC) is applied to represent a speaker sound characteristics module; The classifier of Probabilistic Neural Network (PNN) is applied to identify speaker. Besides, we built a database of experimental speaker sounds with one hundred and twenty same statements recorded by twelve people. This is to validate the speaker identification system. The recognition rate was optimized by PNN smoothing parameters and beam forming parameters during the training. The test results showed that our speaker identification system based on microphone array could reduce about 10% error rate compared to the single one.

關鍵字(中)

★ 語者辨識
★ 麥克風陣列
★ 機率神經網路

關鍵字(英)

★ Speaker Recognition
★ Microphone Array
★ Probabilistic Neural Network

論文目次

第一章緒論 1
1.1 研究動機 1
1.2 文獻回顧 2
1.3 論文架構 5
第二章 MEMS麥克風陣列波束成形 6
2.1 MEMS 麥克風 6
2.1.1 MEMS 麥克風的原理 7
2.1.2 MEMS麥克風的種類 7
2.1.3 麥克風的指向性 8
2.2 麥克風陣列 10
2.2.1 線狀麥克風陣列 10
2.2.2 環形麥克風陣列 11
2.3 波束成形演算法 12
2.3.1 延遲求和波束成形(Delay and Sum Beamformer) 12
2.3.2 利用GCC-PHAT 估算TDOA（Time Difference of Arrival） 14
2.4 聲源方位估測演算法 15
2.4.1 到達時間差(TDOA)聲源方位估測法 15
2.5 特徵擷取 16
2.5.1 前處理 16
2.5.2 線性預測倒頻譜係數(LPCC) 19
2.6 機率神經網路(PNN)分類器 20
2.6.1 機率神經網路架構 20
第三章麥克風陣列語者辨識系統 22
3.1 系統架構 23
3.1.1 聲音訊號擷取 24
3.1.2 波束成形 25
3.1.3 語音特徵擷取(feature extraction) 26
3.1.4 語者辨識 27
3.2 散事件系統建模 28
3.2.1 麥克風陣列語者辨識系統建模 28
3.2.2 聲音訊號擷取建模 29
3.2.3 波束成形建模 30
3.2.4 語音特徵擷取建模 31
3.2.5 語者辨識建模 32
3.2.6 主要的狀態(state)與動作(action) 33
3.3 軟體合成 35
3.3.1麥克風陣列語者識系統模型軟體合成 36
3.3.2聲音訊號擷取模型軟體合成 37
3.3.3波束成形模型軟體合成 37
3.3.4語音特徵擷取模型軟體合成 38
3.3.5語者辨識模型軟體合成 39
3.3.6軟體的模擬 40
第四章系統整合實驗與驗證 45
4.1實驗環境 45
4.1.1 STM32F429 Discovery 開發板規格簡介 45
4.1.2 MEMS麥克風規格簡介 48
4.2實驗 48
4.2.1 受測人員資料採集 49
4.2.2 麥克風陣列語者辨識系統樣本與參數的訓練 51
4.3語者辨識性能評估 54
4.3.1 單一麥克風的語者辨識效能 55
4.3.2 使用麥克風陣列的語者辨識效能 55
4.4 實驗結果與討論 56
第五章結論 57
參考文獻 59

參考文獻

[1] “Speech Recognition”, [Online] Available: https://en.wikipedia.org/wiki/Speech_recognition
[2] Gongping Huang, Jacob Benesty and Jingdong Chen, “On the Design of Frequency-Invariant Beampatterns with Uniform Circular Microphone Arrays”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. PP, pp.1-1, March 2017.
[3] B. D. Van Veen and K. M. Buckley, “Beamforming: A Versatile Approach to Spatial Filtering,” IEEE ASSP Magazine, vol.5, no.2, pp.4 –24, April 1988.
[4] “語音識別”, [Online] Available: https://zh.wikipedia.org/wiki/%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB
[5] K. H. Davis, R. Biddulph and S. Balashek, “Automatic Recognition of Spoken Digit”, Journal of the Acoustical Society of America, vol.24 No 6, November 1952.
[6] N. Morgan and H. Franco, “Applications of neural networks to speech recognition”, IEEE Signal Processing Magazine, vol. 14, pp. 46-48, Nov.1997.
[7] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, vol.77, pp 257-286, Feb.1989.
[8] Warren McCulloch and Walter Pitts, "A Logical Calculus of Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics, vol.5, pp.115–133, in 1943.
[9] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, pp. 1533-1545, July 2014.
[10] D. F. Specht, “Probabilistic neural networks for classification, mapping, or associative memory”, IEEE International Conference on Neural Networks, vol.1, pp.525-532, July 1988.
[11] B. S. Atal, “Effectiveness of linear prediction characteristics of the speech
wave for automatic speaker identification and verification”, J. Acoust. Soc. Am., vol. 55, June 1974.
[12] R. Vergin, D. O′Shaughnessy and V. Gupta, “Compensated mel frequency cepstrum coefficients", IEEE ICASSP Processing Conference Proceedings, vol.1, pp.323-326, May 1996.
[13] V. M. Alvarado, H. F. Silverman, "Experimental Results Showing the Effects of Optimal Spacing Between Elements of a Linear Microphone Array", ICASSP-90, pp. 837-84, April 1990.
[14] S. Gholamrezaei, S. Alirezaee, A. Ahmadi, M. Ahmadi and S. Erfani, "Sound target localization in a 2-D microphone array", Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on, pp.1168 - 1171, 3-6 May 2015.
[15] Y. Tamai, S. Kagami, H. Mizoguchi, K. Sakaya, K. Nagashima and T. Takano, Circular microphone array for meeting system”, Sensors, 2003.Proceedings of IEEE, Vol.2, pp.1100 - 1105, Oct 2003.
[16] Y. Tamai, S. Kagami, Y. Amemiya, Y. Sasaki, H. Mizoguchi and T. Takano, "Circular microphone array for robot′s audition", Sensors, 2004. Proceedings of IEEE, vol.2, pp. 565 - 570, 24-27 Oct 2004.
[17] Y. Sasaki, M. Kabasawa, S. Thompson, S. Kagami, K. Oro, “Spherical Microphone Array for Spatial Sound Localization for a Mobile Robot”, IEEE/RSJ International Conference on Intelligent Robots and Systems, 7-12 Oct. 2012.
[18] P. R. Roth, “Effective measurements using digital signal analysis,” IEEE Spectrum, vol.8, pp.62-70, April 1971.
[19] G. C. Carter, A. H. Nuttall, and P. G. Cable, “The smoothed coherence transform”, Proceedings of the IEEE, vol. 61, pp. 1497-1498, Oct. 1973
[20] C. H. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay”, IEEE Trans. Acoustic speech and Signal Processing, vol.24, pp.320-327, Aug. 1976
[21] M. S. Brandstein, H. F. Silverman, “A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Room “, ICASSP-97, vol.1, pp.375-378, April 1997.
[22] R. O. Schmidt, “Multiple Emitter Location and Signal Parameter Estimation”, IEEE Transaction Antennas and Propagation, vol.34, pp.276-280, March 1986.
[23] K. Yao, R. E. Hudson, C. W. Reed, D. Chen, and F. Lorenzelli, “Blin21beamforming on a randomly distributed sensor array system”, IEEE Journal on Selected Areas in Communications, vol.16, pp.1555–1567, Oct. 1998.
[24] T. Yamada, S. Nakamura and K. Shikano, “Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array”, IEEE Transactions on Speech and Audio Processing, vol. 10, pp. 48-56, August 2002.
[25] Xianyu Zhao and Zhijian Ou, “Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp.1114-1122, February 2007.
[26] Jungpyo Hong, Seungho Han, Sangbae Jeong, and Minsoo Hahn, “Adaptive microphone array processing for high-performance speech recognition in car environment”, IEEE Transactions on Consumer Electronics, vol. 57, pp. 2, March 2011.
[27] Kenichi Kumatani, John McDonough and Bhiksha Raj, “Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors”, IEEE Signal Processing Magazine, vol. 29, pp.127-140, October 2012.
[28] Weifang Li, Longbiao Wang, Yicong Zhou, John Dines, Mathew Magimai. –Doss, Hervé Bourlard and Qingmin Liao, “Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array”, vol. 22, pp. 2244-2255, October 2014.
[29] Soudeh A. Khoubrouy and John H. L. Hansen, “Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition”, vol. 23, pp.1344-1348, July 2016.
[30] X. Anguera, C. Woofers, J. Hernando, "Speaker diarization for multi-party meetings using acoustic fusion", Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on, pp. 426 – 431, 27-27 Nov. 2005.
[31] Ching-Han Chen, Tun-Kai Yao, Jia-Hong Dai and Chen-Yuan Chen, “A pipelined multiprocessor SOC design methodology for streaming signal processing”, Journal of Vibration and Control, vol.20, pp.163-178, in 2014
[32] Ching-Han Chen, Chia-Ming Kuo, Chen-Yuan Chen and Jia-Hong Dai, “The design and synthesis using hierarchical robotic discrete-event modeling”, Journal of Vibration and Control, vol.19, pp.1603-1613, in 2013
[33] STMicroelectronics. (2016). ARM Cortex-M4 32b MCU+FPU, 225DMIPS, up to 2MB Flash/256+4KB RAM, USB OTG HS/FS, Ethernet, 17 TIMs, 3 ADCs ,20 comm. Interfaces, camera & LCD-TFT. STM32F429xx. Doc ID 024030 Rev 8.
[34] Akustica, Inc. DS32-1.04 AKU142 Data Sheet, Package type 4-pin LGA top port, Data sheet revision 1.04, Release date 19 June 2015

指導教授

陳慶瀚(Ching-Han Chen)

審核日期

2017-7-24

推文