基於廣義交互相關函數之聲源方位偵測系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：48

、訪客IP：13.59.241.75

姓名

陳禹興(Yu-Shing Chen) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

基於廣義交互相關函數之聲源方位偵測系統
(A Sound Source Localization System Based on Generalized Cross Correlation)

相關論文

★ 影像處理運用於家庭防盜保全之研究	★ 適用區域範圍之指紋辨識系統設計與實現
★ 頭部姿勢辨識應用於游標與機器人之控制	★ 應用快速擴展隨機樹和人工魚群演算法及危險度於路徑規劃
★ 智慧型機器人定位與控制之研究	★ 基於人工蜂群演算法之物件追蹤研究
★ 即時人臉偵測、姿態辨識與追蹤系統實現於複雜環境	★ 基於環型對稱賈柏濾波器及SVM之人臉識別系統
★ 改良凝聚式階層演算法及改良色彩空間影像技術於無線監控自走車之路徑追蹤	★ 模糊類神經網路於六足機器人沿牆控制與步態動作及姿態平衡之應用
★ 四軸飛行器之偵測應用及其無線充電系統之探討	★ 結合白區塊視網膜皮層理論與改良暗通道先驗之單張影像除霧
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統
★ 模糊控制與灰色預測應用於隧道型機械手臂之分析	★ 模糊滑動模態控制器之設計及應用於非線性系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文以兩個麥克風組成麥克風對建構出一個聲源方位偵測系統，其運算的部分是由嵌入式系統(TMS320VC5509)來完成，運算的部分包括語音活動偵測(Voice Activity Detection, VAD)、時間延遲(Time Delay of Arrival, TDOA)估測與方位角度估測，其中語音活動偵測結合了對數能量分析(Log Energy)與頻譜亂度分析(Entropy)來增加判定的準確性與降低總計算量。時間延遲估測的部分使用廣義交互相關函數(Generalized Cross Correlation, GCC)，並以拋物線內插法來增加時間延遲估測的準確性。在方位角度估測是依據麥克風對與聲源所建構出的雙曲線來求得，在估測中以此雙曲線的漸進線來作為聲源的方位角度。
　　而本論文也在實驗的部分對於時間延遲估測的幾種方式像是平均振幅差函數(Average Magnitude Difference Function, AMDF)、最小平方法(Least Median of Square, LMS)、交互相關函數(Cross Correlation, CC)與廣義交互相關函數做出討論與比較。

摘要(英)

In this thesis, a sound source localization system is studied and implemented. For the hardware part, the microphone array composed of two microphones is used to input the voice signals. The operation of this system is used by an embedded system (TMS320VC5509A). For the software part, VAD (Voice Activity Detection) and TDOA (Time Delay of Arrival) estimation and direction the detection are executed in order. In the processing of VAD, we combine the log-energy and the spectral-entropy to distinguish the speech/non-speech frames. To estimate the TDOA, an approximation algorithm is used to compute a generalized cross correlation function. Then, use the parabolic interpolation based method to increase the accuracy of estimated TDOA values. The TDOA and speed of sound in air can be employed find the direction of a sound source.

關鍵字(中)

★ 時間延遲估測
★ 聲源方位偵測
★ 廣義交互相關函數

關鍵字(英)

★ Time Delay of Arrival
★ Sound Source Localization
★ Generalized Cross Correlation

論文目次

中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 ix
第一章緒論 1
1.1　前言 1
1.2　研究背景與文獻回顧 2
1.3　研究動機與方法 5
1.4　論文架構 7
第二章系統與軟、硬體架構 8
2.1　麥克風放大濾波電路 8
2.2　嵌入式系統 12
2.3　系統流程 13
第三章語音活動偵測 15
3.1　語音活動偵測介紹 15
3.2　對數能量分析 16
3.3　頻譜亂度分析 17
第四章時間延遲估測與方位偵測 22
4.1　時間延遲估測 22
4.1.1　平均振幅差函數 22
4.1.2　最小平方法 23
4.1.3　交互相關函數 24
4.1.4　廣義交互相關函數 26
4.2　方向估測 32
第五章模擬及實驗結果與討論 37
5.1　語音活動偵測實驗 37
5.2　時間延遲估測實驗 42
5.3　聲源方位估測實驗 51
第六章結論與建議 53
6.1　結論 53
6.2　建議 53
參考文獻 55
附錄 58

參考文獻

[1] D. Johnson and D. Dudgeon, Array Signal Processing : Concepts and Techniques, Prentice Hall, Englewood Cliff, New Jersey, 1993.
[2] J. L. Flanagan, L. Landgraf, D.J. McLean, “Matched-filter processing of hydrophone array”, J. Acoust. Soc. Am. Vol. 42, pp.1165-1165, November 1967.
[3] B. L. Sim, Y. C. Tong, J. S. Chang and C. T. Tan, “A parametric formulation of the generalized spectral subtraction method”, IEEE Trans. Speech and Audio Processing, Vol. 6, pp. 328-337, July 1998.
[4] Y. Ephraim and H. L. Van Trees, “A signal approach for speech enhancement”, IEEE Trans. Speech and Audio Processing, Vol. 3, pp. 251-266, July 1995.
[5] F. Asano, Y. Motomura, H. Asoh, T. Yoshimura, N. Ichimura, S. Nakamura, “Fusion of audio and video information for detecting speech events”, Proceedings of the Sixth International Conference of Information Fusion, pp. 386-393, 2003.
[6] F. Asano, H. Asoh, T. Matsi, “Sound source localization and signal separation for office robot“Jijo-2””, Proceedings. 1999 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 243-248, August 1999.
[7] K. Nakadai, K. Hidai, H. Mizoguchi, G. Hiroshi, H. Kitano, “Real-time Auditory and Visual Multiple-object Tracking for Humanoids”, International Joint Conferences on Artificial Intelligence, pp. 1425-1436, 2001.
[8] R. Schmidt, “Multiple emitter location and signal parameter estimation”, IEEE Transactions on Antennas Propagation, Vol. AP-34, No.3, pp. 276-280, March 1986.
[9] R. Roy, T. Kailath, “ESPRIT-Estimation of Signal Parameters via Rotational Invariance Techniques”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.37, No. 7, pp. 984-995, July 1989.
[10] C. H. Knapp, G. C. Carter, “The generalized correlation method for estimation of time delay”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.24, No. 4, pp. 320-327, August 1976.
[11] J-M. Valin, F. Michaud, J. Rouat, D. Letourmeau, “Robot sound source localization using a microphone array on a mobile robot”, IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 2, pp. 1228-1233, October 2003.
[12] Y. Sasaki, Y. Tamai, S. Kagami, H. Mizoguchi, “2D sound source localization on a mobile robot with a concentric microphone array”, IEEE International Conference on Systems, Man and Cybernetics, Vol. 4, pp. 3528-3533, October 2005.
[13] National Semiconductor, LM386 low voltage audio power amplifier
http://www.national.com/ds/LM/LM386.pdf
[14] National Semicondutor, LM124/LM224/LM324/LM2902 Low Power Quad Opweational Amplifiers
http://cache.national.com/ds/LM/LM124.pdf
[15] Texas Instruments, TMS320VC5509A Fixed-Point Digital Signal Processor.
http://www.ti.com/product/tms320vc5509a
[16] P. Renevey, A. Drygajlo, “Entropy Based Voice Activity Detection in Very Noisy Conditions”, European Conference on Speech Communication and Technology, Eurospeech’2001, pp. 1887-1890, 2001.
[17] C. E. Shannon, “A Mathematical Theory of Communication”, The Bell System Technical Journal, Vol. 27, pp. 379-423; 623-656, July, October, 1948.
[18] P. R. Roth, “Effective measurements using digital signal analysis”, IEEE Spectrum, Vol.8, pp. 62-70, April 1971.
[19] G. C. Carter, A. H. Nuttall, and P. G. Cable, “The smoothed coherence transform”, Proceedings of the IEEE, Vol. 61, pp. 1497-1498, October 1973.
[20] G. C. Carter, A. H. Nuttall, and P. G. Cable, “The smoothed coherence transform (SCOT)”, Naval Underwater Systems Center, New London Lab., New London, CT, Tech. Memo TC-159-72, August 1972.
[21] M. Omologo, P. Svaizer, “Use of the Crosspower-Spectrum Phase in Acoustic Event Location”, IEEE Transactions on Speech and Audio Processing, Vol. 5, No. 3, pp. 288-292, May 1997.
[22] K. C. Kwak, “Sound Source Localization with the Aid of Excitation Source Information in Home Robot Environments”, IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, pp. 852-856, May 2008.
[23] R. K. Swamy, K. S. R. Murty, and B. Yegnanarayana, “Determining number of speakers from multispeaker speech signals using excitation source information”, IEEE Signal Processing Letters, vol. 14, no. 7, pp. 481-484, 2007.
[24] B. C. Park, K. D. Ban, K. C. Kwak, and H. S. Yoon, “Sound source localization based on audio-visual information for intelligent service robots”, Int. Symposium on Advanced Intelligent Systems, pp.364-367, September 2007.

指導教授

鍾鴻源(Hung-Yuan Chung)

審核日期

2012-7-25

推文