English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41358260      線上人數 : 2244
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95716


    題名: 深度學習應用於麥克風陣列單聲源追蹤系統;The application of deep learning in microphone array single-source tracking systems
    作者: 彭冠銘;PENG, Kuan-Ming
    貢獻者: 電機工程學系
    關鍵詞: 麥克風陣列;聲援追蹤
    日期: 2024-07-25
    上傳時間: 2024-10-09 17:11:35 (UTC+8)
    出版者: 國立中央大學
    摘要: 自從新冠病毒的疫情之後,遠端視訊會議之需求急劇上升,而各 式遠端視訊之產品的需求也不斷上升,而隨著科技的發展各式輔助 會議進行的產品也不斷推陳出新,使遠端會議更有效率。而在會議進 行不論是要進行錄影來記錄會議內容或需要進行遠端會議,往往需要 確認發言者是否在鏡頭拍攝範圍之內,會使會議效率降低。而若可將 聲源追蹤系統應用於現代會議場景,將可提升會議的品質與效率。故本篇研究運用麥克風陣列搭配攝影機建構適合運用於會議場景之聲源追蹤裝置,使用 Python 搭配真實錄製的 LOCATA 資料集針對定位準確度、所需計算時間與體積小巧等條件進行各種麥克風陣列幾何之 分析,最終選擇正八面排列之體麥克風陣列。結合最小能量無失真響應 (Minimum Power Distortionless Response, MPDR)、轉向功率相位轉換 (Steered Response Power Phase Transform, SRP-PHAT)、多重訊號分類法 (Mulitiple Signal Classification, MUSIC) 三種過去常見的聲源定位演算法,以及使用深度學習強化在具有回響與高雜訊場景下仍保有不錯性能的 Cross3D、IcoDOA 與 Neural-SRP 三種聲源定位演算法。研究中針對室內回響與噪音兩種不利於聲源定位之條件進行模擬分析,以及實時聲源追蹤需要演算法計算要夠快在追蹤任務中才不會造成延遲。而 IcoDOA 演 算法與 Neural-SRP 演算法在訊噪比 SNR = 5dB ~ 30dB 的環境下定位誤差 均在 10 度之內,兩種演算法在回響 RT60 = 0.2s ~ 1s 的環境下定位誤差 也都在 10 度之內,但每幀計算時間就以 IcoDOA 演算法最好,平均計算一幀只需 2.067 毫秒。因此最終使用正八面體之麥克風陣列搭配 IcoDOA 演算法,在模擬實際會議狀況的情境中使用單一聲源並播放語音訊號之場景下,可使得聲源有 91.11 % 的時間落在鏡頭內。而若是在模擬實際會議狀況的情境中播放音樂聲源,可使得聲源有 87.77 % 的時間落在鏡頭內。;Since the outbreak of the COVID-19 pandemic, the demand for remote video conferencing has surged, driving up the need for various remote video products. With technological advancements, numerous auxiliary products have been con- tinuously introduced to enhance the efficiency of remote meetings. One com- mon issue during meetings is ensuring that the speaker is within the camera’s frame, which can lower meeting efficiency when recording the meeting content or conducting remote conferences. Applying a sound source tracking system to modern meeting scenarios can improve the quality and efficiency of meetings.
    This study utilizes a microphone array paired with a camera to construct a sound source tracking device suitable for meeting scenarios. By using Python and the LOCATA dataset, recorded in real-life conditions, various microphone array geometries were analyzed based on criteria such as localization accu- racy, computational time, and compactness. The final choice was an octahe- dral microphone array. This array combines three commonly used sound source localization algorithms—Minimum Power Distortionless Response (MPDR), Steered Response Power Phase Transform (SRP-PHAT), and Multiple Signal Classification (MUSIC)—with three deep learning-enhanced localization algo- rithms that maintain good performance in echoic and noisy environments: Cross3D, IcoDOA, and Neural-SRP.
    The study simulates and analyzes the conditions of indoor reverberation and noise, which are unfavorable for sound source localization. It also considers the need for fast algorithmic computations to prevent delays in real-time sound source tracking. Both the IcoDOA and Neural-SRP algorithms demonstrated localization errors within 10 degrees in environments with signal-to-noise ratios (SNR) ranging from 5dB to 30dB and reverberation times (RT60) from 0.2s to 1s. However, IcoDOA showed the best performance in terms of computation time per frame, averaging only 2.067 milliseconds per frame.
    Therefore, by ultimately using an octahedral microphone array with the Ico- DOA algorithm, the sound source can be kept within the camera’s field of view 91.11 % of the time in a simulated real meeting scenario with a single sound source playing a speech signal. In a simulated real meeting scenario playing a music source, the sound source can be kept within the camera’s field of view 87.77 % of the time.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML29檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明