中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/86328
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41660031      線上人數 : 1884
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86328


    題名: 基於時頻分離卷積壓縮網路之聲音事件定位與偵測;Sound Event Localization and Detection Based on Time-Frequency Separable Convolutional Compression Network
    作者: 楊世宗;Yang, Shih-Tsung
    貢獻者: 通訊工程學系
    關鍵詞: 聲音事件定位與偵測;時頻分離卷積壓縮網路;多頭自注意力機制;雙分支追蹤;sound event localization and detection;time-frequency separable convolutional compression network;multi-head self-attention;dual-branch tracking
    日期: 2021-07-16
    上傳時間: 2021-12-07 12:33:21 (UTC+8)
    出版者: 國立中央大學
    摘要: 隨著人工智慧的技術發展日益盛行,更多的領域紛紛朝以機器取代或輔助人力的方向進行研究。在音訊領域中,聲音事件定位與偵測便是其中之一。近期的研究主要透過深度學習的方法,使機器具有人耳聽覺的能力,以辨別環境中各種突發之聲音事件及其所處之位置與移動軌跡。
    本論文提出時頻分離卷積壓縮網路(Time-Frequency Separable Convolutional Compression Network, TFSCCN)作為聲音事件定位與偵測的系統架構,透過不同維度大小的1-D卷積核,分別對時間與頻率成分進行特徵提取,用於捕捉同一時間下不同聲音事件的頻率分布,或者在連續時間中,聲音事件的持續時間以及相位或延遲的變化。同時,透過控制通道數降維與升維的時機點,大幅降低模型的參數量。另外,模型結合多頭自注意力機制(Multi-head self-attention)來獲取時間序列特徵中的全局與局部資訊,以及透過雙分支追蹤技術來對相同或相異的重疊聲音事件進行有效的定位與偵測。實驗結果表明,在DCASE 2020 Task 3的評估機制中與Baseline相比,偵測的錯誤率下降了37%,而角度定位誤差則降低了14°。另外,與其他以降低參數量方法為目的所建構的網路模型相比,TFSCCN不僅具有最少的參數量,同時也具有最佳的聲音事件定位與偵測的表現。
    ;With the increasing prevalence of artificial intelligence technology, in the audio field, sound event localization and detection is one of the fast growing research topics. By simulating the hearing ability of human ears, it can distinguish various sound events in the environment and locate their spatial locations and movement trajectories.
    In this work, we propose a Time-Frequency Separable Convolutional Compression Network (TFSCCN) as a system architecture for sound event localization and detection, which uses 1-D convolution kernels of different dimensions to extract features of time and frequency components separately. It can distinguish each sound event class according to the different characteristics of the frequency distribution of different sound events. Meanwhile, it can also track the spatial location and movement trajectory. In addition, we greatly reduce the number of model parameters by controlling the timing of the increase and decrease of the number of channels. In the overall system, we also use multi-head self-attention to obtain global and local information in time series features, and use dual-branch tracking technology to effectively locate and detect the same or different overlapping sound events.
    Experimental results show that compared with baseline in the evaluation metrics of DCASE 2020 Task 3, the detection error rate is reduced by 37%, and the localization error is reduced by 14°. In addition, compared with other lightweight models, TFSCCN not only has the fewest number of parameters, but also has the best sound event localization and detection performance.
    顯示於類別:[通訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML65檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明