中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90826
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 38269605      線上人數 : 548
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/90826


    題名: 單 通 道 語 音 分 離 的 深 度 學 習;Deep Learning for Single-Channel Speech Separation
    作者: 何銘津;Tan, Ha Minh
    貢獻者: 資訊工程學系
    關鍵詞: 深度學習;單通道聲學分解;判別向量學習;時域音頻分離;輕量級網絡;Deep learning;single-channel acoustic decomposition;lightweight network;time domain audio separation;discrimination-vector learning
    日期: 2023-02-03
    上傳時間: 2023-05-09 18:07:17 (UTC+8)
    出版者: 國立中央大學
    摘要: 本論文利用深度神經網路 (DNN) 來解決單通道語音分離問題,我們採用了 三種不同的方法。首先,我們使用基於 frequency-to-time Domain 的單通道源分離。在這個領域中,基於嵌入向量的模型獲得突破性的成功,例如深度聚類。我們參考深度聚類的想法,提出了新的框架,即 Encoder Squash-norm Deep Clustering(ESDC)。相比於當前的方法,包括深度聚類、深度提取 網路(DENet)、深度吸引子網絡(DANet)和幾種更新版本的深度聚類,結果表明,我們提出的框架顯著降低了單通道聲學分解的性能消耗。其次,我們提出了一個基於雙路徑回歸神經網路(DPRNN)的 inter-segment 和 intra-segment 的時域單通道聲學分解。這個架構在模 擬超長序列的表現上具有頂尖的性能。而我們引入了一種新的選擇性相互學 習法(SML),在 SML 方法中,有兩個 DPRNN 互相交換知識並且互相學習,特別的是,剩餘的網路由高可信度預測引導的同時,忽略低可信度的預測。根據實驗結果,選擇性相互學習法(SML)大大優於其他類似的方法,如獨立訓練、知識蒸餾和使用相同模型設計的相互學習。最後,我們提出一個輕量 但高性能的語音分離網路: SeliNet。 SeliNet 是採用瓶頸模塊和空洞時間 金字塔池的一維卷積架構神經網路。實驗結果表明,SeliNet 在僅需少量浮 點運算量和較少模型參數的同時,獲得了最先進(SOTA)的性能。;This dissertation addresses the issues of single-channel speech separation by exploiting deep neural networks (DNNs). We approach three different directions. First, we approach single-channel source separation based on the frequency-to-time domain. In this domain, ground-breaking successful models based on the embedding vector which is presented such as deep clustering. We develop our framework inspired by deep clustering, namely node encoder Squash norm deep clustering (ESDC). The results have shown that our proposed framework significantly reduces the performance of single-channel acoustic decomposition in comparison to current training techniques including deep clustering, deep extractor network (DENet), deep attractor network (DANet), and several updated versions of deep clustering. Second, we proposed monaural acoustic decomposition based on the time domain. An impressive contribution of the inter-segment and the intra-segment architectures of the dual-path recurrent neural network (DPRNN), this architecture has cutting-edge performance and can simulate exceedingly long sequences. We introduce a new selective mutual learning. In the selective mutual learning (SML) approach, there are two DPRNNs. They exchange knowledge and learn from one another. In particular, the remaining network is guided by the high-confidence forecasts, meanwhile, the low-confidence predictions are disregarded. According to the experimental findings, selective mutual learning greatly outperforms other training methods such as independent training, knowledge distillation, and mutual learning using the same model design. Finally, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. The experimental results have shown that the suggested SeliNet obtains state-of-the-art (SOTA) performance while still maintaining the small number of floating-point operations (FLOPs) and model size.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML56檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明