中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/74692
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 78852/78852 (100%)
造访人次 : 37841126      在线人数 : 502
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/74692


    题名: 基於語音增強技術之多模態訓練語料強健類神經網路聲學模型;Study of Robustness of DNN Acoustic Modeling Based on Multi-style Training with Speech Enhancement
    作者: 李安德;Zezario, Ryandhimas Edo
    贡献者: 資訊工程學系
    关键词: deep learning;deep neural networks;multi-style training;deep denoising autencoder;extreme learning;hierarchical extreme learning;spectral restoration;automatic speech recognition;deep learning;deep neural networks;multi-style training;deep denoising autencoder;extreme learning;hierarchical extreme learning;spectral restoration;automatic speech recognition
    日期: 2017-07-26
    上传时间: 2017-10-27 14:36:25 (UTC+8)
    出版者: 國立中央大學
    摘要: 本研究提出了一種用於聲學建模的語音增強(MTSE)的多風格訓練,以實現強健的自動語音識別。以前的研究已經證實通過使用來自不同聲學條件的訓練數據(可以通過在不同記錄條件下收集數據或通過將噪聲注入乾淨的話語來獲得),基於深神經網絡(DNN)的聲學模型可以被訓練為對不良聲學條件更加強健。在本研究中,MTSE方法採用相同的概念,包括機器學習和基於頻譜回復的語音增強,來產生恢復的語音數據,並用它來擴展原始訓練集。通過對原始訓練數據擴增語音增強恢復的數據,基於DNN的聲學模型可以捕獲輸入分佈中的而外結構,並在異質條件下決定更準確的決策邊界。 提出的MTSE方法在Aurora-4(具有模擬嘈雜語音的標準化英語ASR任務)和MATBN(具有現實世界記錄的噪聲的標準化ASR任務)數據集進行評估。 實驗結果顯示,與Aurora-4 tsk基線系統相比,提出的MTSE系統在字錯誤率(10.01%〜9.06%)中顯著降低9.49%,當與MATBN任務的基線系統相比時,減少了6.15%字符錯誤率(CER)(即12.84%至12.05%)。結果表明,提出的MTSE方法可以成為可行解決方案來處理真實噪聲強健ASR中的噪聲問題。;This study presents a multi-style training with speech enhancement (MTSE) for acoustic modeling to achieve robust automatic speech recognition. Previous studies have confirmed that by using training data from diverse acoustic conditions (which can be obtained either by collecting data under different recording conditions or by injecting noise into clean utterances), acoustic models based on deep neural network (DNN) can be trained more robust to adverse acoustic conditions. In this study, the MTSE approach adopts the same concept and includes machine learning and spectral restoration based speech enhancement to generate restored speech data and use it to expand the original training set. By augmenting the speech enhancement restored data with the original training data, the DNN-based acoustic models can capture additional structures in the input distribution and determine more accurate decision boundaries in heterogeneous conditions. The proposed MTSE approach was evaluated on the Aurora-4 (a standardized English ASR task with simulated noisy speech) and MATBN (a standardized Mandarin ASR task with real-world recorded noisy speech) datasets. Experimental results show that the proposed MTSE system can yield a notable reduction of 9.49% in the word error rate (from 10.01% to 9.06%) when compared to the baseline system on the Aurora-4 task and a reduction of 6.15 % in the Character error rate (CER) (i.e., from 12.84% to 12.05%) when compared to the baseline system on the MATBN task. The results suggest that the proposed MTSE approach can be a feasible solution to handle the noise issue in the real-world noise robust ASR.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML206检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明