中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/74692
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 37841503      Online Users : 754
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/74692


    Title: 基於語音增強技術之多模態訓練語料強健類神經網路聲學模型;Study of Robustness of DNN Acoustic Modeling Based on Multi-style Training with Speech Enhancement
    Authors: 李安德;Zezario, Ryandhimas Edo
    Contributors: 資訊工程學系
    Keywords: deep learning;deep neural networks;multi-style training;deep denoising autencoder;extreme learning;hierarchical extreme learning;spectral restoration;automatic speech recognition;deep learning;deep neural networks;multi-style training;deep denoising autencoder;extreme learning;hierarchical extreme learning;spectral restoration;automatic speech recognition
    Date: 2017-07-26
    Issue Date: 2017-10-27 14:36:25 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 本研究提出了一種用於聲學建模的語音增強(MTSE)的多風格訓練,以實現強健的自動語音識別。以前的研究已經證實通過使用來自不同聲學條件的訓練數據(可以通過在不同記錄條件下收集數據或通過將噪聲注入乾淨的話語來獲得),基於深神經網絡(DNN)的聲學模型可以被訓練為對不良聲學條件更加強健。在本研究中,MTSE方法採用相同的概念,包括機器學習和基於頻譜回復的語音增強,來產生恢復的語音數據,並用它來擴展原始訓練集。通過對原始訓練數據擴增語音增強恢復的數據,基於DNN的聲學模型可以捕獲輸入分佈中的而外結構,並在異質條件下決定更準確的決策邊界。 提出的MTSE方法在Aurora-4(具有模擬嘈雜語音的標準化英語ASR任務)和MATBN(具有現實世界記錄的噪聲的標準化ASR任務)數據集進行評估。 實驗結果顯示,與Aurora-4 tsk基線系統相比,提出的MTSE系統在字錯誤率(10.01%〜9.06%)中顯著降低9.49%,當與MATBN任務的基線系統相比時,減少了6.15%字符錯誤率(CER)(即12.84%至12.05%)。結果表明,提出的MTSE方法可以成為可行解決方案來處理真實噪聲強健ASR中的噪聲問題。;This study presents a multi-style training with speech enhancement (MTSE) for acoustic modeling to achieve robust automatic speech recognition. Previous studies have confirmed that by using training data from diverse acoustic conditions (which can be obtained either by collecting data under different recording conditions or by injecting noise into clean utterances), acoustic models based on deep neural network (DNN) can be trained more robust to adverse acoustic conditions. In this study, the MTSE approach adopts the same concept and includes machine learning and spectral restoration based speech enhancement to generate restored speech data and use it to expand the original training set. By augmenting the speech enhancement restored data with the original training data, the DNN-based acoustic models can capture additional structures in the input distribution and determine more accurate decision boundaries in heterogeneous conditions. The proposed MTSE approach was evaluated on the Aurora-4 (a standardized English ASR task with simulated noisy speech) and MATBN (a standardized Mandarin ASR task with real-world recorded noisy speech) datasets. Experimental results show that the proposed MTSE system can yield a notable reduction of 9.49% in the word error rate (from 10.01% to 9.06%) when compared to the baseline system on the Aurora-4 task and a reduction of 6.15 % in the Character error rate (CER) (i.e., from 12.84% to 12.05%) when compared to the baseline system on the MATBN task. The results suggest that the proposed MTSE approach can be a feasible solution to handle the noise issue in the real-world noise robust ASR.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML206View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明