English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 38473869      線上人數 : 229
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/81938


    題名: 離線自定義語音語者喚醒詞系統與嵌入式開發實現;Self-defined Wake-Up-Word Recognition and its Embedded System Implementation
    作者: 郝平正;Hao, Ping-Cheng
    貢獻者: 電機工程學系
    關鍵詞: 自定義喚醒詞;梅爾倒譜係數;高斯混合模型;隱藏馬可夫模型;編輯距離;嵌入式系統;Customized Wake-Up-Word;Mel-Frequency Cepstral Coefficients;Gaussian Mixture Model;Hidden Markov Model;Levenshtein Distance;Embedded System
    日期: 2019-11-05
    上傳時間: 2020-01-07 14:39:41 (UTC+8)
    出版者: 國立中央大學
    摘要: 本論文提出離線自定義語音語者喚醒詞系統,是一種能夠讓使用者自行定義語音喚醒詞並以此用來喚醒設備。系統執行時分為兩個階段:訓練階段及測試比對階段。訓練階段為自行定義並錄製一段任何語言的喚醒詞,利用語音活動檢測裁切出語音片段,然後以梅爾倒頻譜算法做為語音前處理,抽取出聲音特徵以供後續使用,再利用高斯混合模型EM算法將語音特徵訓練成聲紋模型,同時利用高斯分布隱藏馬可夫模型的Baum-Welch算法訓練對應之語音序列,兩者合起來即是特定語者語音的資料模型。
    比對階段為輸入任意語音段,同樣使用梅爾倒頻譜算法抽取聲音特徵,將此特徵透過高斯混合模型的log probability從資料集中找出正確語者,而後利用隱藏馬可夫模型Viterbi算法計算出未知語音的序列,最後計算出高斯混合模型的相似程度以及編輯距離算法比對未知語音與資料語音的狀態序列匹配度,若通過門檻值即成功喚醒。
    此系統可以在少量訓練資料的情況下達到準確的結果,並且於比對階段時透過先搜索聲紋再比對語音的方法省去隱藏馬可夫模型算法對整個資料集採用窮舉法運算的時間,最後將此系統實現在嵌入式開發板中評估驗證效能,結果顯示本系統能在real time運作情況下達到高準確率與低誤喚醒率。
    ;We propose an Self-define Wake-Up-Word Recognition system and its embedded system Implementation. To execute whole system, It is divided into two phases: training phase and testing-comparison phase. In the training phase, a wake-up word of any language is recorded, and the voice segment is cut out by using the Voice Activity Detection, and then we use the Mel-Frequency Cepstral Coefficients as the pre-processing to extract the speech feature for follow-up use. The Expectation-Maximization Algorithm is used to train the Gaussian Mixture Model, and the Baum-Welch algorithm is used to train the Hidden Markov Model. These two models are combined to be a data model of a speaker′s speech dataset.
    In the testing-comparison phase, an unknown voice segment is inputted. The Voice Activity Detection and Mel-Frequency Cepstral Coefficients are still used for cutting and extracting. Next, this feature will be calculated through the log likelihood of the Gaussian Mixture Model to find the correspond speaker, and the Viterbi algorithm is used to calculate the state sequence of the unknown speech through Hidden Markov Model. Finally we calculate Gaussian Mixture Model similarity and use Levenshtein Distance to compare dataset state sequence with the unknown speech state sequence. If both of them pass the threshold, then it is a successful wake-up voice control, if not, it means waking up fails.
    This system can work well with a small amount of training data, and the system is implemented on the embedded board to test performance. The results show that the system can achieve high accuracy and low false alarm under real time operation.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML202檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明