離線自定義語音語者喚醒詞系統與嵌入式開發實現;Self-defined Wake-Up-Word Recognition and its Embedded System Implementation

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/81938

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/81938

題名:	離線自定義語音語者喚醒詞系統與嵌入式開發實現;Self-defined Wake-Up-Word Recognition and its Embedded System Implementation
作者:	郝平正;Hao, Ping-Cheng
貢獻者:	電機工程學系
關鍵詞:	自定義喚醒詞;梅爾倒譜係數;高斯混合模型;隱藏馬可夫模型;編輯距離;嵌入式系統;Customized Wake-Up-Word;Mel-Frequency Cepstral Coefficients;Gaussian Mixture Model;Hidden Markov Model;Levenshtein Distance;Embedded System
日期:	2019-11-05
上傳時間:	2020-01-07 14:39:41 (UTC+8)
出版者:	國立中央大學
摘要:	本論文提出離線自定義語音語者喚醒詞系統，是一種能夠讓使用者自行定義語音喚醒詞並以此用來喚醒設備。系統執行時分為兩個階段：訓練階段及測試比對階段。訓練階段為自行定義並錄製一段任何語言的喚醒詞，利用語音活動檢測裁切出語音片段，然後以梅爾倒頻譜算法做為語音前處理，抽取出聲音特徵以供後續使用，再利用高斯混合模型EM算法將語音特徵訓練成聲紋模型，同時利用高斯分布隱藏馬可夫模型的Baum-Welch算法訓練對應之語音序列，兩者合起來即是特定語者語音的資料模型。比對階段為輸入任意語音段，同樣使用梅爾倒頻譜算法抽取聲音特徵，將此特徵透過高斯混合模型的log probability從資料集中找出正確語者，而後利用隱藏馬可夫模型Viterbi算法計算出未知語音的序列，最後計算出高斯混合模型的相似程度以及編輯距離算法比對未知語音與資料語音的狀態序列匹配度，若通過門檻值即成功喚醒。此系統可以在少量訓練資料的情況下達到準確的結果，並且於比對階段時透過先搜索聲紋再比對語音的方法省去隱藏馬可夫模型算法對整個資料集採用窮舉法運算的時間，最後將此系統實現在嵌入式開發板中評估驗證效能，結果顯示本系統能在real time運作情況下達到高準確率與低誤喚醒率。 ;We propose an Self-define Wake-Up-Word Recognition system and its embedded system Implementation. To execute whole system, It is divided into two phases: training phase and testing-comparison phase. In the training phase, a wake-up word of any language is recorded, and the voice segment is cut out by using the Voice Activity Detection, and then we use the Mel-Frequency Cepstral Coefficients as the pre-processing to extract the speech feature for follow-up use. The Expectation-Maximization Algorithm is used to train the Gaussian Mixture Model, and the Baum-Welch algorithm is used to train the Hidden Markov Model. These two models are combined to be a data model of a speaker′s speech dataset. In the testing-comparison phase, an unknown voice segment is inputted. The Voice Activity Detection and Mel-Frequency Cepstral Coefficients are still used for cutting and extracting. Next, this feature will be calculated through the log likelihood of the Gaussian Mixture Model to find the correspond speaker, and the Viterbi algorithm is used to calculate the state sequence of the unknown speech through Hidden Markov Model. Finally we calculate Gaussian Mixture Model similarity and use Levenshtein Distance to compare dataset state sequence with the unknown speech state sequence. If both of them pass the threshold, then it is a successful wake-up voice control, if not, it means waking up fails. This system can work well with a small amount of training data, and the system is implemented on the embedded board to test performance. The results show that the system can achieve high accuracy and low false alarm under real time operation.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	202	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....