離線自定義語音語者喚醒詞系統與嵌入式開發實現

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	郝平正	zh_TW
DC.creator	Ping-Cheng Hao	en_US
dc.date.accessioned	2019-11-5T07:39:07Z
dc.date.available	2019-11-5T07:39:07Z
dc.date.issued	2019
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=105521050
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	本論文提出離線自定義語音語者喚醒詞系統，是一種能夠讓使用者自行定義語音喚醒詞並以此用來喚醒設備。系統執行時分為兩個階段：訓練階段及測試比對階段。訓練階段為自行定義並錄製一段任何語言的喚醒詞，利用語音活動檢測裁切出語音片段，然後以梅爾倒頻譜算法做為語音前處理，抽取出聲音特徵以供後續使用，再利用高斯混合模型EM算法將語音特徵訓練成聲紋模型，同時利用高斯分布隱藏馬可夫模型的Baum-Welch算法訓練對應之語音序列，兩者合起來即是特定語者語音的資料模型。比對階段為輸入任意語音段，同樣使用梅爾倒頻譜算法抽取聲音特徵，將此特徵透過高斯混合模型的log probability從資料集中找出正確語者，而後利用隱藏馬可夫模型Viterbi算法計算出未知語音的序列，最後計算出高斯混合模型的相似程度以及編輯距離算法比對未知語音與資料語音的狀態序列匹配度，若通過門檻值即成功喚醒。此系統可以在少量訓練資料的情況下達到準確的結果，並且於比對階段時透過先搜索聲紋再比對語音的方法省去隱藏馬可夫模型算法對整個資料集採用窮舉法運算的時間，最後將此系統實現在嵌入式開發板中評估驗證效能，結果顯示本系統能在real time運作情況下達到高準確率與低誤喚醒率。	zh_TW
dc.description.abstract	We propose an Self-define Wake-Up-Word Recognition system and its embedded system Implementation. To execute whole system, It is divided into two phases: training phase and testing-comparison phase. In the training phase, a wake-up word of any language is recorded, and the voice segment is cut out by using the Voice Activity Detection, and then we use the Mel-Frequency Cepstral Coefficients as the pre-processing to extract the speech feature for follow-up use. The Expectation-Maximization Algorithm is used to train the Gaussian Mixture Model, and the Baum-Welch algorithm is used to train the Hidden Markov Model. These two models are combined to be a data model of a speaker′s speech dataset. In the testing-comparison phase, an unknown voice segment is inputted. The Voice Activity Detection and Mel-Frequency Cepstral Coefficients are still used for cutting and extracting. Next, this feature will be calculated through the log likelihood of the Gaussian Mixture Model to find the correspond speaker, and the Viterbi algorithm is used to calculate the state sequence of the unknown speech through Hidden Markov Model. Finally we calculate Gaussian Mixture Model similarity and use Levenshtein Distance to compare dataset state sequence with the unknown speech state sequence. If both of them pass the threshold, then it is a successful wake-up voice control, if not, it means waking up fails. This system can work well with a small amount of training data, and the system is implemented on the embedded board to test performance. The results show that the system can achieve high accuracy and low false alarm under real time operation.	en_US
DC.subject	自定義喚醒詞	zh_TW
DC.subject	梅爾倒譜係數	zh_TW
DC.subject	高斯混合模型	zh_TW
DC.subject	隱藏馬可夫模型	zh_TW
DC.subject	編輯距離	zh_TW
DC.subject	嵌入式系統	zh_TW
DC.subject	Customized Wake-Up-Word	en_US
DC.subject	Mel-Frequency Cepstral Coefficients	en_US
DC.subject	Gaussian Mixture Model	en_US
DC.subject	Hidden Markov Model	en_US
DC.subject	Levenshtein Distance	en_US
DC.subject	Embedded System	en_US
DC.title	離線自定義語音語者喚醒詞系統與嵌入式開發實現	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Self-defined Wake-Up-Word Recognition and its Embedded System Implementation	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 105521050 完整後設資料紀錄