可自定義之語者驗證系統與其特徵擷取模組之硬體實現

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：36

、訪客IP：3.135.204.13

姓名

王喬立(Chiao-Li Wang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

可自定義之語者驗證系統與其特徵擷取模組之硬體實現
(A Hardware Implementation of Feature Extraction for Self-Defined Speaker Verification System)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，在人機互動的社會中使用語音辨識來驅動設備或是控制設備的語音系統越來越普遍。其中，語者驗證已經被廣泛的探索並大幅提高了它的有效性，透過分析語者們的聲紋找出之間的特徵差異來進行驗證。然而，目前基於複雜且架構龐大的神經網路做法仍有許多缺點，像是只能在規格極高的邊緣裝置上執行，或是將語音片段擷取後上傳至雲端進行處理，進而衍伸出個人隱私問題。為了解決這些問題，可在終端運算之語者驗證系統是語音人機互動中重要的任務。
本論文提出可自定義之語者驗證系統與其特徵擷取模組之硬體實現。經過各個模組的耗時分析後，在Xilinx ZCU104開發板 Programmable Logic端上實現梅爾倒頻譜參數 (Mel-Frequency Cepstral Coefficients) 預處理模組，並經由AXI匯流排將擷取出的語音特徵傳回 Processing System端進行後處理。其中MFCC硬體架構在FPGA上的功耗為4.26W，在150MHz操作頻率下，一時長為2秒的語音可在53.6毫秒內處理完畢，且在後續的後處理中保有高準確率，滿足實時系統的標準。

摘要(英)

In recent years, the devices that use speaker recognition to drive or control in a human-computer interactive society have become increasingly common. Among these, speaker verification has been widely explored and its effectiveness has been significantly improved by analyzing the voiceprints of speakers to identify differences in features between them. However, the current approach based on complex and large neural networks still has many drawbacks, such as it can only be performed on highly-specified edge devices, or the voice clips are captured and uploaded to the cloud for processing, which can lead to personal privacy issues. To address these issues, local speaker verification systems are an important task in speech human-computer interaction.
This paper proposed a self-defined speaker verification system and its hardware implementation of feature extraction module. After time-consuming analysis of each module, the Mel-Frequency Cepstral Coefficients pre-processing module is implemented on the programmable logic side of the Xilinx ZCU104 development board and the extracted features data are sent back to the processing system side for post-processing. The MFCC hardware architecture consumes 4.26W on the FPGA, and a 2-second speech can be processed in 53.6ms at 150MHz operating frequency. The overall system can meet the real-time standards with high accuracy in the post-processing.

關鍵字(中)

★ 語音辨識
★ 語者驗證
★ 語音特徵擷取

關鍵字(英)

★ FPGA
★ SoC

論文目次

中文摘要 ……………………………………………………………… I
Abstract ……………………………………………………………… II
目錄 ……………………………………………………………… III
表目錄 ……………………………………………………………… IV
圖目錄 ……………………………………………………………… V
一、緒論………………………………………………………… 1
1-1 研究背景…………………………………………………… 1
1-2
1-3 研究動機……………………………………………………
論文架構…………………………………………………… 2
4
二、文獻探討…………………………………………………… 5
2-1 文本獨立語者辨識………………………………………… 5
2-2 文本依賴語者辨識………………………………………… 7
2-3 語者識別與語者驗證……………………………………… 8
2-4 語音處理相關之數學模型………………………………… 10
三、系統架構及硬體實現……………………………………… 21
3-1 系統操作流程……………………………………………… 22
3-2 參數設定與模型訓練……………………………………… 23
3-3 特徵模組硬體化…………………………………………… 27
四、結果與討論………………………………………………… 34
4-1 實驗設置…………………………………………………… 34
4-2 實驗成果…………………………………………………… 36
五、結論………………………………………………………… 39
參考文獻 ……………………………………………………………… 40

參考文獻

[1] DA Reynolds and RC Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models.” IEEE Trans. Speech Audio Process. 3, pp. 72–83, 1995.
[2] CRV "2010 A short tutorial on Gaussian Mixture Models". by: Mohand Saïd Allili Université du Québec en Outaouais.
[3] Muhammet Mesut Toruk and Ramazan Gokay, “Short Utterance Speaker Recognition Using Time-Delay Neural Network,” 16th International Multi-Conference on Systems, Signals & Devices (SSD), 2019.
[4] Sadaoki Furui, “Recent advances in speaker recognition.” Pattern Recognition Letters 18, pp.859-872, 1997.
[5] D. D. T. Thu, L. T. Van, Q. N. Hong and H. P. Ngoc, "Text-dependent speaker recognition for Vietnamese," 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), Hanoi, 2013, pp. 196-200.
[6] Sayana P Babu, Jayadas C K, 2015, GMM Based Speaker Verification System, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 04, Issue 04, April 2015.
[7] Mahboob, Tahira & Khanam, Memoona & Khiyal, Malik & Bibi, Ruqia. (2015). Speaker Identification Using GMM with MFCC. International Journal of Computer Science Issues. 12. 126-135.
[8] Jongseo Sohn, Nam Soo Kim and Wonyong Sung, "A statistical model-based voice activity detection," in IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1-3, Jan. 1999.
[9] Steven Bo Davis and Paul Mermelstein, "COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES", IEEE Transactions on Acoustics, Speech and Signal Processing, Status Report on Speech Research SR-61 (1980).
[10] Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models.” Digital Signal Processing 10, pp. 19–41, 2000.
[11] Christopher D. Manning, Hinrich Schűtze, Foundations of Statistical Natural Language Processing, Fourth printing, The MIT Press, 2001.
[12] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proc. of IEEE, 77(2), pp. 257-285, February 1989.
[13] Mark Gales and Steve Young, “The Application of Hidden Markov Models in Speech Recognition.” in Signal Processing, vol.1, pp. 195-304, 2007.
[14] Shuiyang Mao, Dehua Tao, Guangyan Zhang, P. C. Ching and Tan Lee, “Revisiting Hidden Markov Models for Speech Emotion Recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6715-6719, April 2019.
[15] A. Weigel and F. Fein, "Normalizing the weighted edit distance," Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5), Jerusalem, Israel, 1994, pp. 399-402 vol.2.
[16] Alban Rashiti and Arben Damoni, “Adaption of Levenshtein Algorithm for Albanian Language,” International Conference on Computational Science and Computational Intelligence (CSCI), 2017.
[17] S. Konstantinidis, “Computing the Levenshtein distance of a regular language,” IEEE Information Theory Workshop, 2005.
[18] Jihyuck Jo; Hoyoung Yoo; In-Cheol Park, “Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 754-758, Feb. 2016.
[19] N.-V. Vu, J. Whittington, H. Ye, and J. Devlin, “Implementation of the MFCC front-end for low-cost speech recognition systems,” in Proc. ISCAS, May/Jun. 2010, pp. 2334–2337.
[20] P. EhKan, T. Allen, and S. F. Quigley, “FPGA implementation for GMM-based speaker identification,” Int. J. Reconfig. Comput., vol. 2011, no. 3, pp. 1–8, Jan. 2011, Art. ID 420369.
[21] R. Ramos-Lara, M. López-García, E. Cantó-Navarro, and L. Puente-Rodriguez, “Real-time speaker verification system implemented on reconfigurable hardware,” J. Signal Process. Syst., vol. 71, no. 2, pp. 89–103, May 2013.

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2022-4-15

推文