可自定義之語者驗證系統與其特徵擷取模組之硬體實現;A Hardware Implementation of Feature Extraction for Self-Defined Speaker Verification System

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/88368

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/88368

題名:	可自定義之語者驗證系統與其特徵擷取模組之硬體實現;A Hardware Implementation of Feature Extraction for Self-Defined Speaker Verification System
作者:	王喬立;Wang, Chiao-Li
貢獻者:	電機工程學系
關鍵詞:	語音辨識;語者驗證;語音特徵擷取;FPGA;SoC
日期:	2022-04-15
上傳時間:	2022-07-14 00:36:31 (UTC+8)
出版者:	國立中央大學
摘要:	近年來，在人機互動的社會中使用語音辨識來驅動設備或是控制設備的語音系統越來越普遍。其中，語者驗證已經被廣泛的探索並大幅提高了它的有效性，透過分析語者們的聲紋找出之間的特徵差異來進行驗證。然而，目前基於複雜且架構龐大的神經網路做法仍有許多缺點，像是只能在規格極高的邊緣裝置上執行，或是將語音片段擷取後上傳至雲端進行處理，進而衍伸出個人隱私問題。為了解決這些問題，可在終端運算之語者驗證系統是語音人機互動中重要的任務。本論文提出可自定義之語者驗證系統與其特徵擷取模組之硬體實現。經過各個模組的耗時分析後，在Xilinx ZCU104開發板 Programmable Logic端上實現梅爾倒頻譜參數 (Mel-Frequency Cepstral Coefficients) 預處理模組，並經由AXI匯流排將擷取出的語音特徵傳回 Processing System端進行後處理。其中MFCC硬體架構在FPGA上的功耗為4.26W，在150MHz操作頻率下，一時長為2秒的語音可在53.6毫秒內處理完畢，且在後續的後處理中保有高準確率，滿足實時系統的標準。 ;In recent years, the devices that use speaker recognition to drive or control in a human-computer interactive society have become increasingly common. Among these, speaker verification has been widely explored and its effectiveness has been significantly improved by analyzing the voiceprints of speakers to identify differences in features between them. However, the current approach based on complex and large neural networks still has many drawbacks, such as it can only be performed on highly-specified edge devices, or the voice clips are captured and uploaded to the cloud for processing, which can lead to personal privacy issues. To address these issues, local speaker verification systems are an important task in speech human-computer interaction. This paper proposed a self-defined speaker verification system and its hardware implementation of feature extraction module. After time-consuming analysis of each module, the Mel-Frequency Cepstral Coefficients pre-processing module is implemented on the programmable logic side of the Xilinx ZCU104 development board and the extracted features data are sent back to the processing system side for post-processing. The MFCC hardware architecture consumes 4.26W on the FPGA, and a 2-second speech can be processed in 53.6ms at 150MHz operating frequency. The overall system can meet the real-time standards with high accuracy in the post-processing.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	48	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....