基於x-vector端到端語者驗證之高性能神經網路系統晶片;A High-Performance Neural Network SoC for x-vector based End-to-End Speaker Verification

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/90876

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/90876

题名:	基於x-vector端到端語者驗證之高性能神經網路系統晶片;A High-Performance Neural Network SoC for x-vector based End-to-End Speaker Verification
作者:	江孟叡;Chiang, Meng-Jui
贡献者:	電機工程學系
关键词:	神經網路;系統晶片;語者驗證
日期:	2023-03-14
上传时间:	2023-05-09 18:13:48 (UTC+8)
出版者:	國立中央大學
摘要:	在過去的幾年裡，使用神經網路從說話人的聲音中識別出他們的身份逐漸普及。在這些方法中，x-vector神經網路表現出更強的抗噪能力，通常比以前的方法，如高斯混合模型（GMM）和支持向量機（SVM）具有更高的準確性。本文介紹了一個由RISC-V CPU和神經網路加速器模組組成的系統晶片（SoC），用於基於x-vector的語者驗證（SV）。由於模型中含有大量的參數，在本研究中，x-vector的處理分為三個步驟：縮小尺寸、剪枝和壓縮，以確保即時運算，並實現於運算資源有限的邊緣裝置。我們致力於優化具有稀疏性的資料流程，與傳統的稀疏矩陣壓縮方法Compressed Sparse Row（CSR）相比，我們提出了Binary pointer Compressed Sparse Row （BPCSR）方法，該方法大幅改善了運算延遲，並避免了稀疏性導致PE的負載平衡問題。在硬體實現的部份，我們進一步設計了神經網路加速器模組，儲存壓縮後的參數並計算x-vector神經網路，而RISC-V CPU處理其餘的計算，如特徵提取和分類器。本語者驗證系統在Voxceleb資料集上進行了測試，包含1251個不同的測試語者，並取得了超過95%的準確率。最後，我們使用台積電90奈米製程合成該系統晶片。它的面積為15.5 mm2，功率為97.88mW。此外，本晶片也透過晶心科技ADP-XC7K160 FPGA驗證其功能性，利用麥克風輸入音訊資料，配合General-purpose input/output (GPIO)和Universal Asynchronous Receiver/ Transmitter (UART)等外部IO與使用者互動，並將結果輸出於文字顯示器中，實現完整的端到端語者驗證。;The use of the neural network to recognize speakers′ identity from their speech sounds has become popular in the last few years. Among these methods, X-vector performs more noise immunity and usually has higher accuracy than the previous method, such as the Gaussian mixture model (GMM) and the support vector machine (SVM). This paper presents a system-on-chip (SoC) composed of a RISC-V CPU and a neural network accelerator module for x-vector-based speaker verification (SV). Due to a large number of parameters, in this work, x-vector is processed with three steps: reducing size, pruning, and compression to ensure real-time latency and possible to be implemented on edge devices. We are dedicated to optimizing the data flow with sparsity. Compared with the conventional sparse matrix compression method compressed sparse row (CSR), we propose the binary pointer compressed sparse row (BPCSR) method which significantly improves the latency and avoids the load balancing issue in each PEs. We further design the neural network accelerator module stores the compressed parameters and computes the x-vector while the RISC-V CPU processes the rest of the calculations such as feature extraction and the classifier. The system was tested on the Voxceleb dataset, containing 1251 test speakers, and achieved over 95% accuracy. Lastly, we synthesized the chip with TSMC 90 nm technology. It presents 15.5 mm2 in the area and 97.88 mW for real-time identification. In addition, the chip is also verified by Andes ADP-XC7K160 FPGA, which uses the microphone to input audio data and external IOs such as General-purpose input/output (GPIO) and Universal Asynchronous Receiver/ Transmitter (UART) to interact with users. The results output to a text display to achieve complete end-to-end speaker verification.
显示于类别:	[電機工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	74	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....