應用於語音關鍵字辨識與語者辨識的嵌入式神經網路分類器;An Embedded Neural Network Classifier for Keywords Spotting and Speaker Recognizing

NCUIR > College of Electrical Engineering & Computer Science > Software Engineer > Electronic Thesis & Dissertation > Item 987654321/80950

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/80950

Title:	應用於語音關鍵字辨識與語者辨識的嵌入式神經網路分類器;An Embedded Neural Network Classifier for Keywords Spotting and Speaker Recognizing
Authors:	廖冠富;Liao, Kuan-Fu
Contributors:	軟體工程研究所
Keywords:	語音關鍵字辨識;語者辨識;自組織圖神經網路;多重前饋式神經網路;嵌入式神經網路;分類器
Date:	2019-07-18
Issue Date:	2019-09-03 15:20:58 (UTC+8)
Publisher:	國立中央大學
Abstract:	關鍵字辨識(KWS)系統為語音助理之類的智慧型系統提供了啟動的便利性和耗能的平衡，但是裝置安全和使用者隱私依然難以獲得充分的保障。本研究提出一個應用於語音關鍵字辨識與語者辨識的嵌入式神經網路分類器。首先使用SOM神經網路對語音特徵進行非監督式的大分類，再使用多層前饋式神經網路分別進行KWS辨識和語者辨識。在Google Speech Commands資料集上，SOM-MFNN比起傳統的MFNN網路減少了82.12%的運算量和32.45%的記憶體使用量，取得了比MFNN高4%的辨識率提升，在自建的中文KWS資料集，我們的系統可以提升1%的辨識率，證明SOM-MFNN確實能提高語音指令辨識率的同時降低資源使用量。在語者辨識上，傳統MFNN已有98.43%的辨識率，足以保護裝置安全性。運算量與記憶體使用量小於傳統MFNN的SOM-MFNN，能夠提供須常駐運行的KWS系統一個優良的分類器模型，並且可以結合語者辨識的功能保護裝置安全性。;Keyword spotting (KWS) systems facilitate achieving a balance between easy activation and low energy consumption in voice assistant systems. However, device security and user privacy cannot be fully guaranteed when using such systems. This study proposed an embedded neural network classifier applicable to voice KWS and speaker identification. First, a self-organizing map (SOM) neural network was adopted to roughly classify voice features by using unsupervised classification. Next, a multilayer feed-forward neural network (MFNN) was employed to perform KWS and speaker identification. The results revealed that when the Google Speech Commands Dataset was used, the SOM-MFNN used 82.12% less computation resources and 32.45% less memory compared with the conventional MFNN. The identification rate of the SOM-MFNN also exceeded that of the MFNN by 4%. When using the self-established Chinese KWS dataset, the proposed system improved the identification rate by 1%, verifying that the SOM-MFNN can improve the identification of voice commands while reducing resource consumption. Regarding speaker identification, the conventional MFNN exhibited an identification rate of 98.43%, demonstrating sufficient device security. In sum, the SOM-MFNN, which uses less computation resources and memory than does the conventional MFNN, can serve as an outstanding classifier for KWS systems that are constantly in operation. The SOM-MFNN can also be integrated with speaker identification function to ensure device security.
Appears in Collections:	[Software Engineer] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	297	View/Open

社群 sharing

Loading...