應用於語音關鍵字辨識與語者辨識的嵌入式神經網路分類器

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：41

、訪客IP：3.139.237.121

姓名

廖冠富(Kuan-Fu Liao) 查詢紙本館藏

畢業系所

軟體工程研究所

論文名稱

應用於語音關鍵字辨識與語者辨識的嵌入式神經網路分類器
(An Embedded Neural Network Classifier for Keywords Spotting and Speaker Recognizing)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

關鍵字辨識(KWS)系統為語音助理之類的智慧型系統提供了啟動的便利性和耗能的平衡，但是裝置安全和使用者隱私依然難以獲得充分的保障。本研究提出一個應用於語音關鍵字辨識與語者辨識的嵌入式神經網路分類器。首先使用SOM神經網路對語音特徵進行非監督式的大分類，再使用多層前饋式神經網路分別進行KWS辨識和語者辨識。在Google Speech Commands資料集上，SOM-MFNN比起傳統的MFNN網路減少了82.12%的運算量和32.45%的記憶體使用量，取得了比MFNN高4%的辨識率提升，在自建的中文KWS資料集，我們的系統可以提升1%的辨識率，證明SOM-MFNN確實能提高語音指令辨識率的同時降低資源使用量。在語者辨識上，傳統MFNN已有98.43%的辨識率，足以保護裝置安全性。運算量與記憶體使用量小於傳統MFNN的SOM-MFNN，能夠提供須常駐運行的KWS系統一個優良的分類器模型，並且可以結合語者辨識的功能保護裝置安全性。

摘要(英)

Keyword spotting (KWS) systems facilitate achieving a balance between easy activation and low energy consumption in voice assistant systems. However, device security and user privacy cannot be fully guaranteed when using such systems. This study proposed an embedded neural network classifier applicable to voice KWS and speaker identification. First, a self-organizing map (SOM) neural network was adopted to roughly classify voice features by using unsupervised classification. Next, a multilayer feed-forward neural network (MFNN) was employed to perform KWS and speaker identification. The results revealed that when the Google Speech Commands Dataset was used, the SOM-MFNN used 82.12% less computation resources and 32.45% less memory compared with the conventional MFNN. The identification rate of the SOM-MFNN also exceeded that of the MFNN by 4%. When using the self-established Chinese KWS dataset, the proposed system improved the identification rate by 1%, verifying that the SOM-MFNN can improve the identification of voice commands while reducing resource consumption. Regarding speaker identification, the conventional MFNN exhibited an identification rate of 98.43%, demonstrating sufficient device security. In sum, the SOM-MFNN, which uses less computation resources and memory than does the conventional MFNN, can serve as an outstanding classifier for KWS systems that are constantly in operation. The SOM-MFNN can also be integrated with speaker identification function to ensure device security.

關鍵字(中)

★ 語音關鍵字辨識
★ 語者辨識
★ 自組織圖神經網路
★ 多重前饋式神經網路
★ 嵌入式神經網路
★ 分類器

關鍵字(英)

論文目次

摘要 I
Abstract II
謝誌 III
目錄 V
圖目錄 VII
表目錄 VIII
第一章、緒論 1
1.1 研究背景 1
1.2 研究目的 2
1.3 論文架構 3
第二章、相關技術回顧 4
2.1 語音訊號前處理 4
2.1.1 音框抽取 4
2.1.2 漢明窗 4
2.1.3 預強調 5
2.2 MFCC語音特徵 6
2.3 SOM自組織圖神經網路 8
2.3.1 SOM神經網路架構 8
2.3.2 SOM神經網路學習演算法 9
2.4 MFNN多層前饋式神經網路 11
2.4.1 MFNN神經網路架構 11
2.4.2 MFNN神經網路演算法 12
第三章、 SOM-MFNN語音指令辨識系統設計 15
3.1 MIAT設計方法論 15
3.1.1 IDEF0階層式模組化設計 16
3.1.2 Grafcet離散事件建模 18
3.2 SOM-MFNN語音指令與語者辨識系統設計 21
3.3 離散事件建模 25
3.3.1 語音特徵擷取 26
3.3.2 MFCC語音特徵擷取 26
第四章、實驗結果 28
4.1 實驗環境與說明 28
4.2 實驗資料集 28
4.2.1 Google Speech Commands 29
4.2.2 MIAT中文語音指令資料集 29
4.3 訊號資料前處理和特徵擷取 30
4.4 辨識性能評估指標 31
4.5 比較之神經網路設定與說明 32
4.6 英文語音指令辨識實驗 35
4.7 中文語音指令辨識實驗 39
4.8 中文語者辨識實驗 42
第五章、結論與未來展望 45
5.1 結論 45
5.2 未來展望 46
參考文獻 47

參考文獻

[1] Apple. Siri - Apple. Available: https://www.apple.com/siri/
[2] Google. Google Assistant. Available: https://assistant.google.com/
[3] K. Davis, R. Biddulph, and S. Balashek, "Automatic recognition of spoken digits," The Journal of the Acoustical Society of America, vol. 24, no. 6, pp. 637-642, 1952.
[4] L. E. Baum and T. Petrie, "Statistical inference for probabilistic functions of finite state Markov chains," The annals of mathematical statistics, vol. 37, no. 6, pp. 1554-1563, 1966.
[5] N. Morgan and H. Franco, "Applications of neural networks to speech recognition," IEEE Signal Processing Magazine, vol. 14, pp. 46-48, 1997.
[6] K.-F. Lee, H.-W. Hon, and R. Reddy, "An overview of the SPHINX speech recognition system," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 1, pp. 35-45, 1990.
[7] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, p. 436, 2015.
[8] SiriTeam, "Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant," Apple Machine Learning Journal, vol. 1, no. 6, October 2017.
[9] P. Warden, "Speech commands: A dataset for limited-vocabulary speech recognition," arXiv preprint arXiv:1804.03209, 2018.
[10] B. Logan, "Mel Frequency Cepstral Coefficients for Music Modeling," in ISMIR, 2000, vol. 270, pp. 1-11.
[11] T. Kohonen, "The self-organizing map," Proceedings of the IEEE, vol. 78, no. 9, pp. 1464-1480, 1990.
[12] D. F. Specht, "Probabilistic neural networks," Neural networks, vol. 3, no. 1, pp. 109-118, 1990.
[13] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
[14] F. Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological review, vol. 65, no. 6, p. 386, 1958.
[15] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Cognitive modeling, vol. 5, no. 3, p. 1, 1988.
[16] R. J. Mayer, "IDEF0 function modeling," A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report, AFWAL-TR-81-4023 (The IDEF0 Yellow Book), Knowledge-Based System Inc, College Station, TX, 1992.
[17] R. David, "Grafcet: A powerful tool for specification of logic controllers," IEEE Transactions on control systems technology, vol. 3, no. 3, pp. 253-268, 1995.
[18] (2018). Linguistic data consortium. Available: https://www.ldc.upenn.edu/
[19] (2018). Creative commons international attribution international 4.0 license. Available: https://creativecommons.org/licenses/by/4.0/

指導教授

陳慶瀚

審核日期

2019-7-18

推文