喚醒詞辨識之微處理器實作

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：29

、訪客IP：13.58.149.34

姓名

張桐(Chang Tung) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

喚醒詞辨識之微處理器實作
(Microcontroller Unit Implementation of Wake-up Word Detection)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來，隨著物聯網與深度學習的發展，人工智慧的應用更加廣泛。智慧音箱的出現改變消費者的使用習慣，能使消費者直接用口頭下達指令。這種趨勢也說明未來的家電會偏向用語音輸入指令，但多數家電的運作不像個人電腦有作業系統分配運算資源，是由多個微控制器組織重覆執行功能。要用語音指令控制微控制器，勢必要在微控制器上運行喚醒詞辨識系統。
本論文採用Depth-wise Separable Convolution來實作喚醒詞辨識模型，使用Depth-wise Separable Convolution能大幅減少參數，對於在記憶體和運算限制的微控制器有很大的幫助。此系統會先經由梅爾倒頻譜系數(MFCC)將語音資料轉成特徵，再利用類神經網路訓練，學習喚醒詞的類別，辨識特徵是否有包含喚醒詞。

摘要(英)

In recent years, with the development of the IoT(Internet of Things) and deep learning, artificial intelligence has been applied in more places. The appearance of smart speakers has changed consumers’ habits and enabled them to directly give verbal instructions. This trend also shows that the future of home appliances will tend to use voice input commands, but most home appliances do not operate like the personal computer has an operating system to allocate computing resources, is organized by multiple micro- controllers to repeatedly perform functions. To control the microcontroller with voice commands, it is necessary to run a wake-up word recognition system on the micro- controller.
In this thesis, we uses Depth-wise Separable Convolution to implement the wake word recognition model. Using Depth-wise Separable Convolution can greatly reduce the parameters, which is very helpful for microcontrollers with limited memory and computing. This system will first convert the voice data into features through MFCC, and then use neural network training to learn the types of wake-up words and identify whether the features contain wake-up words.

關鍵字(中)

★ 喚醒詞辨識
★ 卷積神經網路
★ 微處理器
★ 深度可分離卷積

關鍵字(英)

★ Keyword spotting
★ Convolution Neural Network
★ Microcontroller Unit
★ Depthwise Separable Convolution

論文目次

中文摘要 I
Abstract II
圖目錄 III
表目錄 V
目錄 VI
第一章緒論 1
1.1 研究背景與目的 1
1.2 研究方法與章節概要 1
第二章文獻探討 3
2.1 Keyword Spotting(KWS) System 3
2.1.1 基於隱藏式馬可夫的喚醒詞辨識 4
2.1.2 基於深度學習網路之喚醒詞辨識 5
2.2 類神經網路 5
2.2.1 RNN 5
2.2.2 LSTM 6
2.2.3 GRU 8
2.2.4 CNN 9
2.3 Microcontroller Unit 11
2.3.1 處理器的種類 11
2.3.2 MPU 與 MCU的比較 12
第三章系統架構 15
3.1 系統架構設計 15
3.2 特徵擷取 16
3.3 Depth-wise Separable Convolutions 18
3.3.1 Depth-wise Convolution 19
3.3.2 Point-wise Convolution 20
3.3.3 Depth-wise Separable Convolution與Convolution計算量比較 20
3.4 Quantization 22
3.5 Loss Function 25
第四章實驗 30
4.1 資料集說明 30
4.2 硬體環境 31
4.2.1 訓練神經網路硬體設備 31
4.2.2 微型控制器硬體設備 32
4.3 實驗參數與網路設定 36
4.4 實驗結果 37
4.4.1 喚醒詞評估 37
4.4.2 實驗結果 38
第五章結論及未來研究方向 42
第六章參考文獻 43

參考文獻

[1] Y. Zhang, N. Suda, L. Lai及V. Chandra, 作者, 「Hello Edge: Keyword Spotting on Microcontrollers」, arXiv:1711.07128 [cs, eess], 2月 2018, 引見於: 6月 02, 2020. [線上]. 載於: http://arxiv.org/abs/1711.07128.
[2] 「tutorial on hmm and applications.pdf」. 引見於: 6月 02, 2020. [線上]. 載於: https://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf.
[3] LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
[4] Liu, Pengfei & Qiu, Xipeng & Huang, Xuanjing. (2016). Recurrent Neural Network for Text Classification with Multi-Task Learning.
[5] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber, "LSTM: A Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222-2232, Oct. 2017, doi: 10.1109/TNNLS.2016.2582924.
[6] J. Chung, C. Gulcehre, K. Cho, Y. Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv:1412.3555 [cs], 12月 2014, 引見於: 6月 02, 2020. [線上]. 載於: http://arxiv.org/abs/1412.3555.
[7] Sainath, Tara N. / Parada, Carolina (2015): "Convolutional neural networks for small-footprint keyword spotting", In INTERSPEECH-2015, 1478-1482.
[8] 「Microcontroller.pdf」. 引見於: 6月 02, 2020. [線上]. 載於: https://ti.tuwien.ac.at/ecs/teaching/courses/mclu/theory-material/Microcontroller.pdf.
[9] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 1800-1807, doi: 10.1109/CVPR.2017.195.
[10] Ioffe, Sergey & Szegedy, Christian. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
[11]P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition, arXiv:1804.03209 [cs], 4月 2018, 引見於: 6月 15, 2020. [線上]. 載於: http://arxiv.org/abs/1804.03209.
[12]D. P. Kingma, J. Ba. Adam: A Method for Stochastic Optimization, arXiv:1412.6980 [cs], 1月 2017, 引見於: 6月 02, 2020. [線上]. 載於: http://arxiv.org/abs/1412.6980.

指導教授

王家慶(Jia-Ching Weng)

審核日期

2020-7-30

推文