實作於微控制器的深度神經網路聲音事件辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：31

、訪客IP：18.221.235.209

姓名

劉振宏(Chen-Hung Liu) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

實作於微控制器的深度神經網路聲音事件辨識
(A Deep Neural Network for Sound Event Recognition Implemented in Microcontroller)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

典型的深度神經網路需要使用大量記憶體和高速浮點數計算性能，難以應用在硬體資源極少的微控制器嵌入式平台。深度神經網路可以成功的應用在聲音事件辨識，但為了能夠在微控制器平台實作深度聲音事件辨識應用，本研究提出一個量化策略，用以壓縮深度神經網路模型，以便在辨識性能和硬體資源需求之間進行最佳化。本研究採用了DS-CNN的架構去建構聲音事件辨識神經網路模型，擷取聲音的MFCC作為特徵來訓練辨識模型，透過我們的量化程序，將量化過後的權重參數置入ARM Cortex-M7微控制器進行驗證。在PC平台訓練完成的神經網路模型可以達到82%的辨識率，經過量化和移植到MCU平台後，在維持相同的0.2秒的辨識速度條件下，辨識率降低至60%。證實此方法的確可將PC上訓練後的深度神經網路模型移植到MCU平台運行，且仍然維持可接受的辨識性能和辨識率。本研究成果可將深度學習AI技術推廣至眾多低硬體資源需求的應用。

摘要(英)

Typical deep neural networks require the use of considerable memories and high-speed floating-point arithmetic; hence, it is difficult to apply it to microcontroller-embedded platforms with limited hardware resources. Deep neural networks can be successfully applied in recognizing sound events. To facilitate the implementation of microcontroller platforms in deep sound event recognition, this study proposed a quantization strategy to compress deep neural networks and optimize the recognition performance and hardware resource needs. This study adopted the depthwise separable convolutional neural network (DS-CNN) structure to establish the neural network model for sound event recognition. Mel-frequency cepstral coefficients (MFCC) that extract sound were used as the features to train recognition models. Through the quantization process, the quantized weight parameters were input into an ARM Cortex-M7 microcontroller to facilitate verification. The neural network model that completed training on a personal computer platform reached a recognition rate of 82%. After the model was quantized and transferred to a microcontroller unit, the recognition rate dropped to 60% with the recognition speed remaining at 0.2 second. The result verified that the proposed method can enable the deep neural network model training on a personal computer to be transferred to microcontroller units while maintaining acceptable recognition performance and recognition rates. The results can extend the deep learning artificial intelligence technologies to numerous applications with low requirement of hardware resources.

關鍵字(中)

★ 深度神經網路
★ 聲音事件辨識
★ 微控制器
★ 量化
★ 深度學習
★ DS-CNN

關鍵字(英)

★ DS-CNN
★ MCU
★ quantization

論文目次

摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1.1研究動機 1
1.2研究目標 2
第二章技術回顧 3
2.1 從類神經網路到深度學習 3
2.2卷積神經網路 5
2.3 Depthwise Separable Convolution 6
2.4 神經網路的量化 9
第三章 CNN神經網路量化和裁減 11
3.1 模型架構設計和訓練 11
3.2 模型量化的概念 13
3.3 權重參數的量化 14
3.3.1 Q格式和量化 14
3.3.2 量化權重 15
3.3.3 量化啟動資料 16
3.3.4 開發板置入量化後的模型 17
第四章系統整合實驗 19
4.1 軟硬體實作平台 19
4.1.1 模型訓練平台 19
4.1.2 開發板硬體平台 20
4.2 資料前處理 21
4.3 模型訓練及量化之後的實驗 23
4.3.1 混淆矩陣 23
4.3.2 模型訓練過程及結果 24
4.3.3 量化範圍的設定對於模型準確率的影響 26
4.4 開發板的驗證 27
4.4 實驗結果整理 30
第五章結論與未來研究方向 32
5.1 結論 32
5.2 未來研究方向 33
參考文獻 34

參考文獻

[1] A. Krizhevsky, I. Sutskever, G. Hinton, "Imagenet classification with deep convolutional neural networks", Paper presented at the Advances in neural information processing systems, pp. 1097-1105, 2012.
[2] A. Graves, A. Mohamed, G. Hinton, " Speech recognition with deep recurrent neural networks", Paper presented at the Acoustics, speech and signal processing (icassp), pp. 6645-6649, 2013.
[3] N. Lane, S. Bhattacharya, A. Mathur, P. Georgiev, C. Forlivesi, F. Kawsar, " Squeezing deep learning into mobile and embedded devices", IEEE Pervasive Computing, no. 3, pp. 82-88, 2017.
[4] NVIDIA. (2018). 嵌入式系統開發套件、模組及SDK | NVIDIA Jetson. from https://www.nvidia.com/zh-tw/autonomous-machines/embedded-systems-dev-kits-modules/
[5] ARM. (2018). Project Trillium - Arm. from https://www.arm.com/products/silicon-ip-cpu/machine-learning/project-trillium
[6] S. Han, H. Mao, W. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding", arXiv preprint arXiv:1510.00149, 2015.
[7] S. Bhattacharya, N. D. Lane, “Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables”, Paper presented at the Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM, Stanford, CA, USA, 2016.
[8] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications”, arXiv preprint arXiv:1704.04861, 2017.
[9] L. Lai, N. Suda, V. Chandra, “CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs”, eprint arXiv:1801.06601, 2018.
[10] Y. Zhang, N. Suda, L. Lai, V. Chandra, “Hello edge: Keyword spotting on microcontrollers”, arXiv preprint arXiv:1711.07128 ,2017
[11] J.-w. Chen, C.-H. Liu, Y.-F. Liao, “基於深層類神經網路之音訊事件偵測系統” (Deep Neural Networks for Audio Event Detection) [In Chinese]. Paper presented at the Proceedings of the 28th Conference on Computational Linguistics and Speech Processing, 2016.
[12] CS231n, Stanford. (2018). Convolutional Neural Networks for Visual Recognition. from http://cs231n.github.io/convolutional-networks/
[13] C.-S. Li, (2018). Depthwise Separable Convolution. from http://blog.yeshuanova.com/blog/posts/depthwise-separable-convolution/
[14] I. Hubara, M. Courbariaux, D. Soudry, E.-Y. Ran, Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations”, The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869-6898, 2017.
[15] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference", Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[16] R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper”, arXiv preprint arXiv:1806.08342, 2018.
[17] UrbanSound8K, (2018). Urban Sound Datasets. from https://urbansounddataset.weebly.com/urbansound8k.html
[18] X. Zhu, M. Kaznady, G. Hendry, (2018). Hearing AI: Getting Started with Deep Learning for Audio on Azure. from https://blogs.technet.microsoft.com/machinelearning/2018/01/30/hearing-ai-getting-started-with-deep-learning-for-audio-on-azure/

指導教授

陳慶瀚

審核日期

2019-1-30

推文