喚醒詞系統之嵌入式系統實現

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：3.136.19.203

姓名

韓多諾(Maystya Tri Handono) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

喚醒詞系統之嵌入式系統實現
(Embedded System Implementation of Wake-Up Word Recognition)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

喚醒字系統用於將智能設備置於警報狀態，以便它期望進一步的口頭命令。喚醒字系統允許智能電話，汽車多媒體系統或家庭自動化系統等設備的免提操作。在本論文中，我們使用 deep learning 架構實現喚醒字系統。
該系統使用Tensorflow框架進行培訓。在測試中，我們在沒有Tensorflow框架的情況下實現推理。原因是給一些不支持Tensorflow的嵌入式系統設備。將使用虛警率（FAR），錯誤拒絕率（FRR）和準確度來評估喚醒字系統。基線具有95％的準確度，0.09％的FAR和0.02％的FRR。 Tensorflow框架的測試結果具有0.03％FAR，0.07％FRR，精度為96％。沒有張量流的測試推斷得到0.06％FAR，0.11％FRR，精度為95.8％。兩種實現之間的差異大約為 3％。

摘要(英)

Wake-up word system is used to put an intelligent device in a state of alert so that it expects further spoken commands. The wake-up word system allows for hands-free operation of devices such as smart phones, multimedia systems in cars or home automation system.
In this thesis, we implement the wake-up word system using deep learning architecture. The system is implemented using Tensorflow framework for training. In the testing, we implements the inference without Tensorflow framework. The reason is to give some embedded system device that has not support with Tensorflow. The wake-up word system will be evaluated using False Alarm Rate (FAR), False Rejection Rate (FRR), and the accuracy. The baseline has 95% of accuracy, 0.09% of FAR, and 0.02% of FRR. The testing result with Tensorflow framework has 0.03% FAR, 0.07% FRR, and the accuracy is 96%. The testing inference without tensorflow resulting 0.06% FAR, 0.11% FRR and the accuracy is 95.8%. The different between the two implementation is around 3%.

關鍵字(中)

★ ?醒一?
★ 卷積神經網絡

關鍵字(英)

★ wake-up word
★ convolution neural network
★ tensorflow

論文目次

摘要 i
ABSTRACT ii
ACKNOWLEDGEMENT iii
TABLE OF CONTENTS iv
LIST OF FIGURES vi
LIST OF TABLES vii
CHAPTER 1 INTRODUCTION 1
CHAPTER 2 EMBEDDED SYSTEM IMPLEMENTATION OF WAKE-UP WORD RECOGNITION 4
2.1. Mel Frequency Cepstral Coefficient 4
2.2. Convolutional Neural Network 9
2.2.1. Multilayer Perceptron 10
2.2.2. Convolutional Layer 12
2.2.3. Pooling Layer 13
2.2.4. Activation Function 14
2.2.5. Classification 15
CHAPTER 3 METHODOLOGY 16
3.1 Wake-Up Word Recognition Model Architecture 16
CHAPTER 4 EXPERIMENT SETUP 20
4.1 Dataset Description 20
4.2 Training Neural Network Hardware 21
4.3 Experimental Setup 21
4.4 Data Augmentation 22
4.5 Evaluation 22
4.4.1 False Alarm Rate (FAR) 22
4.4.2 False Rejection Rate (FRR) 23
4.4.2 Accuracy 24
CHAPTER 5 EXPERIMENT RESULT 25
CHAPTER 6 CONCLUSION 27
REFERENCES 28

參考文獻

[1] T. N. Sainath and C. Parada, “Convolutional Neural Networks for Small-footprint Keyword Spotting,” Proc. INTERSPEECH, pp. 1478–1482, 2015.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2323, 1998.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Adv. Neural Inf. Process. Syst., pp. 1–9, 2012.
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2015.
[5] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, 2017.
[6] Y. Wu, M. Schuster, Z. Chen, Q. V Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, ?. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean, “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” ArXiv e-prints, pp. 1–23, 2016.
[7] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” Proc. 2014 Conf. Empir. Methods Nat. Lang. Process., pp. 1746–1751, 2014.
[8] O. Abdel-hamid, H. Jiang, and G. Penn, “Applying Convolutional Neural Networks Concepts to Hybrid Nn-Hmm Model for Speech Recognition,” ICASSP 2012, pp. 4277–4280, 2012.
[9] L. Toth, “Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2014, pp. 190–194.
[10] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Hannun, B. Jun, P. Legresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu, “Deep Speech 2?: End-to-End Speech Recognition in English and Mandarin arXiv?: 1512 . 02595v1 [ cs . CL ] 8 Dec 2015,” pp. 1–28, 2015.
[11] P. Warden, “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition,” 2018.
[12] Wei Han, Cheong-Fat Chan, Chiu-Sing Choy, and Kong-Pang Pun, “An efficient MFCC extraction method in speech recognition,” in 2006 IEEE International Symposium on Circuits and Systems, 2006, p. 4.
[13] S. Molau, M. Pitz, R. Schluter, and H. Ney, “Computing Mel-frequency cepstral coefficients on the power spectrum,” 2001 IEEE Int. Conf. Acoust. Speech, Signal Process. Proc. (Cat. No.01CH37221), vol. 1, pp. 73–76.
[14] J. G. Proakis and D. G. Monolakis, “Digital signal processing: principles, algorithms, and applications,” Pentice Hall, pp. 1–42, 1996.
[15] A. Zolnay, R. Schluter, and H. Ney, “Acoustic feature combination for robust speech recognition,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. I, pp. 457–460, 2005.
[16] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” 2015.
[17] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
[18] R. Tang and J. Lin, “Deep Residual Learning for Small-Footprint Keyword Spotting,” 2017.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2018-8-17

推文