參考文獻 |
[1] Number of digital voice assistants in use worldwide from 2019 to 2023. https://www.statista.com/statistics/973815/worldwide-digital-voiceassistant-in-use/
[2] J. S. Edu, J. M. Such, Guillermo Suarez-Tangil, “Smart home personal assistants: A security and privacy review,” ACM Computing Surveys, vol. 53, No. 116, pp. 1-36, Feb. 2020.
[3] Apple Machine Learning Blog, “Hey Siri: An On-device DNN-powered Voice Trigger for Apples Personal Assistant,” Oct. 2017. [Online]. Available: https://machinelearning. apple.com/2017/10/01/hey-siri.html.
[4] B. Li et al., “Acoustic modeling for Google home,” in Proc. Interspeech, 2017, pp. 399–403.
[5] Y. Bai et al., “End-to-end keywords spotting based on connectionist temporal classification for Mandarin,” 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), Oct. 2016, pp. 1–5.
[6] T. N. Sainath and C. Parada, “Convolutional neural networks for small-footprint keyword spotting,” in Proc. Interspeech, 2015, pp. 1478–1482.
[7] M. B. Andra and T. Usagawa, “Contextual keyword spotting in lecture video with deep convolutional neural network,” 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 198–203.
[8] R. Tang and J. Lin, “Deep residual learning for small-footprint keyword spotting,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5484-5488.
[9] M. Sun et al., “Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting,” 2016 IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 474-480.
[10] D. Wang, S. Lv, X. Wang, and X. Lin, “Gated convolutional LSTM for speech commands recognition,” International Conference on Computational Science. Springer, Cham, 2018. p. 669-681.
[11] S. O. Arik et al., “Convolutional recurrent neural networks for small-footprint keyword spotting,” in Proc. Interspeech, 2017, pp. 1606–1610.
[12] M. Zeng and N. Xiao, “Effective combination of densenet and BiLSTM for keyword spotting,” in IEEE Access, vol. 7, pp. 10767-10775, 2019.
[13] G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, “Densely connected convolutional networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261-2269.
[14] K. He, X. Zhang, S. Ren and J. Sun “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, p. 770-778.
[15] A. Krizhevsky et al., “ImageNet classification with deep convolutional neural networks”, Advances in neural information processing systems, 2012, pp. 1097-1105.
[16] S. Xie, R. Girshick, P. Dollár, Z. Tu and K. He, “Aggregated Residual Transformations for Deep Neural Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5987-5995.
[17] X. Zhang, X. Zhou, M. Lin and J. Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848-6856.
[18] D. Sinha and M. El-Sharkawy, “Thin MobileNet: An Enhanced MobileNet Architecture,” 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2019, pp. 0280-0285.
[19] G. Huang, S. Liu, L. v. d. Maaten and K. Q. Weinberger, “CondenseNet: an efficient denseNet using learned group convolutions,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 2752-2761.
[20] J. Hu, L. Shen and G. Sun, “Squeeze-and-excitation networks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141.
[21] A. Howard et al., “Searching for MobileNetV3,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1314-1324.
[22] M. Tan et al., “MnasNet: Platform-Aware Neural Architecture Search for Mobile,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2815-2823.
[23] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in ICML, 2010.
[24] Pete Warden, “Launching the speech commands dataset,” Google Research Blog, 2017. [Online]. Available: https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html
[25] L. Muda, M. Begam, and I. Elamvazuthi, “Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (dtw) techniques,” Journal of Computing, 2010.
[26] T. Ko, V. Peddinti, D. Povey et al., “Audio augmentation for speech recognition,” in Proc. Interspeech, 2015, pp. 3586-3589.
[27] T. Fukuda, R. Fernandez, A. Rosenberg, S. Thomas, B. Ramabhadran, A. Sorin, and G. Kurata, “Data augmentation improves recognition of foreign accented speech,” in Proc. Interspeech , 2018, pp. 2409-2413.
[28] SoX, audio manipulation tool, (accessed March 25, 2015). [Online]. Available: http://sox.sourceforge.net/
[29] T. Zhang, G.-J. Qi, B. Xiao, and J. Wang. “Interleaved group convolutions,” 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4383-4392.
[30] S. Xie, R. Girshick, P. Dollár, Z. Tu and K. He, “Aggregated residual transformations for deep neural networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5987-5995.
[31] Z. Yan et al., “HD-CNN: Hierarchical deep convolutional neural networks for large scale visual recognition,” 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2740-2748.
[32] Y. Zhang, N. Suda, L. Lai, and V. Chandra. (2017). “Hello edge: Keyword spotting on icrocontrollers,” unpublished.
[33] D. C. de Andrade, S. Leo, M. L. Da S. Viana, and C. Bernkopf. (2018). “A neural attention model for speech command recognition,” unpublished. |