參考文獻 |
[1] N. Egi, T. Hayashi and A. Takahashi, "The proposal of quantification method of speaker identification accuracy for speech communication service" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[2] D. A. Reynolds and R. C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," in IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[3] P. Kenny, G. Boulianne, P. Ouellet and P. Dumouchel, "Speaker and Session Variability in GMM-Based Speaker Verification," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1448-1460, May 2007.
[4] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, "Front-End Factor Analysis for Speaker Verification," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, May 2011
[5] M. Li, A. Tsiartas, M. Van Segbroeck and S. S. Narayanan, "Speaker verification using simplified and supervised i-vector modeling," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7199-7203.
[6] S. Cumani, O. Plchot and P. Laface, "On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 4, pp. 846-857, April 2014.
[7] C. J. S. de Souza, D. C. G. González and L. L. Ling, "VVGP features for speaker verification using i-vector framework," 2015 International Workshop on Telecommunications (IWT), 2015, pp. 1-4.
[8] E. Variani, X. Lei, E. McDermott, I. L. Moreno and J. Gonzalez-Dominguez, "Deep neural networks for small footprint text-dependent speaker verification," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4052-4056.
[9] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K. J. Lang, "Phoneme recognition using time-delay neural networks," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, March 1989.
[10] D. Snyder, D. Garcia-Romero, G. Sell, D. Povey and S. Khudanpur, "X-Vectors: Robust DNN Embeddings for Speaker Recognition," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5329-5333.
[11] F. A. Rezaur rahman Chowdhury, Q. Wang, I. L. Moreno and L. Wan, "Attention-Based Models for Text-Dependent Speaker Verification," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5359-5363.
[12] C. -P. Chen, S. -Y. Zhang, C. -T. Yeh, J. -C. Wang, T. Wang and C. -L. Huang, "Speaker Characterization Using TDNN-LSTM Based Speaker Embedding," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6211-6215.
[13] F. Zhao, H. Li and X. Zhang, "A Robust Text-independent Speaker Verification Method Based on Speech Separation and Deep Speaker," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6101-6105.
[14] J. S. P. Giraldo, S. Lauwereins, K. Badami, H. Van Hamme and M. Verhelst, "18μW SoC for near-microphone Keyword Spotting and Speaker Verification," 2019 Symposium on VLSI Circuits, 2019, pp. C52-C53.
[15] J. S. P. Giraldo, S. Lauwereins, K. Badami and M. Verhelst, "Vocell: A 65-nm Speech-Triggered Wake-Up SoC for 10- $mu$ W Keyword Spotting and Speaker Verification," in IEEE Journal of Solid-State Circuits, vol. 55, no. 4, pp. 868-878.
[16] J. Wang, L. Lian, Y. Lin and J. Zhao, "VLSI Design for SVM-Based Speaker Verification System," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 7, pp. 1355-1359, July 2015.
[17] Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S. (2017) Deep Neural Network Embeddings for Text-Independent Speaker Verification. Proc. Interspeech 2017, 999-1003, doi: 10.21437/Interspeech.2017-620.
[18] X. Zhang, X. Zou, M. Sun, T. F. Zheng, C. Jia and Y. Wang, "Noise Robust Speaker Recognition Based on Adaptive Frame Weighting in GMM for i-Vector Extraction," in IEEE Access, vol. 7, pp. 27874-27882, 2019.
[19] M. Horowitz, Computing’s energy problem (and what we can do about it), in International Solid-State Circuits Conference (ISSCC), 2014
[20] D. Kadetotad, V. Berisha, C. Chakrabarti and J. -S. Seo, "A 8.93-TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity With All Parameters Stored On-Chip," in IEEE Solid-State Circuits Letters, vol. 2, no. 9, pp. 119-122, Sept. 2019.
[21] K. -Y. Fan, J. -H. Chen, C. -N. Liu and J. -D. Huang, "Performance Optimization for MLP Accelerators using ILP-Based On-Chip Weight Allocation Strategy," 2022 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2022, pp. 1-4.
[22] S. Wang et al., "Acceleration of LSTM With Structured Pruning Method on FPGA," in IEEE Access, vol. 7, pp. 62930-62937, 2019.
[23] X. Dai, H. Yin and N. K. Jha, "Grow and Prune Compact, Fast, and Accurate LSTMs," in IEEE Transactions on Computers, vol. 69, no. 3, pp. 441-452, 1 March 2020.
[24] Y. -H. Chen, T. -J. Yang, J. Emer and V. Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292-308, June 2019.
[25] J. Park, W. Yi, D. Ahn, J. Kung and J. -J. Kim, "Balancing Computation Loads and Optimizing Input Vector Loading in LSTM Accelerators," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 9, pp. 1889-1901, Sept. 2020.
[26] http://www.andestech.com/en/products-solutions/andescore-processors/RISC-V-n25f/
[27] M. Jiao, Y. Li, P. Dang, W. Cao and L. Wang, "A High Performance FPGA-Based Accelerator Design for End-to-End Speaker Recognition System," 2019 International Conference on Field-Programmable Technology (ICFPT), 2019, pp. 215-223.
[28] J. S. Chung, A. Nagrani, and A. Zisserman, “Voxceleb2: Deep speaker recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2018-Septe, no. i, pp. 1086–1090, 2018, doi: 10.21437/Interspeech.2018-1929.
[29] A. Nagraniy, J. S. Chungy, and A. Zisserman, “VoxCeleb: A large-scale speaker identification dataset,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2017-Augus, pp. 2616–2620, 2017, doi: 10.21437/Interspeech.2017-950.
[30] Peddinti, V., Povey, D., Khudanpur, S. (2015) A time delay neural network architecture for efficient modeling of long temporal contexts. Proc. Interspeech 2015, 3214-3218
[31] Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber, “LSTM: A Search Space Odyssey” arXiv:1503.04069, 2015
[32] R. Ramos-Lara, M. Lopez-Garcia, E. Canto-Navarro and L. Puente-Rodriguez, "SVM speaker verification system based on a low-cost FPGA," 2009 International Conference on Field Programmable Logic and Applications, 2009, pp. 582-586
[33] Ramos-Lara, R., López-García, M., Cantó-Navarro, E. et al. Real-Time Speaker Verification System Implemented on Reconfigurable Hardware. J Sign Process Syst 71, pp. 89–103, 2013
[34] E. Cantó-Navarro, M. López-García, R. Ramos-Lara and R. Sánchez-Reíllo, "Flexible Biometric Online Speaker-Verification System Implemented on FPGA Using Vector Floating-Point Units," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 11, pp. 2497-2507, Nov. 2015
[35] A. S. Bora et al., "Power Efficient Speaker Verification Using Linear Predictive Coding on FPGA," 2018 International CET Conference on Control, Communication, and Computing (IC4), 2018, pp. 260-265
[36] B. Liu et al., "A Target-Separable BWN Inspired Speech Recognition Processor with Low-power Precision-adaptive Approximate Computing," 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2022, pp. 196-201
[37] T. Tambe et al., "9.8 A 25mm2 SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET," 2021 IEEE International Solid- State Circuits Conference (ISSCC), 2021, pp. 158-160.
[38] T. -J. Lin et al., "A 40nm CMOS SoC for Real-Time Dysarthric Voice Conversion of Stroke Patients," 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 2022, pp. 7-8
[39] Y. -H. Chen, T. Krishna, J. S. Emer and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017
[40] M. D. Balasingam and C. S. Kumar, "Refining Cosine Distance Features for Robust Speaker Verification," 2018 International Conference on Communication and Signal Processing (ICCSP), 2018, pp. 0152-0155.
[41] S. J. D. Prince and J. H. Elder, "Probabilistic Linear Discriminant Analysis for Inferences About Identity," 2007 IEEE 11th International Conference on Computer Vision, 2007, pp. 1-8, doi: 10.1109/ICCV.2007.
[42] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in neural information processing systems, 2015, pp. 1135–1143.
[43] J. Fernández-Marqués, Vincent W.-S. Tseng, Sourav Bhattachara, and Nicholas D. Lane.. “On-the-fly deterministic binary filters for memory efficient keyword spotting applications on embedded devices.” In Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning (EMDL′18)
[44] P. Blouw, G. Malik, B. Morcos, A. R. Voelker, C. Eliasmith A. Akandeh and F. M. Salem, " Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware" in arXiv:2009.04465 [eess.AS], 2021
|