參考文獻 |
[1] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Trans. Audio, Speech Lang. Process., vol. 19, no. 4, pp. 788–798, 2011, doi: 10.1109/TASL.2010.2064307.
[2] S. Ioffe, “Probabilistic linear discriminant analysis,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3954 LNCS, pp. 531–542, 2006, doi: 10.1007/11744085_41.
[3] P. Kenny, “Bayesian speaker verification with heavy tailed priors,” Proc. Odyssey Speak. Lang. Recogntion Work. Brno, Czech Repub., 2010.
[4] E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, “Deep neural networks for small footprint text-dependent speaker verification,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., pp. 4052–4056, 2014, doi: 10.1109/ICASSP.2014.6854363.
[5] G. Heigold, I. Moreno, S. Bengio, and N. Shazeer, “End-to-End Text-Dependent Speaker Verification,” 2016, pp. 5115–5119.
[6] L. Wan, Q. Wang, A. Papir, and I. L. Moreno, “Generalized end-to-end loss for speaker verification,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, pp. 4879–4883, 2018, doi: 10.1109/ICASSP.2018.8462665.
[7] D. Snyder, D. Garcia-Romero, D. Povey, and S. Khudanpur, “Deep neural network embeddings for text-independent speaker verification,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2017-Augus, pp. 999–1003, 2017, doi: 10.21437/Interspeech.2017-620.
[8] V. Peddinti, D. Povey, and S. Khudanpur, “Atimedelayneuralnetworkarchitectureforefficientmodelingoflong temporalcontexts.pdf,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2015-Janua, pp. 2–6, 2015.
[9] D. Snyder, D. Garcia-Romero, G. Sell, A. McCree, D. Povey, and S. Khudanpur, “Speaker Recognition for Multi-speaker Conversations Using X-vectors,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2019-May, pp. 5796–5800, 2019, doi: 10.1109/ICASSP.2019.8683760.
[10] B. Gu, W. Guo, L. Dai, and J. Du, “An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales,” pp. 6814–6818, 2020, doi: 10.1109/icassp40776.2020.9054151.
[11] “Chen, Chia-Ping, Su-Yu Zhang, Chih-Ting Yeh, Jia-Ching Wang, Tenghui Wang, and Chien-Lin Huang. ‘Speaker characterization using tdnn-lstm based speaker embedding.’ In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processin,” pp. 6211–6215, 2019.
[12] Q.-B. Hong, C.-H. Wu, H.-M. Wang, and C.-L. Huang, “Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification,” ICASSP 2020-2020 IEEE Int. Conf. Acoust. Speech Signal Process., pp. 6849–6853, 2020, doi: 10.1109/icassp40776.2020.9054350.
[13] F. A. R. R. Chowdhury, Q. Wang, I. L. Moreno, and L. Wan, “Attention-Based Models for Text-Dependent Speaker Verification,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, no. 2, pp. 5359–5363, 2018, doi: 10.1109/ICASSP.2018.8461587.
[14] M. H. Rahman, I. Himawan, M. Mclaren, C. Fookes, and S. Sridharan, “Employing phonetic information in DNN speaker embeddings to improve speaker recognition performance,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2018-Septe, no. September, pp. 3593–3597, 2018, doi: 10.21437/Interspeech.2018-1804.
[15] Y. Zhu, T. Ko, D. Snyder, B. Mak, and D. Povey, “Self-attentive speaker embeddings for text-independent speaker verification,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2018-Septe, pp. 3573–3577, 2018, doi: 10.21437/Interspeech.2018-1158.
[16] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” pp. 1–9, 2014, [Online]. Available: http://arxiv.org/abs/1412.3555.
[17] D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: robust dnn embeddings for speaker recognition,” 2018 IEEE Int. Conf. Acoust. Speech Signal Process., pp. 5329–5333, 2018.
[18] D. Povey et al., “Semi-orthogonal low-rank matrix factorization for deep neural networks,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2018-Septe, no. 2, pp. 3743–3747, 2018, doi: 10.21437/Interspeech.2018-1417.
[19] G. Zhong, G. Yue, and X. Ling, “Recurrent attention unit,” arXiv, 2018.
[20] Y. Qin, D. Chen, S. Xiang, and C. Zhu, “Gated dual attention unit neural networks for remaining useful life prediction of rolling bearings,” IEEE Trans. Ind. Informatics, vol. 3203, no. c, pp. 1–1, 2020, doi: 10.1109/tii.2020.2999442.
[21] J. S. Chung, A. Nagrani, and A. Zisserman, “VoxceleB2: Deep speaker recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2018-Septe, no. i, pp. 1086–1090, 2018, doi: 10.21437/Interspeech.2018-1929.
[22] D. Snyder, G. Chen, and D. Povey, “MUSAN: A Music, Speech, and Noise Corpus,” pp. 2–5, 2015, [Online]. Available: http://arxiv.org/abs/1510.08484.
[23] D. Snyder, D. Garcia-Romero, A. McCree, G. Sell, D. Povey, and S. Khudanpur, “Spoken Language Recognition using X-vectors,” pp. 105–111, 2018, doi: 10.21437/odyssey.2018-15.
[24] A. Nagraniy, J. S. Chungy, and A. Zisserman, “VoxCeleb: A large-scale speaker identification dataset,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2017-Augus, pp. 2616–2620, 2017, doi: 10.21437/Interspeech.2017-950.
[25] M. Mclaren, A. Lawson, L. Ferrer, D. Castán, and M. Graciarena, “The Speakers in the Wild Speaker Recognition Challenge Plan,” pp. 818–822, 2016.
[26] J. Villalba et al., “State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations,” Comput. Speech Lang., vol. 60, 2020, doi: 10.1016/j.csl.2019.101026.
[27] M. Ravanelli, T. Parcollet, Y. Bengio, and C. Fellow, “the pytorch-kaldi speech recognition toolkit mila , Universit ´ e de Montr ´ LIA , Universit ´ e d ’ Avignon,” pp. 6465–6469, 2019. |