參考文獻 |
〔1〕Anguera, Xavier, et al. "Speaker diarization: A review of recent research." IEEE
Transactions on audio, speech, and language processing 20.2 (2012): 356-370.
〔2〕Bromley, Jane, et al. "Signature verification using a" siamese" time delay neural
network." Advances in neural information processing systems 6 (1993).
〔3〕Garcia-Romero, Daniel, and Carol Y. Espy-Wilson. "Analysis of i-vector length
normalization in speaker recognition systems." Twelfth annual conference of the
international speech communication association. 2011.
〔4〕Hershey, John R., et al. "Deep clustering: Discriminative embeddings for segmentation
and separation." 2016 IEEE international conference on acoustics, speech and signal
processing (ICASSP). IEEE, 2016.
〔5〕Snyder, David, et al. "X-vectors: Robust dnn embeddings for speaker recognition." 2018
IEEE international conference on acoustics, speech and signal processing (ICASSP).
IEEE, 2018.
〔6〕Krishna, K., and M. Narasimha Murty. "Genetic K-means algorithm." IEEE Transactions
on Systems, Man, and Cybernetics, Part B (Cybernetics) 29.3 (1999): 433-439.
〔7〕Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17
(2007): 395-416.
〔8〕Dawalatabad, Nauman, et al. "Ecapa-tdnn embeddings for speaker diarization." arXiv
preprint arXiv:2104.01466 (2021).
〔9〕McInnes, Leland, John Healy, and James Melville. "Umap: Uniform manifold
approximation and projection for dimension reduction." arXiv preprint
arXiv:1802.03426 (2018).
〔10〕Van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing data using tSNE." Journal of machine learning research 9.11 (2008).
〔11〕Traag, Vincent A., Ludo Waltman, and Nees Jan Van Eck. "From Louvain to Leiden:
guaranteeing well-connected communities." Scientific reports 9.1 (2019): 5233.
〔12〕McInnes, Leland, John Healy, and Steve Astels. "hdbscan: Hierarchical density based
clustering." J. Open Source Softw.2.11 (2017): 205.
〔13〕Wen, Wei, et al. "Learning structured sparsity in deep neural networks." Advances in
neural information processing systems29 (2016).
〔14〕Waibel, Alexander, et al. "Phoneme recognition using time-delay neural
networks." IEEE transactions on acoustics, speech, and signal processing 37.3 (1989):
328-339.
〔15〕Povey, Daniel, et al. "Semi-orthogonal low-rank matrix factorization for deep neural
networks." Interspeech. 2018.
〔16〕Fujita, Yusuke, et al. "End-to-end neural speaker diarization with self-attention." 2019
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2019.
〔17〕Griffin, Daniel, and Jae Lim. "Signal estimation from modified short-time Fourier
transform." IEEE Transactions on acoustics, speech, and signal processing 32.2 (1984):
236-243.
〔18〕Sahidullah, Md, and Goutam Saha. "Design, analysis and experimental evaluation of
block based transformation in MFCC computation for speaker recognition." Speech
communication54.4 (2012): 543-565.
〔19〕Mikolov, Tomáš, et al. "Extensions of recurrent neural network language model." 2011
IEEE international conference on acoustics, speech and signal processing (ICASSP).
IEEE, 2011.
〔20〕Garofolo, John S., et al. "DARPA TIMIT acoustic-phonetic continous speech corpus
CD-ROM. NIST speech disc 1-1.1." NASA STI/Recon technical report n 93 (1993):
27403.
〔21〕Panayotov, Vassil, et al. "Librispeech: an asr corpus based on public domain audiobooks." 2015 IEEE international conference on acoustics, speech and signal processing
(ICASSP). IEEE, 2015.
〔22〕Nagrani, Arsha, Joon Son Chung, and Andrew Zisserman. "Voxceleb: a large-scale
speaker identification dataset." arXiv preprint arXiv:1706.08612 (2017).
〔23〕Gardner, Matt W., and S. R. Dorling. "Artificial neural networks (the multilayer
perceptron)—a review of applications in the atmospheric sciences." Atmospheric
environment 32.14-15 (1998): 2627-2636.
〔24〕Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted
boltzmann machines." Proceedings of the 27th international conference on machine
learning (ICML-10). 2010.
〔25〕Wu, Fei, et al. "Advances in Automatic Speech Recognition for Child Speech Using
Factored Time Delay Neural Network." Interspeech. 2019.
〔26〕Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." International conference on machine
learning. pmlr, 2015.
〔27〕Merris, Russell. "Laplacian matrices of graphs: a survey." Linear algebra and its
applications 197 (1994): 143-176.
〔28〕Danielsson, Per-Erik. "Euclidean distance mapping." Computer Graphics and image
processing 14.3 (1980): 227-248.
〔29〕Van Dongen, Stijn, and Anton J. Enright. "Metric distances derived from cosine
similarity and Pearson and Spearman correlations." arXiv preprint
arXiv:1208.3145 (2012).
〔30〕Goldberger, Jacob, Shiri Gordon, and Hayit Greenspan. "An Efficient Image Similarity
Measure Based on Approximations of KL-Divergence Between Two Gaussian
Mixtures." ICCV. Vol. 3. 2003.
〔31〕Schubert, Erich, et al. "DBSCAN revisited, revisited: why and how you should (still) 〔22〕Nagrani, Arsha, Joon Son Chung, and Andrew Zisserman. "Voxceleb: a large-scale
speaker identification dataset." arXiv preprint arXiv:1706.08612 (2017).
〔23〕Gardner, Matt W., and S. R. Dorling. "Artificial neural networks (the multilayer
perceptron)—a review of applications in the atmospheric sciences." Atmospheric
environment 32.14-15 (1998): 2627-2636.
〔24〕Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted
boltzmann machines." Proceedings of the 27th international conference on machine
learning (ICML-10). 2010.
〔25〕Wu, Fei, et al. "Advances in Automatic Speech Recognition for Child Speech Using
Factored Time Delay Neural Network." Interspeech. 2019.
〔26〕Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." International conference on machine
learning. pmlr, 2015.
〔27〕Merris, Russell. "Laplacian matrices of graphs: a survey." Linear algebra and its
applications 197 (1994): 143-176.
〔28〕Danielsson, Per-Erik. "Euclidean distance mapping." Computer Graphics and image
processing 14.3 (1980): 227-248.
〔29〕Van Dongen, Stijn, and Anton J. Enright. "Metric distances derived from cosine
similarity and Pearson and Spearman correlations." arXiv preprint
arXiv:1208.3145 (2012).
〔30〕Goldberger, Jacob, Shiri Gordon, and Hayit Greenspan. "An Efficient Image Similarity
Measure Based on Approximations of KL-Divergence Between Two Gaussian
Mixtures." ICCV. Vol. 3. 2003.
〔31〕Schubert, Erich, et al. "DBSCAN revisited, revisited: why and how you should (still) 〔22〕Nagrani, Arsha, Joon Son Chung, and Andrew Zisserman. "Voxceleb: a large-scale
speaker identification dataset." arXiv preprint arXiv:1706.08612 (2017).
〔23〕Gardner, Matt W., and S. R. Dorling. "Artificial neural networks (the multilayer
perceptron)—a review of applications in the atmospheric sciences." Atmospheric
environment 32.14-15 (1998): 2627-2636.
〔24〕Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted
boltzmann machines." Proceedings of the 27th international conference on machine
learning (ICML-10). 2010.
〔25〕Wu, Fei, et al. "Advances in Automatic Speech Recognition for Child Speech Using
Factored Time Delay Neural Network." Interspeech. 2019.
〔26〕Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." International conference on machine
learning. pmlr, 2015.
〔27〕Merris, Russell. "Laplacian matrices of graphs: a survey." Linear algebra and its
applications 197 (1994): 143-176.
〔28〕Danielsson, Per-Erik. "Euclidean distance mapping." Computer Graphics and image
processing 14.3 (1980): 227-248.
〔29〕Van Dongen, Stijn, and Anton J. Enright. "Metric distances derived from cosine
similarity and Pearson and Spearman correlations." arXiv preprint
arXiv:1208.3145 (2012).
〔30〕Goldberger, Jacob, Shiri Gordon, and Hayit Greenspan. "An Efficient Image Similarity
Measure Based on Approximations of KL-Divergence Between Two Gaussian
Mixtures." ICCV. Vol. 3. 2003.
〔31〕Schubert, Erich, et al. "DBSCAN revisited, revisited: why and how you should (still) use DBSCAN." ACM Transactions on Database Systems (TODS) 42.3 (2017): 1-21.
〔32〕Cheriton, David, and Robert Endre Tarjan. "Finding minimum spanning trees." SIAM
journal on computing 5.4 (1976): 724-742.
〔33〕Osipov, Vitaly, Peter Sanders, and Johannes Singler. "The filter-kruskal minimum
spanning tree algorithm." 2009 Proceedings of the Eleventh Workshop on Algorithm
Engineering and Experiments (ALENEX). Society for Industrial and Applied
Mathematics, 2009.
〔34〕Greenberg, Harvey J. "Greedy algorithms for minimum spanning tree." University of
Colorado at Denver (1998).
〔35〕Abdi, Hervé, and Lynne J. Williams. "Principal component analysis." Wiley
interdisciplinary reviews: computational statistics2.4 (2010): 433-459.
〔36〕Balakrishnama, Suresh, and Aravind Ganapathiraju. "Linear discriminant analysis-a
brief tutorial." Institute for Signal and information Processing 18.1998 (1998): 1-8.
〔37〕Cieslak, Matthew C., et al. "t-Distributed Stochastic Neighbor Embedding (t-SNE): A
tool for eco-physiological transcriptomic analysis." Marine genomics 51 (2020): 100723.
〔38〕Wang, Tianlei, et al. "Hierarchical one-class classifier with within-class scatter-based
autoencoders." IEEE Transactions on Neural Networks and Learning Systems 32.8
(2020): 3770-3776.
〔39〕Sharma, Alok, and Kuldip K. Paliwal. "A new perspective to null linear discriminant
analysis method and its fast implementation using random matrix multiplication with
scatter matrices." Pattern Recognition 45.6 (2012): 2205-2213.
〔40〕Van Erven, Tim, and Peter Harremos. "Rényi divergence and Kullback-Leibler
divergence." IEEE Transactions on Information Theory 60.7 (2014): 3797-3820.
〔41〕Do Carmo, Manfredo Perdigao, and J. Flaherty Francis. Riemannian geometry. Vol. 6.
Boston: Birkhäuser, 1992.
〔42〕Spanier, Edwin H. Algebraic topology. Springer Science & Business Media, 1989.
〔43〕Nene, Sameer A., Shree K. Nayar, and Hiroshi Murase. "Columbia object image library
(coil-20)." (1996): 7.
〔44〕Deng, Li. "The mnist database of handwritten digit images for machine learning
research [best of the web]." IEEE signal processing magazine 29.6 (2012): 141-142.
〔45〕Xiao, Han, Kashif Rasul, and Roland Vollgraf. "Fashion-mnist: a novel image dataset
for benchmarking machine learning algorithms." arXiv preprint
arXiv:1708.07747 (2017).
〔46〕Das, Abhinandan S., et al. "Google news personalization: scalable online collaborative
filtering." Proceedings of the 16th international conference on World Wide Web. 2007.
〔47〕Gao, Shang-Hua, et al. "Res2net: A new multi-scale backbone architecture." IEEE
transactions on pattern analysis and machine intelligence 43.2 (2019): 652-662.
〔48〕Santos, Cicero dos, et al. "Attentive pooling networks." arXiv preprint
arXiv:1602.03609 (2016).
〔49〕Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual
connections on learning." Proceedings of the AAAI conference on artificial intelligence.
Vol. 31. No. 1. 2017.
〔50〕Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2018.
〔51〕Nagrani, Arsha, Joon Son Chung, and Andrew Zisserman. "Voxceleb: a large-scale
speaker identification dataset." arXiv preprint arXiv:1706.08612 (2017).
〔52〕Carletta, Jean, et al. "The AMI meeting corpus: A pre-announcement." International
workshop on machine learning for multimodal interaction. Berlin, Heidelberg: Springer
Berlin Heidelberg, 2005.
〔53〕Paszke, Adam, et al. "Pytorch: An imperative style, high-performance deep learning
library." Advances in neural information processing systems 32 (2019).
〔54〕Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
〔55〕Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017 IEEE
winter conference on applications of computer vision (WACV). IEEE, 2017.
〔56〕Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017 IEEE
winter conference on applications of computer vision (WACV). IEEE, 2017.
〔57〕Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face
recognition." Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 2019.
〔58〕The 2009 (RT-09) rich transcription meeting recognition eval- uation plan,”
http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/ rt09-meeting-eval-plan-v2.pdf, 2009.
〔59〕Wang, Weiqing, Xiaoyi Qin, and Ming Li. "Cross-channel attention-based target
speaker voice activity detection: Experimental results for the m2met challenge." ICASSP
2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP). IEEE, 2022.
〔60〕Singh, Prachi, Amrit Kaul, and Sriram Ganapathy. "Supervised Hierarchical Clustering
Using Graph Neural Networks for Speaker Diarization." ICASSP 2023-2023 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,
2023.
〔61〕Snyder, David, et al. "Speaker recognition for multi-speaker conversations using xvectors." ICASSP 2019-2019 IEEE International conference on acoustics, speech and
signal processing (ICASSP). IEEE, 2019.
〔62〕Scarselli, Franco, et al. "The graph neural network model." IEEE transactions on neural
networks 20.1 (2008): 61-80.
〔63〕Sztahó, Dávid, György Szaszák, and András Beke. "Deep learning methods in speaker
recognition: a review." arXiv preprint arXiv:1911.06615 (2019).
〔64〕Li, Hongze, et al. "Ultra-short-term load demand forecast model framework based on 〔55〕Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017 IEEE
winter conference on applications of computer vision (WACV). IEEE, 2017.
〔56〕Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017 IEEE
winter conference on applications of computer vision (WACV). IEEE, 2017.
〔57〕Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face
recognition." Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 2019.
〔58〕The 2009 (RT-09) rich transcription meeting recognition eval- uation plan,”
http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/ rt09-meeting-eval-plan-v2.pdf, 2009.
〔59〕Wang, Weiqing, Xiaoyi Qin, and Ming Li. "Cross-channel attention-based target
speaker voice activity detection: Experimental results for the m2met challenge." ICASSP
2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP). IEEE, 2022.
〔60〕Singh, Prachi, Amrit Kaul, and Sriram Ganapathy. "Supervised Hierarchical Clustering
Using Graph Neural Networks for Speaker Diarization." ICASSP 2023-2023 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,
2023.
〔61〕Snyder, David, et al. "Speaker recognition for multi-speaker conversations using xvectors." ICASSP 2019-2019 IEEE International conference on acoustics, speech and
signal processing (ICASSP). IEEE, 2019.
〔62〕Scarselli, Franco, et al. "The graph neural network model." IEEE transactions on neural
networks 20.1 (2008): 61-80.
〔63〕Sztahó, Dávid, György Szaszák, and András Beke. "Deep learning methods in speaker
recognition: a review." arXiv preprint arXiv:1911.06615 (2019).
〔64〕Li, Hongze, et al. "Ultra-short-term load demand forecast model framework based on
〔65〕Desplanques, Brecht, Jenthe Thienpondt, and Kris Demuynck. "Ecapa-tdnn:
Emphasized channel attention, propagation and aggregation in tdnn based speaker
verification." arXiv preprint arXiv:2005.07143 (2020).
〔66〕From Louvain to Leiden: guaranteeing well-connected communities - Scientific Figure
on ResearchGate. [accessed 26 Jul, 2023]
〔67〕Manifold learning – scikit-learn 1.3.0 documentation
〔68〕How HDBSCAN works – hdbscan 0.8.1 documentation |