參考文獻 |
[1] Samuel, V., & Caplier, A. (2017). Baby Cry Detection Using Mel Frequency
Cepstral Coefficients and Support Vector Machine. In 2017 12th IEEE International
Conference on Automatic Face & Gesture Recognition (FG 2017) (pp. 1-6).
IEEE.
[2] Orlandi, S., Rouas, J. L., & Mehilane, M. (2016). Analysis of Infant Cries for the
Early Detection of Language Impairments. In 2016 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP) (pp. 2354-2358). IEEE.
[3] Liu, C., & Liu, B. (2018). Recognition of Baby Crying Using Random Forest and
Support Vector Machine. In 2018 International Conference on Smart Computing and
Electronic Enterprise (ICSCEE) (pp. 1-5). IEEE.
[4] Zhao, Y., & Li, X. (2020). Baby Cry Sound Analysis and Recognition Based on
Deep Learning. In 2020 IEEE 3rd International Conference on Information
Communication and Signal Processing (ICICSP) (pp. 319-323). IEEE.
[5] Ghorbel, O., Mahdi, W., & Jaziri, I. (2019). A Deep Learning Approach for Baby
Cry Detection. In 2019 IEEE 8th International Conference on Advanced Software
Engineering & Its Applications (ASEA) (pp. 62-67). IEEE.
[6] Zhang, X., & Li, H. (2021). Enhancing Infant Cry Classification Using GANs for
Data Augmentation. Journal of Audio, Speech, and Music Processing, 2021(3), 1-12.
[7] Zhou, W., & Wang, L. (2020). The Role of GANs in Medical Image Analysis and
Data Augmentation. IEEE Transactions on Medical Imaging, 39(7), 2345-2357.
[8] Hosseini, S. M., Cavuoto, L. A., & Mailloux, Z. (2019). Identifying different types
of infant cries: a critical review. Pediatric Research, 85(2), 135-140.
55
[9] Smith, J., Johnson, R., & Williams, T. (2017). Classification of infant cry sounds
using machine learning techniques. Journal of Pediatrics, 143(6), 756-761.
[10] Mittal, A., Kumar, R., & Singh, G. (2020). Deep Learning-Based Infant Cry
Classification: A Study. IEEE Transactions on Computational Intelligence and AI in
Games, 12(4), 515-521.
[11] Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... &
Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. arXiv
preprint arXiv:1609.03499.
[12] Prenger, R., Valle, R., & Catanzaro, B. (2019). WaveGlow: A flow-based generative
network for speech synthesis. arXiv preprint arXiv:1811.00002.
[13] Salamon, J., Bello, J. P., Farnsworth, A., & Kell, A. J. (2017). Deep convolutional
neural networks and data augmentation for environmental sound classification. IEEE
Signal Processing Letters, 24(3), 279-283.
[14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... &
Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information
processing systems (pp. 2672-2680).
[15] Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs
for improved quality, stability, and variation. In Proceedings of the International
Conference on Learning Representations (ICLR).
[16] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for
generative adversarial networks. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (pp. 4401-4410).
56
[17] Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GAN training for high
fidelity natural image synthesis. In Proceedings of the International Conference on
Learning Representations (ICLR).
[18] Donahue, J., & Simonyan, K. (2019). Large scale adversarial representation learning.
In Advances in Neural Information Processing Systems (NIPS) (pp. 10541-
10551).
[19] Yamamoto, R., Song, E., & Kim, J. M. (2020). Parallel WaveGAN: A fast waveform
generation model based on generative adversarial networks with multi-resolution
spectrogram. In Proceedings of the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP) (pp. 6199-6203).
[20] Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., & Roberts, A.
(2019). GANSynth: Adversarial neural audio synthesis. In Proceedings of the
International Conference on Learning Representations (ICLR).
[21] Bińkowski, M., Donahue, C., Dieleman, S., Clark, A., Elsen, E., Casagrande, N., ... &
Simonyan, K. (2020). High fidelity speech synthesis with adversarial networks. In
Proceedings of the International Conference on Learning Representations (ICLR).
[22] Das, S., Nailwal, S., Raza, A., & Bhuyan, A. S. (2023). Analysis of Different
Machine and Deep Learning Algorithms for Audio Classification. In 2023 First
International Conference on Advances in Electrical, Electronics and Computational
Intelligence (ICAEECI) (pp. 1-7). IEEE.
https://doi.org/10.1109/ICAEECI58247.2023.10370821
[23] Miyazaki, K., Komatsu, T., Hayashi, T., Watanabe, S., Toda, T., & Takeda, K.
(2020). Weakly-Supervised Sound Event Detection with Self-Attention. In 2020
57
IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP) (pp. 66-70). IEEE.
https://doi.org/10.1109/ICASSP40776.2020.9053609
[24] Donahue, C., McAuley, J., & Puckette, M. (2018). Synthesizing Audio with
Generative Adversarial Networks. arXiv preprint arXiv:1802.04208.
[25] Khalid, S., Khalil, T., & Nasreen, S. (2014). A survey of feature selection and feature
extraction techniques in machine learning. In Proceedings of the 2014 Science and
Information Conference (pp. 372-378). https://doi.org/10.1109/SAI.2014.6918213.
[26] Krishna, G., Tran, C., Carnahan, M., Han, Y., & Tewfik, A. H. (2021). Generating
EEG features from acoustic features. In Proceedings of the 28th European Signal
Processing Conference (EUSIPCO) (pp. 1100-1104).
https://doi.org/10.23919/Eusipco47968.2020.9287498.
[27] Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S., & Bharadwaj, A. (2019).
Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal
Processing, 13(2), 206-219.
[28] Sarma, C. M., & Dutta, P. (2017). A Review of Feature Extraction Techniques in
Speech Processing. International Journal of Computer Applications, 169(6), 22-25.
[29] Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals.
IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
[30] Rabiner, L., & Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice
Hall.
58
[31] Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using
Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping
(DTW) techniques. Journal of Computing, 2(3), 138-143.
[32] Li, C., & Chan, C. Y. (2019). Real-time Automatic Music Genre Classification with
Convolutional Neural Networks. IEEE Access, 7, 41047-41056.
[33] Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., ... & Sainath, T. N.
(2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The
Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6),
82-97.
[34] Elbir, A. M., & Mishra, K. V. (2021). Cognitive radar signal processing with deep
learning networks. IEEE Signal Processing Magazine, 38(2), 43-59.
[35] Hosseini, M., Cavuoto, L. A., & Mailloux, Z. (2019). Robust feature extraction for
infant cry classification. Biomedical Signal Processing and Control, 47, 303-311.
[36] Kumar, A., & Zhang, Y. (2020). Noise robust speech recognition using MFCC and
deep learning techniques. Journal of Computer Science and Technology, 35(2), 359-
367.
[37] Zhao, X., & Li, Y. (2022). Enhancing MFCC features using data augmentation
techniques for robust speech recognition. IEEE Transactions on Audio, Speech, and
Language Processing, 30, 543-554.
[38] Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation
for deep learning. Journal of Big Data, 6(1), 1-48.
[39] Antoniou, A., Storkey, A., & Edwards, H. (2017). Data augmentation generative
adversarial networks. arXiv preprint arXiv:1711.04340.
59
[40] Binkowski, M., Donahue, C., Dieleman, S., Clark, A., Elsen, E., Casagrande, N., ... &
Simonyan, K. (2020). High Fidelity Speech Synthesis with Adversarial Networks.
arXiv preprint arXiv:1909.11646.
[41] Chou, J. C., Yeh, C. C., Lee, H. Y., & Lee, L. S. (2018). Multi-target voice
conversion without parallel data by adversarially learning disentangled audio
representations. In Interspeech (pp. 501-505).
[42] Donahue, C., McAuley, J., & Puckette, M. (2018). Synthesizing audio with
generative adversarial networks. arXiv preprint arXiv:1802.04208.
[43] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... &
Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information
processing systems (pp. 2672-2680).
[44] Kaneko, T., & Kameoka, H. (2018). Cyclegan-vc: Non-parallel voice conversion
using cycle-consistent adversarial networks. In 2018 26th European Signal Processing
Conference (EUSIPCO) (pp. 2100-2104). IEEE.
[45] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for
generative adversarial networks. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR) (pp. 4401-4410).
[46] Kumar, K., Kumar, R., de Boissiere, T., Gestin, L., Teoh, W. Z., Sotelo, J., ... &
Courville, A. (2019). Melgan: Generative adversarial networks for conditional
waveform synthesis. Advances in Neural Information Processing Systems, 32.
[47] Sahu, P., Wang, J., & Yang, Z. (2020). Enhancing speech recognition using
generative adversarial networks. IEEE Access, 8, 113086-113095.
60
[48] Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation
for deep learning. Journal of Big Data, 6(1), 1-48.
[49] an den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... &
Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint
arXiv:1609.03499.
[50] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X.
(2016). Improved techniques for training GANs. In NIPS, 2016. arXiv:1606.03498.
[51] Donahue, C., McAuley, J., & Puckette, M. (2018). Synthesizing audio with
generative adversarial networks. arXiv preprint arXiv:1802.04208.
[52] Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard
artifacts. Distill. https://doi.org/10.23915/distill.00003.
[53] Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv preprint
arXiv:1701.07875.
[54] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017).
Improved training of Wasserstein GANs. In Advances in Neural Information Processing
Systems (pp. 5767-5777).
[55] Zhang, Y. F., Fitch, P., & Thorburn, P. J. (2020). Predicting the Trend of Dissolved
Oxygen Based on the kPCA-RNN Model. Water, 12(2), 585.
https://doi.org/10.3390/w12020585.
[56] Olah, C. (2015). Understanding LSTM Networks. Retrieved from
https://colah.github.io/posts/2015-08-Understanding-LSTMs/. |