參考文獻 |
[1] Y.-J. Chen, "Suitable Data Input for Deep-Learning-Based Sign Language Recognition with a Small Training Dataset," National Central University,CSIE,2022.[Online].Available:https://hdl.handle.net/11296/4ybeup.
[2] D. Li, C. Rodriguez, X. Yu, and H. Li, "Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 1459-1469.
[3] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in neural information processing systems, vol. 27, 2014.
[4] K. Soomro, A. R. Zamir, and M. Shah, "UCF101: A dataset of 101 human actions classes from videos in the wild," arXiv preprint arXiv:1212.0402, 2012.
[5] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: a large video database for human motion recognition," in 2011 International conference on computer vision, 2011: IEEE, pp. 2556-2563.
[6] H. Luqman, "An Efficient Two-Stream Network for Isolated Sign Language Recognition Using Accumulative Video Motion," IEEE Access, vol. 10, pp. 93785-93798, 2022.
[7] A. A. I. Sidig, H. Luqman, S. Mahmoud, and M. Mohandes, "KArSL: Arabic Sign Language Database," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 20, no. 1, p. Article 14, 2021, doi: 10.1145/3423420.
[8] J. Donahue et al., "Long-term recurrent convolutional networks for visual recognition and description," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625-2634.
[9] L. Hu, L. Gao, and W. Feng, "Self-Emphasizing Network for Continuous Sign Language Recognition," arXiv preprint arXiv:2211.17081, 2022.
[10] J. Wang et al., "Deep high-resolution representation learning for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3349-3364, 2020.
[11] N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, "Neural sign language translation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7784-7793.
[12] H. Zhou, W. Zhou, W. Qi, J. Pu, and H. Li, "Improving sign language translation with monolingual data by sign back-translation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1316-1325.
[13] K. Hara, H. Kataoka, and Y. Satoh, "Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6546-6555.
[14] L. Smaira, J. Carreira, E. Noland, E. Clancy, A. Wu, and A. Zisserman, "A short note on the kinetics-700-2020 human action dataset," arXiv preprint arXiv:2010.10864, 2020.
[15] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[16] W. Du, Y. Wang, and Y. Qiao, "Rpan: An end-to-end recurrent pose-attention network for action recognition in videos," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3725-3734.
[17] M. Boháček and M. Hrúz, "Sign pose-based transformer for word-level sign language recognition," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 182-191.
[18] Z. Zhou, V. W. Tam, and E. Y. Lam, "SIGNBERT: a Bert-based deep learning framework for continuous sign language recognition," IEEE Access, vol. 9, pp. 161669-161682, 2021.
[19] Z. Liu et al., "Video swin transformer," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 3202-3211. |