參考文獻 |
REFERENCES
[1] K. Emmorey, “Language, Cognition, and the Brain,” Language, Cognition, and the Brain, Nov.
2001, doi: 10.4324/9781410603982/LANGUAGE-COGNITION-BRAIN-KAREN-EMMOREY.
[2] M. Mukushev, A. Sabyrov, A. Imashev, K. Koishybay, V. Kimmelman, and A. Sandygulova,
“Evaluation of Manual and Non-manual Components for Sign Language Recognition,”
Proceedings of the 12th Language Resources and Evaluation Conference, pp. 6073–6078, 2020.
[3] N. B. Ibrahim, H. H. Zayed, and M. M. Selim, “Advances, Challenges and Opportunities in
Continuous Sign Language Recognition,” Journal of Engineering and Applied Sciences, vol. 15, no.
5, pp. 1205–1227, Dec. 2019, doi: 10.36478/JEASCI.2020.1205.1227.
[4] H. Zhou, W. Zhou, Y. Zhou, and H. Li, “Spatial-Temporal Multi-Cue Network for Continuous Sign
Language Recognition,” IEEE Transactions on Multimedia, vol. 24, pp. 768–779, 2022.
[5] K. Chen et al., “MMDetection: Open MMLab Detection Toolbox and Benchmark,” Jun. 2019, doi:
10.48550/arxiv.1906.07155.
[6] S. Jiang, B. Sun, L. Wang, Y. Bai, K. Li, and Y. Fu, “Skeleton Aware Multi-modal Sign Language
Recognition,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Workshops, pp. 3408–3418, Mar. 2021.
[7] A. Oya, “Vision based sign language recognition: modeling and recognizing isolated signs with
manual and non-manual components,” Bo˘gazi¸ci University, 2008.
[8] P. Wang, W. Li, S. Liu, Y. Zhang, Z. Gao, and P. Ogunbona, “Large-scale Continuous Gesture
Recognition Using Convolutional Neural Networks,” Aug. 2016, doi: 10.48550/arxiv.1608.06338.
[9] J. Wan, S. Z. Li, Y. Zhao, S. Zhou, I. Guyon, and S. Escalera, “ChaLearn Looking at People RGB-D
Isolated and Continuous Datasets for Gesture Recognition,” IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops, pp. 761–769, Dec. 2016.
[10] D. Li, C. Rodriguez Opazo, X. Yu, and H. Li, “Word-level Deep Sign Language Recognition from
Video: A New Large-scale Dataset and Methods Comparison,” Proceedings - 2020 IEEE Winter
Conference on Applications of Computer Vision, WACV 2020, pp. 1448–1458, Oct. 2019.
[11] O. M. Sincan and H. Y. Keles, “AUTSL: A Large Scale Multi-modal Turkish Sign Language Dataset
and Baseline Methods,” IEEE Access, vol. 8, pp. 181340–181355, Aug. 2020, doi:
10.1109/ACCESS.2020.3028072.
[12] J. Carreira, A. Zisserman, Z. Com, and † Deepmind, “Quo Vadis, Action Recognition? A New Model
and the Kinetics Dataset,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2017, vol. 2017-January, pp. 4724–4733, May 2017, doi:
10.48550/arxiv.1705.07750.
49
[13] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image
Recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference
Track Proceedings, Sep. 2014, doi: 10.48550/arxiv.1409.1556.
[14] K. Cho et al., “Learning Phrase Representations using RNN Encoder–Decoder for Statistical
Machine Translation,” EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language
Processing, Proceedings of the Conference, pp. 1724–1734, 2014, doi: 10.3115/V1/D14-1179.
[15] O. Koller, J. Forster, and H. Ney, “Continuous sign language recognition: Towards large
vocabulary statistical recognition systems handling multiple signers,” Computer Vision and Image
Understanding, vol. 141, pp. 108–125, Dec. 2015, doi: 10.1016/J.CVIU.2015.09.013.
[16] S. Jin et al., “Whole-Body Human Pose Estimation in the Wild,” Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
vol. 12354 LNCS, pp. 196–214, Jul. 2020, doi: 10.48550/arxiv.2007.11858.
[17] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features
with 3D Convolutional Networks,” Proceedings of the IEEE International Conference on Computer
Vision, vol. 2015 ICCV 2015, pp. 4489–4497, Dec. 2014.
[18] L. Song, X. Guo, and Y. Fan, “Action recognition in video using human keypoint detection,” 15th
International Conference on Computer Science and Education, ICCSE 2020, pp. 465–470, Aug.
2020, doi: 10.1109/ICCSE49874.2020.9201857.
[19] J. Cai, N. Jiang, X. Han, K. Jia, and J. Lu, “JOLO-GCN: Mining Joint-Centered Light-Weight
Information for Skeleton-Based Action Recognition,” Proceedings - 2021 IEEE Winter Conference
on Applications of Computer Vision, WACV 2021, pp. 2734–2743, Nov. 2020.
[20] R. Yamashita, M. Nishio, R. Kinh, G. Do, and K. Togashi, “Convolutional neural networks: an
overview and application in radiology”, doi: 10.1007/s13244-018-0639-9.
[21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document
recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2323, 1998, doi:
10.1109/5.726791.
[22] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical
image database,” pp. 248–255, Mar. 2010, doi: 10.1109/CVPR.2009.5206848.
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-
December, pp. 770–778, Dec. 2015, doi: 10.48550/arxiv.1512.03385.
[24] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image
Recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference
Track Proceedings, Sep. 2014, doi: 10.48550/arxiv.1409.1556.
[25] M. S. Islam, M. S. Sultana, U. K. Roy, and J. al Mahmud, “A review on Video Classification with
Methods, Findings, Performance, Challenges, Limitations and Future Work,” Jurnal Ilmiah Teknik
Elektro Komputer dan Informatika, vol. 6, no. 2, p. 47, Jan. 2021, doi:
10.26555/JITEKI.V6I2.18978.
50
[26] Y. Gao, “News Video Classification Model Based on ResNet-2 and Transfer Learning,” Security and
Communication Networks, vol. 2021, 2021, doi: 10.1155/2021/5865200.
[27] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A Critical Review of Recurrent Neural Networks for
Sequence Learning,” arXiv:1506.00019, May 2015.
[28] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The Performance of LSTM and BiLSTM in
Forecasting Time Series,” Proceedings - 2019 IEEE International Conference on Big Data, Big Data
2019, pp. 3285–3292, Dec. 2019, doi: 10.1109/BIGDATA47090.2019.9005997.
[29] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on
Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997, doi: 10.1109/78.650093.
[30] A. Hannun, “Sequence Modeling with CTC,” Distill, vol. 2, no. 11, p. e8, Nov. 2017, doi:
10.23915/DISTILL.00008.
[31] A. Graves, A. Ch, S. Fernández, F. Gomez, J. Schmidhuber, and J. Ch, “Connectionist Temporal
Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks,” ACM
International Conference Proceeding Series, vol. 148, pp. 369–376, 2006.
[32] J. Summaira, A. Muhammad Shoib, O. Bourahla, L. Songyuan, and J. Abdul, “Recent Advances and
Trends in Multimodal Deep Learning: A Review,” arXiv preprint, vol. 2105, no. 11087, 2021.
[33] P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz, “Online Detection and
Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks,”
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, vol. 2016-December, pp. 4207–4215, Dec. 2016, doi: 10.1109/CVPR.2016.456.
[34] R. Cui, H. Liu, and C. Zhang, “A Deep Neural Framework for Continuous Sign Language
Recognition by Iterative Training,” IEEE Transactions on Multimedia, vol. 21, no. 7, pp. 1880–
1891, Jul. 2019, doi: 10.1109/TMM.2018.2889563.
[35] A. Vaswani et al., “Attention Is All You Need,” Advances in Neural Information Processing
Systems, vol. 201-December, pp. 5999–6009, Jun. 2017.
[36] J. Pu, W. Zhou, and H. Li, “Iterative Alignment Network for Continuous Sign Language
Recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, vol. 2019-June, pp. 4160–4169, Jun. 2019.
[37] J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, “Video-based Sign Language Recognition without
Temporal Segmentation,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 2257–
2264, Jan. 2018, doi: 10.48550/arxiv.1801.10111.
[38] H. Zhou, W. Zhou, and H. Li, “Dynamic pseudo label decoding for continuous sign language
recognition,” Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2019-July,
pp. 1282–1287, Jul. 2019, doi: 10.1109/ICME.2019.00223.
[39] Y. Min, A. Hao, X. Chai, and X. Chen, “Visual Alignment Constraint for Continuous Sign Language
Recognition,” Proceedings of the IEEE International Conference on Computer Vision, pp. 11522–
11531, Apr. 2021.
51
[40] R. Takahashi, T. Matsubara, and K. Uehara, “Data Augmentation using Random Image Cropping
and Patching for Deep CNNs,” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 30, no. 9, pp. 2917–2931, Nov. 2018.
[41] T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context,” ECCV2020, vol. 8693 LNCS, no.
PART 5, pp. 740–755, May 2014.
[42] A. Sengupta, F. Jin, R. Zhang, and S. Cao, “mm-Pose: Real-Time Human Skeletal Posture
Estimation using mmWave Radars and CNNs,” IEEE Sensors Journal, vol. 20, no. 17, pp. 10032–
10044, Nov. 2019, doi: 10.1109/JSEN.2020.2991741.
[43] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep High-Resolution Representation Learning for Human
Pose Estimation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, vol. 2019-June, pp. 5686–5696, Feb. 2019.
[44] S. Chen, M. Zhang, X. Yang, Z. Zhao, T. Zou, and X. Sun, “The Impact of Attention Mechanisms on
Speech Emotion Recognition,” Sensors 2021, Vol. 21, Page 7530, vol. 21, no. 22, p. 7530, Nov.
2021, doi: 10.3390/S21227530.
[45] A. A. Baffour, Z. Qin, Y. Wang, Z. Qin, and K. K. R. Choo, “Spatial self-attention network with selfattention distillation for fine-grained image recognition,” Journal of Visual Communication and
Image Representation, vol. 81, p. 103368, Nov. 2021, doi: 10.1016/J.JVCIR.2021.103368.
[46] J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, “Video-based Sign Language Recognition without
Temporal Segmentation,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 2257–
2264, Jan. 2018, doi: 10.48550/arxiv.1801.10111.
[47] D. Guo, S. Wang, Q. Tian, and M. Wang, “Dense temporal convolution network for sign language
translation,” IJCAI International Joint Conference on Artificial Intelligence, vol. 2019-August, pp.
744–750, 2019, doi: 10.24963/IJCAI.2019/105.
[48] S. Wang, D. Guo, W. G. Zhou, Z. J. Zha, and M. Wang, “Connectionist temporal fusion for sign
language translation,” MM 2018 - Proceedings of the 2018 ACM Multimedia Conference, pp.
1483–1491, Oct. 2018, doi: 10.1145/3240508.3240671.
[49] N. C. Camgoz, S. Hadfield, O. Koller, and R. Bowden, “SubUNets: End-to-End Hand Shape and
Continuous Sign Language Recognition,” Proceedings of the IEEE International Conference on
Computer Vision, vol. 2017-October, pp. 3075–3084, Dec. 2017, doi: 10.1109/ICCV.2017.332.
[50] D. Guo, W. Zhou, H. Li, and M. Wang, “Hierarchical LSTM for Sign Language Translation,”
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 6845–6852, Apr.
2018, doi: 10.1609/AAAI.V32I1.12235.
[51] Z. Yang, Z. Shi, X. Shen, and Y.-W. Tai, “SF-Net: Structured Feature Network for Continuous Sign
Language Recognition,” Aug. 2019, doi: 10.48550/arxiv.1908.01341.
[52] K. L. Cheng, Z. Yang, Q. Chen, and Y.-W. Tai, “Fully Convolutional Networks for Continuous Sign
Language Recognition,” ECCV, Jul. 2020, doi: 10.1007/978-3-030-58586-0_41.
52
[53] O. Koller, H. Ney, and R. Bowden, “Deep Hand: How to Train a CNN on 1 Million Hand Images
When Your Data Is Continuous and Weakly Labelled,” Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 3793–3802,
2016.
[54] O. Koller, S. Zargaran, H. Ney, and R. Bowden, “Deep Sign: Enabling Robust Statistical Continuous
Sign Language Recognition via Hybrid CNN-HMMs,” International Journal of Computer Vision, vol.
126, no. 12, pp. 1311–1325, Dec. 2018, doi: 10.1007/S11263-018-1121-3/TABLES/8.
[55] F. ben Slimane and M. Bouguessa, “Context Matters: Self-Attention for Sign Language
Recognition,” Proceedings - International Conference on Pattern Recognition, pp. 7884–7891, Jan.
2021, doi: 10.48550/arxiv.2101.04632.
[56] Z. Niu and B. Mak, “Stochastic Fine-Grained Labeling of Multi-state Sign Glosses for Continuous
Sign Language Recognition,” ECCV 2020: Computer Vision – ECCV 2020, vol. 12361 LNCS, pp. 172–
186, 2020, doi: 10.1007/978-3-030-58517-4_11/TABLES/3.
[57] O. Koller, N. C. Camgoz, H. Ney, and R. Bowden, “Weakly Supervised Learning with Multi-Stream
CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 42, no. 9, pp. 2306–2320, Sep. 2020, doi:
10.1109/TPAMI.2019.2911077.
[58] R. Cui, H. Liu, and C. Zhang, “A Deep Neural Framework for Continuous Sign Language
Recognition by Iterative Training,” IEEE Transactions on Multimedia, vol. 21, no. 7, pp. 1880–
1891, Jul. 2019, doi: 10.1109/TMM.2018.2889563.
[59] J. Pu, W. Zhou, H. Hu, and H. Li, “Boosting Continuous Sign Language Recognition via Cross
Modality Augmentation,” MM 2020 - Proceedings of the 28th ACM International Conference on
Multimedia, pp. 1497–1505, Oct. 2020, doi: 10.1145/3394171.3413931.
[60] R. Zuo and B. Mak, “C2SLR: Consistency-enhanced Continuous Sign Language Recognition,” CVPR
2022, pp. 5131–5140, 2022 |