參考文獻 |
[1] B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, "Aster: An attentional scene text recognizer with flexible rectification," IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 9, pp. 2035-2048, 2018.
[2] B. Shi, X. Bai, and C. Yao, "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 11, pp. 2298-2304, 2016.
[3] S. Fang, H. Xie, Y. Wang, Z. Mao, and Y. Zhang, "Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7098-7107.
[4] Y. Wang, H. Xie, S. Fang, J. Wang, S. Zhu, and Y. Zhang, "From two to one: A new scene text recognizer with visual language modeling network," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14194-14203.
[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[6] A. Graves, A.-r. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in 2013 IEEE international conference on acoustics, speech and signal processing, 2013: Ieee, pp. 6645-6649.
[7] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369-376.
[8] Z. Qiao, Y. Zhou, D. Yang, Y. Zhou, and W. Wang, "Seed: Semantics enhanced encoder-decoder framework for scene text recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13528-13537.
[9] D. Yu et al., "Towards accurate scene text recognition with semantic reasoning networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12113-12122.
[10] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[11] M. Jaderberg, K. Simonyan, and A. Zisserman, "Spatial transformer networks," Advances in neural information processing systems, vol. 28, 2015.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012.
[13] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[14] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[15] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[16] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[17] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning, 2015: PMLR, pp. 448-456.
[18] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[19] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning, 2013: PMLR, pp. 1310-1318.
[20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[21] A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
[22] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[23] J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543.
[24] S. Ilić, E. Marrese-Taylor, J. A. Balazs, and Y. Matsuo, "Deep contextualized word representations for detecting sarcasm and irony," arXiv preprint arXiv:1809.09795, 2018.
[25] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," 2018.
[26] Y. Wu et al., "Google′s neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.
[27] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proceedings of the 25th international conference on Machine learning, 2008, pp. 1096-1103.
[28] F. L. Bookstein, "Thin-plate splines and the atlas problem for biomedical images," in Biennial international conference on information processing in medical imaging, 1991: Springer, pp. 326-342.
[29] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[30] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International conference on machine learning, 2019: PMLR, pp. 6105-6114.
[31] S. Zhang, H. Huang, J. Liu, and H. Li, "Spelling error correction with soft-masked BERT," arXiv preprint arXiv:2005.07421, 2020.
[32] R. Zhang et al., "Icdar 2019 robust reading challenge on reading chinese text on signboard," in 2019 international conference on document analysis and recognition (ICDAR), 2019: IEEE, pp. 1577-1581.
[33] M. D. Zeiler, "Adadelta: an adaptive learning rate method," arXiv preprint arXiv:1212.5701, 2012.
[34] S.-H. Wu, C.-L. Liu, and L.-H. Lee, "Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013," in SIGHAN@ IJCNLP, 2013: Citeseer, pp. 35-42.
[35] Y. Bai, J. Tao, J. Yi, Z. Wen, and C. Fan, "CLMAD: A chinese language model adaptation dataset," in 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2018: IEEE, pp. 275-279.
[36] I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," arXiv preprint arXiv:1711.05101, 2017.
[37] Y. Sun et al., "ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT," in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019: IEEE, pp. 1557-1562.
[38] B. Shi et al., "Icdar2017 competition on reading chinese text in the wild (rctw-17)," in 2017 14th iapr international conference on document analysis and recognition (ICDAR), 2017, vol. 1: IEEE, pp. 1429-1434. |