參考文獻 |
[1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[2] W. Liu et al., "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016, pp. 21-37: Springer.
[3] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.
[4] X. Zhou et al., "East: an efficient and accurate scene text detector," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 5551-5560.
[5] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, "Detecting text in natural image with connectionist text proposal network," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, 2016, pp. 56-72: Springer.
[6] W. Wang et al., "Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text," vol. 44, no. 9, pp. 5349-5367, 2021.
[7] K. Simonyan and A. J. a. p. a. Zisserman, "Very deep convolutional networks for large-scale image recognition," 2014.
[8] A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM networks," in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., 2005, vol. 4, pp. 2047-2052: IEEE.
[9] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[10] M. He et al., "MOST: A multi-oriented scene text detector with localization refinement," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8813-8822.
[11] M. Ye, J. Zhang, S. Zhao, J. Liu, B. Du, and D. Tao, "Dptext-detr: Towards better scene text detection with dynamic points in transformer," in Proceedings of the AAAI Conference on Artificial Intelligence, 2023, vol. 37, no. 3, pp. 3241-3249.
[12] Q. Bu, S. Park, M. Khang, and Y. Cheng, "SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression," in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, vol. 38, no. 2, pp. 855-863.
[13] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," in European conference on computer vision, 2020, pp. 213-229: Springer.
[14] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[15] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, 2015, pp. 234-241: Springer.
[16] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. J. I. t. o. p. a. Yuille, and m. intelligence, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," vol. 40, no. 4, pp. 834-848, 2017.
[17] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[18] K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep high-resolution representation learning for human pose estimation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693-5703.
[19] Y. LeCun, L. Bottou, Y. Bengio, and P. J. P. o. t. I. Haffner, "Gradient-based learning applied to document recognition," vol. 86, no. 11, pp. 2278-2324, 1998.
[20] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[21] M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, "Real-time scene text detection with differentiable binarization," in Proceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, no. 07, pp. 11474-11481.
[22] X. Qin et al., "Towards robust real-time scene text detection: From semantic to instance representation learning," in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 2025-2034.
[23] M. Huang et al., "Swintextspotter: Scene text spotting via better synergy between text detection and text recognition," in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4593-4603.
[24] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character region awareness for text detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9365-9374.
[25] L. Xing, Z. Tian, W. Huang, and M. R. Scott, "Convolutional character networks," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9126-9136.
[26] A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, 2016, pp. 483-499: Springer.
[27] W. Liu, C. Chen, and K.-Y. Wong, "Char-net: A character-aware neural network for distorted scene text recognition," in Proceedings of the AAAI conference on artificial intelligence, 2018, vol. 32, no. 1.
[28] Y. Xu, Y. Wang, W. Zhou, Y. Wang, Z. Yang, and X. J. I. T. o. I. P. Bai, "Textfield: Learning a deep direction field for irregular scene text detection," vol. 28, no. 11, pp. 5566-5579, 2019.
[29] C. Xue et al., "Image-to-character-to-word transformers for accurate scene text recognition," vol. 45, no. 11, pp. 12908-12921, 2023.
[30] M. Jaderberg, K. Simonyan, and A. J. A. i. n. i. p. s. Zisserman, "Spatial transformer networks," vol. 28, 2015.
[31] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369-376.
[32] D. Karatzas et al., "ICDAR 2013 robust reading competition," in 2013 12th international conference on document analysis and recognition, 2013, pp. 1484-1493: IEEE.
[33] D. Karatzas et al., "ICDAR 2015 competition on robust reading," in 2015 13th international conference on document analysis and recognition (ICDAR), 2015, pp. 1156-1160: IEEE.
[34] N. Nayef et al., "Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt," in 2017 14th IAPR international conference on document analysis and recognition (ICDAR), 2017, vol. 1, pp. 1454-1459: IEEE.
[35] A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic data for text localisation in natural images," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2315-2324.
[36] C. K. Chng et al., "Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art," in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 1571-1576: IEEE.
[37] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. J. A. T. G. Goldman, "PatchMatch: A randomized correspondence algorithm for structural image editing," vol. 28, no. 3, p. 24, 2009.
[38] L.-Z. Chen and P.-C. Su, "A Pixel-Based Character Detection Scheme for Texts with Arbitrary Orientations in Natural Scenes," in 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE), 2023, pp. 961-962: IEEE.
[39] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492-1500.
[40] L.-C. Chen et al., "Searching for efficient multi-scale architectures for dense image prediction," vol. 31, 2018.
[41] C. K. Ch′ng and C. S. Chan, "Total-text: A comprehensive dataset for scene text detection and recognition," in 2017 14th IAPR international conference on document analysis and recognition (ICDAR), 2017, vol. 1, pp. 935-942: IEEE.
[42] J. Ye, Z. Chen, J. Liu, and B. Du, "TextFuseNet: Scene Text Detection with Richer Fused Features," in IJCAI, 2020, vol. 20, pp. 516-522.
[43] D. Bautista and R. Atienza, "Scene text recognition with permuted autoregressive sequence models," in European conference on computer vision, 2022, pp. 178-196: Springer. |