參考文獻 |
[1] X. Zhou et al., "East: an efficient and accurate scene text detector," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 5551-5560.
[2] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
[3] M. He et al., "MOST: A multi-oriented scene text detector with localization refinement," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8813-8822.
[4] J. Dai et al., "Deformable convolutional networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764-773.
[5] P. Dai, S. Zhang, H. Zhang, and X. Cao, "Progressive contour regression for arbitrary-shape scene text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7393-7402.
[6] Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, "Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection," in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11753-11762.
[7] P. Wang et al., "A single-shot arbitrarily-shaped text detector based on context attended multi-task learning," in Proceedings of the 27th ACM international conference on multimedia, 2019, pp. 1277-1285.
[8] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character region awareness for text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9365-9374.
[9] S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, "Textsnake: A flexible representation for detecting text of arbitrary shapes," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 20-36.
[10] E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, and G. Li, "Scene text detection with supervised pyramid context network," in Proceedings of the AAAI conference on artificial intelligence, 2019, vol. 33, no. 01, pp. 9038-9045.
[11] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[12] Y. Liu, H. Chen, C. Shen, T. He, L. Jin, and L. Wang, "Abcnet: Real-time scene text spotting with adaptive bezier-curve network," in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9809-9818.
[13] Y. Zhu, J. Chen, L. Liang, Z. Kuang, L. Jin, and W. Zhang, "Fourier contour embedding for arbitrary-shaped text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3123-3131.
[14] S.-X. Zhang et al., "Deep relational reasoning graph network for arbitrary shape text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9699-9708.
[15] A. Shrivastava, A. Gupta, and R. Girshick, "Training region-based object detectors with online hard example mining," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 761-769.
[16] Z. Wang, L. Zheng, Y. Li, and S. Wang, "Linkage based face clustering via graph convolution network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1117-1125.
[17] J. Ye, Z. Chen, J. Liu, and B. Du, "TextFuseNet: Scene Text Detection with Richer Fused Features," in IJCAI, 2020, pp. 516-522.
[18] W. Wang et al., "Shape robust text detection with progressive scale expansion network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9336-9345.
[19] B. R. Vatti, "A generic solution to polygon clipping," Communications of the ACM, vol. 35, no. 7, pp. 56-63, 1992.
[20] W. Wang et al., "Efficient and accurate arbitrary-shaped text detection with pixel aggregation network," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8440-8449.
[21] M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, "Real-time scene text detection with differentiable binarization," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 11474-11481.
[22] W. Wang et al., "PAN++: towards efficient and accurate End-to-End spotting of arbitrarily-shaped text," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
[23] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[24] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[25] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[26] Y.-M. Zhang, C.-C. Lee, J.-W. Hsieh, and K.-C. Fan, "CSL-YOLO: A New Lightweight Object Detection System for Edge Computing," arXiv preprint arXiv:2107.04829, 2021.
[27] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[28] Y.-M. Zhang, J.-W. Hsieh, C.-C. Lee, and K.-C. Fan, "SFPN: Synthetic FPN for Object Detection," arXiv preprint arXiv:2203.02445, 2022.
[29] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[30] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
[31] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, "CSPNet: A new backbone that can enhance learning capability of CNN," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 390-391.
[32] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, "Ghostnet: More features from cheap operations," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1580-1589.
[33] A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic data for text localisation in natural images," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2315-2324.
[34] C. K. Ch′ng and C. S. Chan, "Total-text: A comprehensive dataset for scene text detection and recognition," in 2017 14th IAPR international conference on document analysis and recognition (ICDAR), 2017, vol. 1: IEEE, pp. 935-942.
[35] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19. |