參考文獻 |
[1] E. Borovikov, “A survey of modern optical character recognition techniques,” arXiv:1412.4183, 2014. [Online]. Available: https://arxiv.org/abs/1412.4183
[2] A. Bissacco, M. Cummins, Y. Netzer and H. Neven, “PhotoOCR: Reading Text in Uncontrolled Conditions,” in Proc. IEEE International Conference on Computer Vision, 2013, pp. 785-792.
[3] X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text recognition in the wild: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–35, 2021.
[4] S. Long, X. He, and C. Yao, “Scene text detection and recognition: The deep learning era,” International Journal of Computer Vision, vol. 129, no. 1, pp. 161–184, 2021.
[5] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. European Conference on Computer Vision (ECCV), 2016, pp. 56–72.
[6] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: An efficient and accurate scene text detector,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2642-2651.
[7] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9365–9374.
[8] B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3482-3490.
[9] C. Zhang, B. Liang, Z. Huang, M. En, J. Han, E. Ding, and X. Ding, “Look more than once: An accurate detector for text of arbitrary shapes,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10544-10553.
[10] S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. European Conference on Computer Vision (ECCV), 2018, pp. 20-36.
[11] E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, and G. Li, “Scene text detection with supervised pyramid context network,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 9038–9045, 2019.
[12] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. “Shape robust text detection with progressive scale expansion network,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9328-9337.
[13] W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network,” in Proc. IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 8439–8448.
[14] M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 11474-11481, 2020.
[15] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 11, pp. 2298-2304, 2017.
[16] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, “Robust Scene Text Recognition with Automatic Rectification,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4168–4176.
[17] C. -Y. Lee and S. Osindero, “Recursive Recurrent Nets with Attention Modeling for OCR in the Wild,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2231-2239.
[18] J. Wang and X. Hu, “Gated recurrent convolution neural network for OCR,” in Proc. Neural Information Processing Systems, 2017, pp. 335–344.
[19] W. Liu, C. Chen, K.-Y. K. Wong, Z. Su, and J. Han, “STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition.” in Proc. British Machine Vision Conference (BMVC), 2016, pp. 43.1-43.13.
[20] F. Borisyuk, A. Gordo, and V. Sivakumar, “Rosetta: Large scale system for text detection and recognition in images.” in Proc. 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 71–79.
[21] J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis,” in Proc. IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 4714-4722.
[22] R. Smith, "An Overview of the Tesseract OCR Engine," in Proc. Ninth International Conference on Document Analysis and Recognition (ICDAR), 2007, pp. 629-633.
[23] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proc. International Conference on Learning Representations (ICLR), 2015, pp. 1-14.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 770–778.
[25] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[26] A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proc. International conference on Machine learning (ICML), 2006, pp. 369–376.
[27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv:1706.03762, 2017. [Online]. Available: https://arxiv.org/abs/1706.03762
[28] J. Lee, S. Park, J. Baek, S. J. Oh, S. Kim and H. Lee, "On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 2326-2335.
[29] D. Yu, X. Li, C. Zhang, J. Han, J. Liu, and E. Ding, "Towards Accurate Scene Text Recognition With Semantic Reasoning Networks," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12110-12119.
[30] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv: 2010.11929, 2020. [Online]. Available: https://arxiv.org/abs/2010.11929
[31] R. Atienza, “Vision transformer for fast and efficient scene text recognition,” in Proc. International Conference on Document Analysis and Recognition, 2021, pp. 319–334.
[32] Y. Du, Z. Chen, C. Jia, X. Yin, T. Zheng, C. Li, Y. Du, and Y.-G. Jiang, “SVTR: scene text recognition with a single visual model,” in Proc. Thirty-First International Joint Conference on Artificial Intelligence, 2022, pp. 884–890.
[33] S. Fang, H. Xie, Y. Wang, Z. Mao, and Y. Zhang, “Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7094-7103.
[34] M. Mora, O. Adelakun, S. Galvan-Cruz, and F. Wang, "Impacts of IDEF0-Based Models on the Usefulness, Learning, and Value Metrics of Scrum and XP Project Management Guides," Engineering Management Journal, vol. 34, no. 4, pp. 574-590, 2022.
[35] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proc. Advances in neural information processing systems(NIPS), 2015, pp. 2017–2025.
[36] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis of comment texts based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532, 2019.
[37] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pretraining of deep bidirectional transformers for language understanding,” in Proc. The North American Chapter of the Association for Computational Linguistics(NAACL), 2019, pp. 4171-4186.
[38] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
[39] R. Julius, T. Trenner, A. Fay, J. Neidig, and X. L. Hoang, "A meta-model based environment for GRAFCET specifications," in Proc. IEEE International Systems Conference (SysCon), 2019, pp. 1-7
[40] Y.-C. Chen, Y.-C. Chang, Y.-C. Chang, and Y.-R. Yeh, “Traditional Chinese synthetic datasets verified with labeled data for scene text recognition,” arXiv:2111.13327, 2021. [Online]. Available: https://arxiv.org/abs/2111.13327
[41] Y. Sun, Z. Ni, C.-K. Chng, Y. Liu, C. Luo, C. C. Ng, J. Han, E. Ding, J. Liu, D. Karatzas, et al., “ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling–RRC-LSVT,”in Proc. International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 1557-1562. |