摘要(英) |
Historical maps are essential resources for understanding the geography, culture, and socio-political landscapes of the past. However, the manual interpretation of these maps presents significant challenges for researchers due to its labor-intensive and time-consuming nature. This difficulty is compounded by the lack of annotated datasets of geographical names for Chinese historical maps, making it even more challenging to extract and analyze the information contained within these historical maps.
Current methods often fall short in effectively handling the complexities of historical maps. Our experiments show that manual annotation can take 1–2 days per Chinese historical map, which not only hampers research productivity but also leads to errors stemming from fatigue and the irregular spacing of geographical names. Additionally, Existing Optical Character Recognition (OCR) systems are typically optimized for contemporary texts and struggle with the unique characteristics of historical maps, such as handwritten annotations and grayscale imagery.
To address these shortcomings, this study introduces a five-stage automated process for extracting and recognizing geographical name from Chinese historical maps. This method encompasses character detection, character recognition, character reintegration, and character grouping for geographical names, leveraging Optical Character Recognition (OCR) to enhance both accuracy and efficiency. To expand the training dataset for character detection and recognition, data augmentation techniques, specifically HSV (Hue, Saturation, Value) transformations, are employed. These augmentations improve the model′s ability to manage the distinctive features of historical maps.
One major challenge tackled in this research is the irregular spacing of geographical names, which complicates automatic grouping. To resolve this, Delaunay triangulation is utilized to group geographically related geographical names effectively. We used topographic maps of Hebei, Liaoning, and Shanxi provinces from the 1930s as training datasets. In the overall system evaluation, using topographic maps of Hebei Province from the 1930s as a test dataset, our system achieved 70% accuracy in extracting correct geographical names while also detecting additional scattered character boxes.
In contrast to manual annotation, our proposed Chinese Historical MapOCR system completes the extraction process in just 7-10 minutes, significantly reducing both time and labor costs. This substantial improvement in efficiency provides an invaluable tool for historians and scholars working with large collections of historical maps. |
參考文獻 |
[1] Y. Du, C. Li, R. Guo, X. Yin, W. Liu, J. Zhou, Y. Bai, Z. Yu, Y. Yang, and Q. Dang, “Pp-ocr: A practical ultra lightweight ocr system,” arXiv preprint arXiv:2009.09941, 2020.
[2] J. Kim, Z. Li, Y. Lin, M. Namgung, L. Jang, and Y.-Y. Chiang, “The mapkurator system: A complete pipeline for extracting and linking text from historical maps,” arXiv preprint arXiv:2306.17059, 2023.
[3] Y.-Y. Chiang, W. Duan, S. Leyk, J. H. Uhl, and C. A. Knoblock, Using historical maps in scientific studies: Applications, challenges, and best practices. Springer Publishing Company, Incorporated, 2019.
[4] Z. Li, Y.-Y. Chiang, S. Tavakkol, B. Shbita, J. H. Uhl, S. Leyk, and C. A. Knoblock, “An automatic approach for generating rich, linked geo-metadata from historical map images,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3290–3298, 2020.
[5] E. H. B. Smith, Document Analysis and Recognition-ICDAR 2024: 18th International Conference, Athens, Greece, August 30–September 4, 2024, Proceedings, Part I. Springer Nature, 2024.
[6] T. C. U. of Hong Kong Library, “Chinese classic text ocr challenge 2022.” https://dsprojects.lib.cuhk.edu.hk/en/2022-chinese-ocr-challenge/, 2022. [Online; accessed 19-July-2024].
36
[7] M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 11474–11481, 2020.
[8] M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, “Real-time scene text detection with differentiable binarization and adaptive scale fusion,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 919–931, 2022.
[9] M. Li, T. Lv, J. Chen, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li, and F. Wei, “Trocr: Transformer-based optical character recognition with pre-trained models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 13094–13102, 2023.
[10] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9365–9374, 2019.
[11] Y. Baek, D. Nam, S. Park, J. Lee, S. Shin, J. Baek, C. Y. Lee, and H. Lee, “Cleval: Character-level evaluation for text detection and recognition tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 564–565, 2020.
[12] M. Namgung and Y.-Y. Chiang, “Incorporating spatial context for post-ocr in map images,” in Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, pp. 14–17, 2022.
[13] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in kdd, vol. 96, pp. 226–231, 1996.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.37
[15] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848–6856, 2018.
[16] A. G. Howard, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. |