English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 81570/81570 (100%)
造訪人次 : 47073665      線上人數 : 474
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/96291


    題名: 中文歷史地圖字符偵測與地名文字群集技術:以1930 年代中國地形圖為例;Character-Level Detection and Grouping Techniques for Geographical Names in Chinese Historical Maps: Case Study of 1930s Chinese Topographical Maps
    作者: 陳浩瑋;Chen, Hao-Wei
    貢獻者: 資訊工程學系
    關鍵詞: 光學字符識別;中文歷史地圖;地理名稱文字群集;自動化系統;Optical Character Recognition (OCR);Chinese historical maps;character grouping for geographical names;automatic system
    日期: 2024-11-15
    上傳時間: 2025-04-09 17:36:36 (UTC+8)
    出版者: 國立中央大學
    摘要: 歷史地圖中的地理名稱是我們了解過去地理、文化和社會政治樣貌的重要資源。然而,對於研究人員來說,手動標註地理名稱既耗時又耗力,成為了一大挑戰。再加上已標註的中文歷史地圖地理名稱資料集十分稀少,使得從這些珍貴的地圖中提取資訊變得更加困難與複雜。

    在我們的實驗中得知,手動標註一張完整的中文歷史地圖的地理名稱需要耗費一到兩天,效率不太理想,不僅會拖慢了研究進度,還可能因長時間進行產生疲勞導致標註錯誤。此外,現有的光學字符識別(OCR)系統主要針對現代文本進行訓練,難以應對歷史地圖中的獨特繪製風格,如手寫字體、富有手繪地圖標記以及黑白影像。

    為了解決這些問題,我們提出了一個五階段的自動化系統,專門用於從中文歷史地圖中提取地理名稱。這個系統包括字符偵測、字符辨識、合併結果以及地名文字群集,藉由OCR技術提升準確度與效率。由於已標註的中文歷史地圖資料稀缺,我們使用HSV(色相、飽和度、量度)增強技術來擴增訓練資料集,讓模型更好地學習到歷史地圖中的特殊特徵。

    我們還解決了中文地理名稱常有間距不等長的特性,因此採用字符級的偵測和辨識來準確提取所有文字,藉此進行群集還原出完整的地理名稱。為此,我們運用了德勞內三角剖分技術,幫助我們找出文字框之間的相互關聯,有效地群集出完整的地理名稱。我們以1930年代河北、遼寧、山西省的部分地形圖作為訓練資料。在整體系統評估中,以1930年代河北省的地形圖作為測試資料集為例,我們的自動化系統在提取正確地理名稱的準確率達到70%,還能檢測出其他分散的文字框。

    與人工標註一張完整的中文歷史地圖的地理名稱耗費一到兩天相比,我們提出的中文歷史地圖OCR系統只需七到十分鐘即可完成提取,大幅降低了時間和人力成本。對於需要處理大量歷史地圖的歷史學家和學者來說,提供了一個非常實用的自動化工具。;Historical maps are essential resources for understanding the geography, culture, and socio-political landscapes of the past. However, the manual interpretation of these maps presents significant challenges for researchers due to its labor-intensive and time-consuming nature. This difficulty is compounded by the lack of annotated datasets of geographical names for Chinese historical maps, making it even more challenging to extract and analyze the information contained within these historical maps.

    Current methods often fall short in effectively handling the complexities of historical maps. Our experiments show that manual annotation can take 1–2 days per Chinese historical map, which not only hampers research productivity but also leads to errors stemming from fatigue and the irregular spacing of geographical names. Additionally, Existing Optical Character Recognition (OCR) systems are typically optimized for contemporary texts and struggle with the unique characteristics of historical maps, such as handwritten annotations and grayscale imagery.

    To address these shortcomings, this study introduces a five-stage automated process for extracting and recognizing geographical name from Chinese historical maps. This method encompasses character detection, character recognition, character reintegration, and character grouping for geographical names, leveraging Optical Character Recognition (OCR) to enhance both accuracy and efficiency. To expand the training dataset for character detection and recognition, data augmentation techniques, specifically HSV (Hue, Saturation, Value) transformations, are employed. These augmentations improve the model′s ability to manage the distinctive features of historical maps.

    One major challenge tackled in this research is the irregular spacing of geographical names, which complicates automatic grouping. To resolve this, Delaunay triangulation is utilized to group geographically related geographical names effectively. We used topographic maps of Hebei, Liaoning, and Shanxi provinces from the 1930s as training datasets. In the overall system evaluation, using topographic maps of Hebei Province from the 1930s as a test dataset, our system achieved 70% accuracy in extracting correct geographical names while also detecting additional scattered character boxes.

    In contrast to manual annotation, our proposed Chinese Historical MapOCR system completes the extraction process in just 7-10 minutes, significantly reducing both time and labor costs. This substantial improvement in efficiency provides an invaluable tool for historians and scholars working with large collections of historical maps.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML19檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明