English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 83696/83696 (100%)
造訪人次 : 56338159      線上人數 : 2184
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98219


    題名: 質心優化對比學習於發票內容辨識之研究;A Study on Centroid-Enhanced Contrastive Learning for Invoice Content Recognition
    作者: 李雅芸;LEE, YA-YUN
    貢獻者: 資訊管理學系
    關鍵詞: 發票文字辨識;對比學習;質心式優化;invoice content recognition;contrastive learning;centroid-enhanced loss
    日期: 2025-07-02
    上傳時間: 2025-10-17 12:30:30 (UTC+8)
    出版者: 國立中央大學
    摘要: 本研究提出CCL-OCR 模型,該模型整合了自監督對比學習框架與質心優化策略,
    並融入動量更新機制。首先對發票圖像進行多樣化的數據增強,模擬真實世界的干擾因
    素,以提升模型魯棒性。同時保留了動量對比學習的核心特徵,包括利用動量編碼器與
    隊列機制來獲取豐富且多元的負樣本,確保特徵學習的穩定性與一致性。在此基礎上,
    研究創新性地引入質心式優化損失函式,其目標是讓同一類別的樣本特徵向其質心聚集,
    同時拉開不同類別質心之間的距離,從而顯著提升類內緊密度和類間區隔性。模型透過
    超參數λ來平衡InfoNCE損失與質心損失的重要性。
    實驗部分,CCL-OCR在八類真實發票所裁切下來的74,481張標準化字元圖片資料
    集上進行實驗,涵蓋多種印表技術、字體樣式與排版差異。研究將CCL-OCR與VGG、
    ResNet、SimCLR、MoCo 等模型進行比較。結果顯示,CCL-OCR表現卓越,準確率達
    到96.5%,顯著超越ResNet的95.28%、MoCo的90.65%、SimCLR的85.2%及VGG的
    65.22%。在精確率、召回率與F1-score三項指標上,CCL-OCR也均突破96%的門檻(精
    確率96.57%、召回率96.13%、F1-score96.35%),大幅優於其他模型。這歸因於質心優
    化損失有效壓縮了同類特徵的類內方差,並強化了模型對細微字元差異的辨識能力。
    敏感性分析進一步探討了超參數對模型性能的影響:當λ為0.7時表現最佳,動量
    對比係數接近0.7時準確率達到峰值,而溫度係數τ設為0.09時可達到最高準確率。此
    外,訓練輪數增加至50輪、學習率設為0.1、批次大小擴大到256時,模型準確率皆有
    提升。案例研究也顯示,CCL-OCR 在處理低對比度、輕微裁剪、筆畫繁複等複雜漢字
    方面,能穩健還原結構,克服了傳統模型在細節辨識和高頻字干擾上的缺陷。;In this study, we introduce CCL-OCR, a self-supervised OCR model that unites contrastive
    learning with a centroid-enhanced strategy under a momentum-update regime. First, we apply
    a rich suite of data augmentations to invoice images—simulating real-world noise such as blur,
    distortion, and occlusion to enhance robustness. At the same time, we preserve the strengths of
    momentum contrastive learning by employing a momentum encoder and a queue mechanism
    to harvest a diverse pool of negative samples, ensuring stable and consistent feature
    representations. We propose a centroid-enhanced loss that pulls features of the same class
    toward their centroid while pushing apart centroids of different classes. This dual objective
    markedly improves intra-class compactness and inter-class separability. A single
    hyperparameter λ balances the conventional InfoNCE loss against the centroid loss.
    We evaluated CCL-OCR on a dataset of 74,481 standardized character images cropped from
    eight categories of real invoices, covering various printing technologies, font styles, and
    typesetting variations. Compared against VGG, ResNet, SimCLR, and MoCo, CCL-OCR
    achieved the highest accuracy at 96.5%, outperforming ResNet (95.28%), MoCo (90.65%),
    SimCLR (85.20%), and VGG (65.22%). It also surpassed 96% in precision (96.57%), recall
    (96.13%), and F1-score (96.35%), significantly exceeding the benchmarks set by the other
    models. These gains stem from the centroid loss’s ability to compress intra-class variance and
    sharpen discrimination between visually similar characters.
    Sensitivity analysis found the optimal settings to be λ = 0.7, momentum = 0.7, and τ = 0.09.
    Extending training to 50 epochs with a learning rate of 0.1 and batch size of 256 further boosted
    performance. Case studies confirm that CCL-OCR robustly handles low-contrast, cropped, and
    complex-stroke Chinese characters, overcoming the fine-detail and high-frequency interference
    issues common in traditional OCR.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML1檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明