質心優化對比學習於發票內容辨識之研究;A Study on Centroid-Enhanced Contrastive Learning for Invoice Content Recognition

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/98219

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98219

題名:	質心優化對比學習於發票內容辨識之研究;A Study on Centroid-Enhanced Contrastive Learning for Invoice Content Recognition
作者:	李雅芸;LEE, YA-YUN
貢獻者:	資訊管理學系
關鍵詞:	發票文字辨識;對比學習;質心式優化;invoice content recognition;contrastive learning;centroid-enhanced loss
日期:	2025-07-02
上傳時間:	2025-10-17 12:30:30 (UTC+8)
出版者:	國立中央大學
摘要:	本研究提出CCL-OCR 模型，該模型整合了自監督對比學習框架與質心優化策略，並融入動量更新機制。首先對發票圖像進行多樣化的數據增強，模擬真實世界的干擾因素，以提升模型魯棒性。同時保留了動量對比學習的核心特徵，包括利用動量編碼器與隊列機制來獲取豐富且多元的負樣本，確保特徵學習的穩定性與一致性。在此基礎上，研究創新性地引入質心式優化損失函式，其目標是讓同一類別的樣本特徵向其質心聚集，同時拉開不同類別質心之間的距離，從而顯著提升類內緊密度和類間區隔性。模型透過超參數λ來平衡InfoNCE損失與質心損失的重要性。實驗部分，CCL-OCR在八類真實發票所裁切下來的74,481張標準化字元圖片資料集上進行實驗，涵蓋多種印表技術、字體樣式與排版差異。研究將CCL-OCR與VGG、 ResNet、SimCLR、MoCo 等模型進行比較。結果顯示，CCL-OCR表現卓越，準確率達到96.5%，顯著超越ResNet的95.28%、MoCo的90.65%、SimCLR的85.2%及VGG的 65.22%。在精確率、召回率與F1-score三項指標上，CCL-OCR也均突破96%的門檻（精確率96.57%、召回率96.13%、F1-score96.35%），大幅優於其他模型。這歸因於質心優化損失有效壓縮了同類特徵的類內方差，並強化了模型對細微字元差異的辨識能力。敏感性分析進一步探討了超參數對模型性能的影響：當λ為0.7時表現最佳，動量對比係數接近0.7時準確率達到峰值，而溫度係數τ設為0.09時可達到最高準確率。此外，訓練輪數增加至50輪、學習率設為0.1、批次大小擴大到256時，模型準確率皆有提升。案例研究也顯示，CCL-OCR 在處理低對比度、輕微裁剪、筆畫繁複等複雜漢字方面，能穩健還原結構，克服了傳統模型在細節辨識和高頻字干擾上的缺陷。;In this study, we introduce CCL-OCR, a self-supervised OCR model that unites contrastive learning with a centroid-enhanced strategy under a momentum-update regime. First, we apply a rich suite of data augmentations to invoice images—simulating real-world noise such as blur, distortion, and occlusion to enhance robustness. At the same time, we preserve the strengths of momentum contrastive learning by employing a momentum encoder and a queue mechanism to harvest a diverse pool of negative samples, ensuring stable and consistent feature representations. We propose a centroid-enhanced loss that pulls features of the same class toward their centroid while pushing apart centroids of different classes. This dual objective markedly improves intra-class compactness and inter-class separability. A single hyperparameter λ balances the conventional InfoNCE loss against the centroid loss. We evaluated CCL-OCR on a dataset of 74,481 standardized character images cropped from eight categories of real invoices, covering various printing technologies, font styles, and typesetting variations. Compared against VGG, ResNet, SimCLR, and MoCo, CCL-OCR achieved the highest accuracy at 96.5%, outperforming ResNet (95.28%), MoCo (90.65%), SimCLR (85.20%), and VGG (65.22%). It also surpassed 96% in precision (96.57%), recall (96.13%), and F1-score (96.35%), significantly exceeding the benchmarks set by the other models. These gains stem from the centroid loss’s ability to compress intra-class variance and sharpen discrimination between visually similar characters. Sensitivity analysis found the optimal settings to be λ = 0.7, momentum = 0.7, and τ = 0.09. Extending training to 50 epochs with a learning rate of 0.1 and batch size of 256 further boosted performance. Case studies confirm that CCL-OCR robustly handles low-contrast, cropped, and complex-stroke Chinese characters, overcoming the fine-detail and high-frequency interference issues common in traditional OCR.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	34	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....