基於Transformer架構之繁體中文場景文字辨識系統

DC 欄位	值	語言
DC.contributor	資訊工程學系在職專班	zh_TW
DC.creator	蔡維庭	zh_TW
DC.creator	Wei-Ting Tsai	en_US
dc.date.accessioned	2023-6-27T07:39:07Z
dc.date.available	2023-6-27T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110552030
dc.contributor.department	資訊工程學系在職專班	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	在繁體中文場景的文字辨識任務中，系統須同時具備處理圖像和文字兩種模態的能力。由於繁體中文的字符結構複雜、字元數量眾多，為了確保文字能夠準確辨識，辨識模型和系統架構設計往往變得複雜，而且通常需要大量計算資源。為了讓硬體資源有限的邊緣設備能運作即時繁體中文辨識，本研究提出一個能動態調整架構的辨識系統。此系統由一個辨識與校正子系統所組成，辨識子系統包含輕量化辨識模型SVTR，校正子系統主要為雙向克漏字語言模型，兩者分別基於Transformer編碼器與解碼器架構而設計，透過注意力機制與多重下採樣運算讓輸出特徵能關注不同尺度的資訊，局部特徵關注字符結構與筆劃，全局特徵關注字元之間的語義資訊。因此模型架構能簡化，從而減少參數量。在訓練階段，我們將模型的梯度傳遞過程分離，以確保模型能夠獨立運作。在運行階段，系統根據不同規模的硬體環境調整配置，將參數量較少的辨識子系統運行於硬體資源有限的機器上，而讓包含校正子系統的完整系統佈署於有較高計算資源的伺服器上。從實驗中可得知，辨識子系統的參數大小只有11.45(MB)，準確率可達到 71%。結合校正子系統後，準確率則可提升至77%。	zh_TW
dc.description.abstract	In the task of text recognition in Traditional Chinese scenarios, the system needs to possess the ability to process both image and text modalities simultaneously. Given the complex character structure and extensive character set in Traditional Chinese, ensuring accurate text recognition necessitates complex design of recognition models and system architectures, often demanding significant computational resources. To enable real-time Traditional Chinese recognition on edge devices with limited hardware resources, this research proposes a recognition system with a dynamically adjustable architecture. The system consists of a recognition and a correction subsystems. The recognition subsystem incorporates a lightweight recognition model called SVTR, while the correction subsystem includes a bidirectional cloze language model. Both subsystems are designed based on the Transformer encoder-decoder architecture. Through attention mechanisms and multiple down-sampling operations, the output features are able to focus on information at different scales. Local features attend to character structure and strokes, while global features emphasize semantic information between characters. Consequently, the model architecture can be simplified, leading to a reduction in the number of parameters. During the training phase, we separate the gradient propagation process of the model to ensure its independent operation. In the inference phase, the system adjusts its configuration based on the scale of the hardware environment. The recognition subsystem, which has fewer parameters, runs on hardware-limited machines, while the main system incorporating the correction subsystem is deployed on servers with higher computational resources. Experimental results indicate that the parameter size of the recognition subsystem is a mere 11.45 MB, achieving an accuracy of 71%. Upon integration with the correction subsystem, the accuracy improves to 77%.	en_US
DC.subject	繁體中文辨識	zh_TW
DC.subject	Transformer架構	zh_TW
DC.subject	場景文字辨識	zh_TW
DC.subject	Traditional Chinese recognition	en_US
DC.subject	Transformer Architecture	en_US
DC.subject	Scene text recognition	en_US
DC.title	基於Transformer架構之繁體中文場景文字辨識系統	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Traditional Chinese Scene Text Recognition based on Transformer Architecture	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110552030 完整後設資料紀錄