博碩士論文 110552030 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系在職專班zh_TW
DC.creator蔡維庭zh_TW
DC.creatorWei-Ting Tsaien_US
dc.date.accessioned2023-6-27T07:39:07Z
dc.date.available2023-6-27T07:39:07Z
dc.date.issued2023
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110552030
dc.contributor.department資訊工程學系在職專班zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在繁體中文場景的文字辨識任務中,系統須同時具備處理圖像和文字兩種模態的能力。由於繁體中文的字符結構複雜、字元數量眾多,為了確保文字能夠準確辨識,辨識模型和系統架構設計往往變得複雜,而且通常需要大量計算資源。為了讓硬體資源有限的邊緣設備能運作即時繁體中文辨識,本研究提出一個能動態調整架構的辨識系統。此系統由一個辨識與校正子系統所組成,辨識子系統包含輕量化辨識模型SVTR,校正子系統主要為雙向克漏字語言模型,兩者分別基於Transformer編碼器與解碼器架構而設計,透過注意力機制與多重下採樣運算讓輸出特徵能關注不同尺度的資訊,局部特徵關注字符結構與筆劃,全局特徵關注字元之間的語義資訊。因此模型架構能簡化,從而減少參數量。在訓練階段,我們將模型的梯度傳遞過程分離,以確保模型能夠獨立運作。在運行階段,系統根據不同規模的硬體環境調整配置,將參數量較少的辨識子系統運行於硬體資源有限的機器上,而讓包含校正子系統的完整系統佈署於有較高計算資源的伺服器上。從實驗中可得知,辨識子系統的參數大小只有11.45(MB),準確率可達到 71%。結合校正子系統後,準確率則可提升至77%。zh_TW
dc.description.abstractIn the task of text recognition in Traditional Chinese scenarios, the system needs to possess the ability to process both image and text modalities simultaneously. Given the complex character structure and extensive character set in Traditional Chinese, ensuring accurate text recognition necessitates complex design of recognition models and system architectures, often demanding significant computational resources. To enable real-time Traditional Chinese recognition on edge devices with limited hardware resources, this research proposes a recognition system with a dynamically adjustable architecture. The system consists of a recognition and a correction subsystems. The recognition subsystem incorporates a lightweight recognition model called SVTR, while the correction subsystem includes a bidirectional cloze language model. Both subsystems are designed based on the Transformer encoder-decoder architecture. Through attention mechanisms and multiple down-sampling operations, the output features are able to focus on information at different scales. Local features attend to character structure and strokes, while global features emphasize semantic information between characters. Consequently, the model architecture can be simplified, leading to a reduction in the number of parameters. During the training phase, we separate the gradient propagation process of the model to ensure its independent operation. In the inference phase, the system adjusts its configuration based on the scale of the hardware environment. The recognition subsystem, which has fewer parameters, runs on hardware-limited machines, while the main system incorporating the correction subsystem is deployed on servers with higher computational resources. Experimental results indicate that the parameter size of the recognition subsystem is a mere 11.45 MB, achieving an accuracy of 71%. Upon integration with the correction subsystem, the accuracy improves to 77%.en_US
DC.subject繁體中文辨識zh_TW
DC.subjectTransformer架構zh_TW
DC.subject場景文字辨識zh_TW
DC.subjectTraditional Chinese recognitionen_US
DC.subjectTransformer Architectureen_US
DC.subjectScene text recognitionen_US
DC.title基於Transformer架構之繁體中文場景文字辨識系統zh_TW
dc.language.isozh-TWzh-TW
DC.titleTraditional Chinese Scene Text Recognition based on Transformer Architectureen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明