應用深度學習OCR於兒童閱讀管理

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：89

、訪客IP：3.12.163.76

姓名

陳緯庭(Wei-Ting Chen) 查詢紙本館藏

畢業系所

通訊工程學系在職專班

論文名稱

應用深度學習OCR於兒童閱讀管理
(Applying Deep Learning OCR in Children′s Reading Management)

相關論文

★ 街景招牌文字辨識與導盲應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-2-1以後開放)

摘要(中)

本論文探討了在兒童閱讀管理領域中應用深度學習光學字元辨識（OCR）的可能性。具體而言，我們使用了PaddleOCR，透過遷移式學習的方法，建立了一個專門用於辨識兒童讀物封面的模型。這個模型的平均辨識率達到了74.5%。傳統的OCR方法在辨識兒童讀物封面時可能遇到一些挑戰，因為兒童讀物通常具有特殊的字型和圖像風格，這對傳統OCR模型來說可能是一個問題。為了克服這些困難，我們採用了深度學習方法，通過PaddleOCR進行模型訓練，並且使用遷移式學習的技術，使模型能夠更好地適應兒童讀物的特殊特徵。實驗結果顯示，在兒童讀物封面辨識方面，平均辨識率達到74.5%。這一結果表明深度學習方法在兒童閱讀管理中具有潛力，可以為教育和圖書管理領域提供有價值的工具。但本研究仍有進一步的改進空間，包括擴大數據集以提高模型的性能，以及研究其他深度學習技術來進一步優化辨識結果。本論文為應用深度學習OCR於兒童閱讀管理領域提供了有價值的參考，為未來的相關研究提供了基礎。

摘要(英)

This paper explores the potential of applying deep learning Optical Character Recognition (OCR) in the field of children′s reading management. Specifically, we utilized PaddleOCR and employed transfer learning to develop a specialized model for recognizing covers of children′s reading materials. The model achieved an average recognition rate of 74.5%.
Traditional OCR methods may face challenges when recognizing covers of children′s reading materials due to their unique fonts and image styles. To overcome these difficulties, we adopted a deep learning approach, training the model using PaddleOCR and incorporating transfer learning techniques to adapt the model better to the distinct features of children′s reading materials. Experimental results showed an average recognition rate of 74.5% in the recognition of children′s reading material covers.
This outcome indicates the potential of deep learning methods in the field of children′s reading management, offering valuable tools for education and library management. However, there is still room for further improvement in this study, including expanding the dataset to enhance model performance and exploring other deep learning techniques for further optimizing recognition results. This paper provides a valuable reference for the application of deep learning OCR in the domain of children′s reading management, laying the foundation for future related research.

關鍵字(中)

★ 光學辨識
★ 深度學習
★ 機器學習
★ 人工智能

關鍵字(英)

★ Optical Recognition
★ Deep Learning
★ Machine Learning
★ Artificial Intelligence

論文目次

目錄

摘要 ix
Abstract x
致謝 xi
目錄 xii
表目錄 xvi
第一章、緒論 1
1.1 研究背景 1
1.2 研究目的 2
1.3 論文架構 2
第二章、技術回顧 4
2.1 光學字元辨識（Optical Character Recognition） 4
2.2 PaddleOCR 5
2.3 文字檢測（Text Detection） 6
2.4 文字辨識（Text Recognition） 7
2.5 卷積神經網路（Convolutional Neural Network，CNN） 9
2.6 遞歸神經網路（Recurrent Neural Network，RNN） 15
2.7 數據收集和標註 (Labling) 17
2.8 數據集劃分 19
2.9 損失函數（Loss Function） 21
2.10 二元交叉熵損失函數（Binary Cross Entropy） 21
2.11 連結式時序分類損失函數（Connectionist Temporal Classification） 22
2.12 遷移式學習（Transfer Learning） 23
2.13 反向傳播算法（Backpropagation） 27
第三章、兒童閱讀管理系統設計 29
3.1 MIAT系統設計方法論 29
3.1.1 IDEF0階層式架構 29
3.1.2 Grafect離散事件建模 30
3.2 兒童閱讀管理系統設計架構 31
3.2.1 書名檢測 33
3.2.2 書名識別 35
3.2.3 書名比對與產生CSV檔 37
第四章、實驗結果與分析 40
4.1 實驗環境 40
4.2 實驗資料集 41
4.3 實驗資料集標註 42
4.4 模型遷移式學習 44
4.4.1 文字檢測模型 45
4.4.2 文字識別模型 47
4.5 兒童閱讀管理系統 48
第五章、結論與未來展望 50
5.1 結論 50
5.2 未來展望 50
參考文獻 52

圖目錄
圖2- 1 OCR路牌辨識 4
圖2- 2 OCR應用場景內文 5
圖2- 3應用文章內文 5
圖2- 4 PADDLEOCR結構 6
圖2- 5識別文字區域範 7
圖2- 6文字辨識架構 8
圖2- 7傳統方式將圖片數位化過程 10
圖2- 8圖片構成 10
圖2- 9提取圖片局部特徵 12
圖2- 10池化層採樣過程 13
圖2- 11全連線層動作過程 14
圖2- 12 RNN結構 15
圖2- 13循環層結構 16
圖2- 14文字區域識別訓練集標註與資料 17
圖2- 15文字區域識別訓練集資料 18
圖2- 16文字辨識訓練集標 18
圖2- 17文字辨識訓練集資料 18
圖2- 18模型訓練與效能評估 20
圖2- 19 K折交叉驗證 20
圖2- 20 CTC在文字識別中的應用 23
圖2- 21模型遷移式學習原理 24
圖2- 22遷移式學習場景一 24
圖2- 23遷移式學習場景二 25
圖2- 24遷移式學習場景三 26
圖2- 25神經元傳播結構 27
圖2- 26前向傳播 28
圖2- 27反向傳播 28
圖3- 1 MIAT系統設計方法論架構 29
圖3- 2 IDEF0功能模組架構 30
圖3- 3兒童閱讀管理系統IDEF0功能模組 32
圖3- 4兒童閱讀管理系統GRAFECT 32
圖3- 5書名檢測IDEF0功能模組 34
圖3- 6書名檢測GRAFECT 34
圖3- 7書名檢測IDEF0功能模組 36
圖3- 8書名識別GRAFECT 36
圖3- 9 書名比對與產生CSV檔IDEF0功能模組 38
圖3- 10書名比對與產生CSV檔GRAFECT 38
圖4- 1兒童圖書資料集影像 41
圖4- 2文字檢測標註 42
圖4- 3文字檢測標註資料 43
圖4- 4文字識別標註 43
圖4- 5文字識別標註資料 44
圖4- 6兒童圖書辨識系統 48
圖4- 7兒童閱讀書名建檔 49

表目錄
表3- 1基本元件說明 31
表3- 2兒童閱讀管理系統GRAFECT功能敘述 33
表3- 3書名檢測GRAFECT功能敘述 35
表3- 4書名識別GRAFECT功能敘述 37
表3- 5書名比對與產生CSV檔GRAFECT功能敘述 39
表4- 1實驗環境 40
表4- 2文字檢測區域符合度比較表 46
表4- 3文字檢測模型結果比較表 46
表4- 4書名辨識率比較比較表 47
表4- 5字元辨識率比較比較表 47

參考文獻

參考文獻
[1] R. U. Islam, M. S. Hossain, and K. Andersson, "A Deep Learning Inspired Belief Rule-Based Expert System," IEEE Access, vol. 8, pp. 190637-190651, 2020.
[2] J. Memon, M. Sami, R. A. Khan, and M. Uddin, "Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)," IEEE Access, vol. 8, pp. 142642-142668, 2020.
[3] PaddlePaddle. PaddleOCR. [Online]. Available:https://github.com/PaddlePaddle/PaddleOCR.
[4] M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, "Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 919-931, Jan. 2023.
[5] J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR) ,“ IEEE Access, vol. 8, pp. 142642-142668, 2020.
[6] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,“ IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999-7019, 2021.
[7] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A Comprehensive Survey on Graph Neural Networks,“ IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4-24, 2021.
[8] “史上最詳細迴圈神經網路講解（RNN/LSTM/GRU）,“ [Online]. Available: https://zhuanlan.zhihu.com/p/123211148.
[9] “訓練集、驗證集、測試集的定義與劃分,“ [Online]. Available: https://cynthiachuang.github.io/What-is-the-Difference-between-Training-Validation-and-Test-Dataset/.
[10] “訓練集、驗證集、測試集的定義與劃分,“ [Online]. Available: https://cynthiachuang.github.io/What-is-the-Difference-between-Training-Validation-and-Test-Dataset/.
[11] M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, “Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion,“ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 919-931, Jan. 2023.
[12] M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, “Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion,“ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 919-931, Jan. 2023.
[13] “二元交叉熵 binary cross entropy,“ [Online]. Available: https://blog.csdn.net/weixin_44569973/article/details/122466624.
[14] A. A. Chandio, M. Asikuzzaman, M. R. Pickering, and M. Leghari, “Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network,“ IEEE Access, vol. 10, pp. 10062-10078, 2022.
[15] “【AI實戰】手把手教你文字識別（識別篇：LSTM+CTC, CRNN, chineseocr方法）,“ [Online]. Available:https://www.gushiciku.cn/pl/2eJg/zh-tw.
[16] “遷移學習,“ [Online]. Available: https://paddlehub.readthedocs.io/zh-cn/release-v2.1/transfer_learning_index.html.
[17] H. Li, N. Wang, X. Ding, X. Yang, and X. Gao, “Adaptively Learning Facial Expression Representation via C-F Labels and Distillation,“ IEEE Transactions on Image Processing, vol. 30, pp. 2016-2028, 2021.
[18] S. Surana, K. Pathak, M. Gagnani, V. Shrivastava, M. T. R, and S. M. G, “Text Extraction and Detection from Images using Machine Learning Techniques: A Research Review,“ in 2022 International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 16-18 March 2022.
[19] “深度學習 | 反向傳播詳解,“ [Online]. Available: https://zhuanlan.zhihu.com/p/115571464..

指導教授

陳永芳陳慶瀚(Yung-Fang Chen Ching-Han Chen)

審核日期

2024-1-22

推文