基於MLP-Mixer之影像辨識平台與應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：55

、訪客IP：3.142.249.163

姓名

廖彥勳(Yen-Hsun Liao) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

基於MLP-Mixer之影像辨識平台與應用

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-10-1以後開放)

摘要(中)

近年來基於深度學習方法的影像辨識相關應用需求不斷增加，對於開發者的負擔也隨之倍增，因此本論文設計一個具有Low-code性質的影像辨識平台來達到快速開發的目的，並且使用2021年新推出的神經網路模型-MLP-Mixer來做為本系統的神經網路架構。本研究開發了一個圖形化人機介面讓使用者能快速地訓練及測試神經網路模型，並使用三種不同的影像數據集進行實驗與分析，準確率分別達到85%、96.5%及89.6%，也驗整了本平台能夠實現在不同數據集上的影像辨識應用。本論文所提出的MLP-Mixer影像辨識低代碼開發平台，在進行訓練、測試模型和分類預測的全部過程中，僅需要選取資料夾和輸入相關參數即可自動完成，此Low-code的特性讓非專家的一般使用者也能輕鬆地操作。

摘要(英)

In recent years, the demand for image recognition related applications based on deep learning methods has continued to increase, and the burden on developers has also doubled. Therefore, this paper designs a low-code image recognition platform to achieve rapid development purposes, and the MLP-Mixer which is the newly launched neural network model in 2021 is used as the neural network architecture of the system. This research has developed a graphical human-machine interface that allows users to quickly train and test neural network models, and uses three different image datasets for experiments and analysis, with accuracy rates of 85%, 96.5%, and 89.6%, respectively. It has also been verified that the platform can realize image recognition applications on different datasets. The MLP-Mixer image recognition low-code development platform proposed in this paper can be automatically completed by selecting the folder and inputting relevant parameters in the entire process of training, testing the model and classification prediction. This low-code feature allows Non-expert general users can also easily use.

關鍵字(中)

★ 影像辨識
★ 深度學習
★ 低代碼

關鍵字(英)

★ Image Recognition
★ MLP-Mixer
★ Deep Learning
★ Low-code

論文目次

摘　要 I
Abstract II
謝誌 III
目錄 IV
圖目錄 VII
表目錄 IX
第一章、緒論 1
1.1 研究背景 1
1.2 研究目的 2
1.3 論文架構 3
第二章、技術回顧 4
2.1 深度學習簡介 4
2.1.1 卷積神經網路（CNN） 5
2.1.2 Vision Transformer（ViT） 6
2.2 MLP-Mixer 7
2.2.1 MLP-Mixer架構 8
2.2.2 MLP-Mixer運算原理 9
2.2.3 MLP-Mixer應用 10
第三章、系統架構設計 13
3.1 MLP-Mixer影像辨識平台架構設計 13
3.2 基於MLP-Mixer的影像辨識系統 14
3.2.1 基於MLP-Mixer的影像辨識之系統主架構 14
3.2.2 影像數據集選擇與切割模組 15
3.2.3 資料讀取與預處理模組 16
3.2.4 MLP-Mixer訓練模組 18
3.2.5 模型測試與分類預測模組 20
3.3 圖形化人機界面 22
第四章、實驗結果 24
4.1 實驗開發環境介紹 24
4.1.1 訓練模型 25
4.2 性能評估指標 26
4.3 低代碼開發平台介紹 28
4.4 MLP-Mixer影像辨識低代碼開發平台-比特犬影像數據集 29
4.4.1 實驗流程 30
4.4.2 性能評估與探討 37
4.5 MLP-Mixer影像辨識低代碼開發平台-魚類影像數據集 40
4.5.1 實驗流程 40
4.5.2 性能評估與探討 47
4.6 MLP-Mixer影像辨識低代碼開發平台-種子影像數據集 49
4.6.1 實驗流程 49
4.6.2 性能評估與探討 56
第五章、結論與未來發展 59
5.1 結論 59
5.2 未來展望 60
參考文獻 61

參考文獻

[1] L. Jiao, F. Zhang, F. Liu, S. Yang, L. Li, Z. Feng, and R. Qu, “A Survey of Deep Learning-based Object Detection,” IEEE Access, Vol. 7, pp. 128837-128868, 2019.
[2] B. Meena, K. V. Rao, and S. Chittineni, “A Survey On Deep Learning Methods and Tools in Image Processing,” INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH, Vol. 9, Issue 2, pp. 1057-1062, 2020
[3] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” ICLR, 2021
[4] I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovitskiy, “MLP-Mixer: An all-MLP Architecture for Vision,” arXiv preprint arXiv:2105.01601, 2021.
[5] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby, “Big Transfer (BiT): General Visual Representation Learning,” ECCV, Vol. 5, pp. 491-507, 2020.
[6] A. Vaswani, P. Ramachandran, A. Srinivas, N. Parmar, B. Hechtman, and J. Shlens, “Scaling Local Self-Attention for Parameter Efficient Visual Backbones,” CVPR, pp. 12894-12904, 2021.
[7] A. Brock, S. De, S. L. Smith, and K. Simonyan, “High-Performance Large-Scale Image Recognition Without Normalization,” arXiv preprint arXiv:2102.06171, 2021.
[8] C.-H. Chen, C.-M. Kuo, C.-Y. Chen, and J.-H. Dai, "The Design and Synthesis Using Hierarchical Robotic Discrete-Event Modeling," Journal of Vibration and Control, vol. 19, pp. 1603-1613, 2013.
[9] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, Vol. 521, No. 7553, pp. 436-444, 2015.
[10] Sonali, B. Maind, and P. Wankar, “Research Paper on Basic of Artificial Neural Network,” International Journal on Recent and Innovation Trends in Computing and Communication, Vol. 2, Issue. 1, pp. 96-101, 2014.
[11] W. Wang, Y. Yang, X. Wang, W. Wang, and J. Li, “Development of Convolutional Neural Network and Its Application in Image Classification: A Survey,” OPTICAL ENGINEERING, Vol.58, No. 4, Article ID 040901, 2019.
[12] D. H. Hubel, and T. N. Wiesel, “Receptive Fields, Binocular Interaction and Functional Architecture in the Cat′s Visual Cortex,” The Journal of Physiology, Vol. 160, No. 1, pp.106-154, 1962
[13] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, Vol. 36, pp.193-202, 1980
[14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” Parallel distributed processing: explorations in the microstructure of cognition, Vol. 1, pp.318-362, 1986
[15] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, Vol. 86, Issue. 11, pp.2278-2324, 1998
[16] G. E. Hinton, S. Osindero, and Y. W. Teh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, Vol. 18, No. 7, pp.1527-1554, 2006
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM, Vol. 60, Issue. 6, pp.84-90, 2017
[18] K. Simonyan, and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Proceedings of the IEEE, Vol. 86, Issue. 11, pp.2278-2324, 1998
[19] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” CVPR, pp.1-9, 2015
[20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Advances in Neural Information Processing Systems, pp.6000-6010, 2017
[21] D. Hendrycks, and K. Gimpel, “Gaussian Error Linear Units (GELUs),” arXiv preprint arXiv:1606.08415, 2016
[22] F. Chollet, “Xception: Deep Learning With Depthwise Separable Convolutions,” CVPR, pp.1800-1807, 2017
[23] L. Sifre, “Rigid-Motion Scattering For Image Classification,” PhD thesis, Ecole Polytechnique, 2014
[24] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer Normalization,” arXiv preprint arXiv:1607.06450, 2016
[25] R. Wightman, PyTorchImageModels(timm), 2020 [Online]. Available: https://github.com/rwightman/pytorch-image-models.

指導教授

陳慶瀚(Pierre Chen)

審核日期

2021-10-18

推文