使用特徵增強策略在 MLP-Mixer 影像分類器

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：54

、訪客IP：3.145.81.252

姓名

童子祐(Tzu-Yu Tung) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用特徵增強策略在 MLP-Mixer 影像分類器
(Applying Feature Enhancement Strategies in MLP-Mixer Image Classifier)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-7-13以後開放)

摘要(中)

由於卷積神經網路模型擁有龐大的演算法，運算過程被視為黑盒子(black box)無法對其提出合理的解釋與分析，因此本研究提出透過增強影像特徵的方式並結合MLP-Mixer 分類器，增加整個辨識系統的可解釋性與準確，該辨識系統架構應用於魚類、種子和中歐森林生物辨識資料集。首先針對影像先進行形狀、紋理與顏色的特徵增強，再將特徵增強過後的影像(Feature-Enhanced Image, FEI)作為 MLPMixer 分類器的輸入，分別輸出三個特徵增強方式的 Top-5，作為此三個 Top-5 作為決策融合的輸入，透過多類別羅吉斯回歸(Multinomial Logistic Regression)輸出最終決策結果。本篇研究在 40 種魚類資料集上達到 99%的辨識率，優於未使用特徵增強的 MLP-Mixer 分類器的 96%辨識率；在 560 類種子資料集上達到 90.65%的辨識率，優於混合式神經網路(ResNet-50+Siamese)的 70.23%辨識率；在中歐森林資料集153 類上達到 97.91%的辨識率，優於採用單個卷積神經網路架構的 93.4%辨識率。

摘要(英)

Since the convolutional neural network model has a huge algorithm, the operation process is regarded as a black box and cannot provide a reasonable explanation and analysis. Therefore, this study proposes to enhance the image features and combine the MLP-Mixer classifier to increase the overall Interpretability and accuracy of the identification system architecture applied to the fish, seed and central European forest biometric datasets.Firstly, the features of shape, texture and color are enhanced for the image, and then the image after feature enhancement is used as the input of the MLPMixer classifier, and the Top-5 of the three feature enhancement methods are output respectively, as the three Top-5 as the input of the MLP-Mixer classifier. The input of the decision fusion, the final decision result is output through multi-class Logis regression.This study achieves a recognition rate of 99% on 40 fish datasets, which is better than the 96% recognition rate of the MLP-Mixer classifier without feature enhancement;Achieving a recognition rate of 90.65% on the 560-category seed dataset, which is better than the 70.23% recognition rate of the hybrid neural network;It achieves a recognition rate of 97.91% on 153 categories of the Central European Forest dataset, which is better than the 93.4% recognition rate using a single convolutional neural network architecture.

關鍵字(中)

★ 影像特徵增強
★ 方向梯度直方圖
★ 局部二值模式
★ 單尺度視網膜增強算法

關鍵字(英)

★ image feature enhancement
★ Histogram of Oritentd Gradients
★ Local Binary Patterns
★ Single-Scale Retinex
★ MLP-Mixer

論文目次

摘要 .............................................................................................................................................I
Abstract...................................................................................................................................... II
誌謝 ........................................................................................................................................III
目錄 ..........................................................................................................................................IV
圖目錄 ......................................................................................................................................VI
表目錄 ................................................................................................................................... VIII
第一章、緒論 ....................................................................................................................1
1.1 研究背景 .....................................................................................................................1
1.2 研究目標 .....................................................................................................................3
1.3 論文架構 .....................................................................................................................3
第二章、影像分類 ............................................................................................................4
2.1 Canny 邊緣檢測...........................................................................................................4
2.2 方向梯度直方圖(HOG) ..............................................................................................6
2.3 局部二值模式(LBP) ...................................................................................................9
2.4 單尺度視網膜增強算法(SSR)..................................................................................11
2.5 深度學習 ...................................................................................................................12
2.6 MLP-Mixer .................................................................................................................14
第三章、影像辨識分類系統 ..........................................................................................18
3.1 分類系統架構 ...........................................................................................................18
3.2 分類器系統離散事件建模 .......................................................................................22
第四章、系統整合與驗證 ..............................................................................................28
4.1 實驗開發環境介紹 ...................................................................................................28
4.2 實驗資料集介紹 .......................................................................................................29
4.3 特徵增強策略在 MLP-Mixer 影像分類驗證 ..........................................................31
V
第五章、結論與未來展望 ..............................................................................................38
5.1 結論 ...........................................................................................................................38
5.2 未來展望 ...................................................................................................................38
參考文獻 ..................................................................................................................................40

參考文獻

[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to
document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[2] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D.
Jackel, "Backpropagation applied to handwritten zip code recognition," Neural
computation, vol. 1, no. 4, pp. 541-551, 1989.
[3] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image
recognition," arXiv preprint arXiv:1409.1556, 2014.
[4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-
778, 2016.
[5] J. Amara, B. Bouaziz, and A. Algergawy, "A deep learning-based approach for banana leaf
diseases classification," Datenbanksysteme für Business, Technologie und Web (BTW
2017)-Workshopband, 2017.
[6] I. Goodfellow, Y. Bengio, and A. Courville, "Convolutional Networks" in Deep learning,
MIT press, pp. 321-362. 2016.
[7] M. Elhoushi, Z. Chen, F. Shafiq, Y. H. Tian, and J. Y. Li, "Deepshift: Towards
multiplication-less neural networks," in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 2359-2368, 2021.
[8] R. Maini and H. Aggarwal, "A comprehensive review of image enhancement techniques, "
arXiv preprint arXiv:1003.4053, 2010.
[9] L. Hong, Y. Wan, and A. Jain, "Fingerprint image enhancement: algorithm and
performance evaluation," IEEE transactions on pattern analysis and machine intelligence,
vol. 20, no. 8, pp. 777-789, 1998.
[10] S. Ritter, D. G. Barrett, A. Santoro, and M. M. Botvinick, "Cognitive psychology for deep
neural networks: A shape bias case study," in International conference on machine learning,
pp. 2940-2949, 2017.
[11] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel,
"ImageNet-trained CNNs are biased towards texture; increasing shape bias improves
accuracy and robustness," arXiv preprint arXiv:1811.12231, 2018.
[12] H. Li, X.-j. Wu, and T. S. Durrani, "Infrared and visible image fusion with ResNet and
zero-phase component analysis," Infrared Physics & Technology, vol. 102, p. 103039,
2019.
[13] H. Li, X.-J. Wu, and J. Kittler, "Infrared and visible image fusion using a deep learning
framework," in 2018 24th international conference on pattern recognition (ICPR), pp.
2705-2710, 2018.
[14] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M.
Dehghani, M. Minderer, G. Heigold, and S. Gelly, "An image is worth 16x16 words:
Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
[15] I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, D.
Keysers, J. Uszkoreit, and M. Lucic, "Mlp-mixer: An all-mlp architecture for vision, "
arXiv preprint arXiv:2105.01601, 2021.
[16] W. Samek, T. Wiegand, and K.-R. Müller, "Explainable artificial intelligence:
Understanding, visualizing and interpreting deep learning models," arXiv preprint
arXiv:1708.08296, 2017.
[17] R. C. Fong and A. Vedaldi, "Interpretable explanations of black boxes by meaningful
perturbation," in Proceedings of the IEEE international conference on computer vision, pp.
3429-3437, 2017.
[18] D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, "Smoothgrad: removing
noise by adding noise," arXiv preprint arXiv:1706.03825, 2017.
[19] M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic attribution for deep networks," in
International conference on machine learning, pp. 3319-3328, 2017.
[20] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, "On pixelwise explanations for non-linear classifier decisions by layer-wise relevance propagation, "
PloS one, vol. 10, no. 7, p. e0130140, 2015.
[21] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for
discriminative localization," in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 2921-2929, 2016.
[22] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in
European conference on computer vision, pp. 818-833, 2014.
[23] K. Simonyan, A. Vedaldi, and A. Zisserman, "Deep inside convolutional networks:
Visualising image classification models and saliency maps," arXiv preprint
arXiv:1312.6034, 2013.
[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam:
Visual explanations from deep networks via gradient-based localization," in Proceedings
of the IEEE international conference on computer vision, pp. 618-626, 2017.
[25] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in 2005
IEEE computer society conference on computer vision and pattern recognition (CVPR′05),
vol. 1, pp. 886-893, 2005.
[26] T. Ojala, M. Pietikainen, and D. Harwood, "Performance evaluation of texture measures
with classification based on Kullback discrimination of distributions," in Proceedings of
12th international conference on pattern recognition, vol. 1, pp. 582-585, 1994.
[27] D. J. Jobson, Z.-u. Rahman, and G. A. Woodell, "Properties and performance of a
center/surround retinex," IEEE transactions on image processing, vol. 6, no. 3, pp. 451-
462, 1997.
[28]C. W. Niblack, R. Barber, W. Equitz, M. D. Flickner, E. H. Glasman, D. Petkovic, P. Yanker,
C. Faloutsos, and G. Taubin, "QBIC project: querying images by content, using color,
texture, and shape," in Storage and retrieval for image and video databases, vol. 1908, pp.
173-187, 1993.
[29] J. Canny, "A computational approach to edge detection," IEEE Transactions on pattern
analysis and machine intelligence, no. 6, pp. 679-698, 1986.
[30] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation
invariant texture classification with local binary patterns," IEEE Transactions on pattern
analysis and machine intelligence, vol. 24, no. 7, pp. 971-987, 2002.
[31] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-
444, 2015.
[32] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep
convolutional neural networks," Advances in neural information processing systems, vol.
25, 2012.
[33] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, "Going deeper with convolutions," in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 1-9, 2015.
[34] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, "High-level modeling and synthesis of smart sensor
networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61,
pp. 48-66, 2017.
[35] M. Mora, O. Adelakun, S. Galvan-Cruz, and F. Wang, "Impacts of IDEF0-Based Models
on the Usefulness, Learning, and Value Metrics of Scrum and XP Project Management
Guides," Engineering Management Journal, pp. 1-17, 2021.
[36] R. Julius, T. Trenner, A. Fay, J. Neidig, and X. L. Hoang, "A meta-model based
environment for GRAFCET specifications," in 2019 IEEE International Systems
Conference (SysCon), pp. 1-7, 2019.
[37] P. Novotný and T. Suk, "Leaf recognition of woody species in Central Europe," Biosystems
Engineering, vol. 115, no. 4, pp. 444-452, 2013.
[38] J. Gu, P. Yu, X. Lu, and W. Ding, "Leaf species recognition based on VGG16 networks
and transfer learning," in 2021 IEEE 5th Advanced Information Technology, Electronic
and Automation Control Conference (IAEAC), vol. 5, pp. 2189-2193, 2021

指導教授

陳慶瀚(Ching-Han Chen)

審核日期

2022-7-30

推文