| 摘要: | 本研究針對鏡面車體面板進行表面瑕疵偵測與分類,利用偏折成像量測技術(deflectometry),系統性地比較基於卷積神經網路(CNN)與Transformer架構語意分割模型的性能表現。實驗中建構了一套高解析度的影像擷取系統,其中包含影像擷取裝置、可移動平台,以及具備光源控制的照明系統,以提升干涉條紋影像的品質並精確捕捉表面變化。此外,透過此系統建立了一組自製的偏折成像測量資料集,其中,涵蓋三種類型的瑕疵標註,包括點狀、線狀與面狀瑕疵,透過該瑕疵分類來比較各種模型在處理不同幾何缺陷特徵時的優劣。 本研究共評估四種語意分割模型,並採用五摺交叉驗證法,最終於獨立測試集上進行驗證。所比較的模型包含以 ResNet 與 ResNeXt 為骨幹網路的 Cascade Mask R-CNN、以 Swin Transformer 為特徵萃取器並整合於 Cascade Mask R-CNN 框架中的混合模型,以及以 SegFormer 為代表的純Transformer架構。其中, Swin Transformer 搭配的 Cascade Mask R-CNN 模型於獨立測試集中達到最高 mIoU 值 64.90%。進一步將語意分割任務簡化為僅區分「瑕疵」與「非瑕疵」兩類時,混和模型所構建的模型在獨立測試集中達成 IoU 92.26%、Precision 100%、Recall 95.97%、F1 Score 97.94%,展現出極高的檢測準確率與實務應用潛力。 綜合實驗結果,混合模型在本研究中達到最佳效能,並將每張影像的推理時間控制在 0.2 秒以內。此結果顯示,即使在數據有限的情況下,亦可同時兼顧高分割精度與即時可行性。值得一提的是,SegFormer mit-b5 雖然在小樣本條件下的量化指標不及卷積神經網路模型,但其分割可視化結果呈現出良好的結構解析能力與一定的泛化潛力,顯示其在更大規模資料條件下仍具發展應用的價值。 ;This study presents a systematic comparison of CNN-based and Transformer-based semantic segmentation models for surface defect segmentation on specular car body panels using deflectometric imaging. A high-resolution image acquisition setup was established, comprising an image acquisition system, a motorized moving platform, and a controlled lighting system, to enhance the quality of deflectometric patterns and accurately capture surface variations. In addition, a custom deflectometric dataset was constructed, containing annotated point-, line-, and patch-type defects. These defect categories were defined to enable a detailed comparison of each model’s strengths and limitations when handling different defect geometries. In this study, four semantic segmentation models were evaluated using five-fold cross-validation and validated on an independent test set. These include Cascade Mask R-CNN with ResNet and ResNeXt backbones, a hybrid model based on Cascade Mask R-CNN with the Swin Transformer backbone, and a pure Transformer model represented by SegFormer. Among these models, Swin Transformer combined with Cascade Mask R-CNN achieved the highest mean IoU of 64.90 % on the independent test set. Furthermore, when the semantic segmentation task was simplified to a binary classification of defective vs. non-defective objects, the hybrid model attained an IoU of 92.26%, a Precision of 100%, a Recall of 95.97%, and an F1 Score of 97.94%, demonstrating excellent detection accuracy and strong potential for practical deployment. Overall, the hybrid model achieved the best performance while keeping inference time below 0.2 s per image, comparable to other architectures. This demonstrates that high segmentation accuracy and real-time feasibility can be achieved simultaneously, even with limited data. Notably, although SegFormer mit-b5 underperformed CNN-based models in quantitative metrics under small-sample conditions, its segmentation visualizations exhibited excellent structural interpretation and a certain degree of generalization potential. It thereby indicates promising applicability with larger-scale datasets. |