面向工業瑕疵檢測的少樣本資料增強：結合 MLLM 與 Inpainting 的生成方法;Few-Shot Data Augmentation for Industrial Defect Detection: A Generative Approach with MLLM and Inpainting

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/99368

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99368

題名:	面向工業瑕疵檢測的少樣本資料增強：結合 MLLM 與 Inpainting 的生成方法;Few-Shot Data Augmentation for Industrial Defect Detection: A Generative Approach with MLLM and Inpainting
作者:	昌廷宇;Chang, Ting-Yu
貢獻者:	資訊工程學系
關鍵詞:	資料擴充;影像修復;多模態大型語言模型;Data Augmentation;Image Inpainting;Multimodal Large Language Model
日期:	2025-12-10
上傳時間:	2026-03-06 18:49:31 (UTC+8)
出版者:	國立中央大學
摘要:	深度學習模型在工業瑕疵檢測的表現仰賴充足的訓練資料。但在實際產線上，瑕疵樣本稀少且標註成本高，使得模型效能受到限制。為解決資料稀缺問題，生成式對抗網路 (GAN) 被應用於工業瑕疵檢測。GAN 能夠學習真實瑕疵的分佈特徵，進而生成人工樣本來擴充訓練集，在瑕疵樣本有限的情況下提升檢測模型的準確度。然後，若訓練樣本不足，GAN 容易出現模式崩潰 (mode collapse) 或訓練失敗。近年來，擴散模型 (Diffusion Model) 在少樣本圖像生成的領域發展很好，即使在訓練樣本較少的情況下，仍能透過預訓練模型維持穩定的訓練過程與好的生成品質。然而，Diffusion Model 方法應用於 Mura 工業瑕疵檢測時面臨三個問題。首先，真實瑕疵樣本僅 5-10 張，無法滿足訓練需求；瑕疵與面板背景在視覺上差異極小，難以捕捉特徵；最後，瑕疵只會出現在特定檢測區域內，但傳統 Diffusion Model 無法精確控制生成位置。針對上述問題，本研究提出結合多模態大型語言模型 (Multimodal Large Language Model, MLLM) 與影像修復技術 (Image Inpainting) 的極少樣本瑕疵生成方法。首先利用 MLLM 分析真實瑕疵特徵並生成描述提示詞，同時透過演算法自動識別檢測區域並在合理範圍內生成瑕疵遮罩，最後將提示詞與遮罩結合無瑕疵面板圖片，透過分塊修復生成人工瑕疵影像。本研究分別對 Mura 面板資料集與 GC10-DET [1] 資料集進行實驗，並與現有兩種生成瑕疵方法 DefectDiffu [2] 與 DefFiller [3] 進行比較。在圖像生成品質評估中，本方法在兩個資料集上均獲得最好的成績。在 Mura 資料集上獲得 FID 137 和 CMMD 0.360，優於 DefectDiffu [2](FID: 219，CMMD: 0.930) 與 DefFiller [3](FID: 144，CMMD: 1.723);在 GC10-DET [1] 資料集上獲得 FID 105 和 CMMD 0.614，優於 DefectDiffu [2](FID: 300，CMMD: 0.874) 與 DefFiller [3](FID: 250，CMMD: 0.784)，證明生成的瑕疵圖像在語義特徵與統計分布上都更接近真實樣本。此外，本方法透過演算法自動識別檢測區域，確保瑕疵僅生成於有效範圍內，避免傳統方法無法控制生成位置的問題。在實際應用驗證中，本方法生成的人工瑕疵能有效用於監督式模型訓練。實驗結果顯示，使用生成數據擴充訓練集後，模型在 Precision 指標上優於僅使用真實數據訓練的模型。在 Mura 資料集上，Precision 提升到 0.94(原 0.55)，Recall 達到 0.87(原0.73)；在 GC10-DET [1] 資料集上，Precision 提升到 0.92(原 0.43)，Recall 達到 0.87(原 0.79)。這些結果證明我們方法生成的人工瑕疵能夠在極少樣本條件下有效替代真實樣本，解決訓練資料不足的問題。本研究提出的方法成功解決了工業瑕疵檢測中極少樣本生成的挑戰。透過結合MLLM 與 Image Inpainting，在僅有 2-5 張真實樣本的條件下，能生成高品質的人工瑕疵圖片，且能夠自動識別檢測區域並控制瑕疵生成位置，確保生成的瑕疵僅出現在合理範圍內，解決了傳統 Diffusion Model 無法控制生成位置的問題。最後，生成的人工瑕疵可以有效用於監督式模型訓練，解決了真實瑕疵樣本過少導致模型訓練困難的問題。;Deep learning models for industrial defect detection rely heavily on sufficient training data.However, in actual production lines, defect samples are scarce and annotation costs are high,which limits model performance. To address the data scarcity problem, Generative Adversarial Networks (GANs) have been applied to industrial defect detection. GANs can learn the distribution characteristics of real defects and generate synthetic samples to augment training datasets,thereby improving detection accuracy under limited defect samples. However, with insufficient training samples, GANs are prone to mode collapse or training failure. In recent years, Diffusion Models have shown significant progress in few-shot image generation. Even with limited training samples, they can maintain stable training processes and high generation quality through pre-trained models. Nevertheless, Diffusion Model methods face three challenges when applied to Mura defect detection. First, real defect samples typically number only 5-10, insufficient for training requirements. Second, the visual differences between defects and panel backgrounds are extremely subtle, making feature extraction difficult. Finally, defects only appear within specific inspection regions, but traditional methods cannot precisely control generation locations. To tackle these issues, this study proposes a few-shot defect generation method that combines Multimodal Large Language Models (MLLMs) with image inpainting techniques. The system first utilizes LLMs to analyze real defect features and generate descriptive prompts. Simultaneously, an algorithm automatically identifies inspection regions and generates defect masks within reasonable boundaries. The prompts and masks are then combined with defect-free panel images, and synthetic defect images are generated through patch-based inpainting. This study conducts experiments on both the Mura panel dataset and the GC10-DET dataset [1], comparing results with two existing defect generation methods: DefectDiffu [2] and DefFiller [3]. In image generation quality assessment, our method achieves the best performance on both datasets. On the Mura dataset, it obtains FID 137 and CMMD 0.360, outperforming DefectDiffu [2] (FID: 219, CMMD: 0.930) and DefFiller [3] (FID: 144, CMMD: 1.723). On the GC10-DET dataset [1], it achieves FID 105 and CMMD 0.614, superior to DefectDiffu [2] (FID: 300, CMMD: 0.874) and DefFiller [3] (FID: 250, CMMD: 0.784), demonstrating that generated defect images are closer to real samples in both semantic features and statistical distribution.Moreover, our method employs an algorithm to automatically identify inspection regions, ensuring defects are generated only within valid areas and avoiding the position control issues of traditional approaches. In practical application validation, the synthetic defects generated by our method effec- tively serve supervised model training. Experimental results show that models trained with augmented datasets using generated data outperform those trained solely on real data in both Precision and Recall metrics. On the Mura dataset, Precision improves from 0.55 to 0.94 and Recall from 0.73 to 0.87, while on the GC10-DET dataset [1], Precision increases from 0.43 to 0.92 and Recall from 0.79 to 0.87. These results confirm that the synthetic defects generated by our method can effectively substitute real samples under few-shot conditions, resolving the training data insufficiency problem. The proposed method successfully addresses the challenge of few-shot defect generation in industrial defect detection. By integrating MLLM with Image Inpainting techniques, highquality synthetic defect images can be generated with only 2-5 real samples. The method automatically identifies inspection regions and controls defect generation locations, ensuring that defects are generated only within reasonable areas, thereby solving the problem of uncontrollable generation positions in traditional Diffusion Models. Finally, the generated synthetic defects can be effectively used for supervised model training, addressing the problem of training difficulties caused by insufficient real defect samples.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	85	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....