中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98575
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83776/83776 (100%)
Visitors : 58272870      Online Users : 11951
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98575


    Title: FINE:基於多模態對比式嵌入學習的細粒度影像理解方法;FINE: Fine-Grained Image Understanding through Multimodal Contrastive Embedding Learning
    Authors: 林澤華;Lin, Tse-Hua
    Contributors: 資訊工程學系
    Keywords: 無監督學習;對比式學習;細粒度分群;多模態特徵融合;Unsupervised Learning;Contrastive Learning;Fine-Grained Clustering;Multi modal Feature Fusion
    Date: 2025-08-16
    Issue Date: 2025-10-17 12:56:51 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 近年來深度學習技術的快速發展,使得電腦視覺在分類、檢測與分割等任務上取
    得良好的效果。然而,大多數方法仍高度依賴大量標註資料,尤其對於像是醫療影像
    分析或工業產品檢測等標註成本高昂的應用場景時,會受到限制。為了降低對人工
    標註的依賴,非監督式學習(UnsupervisedLearning) 逐漸受到重視,其中對比式學習
    (Contrastive Learning) 作為非監督式學習的一種,已被證實能夠有效學習出具判別性的
    特徵表示。
    然而,傳統的對比式學習在面對「微小且細粒度(tinyandfine-grained)」的資料時
    仍面對巨大的困難。這種資料通常嵌入在複雜背景中的小型目標,且類別間差異細微,
    使得模型難以擷取有效特徵,進而影響整體表現。本研究針對此挑戰,提出FINE:基
    於多模態對比式嵌入學習的細粒度影像理解方法,專為「微小且細粒度」資料所設計。
    我們的方法採用編碼器-解碼器架構來生成輔助影像,強調微小目標區域,從而促使
    更有效的特徵提取。此外,我們設計了一個多模態對比式特徵提取模組(Multi-modal
    Contrastive Learning Feature Extract Block, MCLFE),整合分支特徵提取模組、注意力模
    組與特徵融合模組。該模組與對比學習策略共同運作,分別利用InstanceLoss(IL)與
    Center Loss(CL)來訓練特徵提取器與聚類中心。
    在實驗中,我們分別使用一種非公開的產線面板資料集以及四種公開資料集進行
    比較,其中包含非公開的產線面板資料集A19、Retina視網膜資料集[1]、NEU鋼鐵表
    面瑕疵資料集[2]、MVTecAD工業異常資料集[3]和CIFAR-10資料集[4]。其中,在
    視網膜資料集上,當提供準確的輔助影像時,我們的方法能顯著提升超過25%的分群
    準確率,突顯其在醫學影像上,處理微小且細粒度方面的有效性。在一般分類資料集
    上,我們的方法即使僅提升約4%的ARI,仍優於多數主流方法,顯示本方法在無需額
    外先驗的情況下,亦具備穩定且一致的分群能力。在鋼鐵表面瑕疵資料集中,我們的
    方法也可以在三個指標下取得最好的效果,證實本方法對於顯著異常亦具備良好的辨
    識能力。
    我們也於工業異常資料集中針對兩種不同型態的瑕疵類別(結構性與紋理性)進
    行測試。結果顯示,我們的方法在結構性瑕疵上優於所有對照方法,展現其於結構性
    缺陷分群上的穩定性與辨識力;雖然在紋理性瑕疵中受到紋理型異常與輔助模態限制
    影響,表現略遜於最佳方法,但整體仍具備一致且具有潛力的分群效果,進一步說明
    了本方法於多樣化工業瑕疵情境中的應用可能。
    最後,在非公開的產線面板資料集上,我們的方法搭配輔助影像後於NMI、ARI
    與ACC三項指標上均優於現有方法,雖然各指標提升幅度有限,但整體表現更為穩
    定,展現出本研究方法於真實工業場景中的實用性與潛力。;In recent years, the rapid development of deep learning techniques has led to significant
    progress in computer vision tasks such as classification, detection, and segmentation. However,
    most existing approaches still heavily rely on large amounts of annotated data. This depen
    dency becomes a major limitation in application scenarios where annotation is costly, such as
    medical image analysis or industrial product inspection. To alleviate the reliance on manual
    labeling, unsupervised learning has gained increasing attention. Among these methods, con
    trastive learning—an effective unsupervised approach—has been shown to learn discriminative
    feature representations successfully.
    Nevertheless, conventional contrastive learning methods face significant challenges when
    applied to ”tiny and fine-grained” data. Such data typically consist of small targets embedded in
    complexbackgroundswithsubtleinter-classdifferences, makingitdifficult for models toextract
    meaningful features and thereby affecting overall performance. To address this challenge, we
    propose FINE:Fine-GrainedImageUnderstandingthroughMultimodalContrastiveEmbedding
    Learning, specifically designed for ”tiny and fine-grained” data. Our method adopts an encoder
    decoder architecture to generate auxiliary images that emphasize small target regions, thereby
    facilitating more effective feature extraction.
    Additionally, wedesignaMulti-modalContrastiveLearningFeatureExtractBlock(MCLFE),
    which integrates a multi-branch feature extraction module, an attention module, and a feature
    fusion module. This module, together with a contrastive learning strategy, jointly optimizes the
    feature extractor and clustering centers using Instance Loss (IL) and Center Loss (CL), respec
    tively.
    In our experiments, we evaluate the proposed method on a private industrial panel dataset
    and four public datasets: the private panel dataset A19, the Retina fundus dataset [1], the NEU
    surface defect dataset [2], the MVTec AD industrial anomaly dataset [3], and the CIFAR-10
    dataset [4]. On the Retina dataset, our method achieves over a 25% improvement in clustering
    accuracy when provided with high-quality auxiliary images, demonstrating its effectiveness in
    handling tiny and fine-grained features in medical imaging. On general classification datasets,
    our method still outperforms most mainstream approaches by achieving an approximate 4%
    improvement in ARI, indicating its stable and consistent clustering capability even without ad
    ditional priors. For the NEU dataset, our method achieves the best performance across all three
    evaluation metrics, further validating its ability to identify prominent anomalies effectively.
    Moreover, we evaluate our method on two distinct defect types—structural and textural
    —within the MVTec AD dataset. The results show that our method outperforms all baseline
    approaches on structural defects, showcasing its robustness and discriminative power in cluster
    ing structural anomalies. Although its performance on textural defects is slightly inferior to the
    best-performing method due to the challenges posed by texture anomalies and auxiliary modal
    ity limitations, the overall results remain consistent and promising, underscoring the method’s
    potential in diverse industrial inspection scenarios.
    Finally, on the private industrial panel dataset, our method, when combined with auxiliary
    images, surpasses existing methods in terms of NMI, ARI, and ACC. While the improvement
    margins are modest, the performance is notably more stable, highlighting the practical applica
    bility and potential of our approach in real-world industrial environments.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML13View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明