中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/95554
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 40304825      Online Users : 592
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/95554


    Title: Multimodal Composed Image Retrieval Using Querying-Transformer
    Authors: 楊歷恆;Yang, Alex Li-Heng
    Contributors: 資訊工程學系
    Keywords: 圖片搜索;Composed Image Retrieval;deep learning;attention
    Date: 2024-07-23
    Issue Date: 2024-10-09 17:00:39 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 基於組合影像檢索系統的重要性在於它能夠讓用戶使用視覺參
    考和描述文字來找到特定影像,解決了傳統僅靠文字檢索方法的局
    限性。在本論文中,我們提出了一種利用 Querying-Transformer 來
    解決傳統影像檢索方法局限性的系統。Qformer 通過基於
    Transformer 的架構,將影像和文字數據整合在一起,能夠熟練地捕
    捉這兩種模式之間的複雜關係。通過引入影像-文字匹配損失函數,
    我們的系統顯著提高了影像與文字匹配的準確性,確保了視覺和文
    字表現之間的高度一致性。我們還在 Qformer 模型中使用了殘差學
    習技術,以保留重要的視覺信息,從而在學習過程中保持原始影像
    的質量和特徵。
    為了驗證我們方法的效果,我們在 FashionIQ 和 CIRR 數據集上
    進行了實驗。結果顯示,我們提出的系統在各種類別中顯著優於現
    有模型,實現了更高的召回率指標。實驗結果展示了我們系統在實
    際應用中的潛力,提供了在影像檢索任務中精確性和相關性方面的
    顯著改進。;Composed Image Retrieval (CIR) systems are crucial because they enable users to find specific images using both visual references and descriptive text, addressing the limitations of traditional text-only search methods. In this thesis, we propose a system that utilizes the Querying-Transformer (Qformer) to address the limitations of traditional image retrieval methods. The Qformer integrates image and text data through a transformer-based architecture, adeptly capturing complex relationships between the two modalities. By incorporating the Image-Text Matching (ITM) loss function, our system significantly enhances the accuracy of image-text matching, ensuring superior alignment between visual and textual representations. We also employ residual learning techniques within the Qformer model to preserve essential visual information, thereby maintaining the quality and features of the original images throughout the learning process. To confirm the efficacy of our approach, we performed experiments on the FashionIQ and CIRR datasets. The results show that our proposed system significantly outperforms existing models, achieving superior recall metrics across various categories. The experimental results demonstrate the potential of our system in practical applications, offering robust improvements in the precision and relevance of image retrieval tasks.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML11View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明