中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/95504
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 42142976      Online Users : 1319
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/95504


    Title: 以大型多模態語言模型進行穿搭推薦;Using Large Multi-modality Language Model for Outfit Recommendation
    Authors: 姜道宣;Chiang, Tao-Shuan
    Contributors: 資訊管理學系
    Keywords: 大型多模態模型;大型語言模型;穿搭適配性;穿搭推薦;Large Multi-modal Models;Large Language Models;Outfit Compatibility;Outfit recommendation
    Date: 2024-07-16
    Issue Date: 2024-10-09 16:54:36 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 衣著的穿搭是人們表現自我最直接的方式。然而,人們在判斷上衣和下著之間的適配性時需要從顏色、風格等多方面進行考慮,這不僅花費大量時間,也需要承受失誤的風險。近年來,隨著大型語言模型與大型多模態模型的發展,許多應用領域發生了變革,本研究旨在探討如何利用大型多模態模型在服裝時尚搭配領域達成推薦的突破。本研究結合大型語言模型Gemini於VQA(Vision Question Amswering)任務中的關鍵字回覆文本,與大型多模態模型Beit3的深層特徵融合技術,讓使用者僅需提供衣物影像資料,即可對上衣和下著的適配性進行評分,方便使用者便捷利用。我們提出的模型Large Multi-modality Language Model for Outfit Recommendation (LMLMO) 在FashionVC和Evaluation3兩個資料集中的表現優於以往提出的模型。此外,實驗結果顯示,不同種類的關鍵字回覆對於模型的影響存在差異,這為未來的研究提供了新的方向和思考。;Outfit coordination is the most direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires consideration of multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, with the development of large language models and large multi-modal models, many application fields have undergone transformations. This study aims to explore how to leverage large multi-modal models to achieve breakthroughs in clothing fashion outfit recommendation.
    This research combines the large language model Gemini′s keyword response text in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making it convenient for users to utilize. Our proposed model, Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.
    Appears in Collections:[Graduate Institute of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML22View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明