衣著的穿搭是人們表現自我最直接的方式。然而,人們在判斷上衣和下著之間的適配性時需要從顏色、風格等多方面進行考慮,這不僅花費大量時間,也需要承受失誤的風險。近年來,隨著大型語言模型與大型多模態模型的發展,許多應用領域發生了變革,本研究旨在探討如何利用大型多模態模型在服裝時尚搭配領域達成推薦的突破。本研究結合大型語言模型Gemini於VQA(Vision Question Amswering)任務中的關鍵字回覆文本,與大型多模態模型Beit3的深層特徵融合技術,讓使用者僅需提供衣物影像資料,即可對上衣和下著的適配性進行評分,方便使用者便捷利用。我們提出的模型Large Multi-modality Language Model for Outfit Recommendation (LMLMO) 在FashionVC和Evaluation3兩個資料集中的表現優於以往提出的模型。此外,實驗結果顯示,不同種類的關鍵字回覆對於模型的影響存在差異,這為未來的研究提供了新的方向和思考。;Outfit coordination is the most direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires consideration of multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, with the development of large language models and large multi-modal models, many application fields have undergone transformations. This study aims to explore how to leverage large multi-modal models to achieve breakthroughs in clothing fashion outfit recommendation. This research combines the large language model Gemini′s keyword response text in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making it convenient for users to utilize. Our proposed model, Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.