以大型多模態語言模型進行穿搭推薦

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：170

、訪客IP：3.135.221.153

姓名

姜道宣(Tao-Shuan Chiang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

以大型多模態語言模型進行穿搭推薦
(Using Large Multi-modality Language Model for Outfit Recommendation)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-7-1以後開放)

摘要(中)

衣著的穿搭是人們表現自我最直接的方式。然而，人們在判斷上衣和下著之間的適配性時需要從顏色、風格等多方面進行考慮，這不僅花費大量時間，也需要承受失誤的風險。近年來，隨著大型語言模型與大型多模態模型的發展，許多應用領域發生了變革，本研究旨在探討如何利用大型多模態模型在服裝時尚搭配領域達成推薦的突破。本研究結合大型語言模型Gemini於VQA(Vision Question Amswering)任務中的關鍵字回覆文本，與大型多模態模型Beit3的深層特徵融合技術，讓使用者僅需提供衣物影像資料，即可對上衣和下著的適配性進行評分，方便使用者便捷利用。我們提出的模型Large Multi-modality Language Model for Outfit Recommendation (LMLMO) 在FashionVC和Evaluation3兩個資料集中的表現優於以往提出的模型。此外，實驗結果顯示，不同種類的關鍵字回覆對於模型的影響存在差異，這為未來的研究提供了新的方向和思考。

摘要(英)

Outfit coordination is the most direct way for people to express themselves. However, judging the compatibility between tops and bottoms requires consideration of multiple factors such as color and style. This process is time-consuming and prone to errors. In recent years, with the development of large language models and large multi-modal models, many application fields have undergone transformations. This study aims to explore how to leverage large multi-modal models to achieve breakthroughs in clothing fashion outfit recommendation.
This research combines the large language model Gemini′s keyword response text in the Vision Question Answering (VQA) task with the deep feature fusion technology of the large multi-modal model Beit3. By providing only image data of the clothing, users can evaluate the compatibility of tops and bottoms, making it convenient for users to utilize. Our proposed model, Large Multi-modality Language Model for Outfit Recommendation (LMLMO), outperforms previously proposed models on the FashionVC and Evaluation3 datasets. Moreover, experimental results show that different types of keyword responses have varying impacts on the model, offering new directions and insights for future research.

關鍵字(中)

★ 大型多模態模型
★ 大型語言模型
★ 穿搭適配性
★ 穿搭推薦

關鍵字(英)

★ Large Multi-modal Models
★ Large Language Models
★ Outfit Compatibility
★ Outfit recommendation

論文目次

摘要 i
ABSTRACT ii
List of Figures iv
List of Tables v
1. Introduction 1
2. Related work 10
2-1 Fashion outfit recommendation system 10
2-2 Large Multi-modality Model 11
2-3 Multi-modal representation learning 12
3. Proposed approach 14
3-1 Model structure 14
3-2 Data preprocessing 16
3-3 Deep multimodality fusion and feature extraction 19
3-4 Outfit compatibility calculation 24
3-4 Loss function 25
4. Experiments and results 26
4-1 Datasets 26
4-2 Baseline model 27
4-3 Evaluation metrics 27
4-4 Experimental setting 29
4-5 Experimental results 31
4-7 Sensitivity analysis 33
4-8 Ablation study 36
5. Conclusion 38
5-1 Limitations and future work 39
Reference 41

參考文獻

[1] W.-H. Cheng, “Fashion Meets Computer Vision,” in Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation, Lisboa Portugal: ACM, Oct. 2022, pp. 1–1. doi: 10.1145/3552468.3554360.
[2] Statista, “Fashion - Worldwide | Statista Market Forecast,” Statista. Accessed: Oct. 10, 2023. [Online]. Available: https://www.statista.com/outlook/dmo/ecommerce/fashion/worldwide
[3] Y. Deldjoo, F. Nazary, A. Ramisa, J. Mcauley, G. Pellegrini, A. Bellogin, and T. Di Noia, “A Review of Modern Fashion Recommender Systems.” arXiv, Sep. 12, 2023. Accessed: Sep. 26, 2023. [Online]. Available: http://arxiv.org/abs/2202.02757
[4] H. K. Lee and H. J. Choo, “Daily outfit satisfaction: the effects of self and others’ evaluation on satisfaction with what I wear today,” Int. J. Consum. Stud., vol. 39, no. 3, pp. 261–268, May 2015, doi: 10.1111/ijcs.12174.
[5] R. Feng, “To Become Fashionable: A Brief Review of Outfit Compatibility,” in 2020 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), Shenyang, China: IEEE, Dec. 2020, pp. 219–225. doi: 10.1109/TOCS50858.2020.9339690.
[6] H. Ghodhbani, M. Neji, I. Razzak, and A. M. Alimi, “You can try without visiting: a comprehensive survey on virtually try-on outfits,” Multimed. Tools Appl., vol. 81, no. 14, pp. 19967–19998, Jun. 2022, doi: 10.1007/s11042-022-12802-6.
[7] W. Guan, F. Jiao, X. Song, H. Wen, C.-H. Yeh, and X. Chang, “Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid Spain: ACM, Jul. 2022, pp. 482–491. doi: 10.1145/3477495.3532038.
[8] P. Jing, K. Cui, W. Guan, L. Nie, and Y. Su, “Category-aware Multimodal Attention Network for Fashion Compatibility Modeling,” IEEE Trans. Multimed., pp. 1–12, 2023, doi: 10.1109/TMM.2023.3246796.
[9] K. Matzen, K. Bala, and N. Snavely, “StreetStyle: Exploring world-wide clothing styles from millions of photos.” arXiv, Jun. 06, 2017. Accessed: Oct. 02, 2023. [Online]. Available: http://arxiv.org/abs/1706.01869
[10] V. Volokha and K. Bochenina, “Content-Aware Generative Model for Multi-item Outfit Recommendation,” in Computational Science – ICCS 2022, D. Groen, C. de Mulatier, M. Paszynski, V. V. Krzhizhanovskaya, J. J. Dongarra, and P. M. A. Sloot, Eds., in Lecture Notes in Computer Science. Cham: Springer International Publishing, 2022, pp. 164–177. doi: 10.1007/978-3-031-08751-6_12.
[11] X. Han, Z. Wu, Y.-G. Jiang, and L. S. Davis, “Learning Fashion Compatibility with Bidirectional LSTMs,” in Proceedings of the 25th ACM international conference on Multimedia, Oct. 2017, pp. 1078–1086. doi: 10.1145/3123266.3123394.
[12] X. Li, X. Wang, X. He, L. Chen, J. Xiao, and T.-S. Chua, “Hierarchical Fashion Graph Network for Personalized Outfit Recommendation.” arXiv, May 26, 2020. Accessed: Sep. 26, 2023. [Online]. Available: http://arxiv.org/abs/2005.12566
[13] Y. Ding, P. Y. Mok, Y. Ma, and Y. Bin, “Personalized fashion outfit generation with user coordination preference learning,” Inf. Process. Manag., vol. 60, no. 5, p. 103434, Sep. 2023, doi: 10.1016/j.ipm.2023.103434.
[14] Y. Li, L. Cao, J. Zhu, and J. Luo, “Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data,” IEEE Trans. Multimed., vol. 19, no. 8, pp. 1946–1955, Aug. 2017, doi: 10.1109/TMM.2017.2690144.
[15] S. Jandial, P. Badjatiya, P. Chawla, A. Chopra, M. Sarkar, and B. Krishnamurthy, “SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval.” arXiv, Oct. 19, 2021. Accessed: Sep. 25, 2023. [Online]. Available: http://arxiv.org/abs/2009.01485
[16] U. Mall, K. Matzen, B. Hariharan, N. Snavely, and K. Bala, “GeoStyle: Discovering Fashion Trends and Events.” arXiv, Aug. 29, 2019. Accessed: Oct. 02, 2023. [Online]. Available: http://arxiv.org/abs/1908.11412
[17] X. Song, S.-T. Fang, X. Chen, Y. Wei, Z. Zhao, and L. Nie, “Modality-Oriented Graph Learning Toward Outfit Compatibility Modeling,” IEEE Trans. Multimed., vol. 25, pp. 856–867, 2023, doi: 10.1109/TMM.2021.3134164.
[18] H. Zhan, J. Lin, K. E. Ak, B. Shi, L.-Y. Duan, and A. C. Kot, “$A^3$-FKG: Attentive Attribute-Aware Fashion Knowledge Graph for Outfit Preference Prediction,” IEEE Trans. Multimed., vol. 24, pp. 819–831, 2022, doi: 10.1109/TMM.2021.3059514.
[19] X. Wang, B. Wu, Y. Ye, and Y. Zhong, “Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network.” Aug. 21, 2019. doi: 10.1145/3343031.3350909.
[20] Y. Jiang, Q. Xu, and X. Cao, “Outfit Recommendation with Deep Sequence Learning,” in 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM), Xi’an: IEEE, Sep. 2018, pp. 1–5. doi: 10.1109/BigMM.2018.8499079.
[21] T. Nakamura and R. Goto, “Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder.” arXiv, Oct. 23, 2018. Accessed: Oct. 19, 2023. [Online]. Available: http://arxiv.org/abs/1807.03133
[22] Y. Chen, Z. Zhou, G. Lin, X. Chen, and Z. Su, “Personalized Outfit Compatibility Prediction based on Regional Attention,” in 2022 9th International Conference on Digital Home (ICDH), Guangzhou, China: IEEE, Oct. 2022, pp. 75–80. doi: 10.1109/ICDH57206.2022.00019.
[23] Z. Cui, Z. Li, S. Wu, X. Zhang, and L. Wang, “Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks,” in The World Wide Web Conference, May 2019, pp. 307–317. doi: 10.1145/3308558.3313444.
[24] X. Yang, Y. Ma, L. Liao, M. Wang, and T.-S. Chua, “TransNFCM: Translation-Based Neural Fashion Compatibility Modeling.” arXiv, Dec. 24, 2018. Accessed: Oct. 20, 2023. [Online]. Available: http://arxiv.org/abs/1812.10021
[25] Gemini Team, and Google, “Gemini: A Family of Highly Capable Multimodal Models.” arXiv, Dec. 18, 2023. doi: 10.48550/arXiv.2312.11805.
[26] 蔡素滿 and 易毅成, “職場女性個人形象與服裝衣譜之研究 -以屏東勞工大學時尚彩妝班學員為例.” Accessed: Mar. 13, 2024. [Online]. Available: http://ir.nptu.edu.tw/retrieve/22776/103NPTU0785013-001.pdf
[27] Ann.W, “從色彩學認識穿搭配色的5大技巧，不要再只會穿黑、白、灰了！,” Pinkoi 設計誌. Accessed: Mar. 13, 2024. [Online]. Available: https://blog.pinkoi.com/tw/fashion-beauty/2108-outfit-color-match/
[28] E. Liu, “「最美五套」質感人生穿搭：流行預測師的低管理高時尚法則，小衣櫥就能讓你美翻了,” 博客來. Accessed: Mar. 13, 2024. [Online]. Available: https://www.books.com.tw/products/0010892552
[29] 二神弓子 and 蘇暐婷, “骨架分析Ｘ基因色彩＝史上最強最美穿搭術,” 博客來. Accessed: Mar. 13, 2024. [Online]. Available: https://www.books.com.tw/products/0010954085
[30] J. D. Bláha and Z. Štěrba, “Colour Contrast in Cartographic Works Using the Principles of Johannes Itten: The Cartographic Journal: Vol 51, No 3.” Accessed: May 07, 2024. [Online]. Available: https://www.tandfonline.com/doi/abs/10.1179/1743277414Y.0000000084
[31] H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual Instruction Tuning.” arXiv, Dec. 11, 2023. doi: 10.48550/arXiv.2304.08485.
[32] H. Liu, C. Li, Y. Li, and Y. J. Lee, “Improved Baselines with Visual Instruction Tuning.” arXiv, Oct. 05, 2023. doi: 10.48550/arXiv.2310.03744.
[33] W.-L. Hsiao and K. Grauman, “ViBE: Dressing for Diverse Body Shapes.” arXiv, Mar. 28, 2020. Accessed: Oct. 03, 2023. [Online]. Available: http://arxiv.org/abs/1912.06697
[34] C. Packer, J. McAuley, and A. Ramisa, “Visually-Aware Personalized Recommendation using Interpretable Image Representations.” arXiv, Aug. 21, 2018. Accessed: Oct. 03, 2023. [Online]. Available: http://arxiv.org/abs/1806.09820
[35] W. Chen, P. Huang, J. Xu, X. Guo, C. Guo, F. Sun, C. Li, A. Pfadler, H. Zhao, and B. Zhao, “POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion.” arXiv, May 19, 2019. Accessed: Oct. 02, 2023. [Online]. Available: http://arxiv.org/abs/1905.01866
[36] V. Jagadeesh, R. Piramuthu, A. Bhardwaj, W. Di, and N. Sundaresan, “Large Scale Visual Recommendations From Street Fashion Images.” arXiv, Jan. 08, 2014. Accessed: Oct. 02, 2023. [Online]. Available: http://arxiv.org/abs/1401.1778
[37] R. Anil, R. Walambe, S. Ramanna, and K. Kotecha, “Multimodal Co-learning: Challenges, applications with datasets, recent advances and future directions | EndNote Click.” Accessed: Oct. 13, 2023. [Online]. Available: https://click.endnote.com/viewer?doi=10.1016%2Fj.inffus.2021.12.003&token=WzQwMDM3NDgsIjEwLjEwMTYvai5pbmZmdXMuMjAyMS4xMi4wMDMiXQ.N2y_BQTKWuSgy20Lg1UxNgejRis
[38] W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Som, and F. Wei, “Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks.” arXiv, Aug. 30, 2022. Accessed: Oct. 13, 2023. [Online]. Available: http://arxiv.org/abs/2208.10442
[39] X. Song, X. Han, Y. Li, J. Chen, X.-S. Xu, and L. Nie, “GP-BPR: Personalized Compatibility Modeling for Clothing Matching,” in Proceedings of the 27th ACM International Conference on Multimedia, Nice France: ACM, Oct. 2019, pp. 320–328. doi: 10.1145/3343031.3350956.
[40] S. Arunkumar, G. Deepak, J. S. Priyadarshini, and A. Santhanavijayan, “PMFRO: Personalized Men’s Fashion Recommendation Using Dynamic Ontological Models,” in Hybrid Intelligent Systems, vol. 647, A. Abraham, T.-P. Hong, K. Kotecha, K. Ma, P. Manghirmalani Mishra, and N. Gandhi, Eds., in Lecture Notes in Networks and Systems, vol. 647. , Cham: Springer Nature Switzerland, 2023, pp. 96–105. doi: 10.1007/978-3-031-27409-1_9.
[41] M. A. Stefani, V. Stefanis, and J. Garofalakis, “CFRS: A Trends-Driven Collaborative Fashion Recommendation System,” in 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), PATRAS, Greece: IEEE, Jul. 2019, pp. 1–4. doi: 10.1109/IISA.2019.8900681.
[42] X. Chen, H. Chen, H. Xu, Y. Zhang, Y. Cao, Z. Qin, and H. Zha, “Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network: Towards Visually Explainable Recommendation,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris France: ACM, Jul. 2019, pp. 765–774. doi: 10.1145/3331184.3331254.
[43] M. Hou, L. Wu, E. Chen, Z. Li, V. W. Zheng, and Q. Liu, “Explainable Fashion Recommendation: A Semantic Attribute Region Guided Approach.” arXiv, Jun. 27, 2019. Accessed: Oct. 19, 2023. [Online]. Available: http://arxiv.org/abs/1905.12862
[44] X. Yang, X. Song, F. Feng, H. Wen, L.-Y. Duan, and L. Nie, “Attribute-wise Explainable Fashion Compatibility Modeling,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 17, no. 1, pp. 1–21, Feb. 2021, doi: 10.1145/3425636.
[45] R. Guigourès, Y. K. Ho, E. Koriagin, A.-S. Sheikh, U. Bergmann, and R. Shirvany, “A hierarchical bayesian model for size recommendation in fashion,” in Proceedings of the 12th ACM Conference on Recommender Systems, Vancouver British Columbia Canada: ACM, Sep. 2018, pp. 392–396. doi: 10.1145/3240323.3240388.
[46] A.-S. Sheikh, R. Guigoures, E. Koriagin, Y. K. Ho, R. Shirvany, R. Vollgraf, and U. Bergmann, “A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce,” in Proceedings of the 13th ACM Conference on Recommender Systems, Sep. 2019, pp. 110–118. doi: 10.1145/3298689.3347006.
[47] K. Hajjar, J. Lasserre, A. Zhao, and R. Shirvany, “Attention Gets You the Right Size and Fit in Fashion,” in Recommender Systems in Fashion and Retail, N. Dokoohaki, S. Jaradat, H. J. Corona Pampín, and R. Shirvany, Eds., in Lecture Notes in Electrical Engineering. Cham: Springer International Publishing, 2021, pp. 77–98. doi: 10.1007/978-3-030-66103-8_5.
[48] J. McAuley, C. Targett, Q. Shi, and A. van den Hengel, “Image-based Recommendations on Styles and Substitutes.” arXiv, Jun. 15, 2015. Accessed: Sep. 26, 2023. [Online]. Available: http://arxiv.org/abs/1506.04757
[49] X. Han, X. Song, J. Yin, Y. Wang, and L. Nie, “Prototype-guided Attribute-wise Interpretable Scheme for Clothing Matching,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris France: ACM, Jul. 2019, pp. 785–794. doi: 10.1145/3331184.3331245.
[50] S. Lu, X. Zhu, Y. Wu, X. Wan, and F. Gao, “Outfit compatibility prediction with multi-layered feature fusion network,” Pattern Recognit. Lett., vol. 147, pp. 150–156, Jul. 2021, doi: 10.1016/j.patrec.2021.04.009.
[51] Y. Wang, “Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 17, no. 1s, pp. 1–25, Jan. 2021, doi: 10.1145/3408317.
[52] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[53] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” Proc. AAAI Conf. Artif. Intell., vol. 31, no. 1, Feb. 2017, doi: 10.1609/aaai.v31i1.11231.
[54] C. Yu, Y. Hu, Y. Chen, and B. Zeng, “Personalized Fashion Design,” IEEE, Feb. 2020, doi: 10.1109/ICCV.2019.00914.
[55] B. S. Vivek, G. Bhattacharya, J. Gubbi, B. L. V, A. Pal, and P. Balamuralidhar, “Personalized Outfit Compatibility Prediction Using Outfit Graph Network,” in 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia: IEEE, Jun. 2023, pp. 1–8. doi: 10.1109/IJCNN54540.2023.10191458.
[56] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Generative Pre-Training,” 2018.
[57] J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. Menick, S. Borgeaud, A. Brock, A. Nematzadeh, S. Sharifzadeh, M. Binkowski, R. Barreira, O. Vinyals, A. Zisserman, and K. Simonyan, “Flamingo: a Visual Language Model for Few-Shot Learning.” arXiv, Nov. 15, 2022. doi: 10.48550/arXiv.2204.14198.
[58] J. Li, D. Li, C. Xiong, and S. Hoi, “BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.” arXiv, Feb. 15, 2022. Accessed: Sep. 26, 2023. [Online]. Available: http://arxiv.org/abs/2201.12086
[59] D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “PaLM-E: An Embodied Multimodal Language Model.” arXiv, Mar. 06, 2023. doi: 10.48550/arXiv.2303.03378.
[60] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” ArXiv170603762 Cs, Dec. 2017, Accessed: Mar. 22, 2022. [Online]. Available: http://arxiv.org/abs/1706.03762
[61] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-Shot Text-to-Image Generation.” arXiv, Feb. 26, 2021. doi: 10.48550/arXiv.2102.12092.
[62] J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan, B. Hutchinson, W. Han, Z. Parekh, X. Li, H. Zhang, J. Baldridge, and Y. Wu, “Scaling Autoregressive Models for Content-Rich Text-to-Image Generation.” arXiv, Jun. 21, 2022. doi: 10.48550/arXiv.2206.10789.
[63] A. Jangra, S. Mukherjee, A. Jatowt, S. Saha, and M. Hasanuzzaman, “A Survey on Multi-modal Summarization,” ACM Comput. Surv., vol. 55, no. 13s, pp. 1–36, Dec. 2023, doi: 10.1145/3584700.
[64] J. Zhang, Z. Yin, P. Chen, and S. Nichele, “Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review,” Inf. Fusion, vol. 59, pp. 103–126, Jul. 2020, doi: 10.1016/j.inffus.2020.01.011.
[65] X. Song, C. Wang, C. Sun, S. Feng, M. Zhou, and L. Nie, “MM-FRec: Multi-Modal Enhanced Fashion Item Recommendation,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 10, pp. 10072–10084, Oct. 2023, doi: 10.1109/TKDE.2023.3266423.
[66] W. Guan, H. Wen, X. Song, C.-H. Yeh, X. Chang, and L. Nie, “Multimodal Compatibility Modeling via Exploring the Consistent and Complementary Correlations,” in Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event China: ACM, Oct. 2021, pp. 2299–2307. doi: 10.1145/3474085.3475392.
[67] R. Tan, M. I. Vasileva, K. Saenko, and B. A. Plummer, “Learning Similarity Conditions Without Explicit Supervision.” arXiv, Aug. 22, 2019. Accessed: Oct. 03, 2023. [Online]. Available: http://arxiv.org/abs/1908.08589
[68] M. I. Vasileva, B. A. Plummer, K. Dusad, S. Rajpal, R. Kumar, and D. Forsyth, “Learning Type-Aware Embeddings for Fashion Compatibility.” arXiv, Jul. 27, 2018. doi: 10.48550/arXiv.1803.09196.
[69] X. Yang, P. Ramesh, R. Chitta, S. Madhvanath, E. A. Bernal, and J. Luo, “Deep Multimodal Representation Learning from Temporal Data.” arXiv, Apr. 11, 2017. Accessed: Oct. 20, 2023. [Online]. Available: http://arxiv.org/abs/1704.03152
[70] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.” arXiv, Sep. 02, 2014. Accessed: Oct. 20, 2023. [Online]. Available: http://arxiv.org/abs/1406.1078
[71] D. Wang, P. Cui, M. Ou, and W. Zhu, “Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure,” IEEE Trans. Multimed., vol. 17, no. 9, pp. 1404–1416, Sep. 2015, doi: 10.1109/TMM.2015.2455415.
[72] X. Ju, D. Zhang, R. Xiao, J. Li, S. Li, M. Zhang, and G. Zhou, “VLMo: Unified Vision-Language Pre-Training,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, pp. 4395–4405. doi: 10.18653/v1/2021.emnlp-main.360.
[73] Z. Peng, L. Dong, H. Bao, Q. Ye, and F. Wei, “BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers.” arXiv, Oct. 03, 2022. doi: 10.48550/arXiv.2208.06366.
[74] T. Kudo and J. Richardson, “SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing.” arXiv, Aug. 19, 2018. doi: 10.48550/arXiv.1808.06226.
[75] R. Sennrich, B. Haddow, and A. Birch, “Neural Machine Translation of Rare Words with Subword Units.” arXiv, Jun. 10, 2016. doi: 10.48550/arXiv.1508.07909.
[76] T. Kudo, “Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates.” arXiv, Apr. 29, 2018. doi: 10.48550/arXiv.1804.10959.
[77] X. Song, F. Feng, X. Han, X. Yang, W. Liu, and L. Nie, “Neural Compatibility Modeling with Attentive Knowledge Distillation.” arXiv, Apr. 16, 2018. Accessed: Sep. 26, 2023. [Online]. Available: http://arxiv.org/abs/1805.00313
[78] X. Zou, Z. Li, K. Bai, D. Lin, and W. Wong, “Regularizing Reasons for Outfit Evaluation with Gradient Penalty.” arXiv, Feb. 02, 2020. doi: 10.48550/arXiv.2002.00460.
[79] N. Zheng, X. Song, Q. Niu, X. Dong, Y. Zhan, and L. Nie, “Collocation and Try-on Network: Whether an Outfit is Compatible,” in Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event China: ACM, Oct. 2021, pp. 309–317. doi: 10.1145/3474085.3475691.
[80] R. Sarkar, N. Bodla, M. I. Vasileva, Y.-L. Lin, A. Beniwal, A. Lu, and G. Medioni, “OutfitTransformer: Learning Outfit Representations for Fashion Recommendation.” arXiv, Apr. 15, 2022. Accessed: May 05, 2024. [Online]. Available: http://arxiv.org/abs/2204.04812
[81] A. Veit, B. Kovacs, S. Bell, J. McAuley, K. Bala, and S. Belongie, “Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences,” in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile: IEEE, Dec. 2015, pp. 4642–4650. doi: 10.1109/ICCV.2015.527.
[82] Y.-L. Lin, S. Tran, and L. S. Davis, “Fashion Outfit Complementary Item Retrieval,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA: IEEE, Jun. 2020, pp. 3308–3316. doi: 10.1109/CVPR42600.2020.00337.
[83] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization.” arXiv, Jan. 29, 2017. doi: 10.48550/arXiv.1412.6980.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2024-7-16

推文