身分感知與高效對齊之文字到圖像人物檢索;IDEA: IDentity-aware and Efficient Alignment for Text-to-Image Person Retrieval

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/98232

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98232

題名:	身分感知與高效對齊之文字到圖像人物檢索;IDEA: IDentity-aware and Efficient Alignment for Text-to-Image Person Retrieval
作者:	胡以諾;Hu, Yi-Nuo
貢獻者:	資訊工程學系
關鍵詞:	文字到圖像人物檢索;對比學習;視覺語言預訓練模型;Text-to-Image Person Retrieval;Contrastive Learning;Vision-Language Pre-training
日期:	2025-07-14
上傳時間:	2025-10-17 12:31:27 (UTC+8)
出版者:	國立中央大學
摘要:	隨著跨模態檢索技術的發展，文字到圖像人物檢索 (Text-to-Image Person Retrieval, TIPR) 已成為公共安全與多媒體檢索中的關鍵任務。該任務旨在根據自然語言描述，從龐大的圖像資料庫中準確找出相符的人物圖像。然而，現有方法在處理細粒度語意對齊、弱正樣本標註錯誤以及模態內學習過度集中在自我樣本與訓練效率等方面仍存在挑戰，導致檢索性能與泛化能力受限。為此，本文提出一個具身分感知與高效對齊能力的檢索架構 IDEA (IDentity-aware and Efficient Alignment) 以解決上述問題。首先，在對比學習策略上提出身分感知圖文對比學習 (IDA-ITC)，結合 Text-to-Image、Image-to-Text 與 Image-to-Image 三個方向的對比學習，並搭配身分感知取樣法在遮蔽自我樣本的前提下保證 mini-batch 內存在除了自己以外的正樣本，有效緩解同模態特徵過度集中於自我樣本的問題。其次，為避免將同一圖像下的多句正確描述誤標為弱正樣本，本文進一步提出去重複關係感知 (DRA)，提升圖文比對準確度並強化模型對語意差異的判別能力。最後，導入混精度訓練以提升訓練效率，實驗證明在維持檢索性能的情況下，可顯著降低計算資源消耗。實驗結果顯示，本文提出的方法在 RSTPReid 資料集上成功超越現有最先進 (state-of-the-art) 的方法，R@1 提升 2.67\%，並在 CUHK-PEDES、ICFG-PEDES 等其他資料集上亦展現出具競爭力的性能。;With the growing demand for cross-modal retrieval, Text-to-Image Person Retrieval (TIPR) has become a critical task in public safety and multimedia search. This task aims to locate specific person images from large-scale databases based on natural language descriptions. However, existing methods face challenges such as insufficient fine-grained alignment, mislabeled samples, and overfitting to self-sample during training.To address these issues, we propose IDEA (IDentity-aware and Efficient Alignment), a framework that improves both semantic alignment and training efficiency. We introduce Identity-Aware Image-Text Contrastive learning (IDA-ITC), which combines contrastive objectives from multiple directions with an identity-aware sampling strategy to ensure valid positive samples and avoid over-reliance on self-sample. Additionally, we propose Deduplicated Relation-Aware (DRA) to correct mislabeled weak positive samples and improve the model′s ability to distinguish fine-grained semantic differences. Mixed-precision training is also applied to improve efficiency without sacrificing performance. Experiments show that our method outperforms existing methods on the RSTPReid dataset, improving R@1 by 2.67\%, and achieves competitive results on CUHK-PEDES and ICFG-PEDES.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	265	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....