儘管擴散模型擁有強大的生成能力,但是在使用訓練資料進行生成的過程中仍大多 不透明。為此,我們提出一個新的可解釋性框架,透過分析輸入樣本在模型內部反向 擴散過程中的潛在軌跡,以識別其與模型所學分布的語意對齊程度。 我們定義一組錨 點樣本,並從 U-Net 卷積網絡的中間層提取不同時間步的表示,藉此構建潛在空間中 的軌跡。接著藉由使用內部與外部距離測量,將這些軌跡與模型生成輸出之軌跡進行 比較,我們可以推論模型對每個錨點的熟悉程度。本方法提供了內部與外部雙重視角 的可解釋性度量,並可用於進行訓練資料的事後歸因,而無需存取原始訓練集或額外 的監督資訊。我們也在 LSUN-bedroom 與 ImageNet 上訓練的模型中驗證了此方法的 有效性,並進一步將分析擴展至 Consistency Models,展現出我們解釋框架的可擴展性 與普適性。 本研究揭示了擴散模型所學語意結構的內在特性,並為生成式人工智慧中 的資料歸因問題提供新的解法方向。;Despite their remarkable generative capabilities, diffusion models remain largely opaque with respect to how they utilize training data during generation. In this work, we propose a novel explainability framework that analyzes the internal reverse diffusion trajectories of input samples to identify their semantic alignment with the model’s learned distribution. We define a set of anchor samples and extract intermediate representations from the bottleneck of the U-Net across all timesteps to form latent trajectories. By comparing these trajectories to those of generated outputs using both intrinsic (latent space) and extrinsic (output space) distance metrics, we infer the degree of model familiarity for each anchor. Our method provides a dual-perspective interpretability measure and enables post hoc identification of training data influence—without requiring access to the original training set or additional supervision. We demonstrate the effectiveness of our approach on models trained on CIFAR-10 and ImageNet, and further extend our analysis to the one- step Consistency Models, highlighting the scalability and generality of our interpretability framework. Our findings offer insights into the semantic structure learned by diffusion models and open up directions for data attribution in generative AI.