PXGen:生成模型的事後可解釋方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：50

、訪客IP：18.227.134.133

姓名

黃彥龍(Yen-Lung Huang) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

PXGen:生成模型的事後可解釋方法
(PXGen:A Post-hoc Explainable Method for Generative Models)

相關論文

★ 多機器人在樹結構上最小化最大延遲巡邏調度	★ 使用時序圖卷積網絡進行環境異常檢測
★ 隨機性巡邏排程對抗具有不同攻擊時長的敵手

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-7-31以後開放)

摘要(中)

隨著生成式人工智慧在眾多應用中的迅速成長，可解釋性人工智慧（XAI）在生成式
人工智慧技術的發展和部署中扮演著至關重要的角色，賦予使用者理解、信任和有效
利用這些強大工具的能力，同時最小化潛在風險和偏見。近年來，可解釋性人工智慧
（XAI）取得了顯著的進步和廣泛的應用，這反映出大家共同努力提高人工智慧系統的
透明度、可解釋性和可信度。最近的研究強調，一個成熟的XAI方法應遵循一套標準，
主要聚焦於兩個關鍵領域。首先，它應確保解釋的品質和流暢性，涵蓋如忠實性、合
理性、完整性和針對個體需求的定制等方面。其次，XAI系統或機制的設計原則應該涵
蓋以下因素，例如可靠性、韌性、其輸出的可驗證性以及其算法的透明度。然而，針
對生成模型的XAI研究相對稀少，對於這樣的方法如何有效在該領域滿足這些標準的探
索不多。
在這篇論文中，我們提出了PXGen，一種針對生成模型的事後可解釋方法。給定
一個需要解釋的模型，PXGen為解釋準備了兩種項目：“ 錨點集”以及“ 內在和外在指
標”。這些項目可以根據使用者的目的和需求進行自定義。通過計算每個指標，每個錨
點都有一組特徵值，且PXGen根據所有錨點的特徵值提供基於實體的解釋方法，並通
過如k-dispersion或k-center這樣的容易駕馭的演算法向使用者展示和視覺化。在這個框
架下，PXGen處理了上述需求並提供額外好處，如低執行時間、不需介入模型訓練，
等等......。根據我們的評估顯示，與最先進的方法相比，PXGen可以很好地找到代表性
的訓練樣本。

摘要(英)

With the rapid growth of generative AI in numerous applications, explainable AI (XAI)
plays a crucial role in ensuring the responsible development and deployment of generative
AI technologies, empowering users to understand, trust, and effectively utilize these
powerful tools while minimizing potential risks and biases. Explainable AI (XAI) has
undergone notable advancements and widespread adoption in recent years, reflecting a
concerted push to enhance the transparency, interpretability, and credibility of AI systems.
Recent research emphasizes that a proficient XAI method should adhere to a set
of criteria, primarily focusing on two key areas. Firstly, it should ensure the quality and
fluidity of explanations, encompassing aspects like faithfulness, plausibility, completeness,
and tailoring to individual needs. Secondly, the design principle of the XAI system or
mechanism should cover the following factors such as reliability, resilience, the verifiability
of its outputs, and the transparency of its algorithm. However, research in XAI for
generative models remains relatively scarce, with little exploration into how such methods
can effectively meet these criteria in that domain.
In this work, we propose PXGen, a post-hoc explainable method for generative models.
Given a model that needs to be explained, PXGen prepares two materials for the
explanation, the Anchor set and intrinsic & extrinsic criteria. Those materials are customizable
by users according to their purpose and requirements. Via the calculation of
each criterion, each anchor has a set of feature values and PXGen provides examplebased
explanation methods according to the feature values among all the anchors and
illustrated and visualized to the users via tractable algorithms such as k-dispersion or
k-center. Under this framework, PXGen addresses the abovementioned desiderata and
provides additional benefits with low execution time, no additional access requirement,
etc. Our evaluation shows that PXGen can find representative training samples well
compared with the state-of-the-art.

關鍵字(中)

★ 可解釋的人工智慧
★ 生成式人工智慧
★ 變分自編碼器
★ 事後解釋

關鍵字(英)

★ XAI
★ generative AI
★ VAE
★ post-hoc explanation

論文目次

1 Introduction 1
2 Related Work 4
3 Framework 6
3.1 Preparation Phase 6
3.2 Analysis Phase 8
3.3 Discovery Phase 8
4 Demonstration : Variational AutoEncode 9
4.1 Primer of VAE 9
4.2 The Model, Anchor Set, and criteria 11
4.3 Analysis Phase: Classifying Anchors and Analysis 13
4.4 Discover Phase: Visualizing the most Characteristic Anchors 15
5 Demonstration : Soft-IntroVAE 17
5.1 The Model, Anchor Set, and criteria 17
5.2 Analysis Phase: Classifying Anchors and Analysis 18
5.3 Discover Phase: Visualizing the most Characteristic Anchors 21
6 Evaluation 22
6.1 Finding Representative Training Samples 22
6.2 Investigating Authorship 25
7 Conclusion 26
Bibliography 27

參考文獻

[1] Alejandro Barredo Arrieta, Natalia D´ıaz-Rodr´ıguez, Javier Del Ser, Adrien Bennetot,
Siham Tabik, Alberto Barbado, Salvador Garc´ıa, Sergio Gil-L´opez, Daniel Molina,
Richard Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies,
opportunities and challenges toward responsible ai. Information fusion,
58:82–115, 2020.
[2] Eric Bauer and Ron Kohavi. An empirical comparison of voting classification algorithms:
Bagging, boosting, and variants. Machine learning, 36:105–139, 1999.
[3] David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review
for statisticians. Journal of the American statistical Association, 112(518):859–877,
2017.
[4] Francesco Bodria, Fosca Giannotti, Riccardo Guidotti, Francesca Naretto, Dino Pedreschi,
and Salvatore Rinzivillo. Benchmarking and survey of explanation methods
for black box models. Data Mining and Knowledge Discovery, 37(5):1719–1778, 2023.
[5] Philippe M Burlina, Neil Joshi, Katia D Pacheco, TY Alvin Liu, and Neil M Bressler.
Assessment of deep generative models for high-resolution synthetic retinal image
generation of age-related macular degeneration. JAMA ophthalmology, 137(3):258–
264, 2019.
[6] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Max-sum diversity via
convex programming. arXiv preprint arXiv:1511.07077, 2015.
[7] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Local search for maxsum
diversification. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium
on Discrete Algorithms, pages 130–142. SIAM, 2017.
[8] Raymond Chen. On mentzer’s hardness of the k-center problem on the euclidean
plane. 2021.
[9] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta,
and Anil A Bharath. Generative adversarial networks: An overview. IEEE
signal processing magazine, 35(1):53–65, 2018.
[10] Tal Daniel and Aviv Tamar. Soft-introvae: Analyzing and improving the introspective
variational autoencoder. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 4391–4400, 2021.
[11] Ryan Daws. Medical chatbot using openai’s gpt-3 told a fake patient to kill themselves.
AI News, 2020.
[12] Li Deng. The mnist database of handwritten digit images for machine learning
research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012.
[13] Carl Doersch. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908,
2016.
[14] Chris Donahue, Julian McAuley, and Miller Puckette. Adversarial audio synthesis.
arXiv preprint arXiv:1802.04208, 2018.
[15] Rudresh Dwivedi, Devam Dave, Het Naik, Smiti Singhal, Rana Omer, Pankesh Patel,
Bin Qian, Zhenyu Wen, Tejal Shah, Graham Morgan, et al. Explainable ai (xai):
Core ideas, techniques, and solutions. ACM Computing Surveys, 55(9):1–33, 2023.
[16] Tom´as Feder and Daniel Greene. Optimal algorithms for approximate clustering. In
Proceedings of the twentieth annual ACM symposium on Theory of computing, pages
434–444, 1988.
[17] Giorgio Franceschelli and Mirco Musolesi. Copyright in generative deep learning.
Data & Policy, 4:e17, 2022.
[18] Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li,
Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality
3d textured shapes learned from images. Advances In Neural Information Processing
Systems, 35:31841–31854, 2022.
[19] Albert Gatt and Emiel Krahmer. Survey of the state of the art in natural language
generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence
Research, 61:65–170, 2018.
[20] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.
Advances in neural information processing systems, 27, 2014.
[21] Riccardo Guidotti. Counterfactual explanations and how to find them: literature
review and benchmarking. Data Mining and Knowledge Discovery, pages 1–55, 2022.
[22] David Gunning, Mark Stefik, Jaesik Choi, Timothy Miller, Simone Stumpf, and
Guang-Zhong Yang. Xai—explainable artificial intelligence. Science robotics,
4(37):eaay7120, 2019.
[23] Han Guo, Nazneen Fatema Rajani, Peter Hase, Mohit Bansal, and Caiming Xiong.
Fastif: Scalable influence functions for efficient model interpretation and debugging.
arXiv preprint arXiv:2012.15781, 2020.
[24] Isobel Asher Hamilton. An ai tool which reconstructed a pixelated picture of barack
obama to look like a white man perfectly illustrates racial bias in algorithms. Business
Insider, 2020.
[25] Zayd Hammoudeh and Daniel Lowd. Training data influence analysis and estimation:
A survey. arXiv preprint arXiv:2212.04612, 2022.
[26] Refael Hassin, Shlomi Rubinstein, and Arie Tamir. Approximation algorithms for
maximum dispersion. Operations research letters, 21(3):133–137, 1997.
[27] Xin He, Kaiyong Zhao, and Xiaowen Chu. Automl: A survey of the state-of-the-art.
Knowledge-based systems, 212:106622, 2021.
[28] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp
Hochreiter. Gans trained by a two time-scale update rule converge to a local nash
equilibrium. Advances in neural information processing systems, 30, 2017.
[29] Irina Higgins, Loic Matthey, Arka Pal, Christopher P Burgess, Xavier Glorot,
Matthew M Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning
basic visual concepts with a constrained variational framework. ICLR (Poster),
3, 2017.
[30] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.
Advances in neural information processing systems, 33:6840–6851, 2020.
[31] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv
preprint arXiv:1312.6114, 2013.
[32] Diederik P Kingma, Max Welling, et al. An introduction to variational autoencoders.
Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
[33] Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence
functions. In International conference on machine learning, pages 1885–1894. PMLR,
2017.
[34] Zhifeng Kong and Kamalika Chaudhuri. Understanding instance-based interpretability
of variational auto-encoders. Advances in Neural Information Processing Systems,
34:2400–2412, 2021.
[35] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from
tiny images. 2009.
[36] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals
of mathematical statistics, 22(1):79–86, 1951.
[37] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at
scale. arXiv preprint arXiv:1611.01236, 2016.
[38] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable
ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2020.
[39] James Lucas, George Tucker, Roger B Grosse, and Mohammad Norouzi. Don’t
blame the elbo! a linear vae perspective on posterior collapse. Advances in Neural
Information Processing Systems, 32, 2019.
[40] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions.
Advances in neural information processing systems, 30, 2017.
[41] Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint
arXiv:2208.11970, 2022.
[42] Qing Lyu, Marianna Apidianaki, and Chris Callison-Burch. Towards faithful model
explanation in nlp: A survey. Computational Linguistics, pages 1–70, 2024.
[43] Stuart G Mentzer. Approximability of metric clustering problems. Unpublished
manuscript, March, 2016.
[44] Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to
linear regression analysis. John Wiley & Sons, 2021.
[45] Leif E Peterson. K-nearest neighbor. Scholarpedia, 4(2):1883, 2009.
[46] Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating
training data influence by tracing gradient descent. Advances in Neural Information
Processing Systems, 33:19920–19930, 2020.
[47] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?”
explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD
international conference on knowledge discovery and data mining, pages 1135–1144,
2016.
[48] Johanes Schneider and Joshua Handali. Personalized explanation in machine learning:
A conceptualization. arXiv preprint arXiv:1901.00770, 2019.
[49] Johannes Schneider. Explainable generative ai (genxai): A survey, conceptualization,
and research agenda. arXiv preprint arXiv:2404.09554, 2024.
[50] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam,
Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep
networks via gradient-based localization. In Proceedings of the IEEE international
conference on computer vision, pages 618–626, 2017.
[51] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional
networks: Visualising image classification models and saliency maps. arXiv preprint
arXiv:1312.6034, 2013.
[52] Yan-Yan Song and LU Ying. Decision tree methods: applications for classification
and prediction. Shanghai archives of psychiatry, 27(2):130, 2015.
[53] Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi,
Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias
Nießner, et al. State of the art on neural rendering. In Computer Graphics Forum,
volume 39, pages 701–727. Wiley Online Library, 2020.
[54] Andrea Tirinzoni, Riccardo Poiani, and Marcello Restelli. Sequential transfer in
reinforcement learning with a generative model. In International Conference on
Machine Learning, pages 9481–9492. PMLR, 2020.
[55] Luan Tran, Xi Yin, and Xiaoming Liu. Disentangled representation learning gan for
pose-invariant face recognition. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 1415–1424, 2017.
[56] Andrea Vattani. K-means requires exponentially many iterations even in the plane.
In Proceedings of the twenty-fifth annual symposium on Computational geometry,
pages 324–332, 2009.
[57] Andrey Voynov and Artem Babenko. Unsupervised discovery of interpretable directions
in the gan latent space. In International conference on machine learning, pages
9786–9796. PMLR, 2020.
[58] W Patrick Walters and Mark Murcko. Assessing the impact of generative ai on
medicinal chemistry. Nature biotechnology, 38(2):143–145, 2020.
[59] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality
assessment: from error visibility to structural similarity. IEEE transactions on image
processing, 13(4):600–612, 2004.
[60] Chih-Kuan Yeh, Joon Kim, Ian En-Hsu Yen, and Pradeep K Ravikumar. Representer
point selection for explaining deep neural networks. Advances in neural information
processing systems, 31, 2018.
[61] Kayo Yin and Graham Neubig. Interpreting language models with contrastive explanations.
arXiv preprint arXiv:2202.10419, 2022.
[62] Junbo Zhao, Michael Mathieu, and Yann LeCun. Energy-based generative adversarial
network. arXiv preprint arXiv:1609.03126, 2016.
[63] Joyce Zhou and Thorsten Joachims. How to explain and justify almost any decision:
Potential pitfalls for accountability in ai decision-making. In Proceedings of the 2023
ACM Conference on Fairness, Accountability, and Transparency, pages 12–21, 2023.

指導教授

楊晧琮(Hao-Tsung Yang)

審核日期

2024-7-31

推文