Towards robustness of post hoc Explainable AI methods

Title	Towards robustness of post hoc Explainable AI methods
Summary	[[OneLineSummary::Post Hoc explanation methods like LIME[1] and SHAP[2], due to their internal perturbation mechanisms, are shown to be susceptible to adversarial attacks [3, 4]. This means that, for example, a biased method can be altered maliciously in a way to fool explanation methods so that it appears as unbiased[5]. Furthermore, there are methods for fooling Partial Dependence Plot (PDP)[6] and Gradient-Based approaches[7], which propose attacks according to each method’s weaknesses[8, 9]. Almost every industrial sector leans towards adopting AI. However, there is a barrier of trust to AI which can be alleviated by Explainable AI; therefore it is of immense importance to make XAI methods robust to adversarial attacks. This project aims at exploring and equipping a chosen Post hoc XAI method with a mechanism to make them robust to adversarial attacks[10].]]
Keywords	Explainable AI, Robustness, Adversarial attacksProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user.
TimeFrame	Fall 2023
References	[[References::1] M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?" explaining the predic- tions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144. [2] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017. [3] U. Aïvodji, S. Hara, M. Marchand, F. Khomh et al., “Fooling shap with stealthily biased sampling,” in The Eleventh International Conference on Learning Representations, 2022. [4] G. Laberge, U. Aïvodji, S. Hara, F. Khomh et al., “Fool shap with stealthily biased sampling,” arXiv preprint arXiv:2205.15419, 2022. [5] D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, “Fooling lime and shap: Adversarial attacks on post hoc explanation methods,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020, pp. 180–186. [6] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001. [7] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013. [8] H. Baniecki, W. Kretowicz, and P. Biecek, “Fooling partial dependence via data poisoning,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2022, pp. 121–136. [9] B. Dimanov, U. Bhatt, M. Jamnik, and A. Weller, “You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods,” in ECAI 2020. IOS Press, 2020, pp. 2473–2480. [10] S. Saito, E. Chua, N. Capel, and R. Hu, “Improving lime robustness with smarter locality sampling,” arXiv preprint arXiv:2006.12302, 2020.]]
Prerequisites
Author
Supervisor	Parisa Jamshidi, Peyman Mashhadi, Jens Lundström
Level	Master
Status	Open

Towards robustness of post hoc Explainable AI methods

Navigation menu

Page actions

Page actions

Personal tools

Research

Education

Partners

People

Contact

Links

Internal

Tools

Search