Towards robustness of post hoc Explainable AI methods

From ISLAB/CAISR
Revision as of 07:53, 26 September 2023 by Islab (talk | contribs) (Created page with "{{StudentProjectTemplate |Summary=Post Hoc explanation methods like LIME[1] and SHAP[2], due to their internal perturbation mechanisms, are shown to be susceptible to adversar...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Title Towards robustness of post hoc Explainable AI methods
Summary [[OneLineSummary::Post Hoc explanation methods like LIME[1] and SHAP[2], due to their internal perturbation mechanisms, are shown to be susceptible to adversarial attacks [3, 4]. This means that, for example, a biased method can be altered maliciously in a way to fool explanation methods so that it appears as unbiased[5]. Furthermore, there are methods for fooling Partial Dependence Plot (PDP)[6] and Gradient-Based approaches[7], which propose attacks according to each method’s weaknesses[8, 9]. Almost every industrial sector leans towards adopting AI. However, there is a barrier of trust to AI which can be alleviated by Explainable AI; therefore it is of immense importance to make XAI methods robust to adversarial attacks. This project aims at exploring and equipping a chosen Post hoc XAI method with a mechanism to make them robust to adversarial attacks[10].]]
Keywords Explainable AI, Robustness, Adversarial attacksProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user.
TimeFrame Fall 2023
References [[References::1] M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?" explaining the predic-

tions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144. [2] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017. [3] U. Aïvodji, S. Hara, M. Marchand, F. Khomh et al., “Fooling shap with stealthily biased sampling,” in The Eleventh International Conference on Learning Representations, 2022. [4] G. Laberge, U. Aïvodji, S. Hara, F. Khomh et al., “Fool shap with stealthily biased sampling,” arXiv preprint arXiv:2205.15419, 2022. [5] D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, “Fooling lime and shap: Adversarial attacks on post hoc explanation methods,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020, pp. 180–186. [6] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001. [7] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013. [8] H. Baniecki, W. Kretowicz, and P. Biecek, “Fooling partial dependence via data poisoning,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2022, pp. 121–136. [9] B. Dimanov, U. Bhatt, M. Jamnik, and A. Weller, “You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods,” in ECAI 2020. IOS Press, 2020, pp. 2473–2480. [10] S. Saito, E. Chua, N. Capel, and R. Hu, “Improving lime robustness with smarter locality sampling,” arXiv preprint arXiv:2006.12302, 2020.]]

Prerequisites
Author
Supervisor Parisa Jamshidi, Peyman Mashhadi, Jens Lundström
Level Master
Status Open