Towards robustness of post hoc Explainable AI methods
| Title | Towards robustness of post hoc Explainable AI methods |
|---|---|
| Summary | [[OneLineSummary::Post Hoc explanation methods like LIME[1] and SHAP[2], due to their internal perturbation mechanisms, are shown to be susceptible to adversarial attacks [3, 4]. This means that, for example, a biased method can be altered maliciously in a way to fool explanation methods so that it appears as unbiased[5]. Furthermore, there are methods for fooling Partial Dependence Plot (PDP)[6] and Gradient-Based approaches[7], which propose attacks according to each method’s weaknesses[8, 9]. Almost every industrial sector leans towards adopting AI. However, there is a barrier of trust to AI which can be alleviated by Explainable AI; therefore it is of immense importance to make XAI methods robust to adversarial attacks. This project aims at exploring and equipping a chosen Post hoc XAI method with a mechanism to make them robust to adversarial attacks[10].]] |
| Keywords | Explainable AI, Robustness, Adversarial attacksProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user. |
| TimeFrame | Fall 2023 |
| References | [[References::1] M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?" explaining the predic-
tions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144. [2] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017. [3] U. Aïvodji, S. Hara, M. Marchand, F. Khomh et al., “Fooling shap with stealthily biased sampling,” in The Eleventh International Conference on Learning Representations, 2022. [4] G. Laberge, U. Aïvodji, S. Hara, F. Khomh et al., “Fool shap with stealthily biased sampling,” arXiv preprint arXiv:2205.15419, 2022. [5] D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, “Fooling lime and shap: Adversarial attacks on post hoc explanation methods,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020, pp. 180–186. [6] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001. [7] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013. [8] H. Baniecki, W. Kretowicz, and P. Biecek, “Fooling partial dependence via data poisoning,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2022, pp. 121–136. [9] B. Dimanov, U. Bhatt, M. Jamnik, and A. Weller, “You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods,” in ECAI 2020. IOS Press, 2020, pp. 2473–2480. [10] S. Saito, E. Chua, N. Capel, and R. Hu, “Improving lime robustness with smarter locality sampling,” arXiv preprint arXiv:2006.12302, 2020.]] |
| Prerequisites | |
| Author | |
| Supervisor | Parisa Jamshidi, Peyman Mashhadi, Jens Lundström |
| Level | Master |
| Status | Open |