Towards robustness of post hoc Explainable AI methods
| Title | Towards robustness of post hoc Explainable AI methods |
|---|---|
| Summary | [[OneLineSummary::Post Hoc explanation methods like LIME [1] and SHAP [2], due to their internal perturbation mechanisms, are shown to be susceptible to adversarial attacks [3, 4]. This means that, for example, a biased method can be altered maliciously in a way to fool explanation methods so that it appears as unbiased [5]. Furthermore, there are methods for fooling Partial Dependence Plot (PDP)[6] and Gradient-Based approaches7], which propose attacks according to each method's weaknesses [8, 9]. Almost every industrial sector leans towards adopting AI. However, there is a barrier of trust to AI which can be alleviated by Explainable AI; therefore it is of immense importance to make XAI methods robust to adversarial attacks. This project aims at exploring and equipping a chosen Post hoc XAI method with a mechanism to make them robust to adversarial attacks [10].]] |
| Keywords | Explainable AI, Robustness, Adversarial attacksProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user. |
| TimeFrame | Fall 2023 |
| References | [[References::[1] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.
[2] Lundberg, S., and S. I. Lee. "A unified approach to interpreting model predictions. arXiv 2017." arXiv preprint arXiv:1705.07874 (2022). [3] Aïvodji, Ulrich, et al. "Fooling SHAP with Stealthily Biased Sampling." The Eleventh International Conference on Learning Representations. 2022. [4] Laberge, Gabriel, et al. "Fool SHAP with Stealthily Biased Sampling." arXiv preprint arXiv:2205.15419 (2022). [5] Slack, Dylan, et al. "Fooling lime and shap: Adversarial attacks on post hoc explanation methods." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 2020. [6] Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001): 1189-1232. [7] Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. "Deep inside convolutional networks: Visualising image classification models and saliency maps." arXiv preprint arXiv:1312.6034 (2013). [8] Baniecki, Hubert, Wojciech Kretowicz, and Przemyslaw Biecek. "Fooling partial dependence via data poisoning." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2022. [9] Dimanov, Botty, et al. "You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods." ECAI 2020. IOS Press, 2020. 2473-2480. [10] Saito, Sean, et al. "Improving lime robustness with smarter locality sampling." arXiv preprint arXiv:2006.12302 (2020).]] |
| Prerequisites | |
| Author | |
| Supervisor | Parisa Jamshidi, Peyman Mashhadi, Jens Lundström |
| Level | Master |
| Status | Open |