Machine Unlearning Faithfulness with XAI

From ISLAB/CAISR
Revision as of 23:36, 22 October 2025 by Islab (talk | contribs) (Created page with "{{StudentProjectTemplate |Summary=Evaluating Machine Unlearning Faithfulness in Deep Neural Networks using Explainable AI |Keywords=ML, XAI, forget! |TimeFrame=Spring 2026 |Re...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Title Machine Unlearning Faithfulness with XAI
Summary Evaluating Machine Unlearning Faithfulness in Deep Neural Networks using Explainable AI
Keywords ML, XAI, forget!Property "Keywords" has a restricted application area and cannot be used as annotation property by a user.
TimeFrame Spring 2026
References Beierle, Christoph, and Ingo J. Timm. "Intentional forgetting: An emerging field in AI and beyond." KI-Künstliche Intelligenz 33.1 (2019): 5-8.

Bourtoule, Lucas, et al. "Machine unlearning." 2021 IEEE symposium on security and privacy (SP). IEEE, 2021. Cadet, Xavier F., et al. "Deep Unlearn: Benchmarking Machine Unlearning for Image Classification." 2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P). IEEE, 2025. Golatkar, Aditya, Alessandro Achille, and Stefano Soatto. "Eternal sunshine of the spotless net: Selective forgetting in deep networks." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. Vidal, Àlex Pujol, et al. "Verifying machine unlearning with explainable AI." International Conference on Pattern Recognition. Cham: Springer Nature Switzerland, 2024

Prerequisites
Author
Supervisor Grzegorz J. Nalepa and Peyman Mashhadi
Level Master
Status Open


As deep learning models become larger and their training size grows, the concept of machine unlearning (ML) [Beierle et al.] is becoming more relevant. Machine unlearning refers to the concept of forgetting information about some data that is not desired to be in the training data and consequently in the trained model due to many different reasons including privacy, stale knowledge, copyrighted material, toxic/unsafe content, and so on.

While the ultimate goal of an unlearned model​ is to achieve performance indistinguishable from a model retrained from scratch on the retained data (target model), evaluating this indistinguishability rigorously remains difficult [Golatkar et al.]. Current evaluation methods primarily focus on utility (accuracy retention) and privacy against Membership Inference Attacks (MIA) [Cadet et al.].

This thesis proposes to take an instance beyond traditional accuracy and privacy metrics by leveraging Explainable AI (XAI) tools to quantify the structural and functional similarity (or "faithfulness") between the unlearned model and the target model [Beierle et al., Vidal et al].

The thesis has both conceptual and practical motivations. Due to privacy considerations or legal requirements (e.g., the GDPR) business stakeholders might in fact soon require the providers and developers of ML models to effectively remove their data samples from the training datasets. The developers in turn will be interested in understanding the impact of the data removal on the models, as well as finding ways to compensate, e.g., supplement additional data. Finally, societally “the right to be forgotten” is becoming increasingly relevant for the new generations, digital ethics and the so-called Delete Culture.

It is encouraged that this thesis will result in scientific publications possibly also developed in collaboration with external stakeholders.