Forgetting the Past: Targeted Unlearning in Pretrained Deep Networks

Zero-Shot Class Unlearning for Computer Vision

Yalla Mahanth

M.Tech Artificial Intelligence

Indian Institute of Science, Bengaluru

Abstract

The "right to be forgotten" presents a significant challenge for deep learning models, where removing the influence of specific data typically requires costly retraining from scratch. This paper addresses the problem of targeted class unlearning in a practical yet challenging zero-shot setting, where the original training data is entirely unavailable. We introduce a novel and scalable framework that leverages data-free class impressions โ€” synthetic data proxies that capture the essence of a class โ€” to guide a differential unlearning process. Our method achieves near-perfect forgetting (โ‰ˆ0% accuracy) on target classes while maintaining or improving performance on retained classes across models from LeNet5 to ResNet50 on ImageNet.

Key Contributions

Zero-Shot Unlearning

No access to original training data required. Works entirely with the pretrained model and class labels.

Class Impressions

Generate data-free proxies by maximizing class-specific activations, serving as synthetic stand-ins for real data.

Differential Loss

Simultaneous gradient ascent on forget classes and gradient descent on retain classes for selective unlearning.

Scalability

Proven effectiveness from MNIST to ImageNet, including AlexNet and ResNet50 architectures.

The Problem

With regulations like GDPR and India's DPDPA, the "right to be forgotten" is now a legal imperative. When users request data removal, models must verifiably erase the data's influence. The naive solution โ€” retraining from scratch โ€” is computationally prohibitive for large models.

Zero-Shot Challenge

The most practical scenario is when the original training data is completely unavailable due to privacy policies or data retention limits. Our method solves this by:

Our Method

1

Generate Class Impressions

For each class, optimize a random noise input to maximize the class logit, creating a synthetic representation that the model considers quintessential for that class.

x*c = arg maxx fฮธ(x)c - ฮฒ||x||ยฒ
2

Partition the Model

Freeze early layers (head) that contain general features. Only update the tail layers where class-specific features emerge, ensuring efficiency and stability.

3

Apply Differential Loss

Optimize a combined objective that performs gradient ascent on forget class impressions while performing gradient descent on retain class impressions.

Ltotal = wf ยท Lforget - wr ยท Lretain

Results SOTA Performance

Summary Table

Model Dataset Original Acc Forget Acc Retain Acc
LeNet5 MNIST 99.07% 0.00% 99.44% โ†‘
KarpathyNet CIFAR-10 72.36% 0.00% 80.23% โ†‘
AlexNet ImageNet 56.55% 0.01% 57.11% โ†‘
ResNet50 ImageNet 76.14% 0.01% 76.83% โ†‘

Detailed Performance

LeNet5 - MNIST

Forgotten Classes 1, 2, 3, 4
Forget Accuracy 0.00%
Retain Accuracy 99.44%
Performance Change +0.37%

KarpathyNet - CIFAR-10

Forgotten Classes 3, 4, 8
Forget Accuracy 0.00%
Retain Accuracy 80.23%
Performance Change +7.87%

AlexNet - ImageNet

Forgotten Classes 100 classes
Forget Accuracy 0.01%
Retain Accuracy 57.11%
Performance Change +0.56%

ResNet50 - ImageNet

Forgotten Classes 100 classes
Forget Accuracy 0.01%
Retain Accuracy 76.83%
Performance Change +0.69%

Key Observations

Future Directions

Vision Transformers

Adapt the framework to attention-based architectures like ViT and Swin-Transformers, investigating how class impressions manifest in patch-based models.

Theoretical Guarantees

Develop formal proofs that unlearned models are statistically indistinguishable from Golden Standard Models retrained from scratch.

Fine-Grained Unlearning

Extend from class-level to instance-level and attribute-level forgetting, enabling removal of specific individuals or features.

See All Projects of Mine โ†’