Forgetting the Past: Zero-Shot Unlearning in Deep Networks

Abstract

The "right to be forgotten" presents a significant challenge for deep learning models, where removing the influence of specific data typically requires costly retraining from scratch. This paper addresses the problem of targeted class unlearning in a practical yet challenging zero-shot setting, where the original training data is entirely unavailable. We introduce a novel and scalable framework that leverages data-free class impressions — synthetic data proxies that capture the essence of a class — to guide a differential unlearning process. Our method achieves near-perfect forgetting (≈0% accuracy) on target classes while maintaining or improving performance on retained classes across models from LeNet5 to ResNet50 on ImageNet.

Key Contributions

Zero-Shot Unlearning

No access to original training data required. Works entirely with the pretrained model and class labels.

Class Impressions

Generate data-free proxies by maximizing class-specific activations, serving as synthetic stand-ins for real data.

Differential Loss

Simultaneous gradient ascent on forget classes and gradient descent on retain classes for selective unlearning.

Scalability

Proven effectiveness from MNIST to ImageNet, including AlexNet and ResNet50 architectures.

The Problem

With regulations like GDPR and India's DPDPA, the "right to be forgotten" is now a legal imperative. When users request data removal, models must verifiably erase the data's influence. The naive solution — retraining from scratch — is computationally prohibitive for large models.

Zero-Shot Challenge

The most practical scenario is when the original training data is completely unavailable due to privacy policies or data retention limits. Our method solves this by:

Working with only the pretrained model and class indices to forget
Generating synthetic proxies that represent class-specific features
Selectively modifying only the tail-end of the network

Our Method

Generate Class Impressions

For each class, optimize a random noise input to maximize the class logit, creating a synthetic representation that the model considers quintessential for that class.

x*_c = arg max_x f_θ(x)_c - β||x||²

Partition the Model

Freeze early layers (head) that contain general features. Only update the tail layers where class-specific features emerge, ensuring efficiency and stability.

Apply Differential Loss

Optimize a combined objective that performs gradient ascent on forget class impressions while performing gradient descent on retain class impressions.

L_total = w_f · L_forget - w_r · L_retain

Results SOTA Performance

Summary Table

Model	Dataset	Original Acc	Forget Acc	Retain Acc
LeNet5	MNIST	99.07%	0.00%	99.44% ↑
KarpathyNet	CIFAR-10	72.36%	0.00%	80.23% ↑
AlexNet	ImageNet	56.55%	0.01%	57.11% ↑
ResNet50	ImageNet	76.14%	0.01%	76.83% ↑

Detailed Performance

LeNet5 - MNIST

Forgotten Classes 1, 2, 3, 4

Forget Accuracy 0.00%

Retain Accuracy 99.44%

Performance Change +0.37%

KarpathyNet - CIFAR-10

Forgotten Classes 3, 4, 8

Forget Accuracy 0.00%

Retain Accuracy 80.23%

Performance Change +7.87%

AlexNet - ImageNet

Forgotten Classes 100 classes

Forget Accuracy 0.01%

Retain Accuracy 57.11%

Performance Change +0.56%

ResNet50 - ImageNet

Forgotten Classes 100 classes

Forget Accuracy 0.01%

Retain Accuracy 76.83%

Performance Change +0.69%

Key Observations

Perfect Forgetting: Achieved ≈0% accuracy on all forgotten classes across all architectures
Knowledge Preservation: Not only maintained but often improved performance on retained classes
Regularization Effect: The unlearning process appears to reduce inter-class confusion, particularly evident in the +7.87% improvement on CIFAR-10
Scalability Proven: Successfully scaled to ImageNet with 100 classes removed from deep architectures

Future Directions

Vision Transformers

Adapt the framework to attention-based architectures like ViT and Swin-Transformers, investigating how class impressions manifest in patch-based models.

Theoretical Guarantees

Develop formal proofs that unlearned models are statistically indistinguishable from Golden Standard Models retrained from scratch.

Fine-Grained Unlearning

Extend from class-level to instance-level and attribute-level forgetting, enabling removal of specific individuals or features.

See All Projects of Mine →