Abstract
Key Contributions
Zero-Shot Unlearning
No access to original training data required. Works entirely with the pretrained model and class labels.
Class Impressions
Generate data-free proxies by maximizing class-specific activations, serving as synthetic stand-ins for real data.
Differential Loss
Simultaneous gradient ascent on forget classes and gradient descent on retain classes for selective unlearning.
Scalability
Proven effectiveness from MNIST to ImageNet, including AlexNet and ResNet50 architectures.
The Problem
With regulations like GDPR and India's DPDPA, the "right to be forgotten" is now a legal imperative. When users request data removal, models must verifiably erase the data's influence. The naive solution โ retraining from scratch โ is computationally prohibitive for large models.
Zero-Shot Challenge
The most practical scenario is when the original training data is completely unavailable due to privacy policies or data retention limits. Our method solves this by:
- Working with only the pretrained model and class indices to forget
- Generating synthetic proxies that represent class-specific features
- Selectively modifying only the tail-end of the network
Our Method
Generate Class Impressions
For each class, optimize a random noise input to maximize the class logit, creating a synthetic representation that the model considers quintessential for that class.
Partition the Model
Freeze early layers (head) that contain general features. Only update the tail layers where class-specific features emerge, ensuring efficiency and stability.
Apply Differential Loss
Optimize a combined objective that performs gradient ascent on forget class impressions while performing gradient descent on retain class impressions.
Results SOTA Performance
Summary Table
| Model | Dataset | Original Acc | Forget Acc | Retain Acc |
|---|---|---|---|---|
| LeNet5 | MNIST | 99.07% | 0.00% | 99.44% โ |
| KarpathyNet | CIFAR-10 | 72.36% | 0.00% | 80.23% โ |
| AlexNet | ImageNet | 56.55% | 0.01% | 57.11% โ |
| ResNet50 | ImageNet | 76.14% | 0.01% | 76.83% โ |
Detailed Performance
LeNet5 - MNIST
KarpathyNet - CIFAR-10
AlexNet - ImageNet
ResNet50 - ImageNet
Key Observations
- Perfect Forgetting: Achieved โ0% accuracy on all forgotten classes across all architectures
- Knowledge Preservation: Not only maintained but often improved performance on retained classes
- Regularization Effect: The unlearning process appears to reduce inter-class confusion, particularly evident in the +7.87% improvement on CIFAR-10
- Scalability Proven: Successfully scaled to ImageNet with 100 classes removed from deep architectures
Future Directions
Vision Transformers
Adapt the framework to attention-based architectures like ViT and Swin-Transformers, investigating how class impressions manifest in patch-based models.
Theoretical Guarantees
Develop formal proofs that unlearned models are statistically indistinguishable from Golden Standard Models retrained from scratch.
Fine-Grained Unlearning
Extend from class-level to instance-level and attribute-level forgetting, enabling removal of specific individuals or features.