CyLab Seminar: Seth Neel
April 14, 2025
12:00 p.m. ET
Zoom or CIC room 4105, Panther Hollow
April 14, 2025
12:00 p.m. ET
Zoom or CIC room 4105, Panther Hollow
*Please note: this CyLab seminar is open only to partners and Carnegie Mellon University faculty, students, and staff.
Speaker:
Seth Neel
Senior Research Scientist
Google Research
Talk Title:
Machine Unlearning is as "easy" as Data Attribution
Abstract:
Machine unlearning—efficiently removing the effect of a small "forget set" of training data on a trained machine learning model—has attracted significant research interest due to potential applications in privacy, model editing, safety training and more. Despite this, recent work shows that existing techniques are ineffective for non-convex models; remnants of the removed points can still be reliably detected after "unlearning." In this talk, we'll first briefly review the earlier work on gradient-based machine unlearning in the convex setting that motivates many of these more recent algorithms, building intuition for why these methods may fail with non-convexity. We will then introduce a new machine unlearning technique that exhibits strong empirical performance when removing data from neural networks, and is well-motivated theoretically. Our meta-algorithm, which we call Data Model Matching (DMM), leverages recent advances in data attribution to predict the output of the model if it were re-trained on all but the forget set points, and then fine-tunes the model to match these predicted outputs. We show that in a simple convex setting DMM converges faster than prior unlearning algorithms, and in non-convex settings (ResNets trained on CIFAR and ImageNet), DMM achieves superior empirical unlearning performance. An added benefit of DMM is that it is a meta-algorithm, meaning that future advances in data attribution will translate directly into better unlearning algorithms, pointing to a clear direction for future progress in this field.
Bio:
Seth Neel is currently a Research Scientist at Google Research focused on value of data, synthetic data, and data privacy questions as they pertain to training generative models. He joined Google from Harvard, where he was an Assistant Professor at Harvard Business School and faculty affiliate in Computer Science. He received his Ph.D. in 2020 from the University of Pennsylvania, under the supervision of Aaron Roth and Michael Kearns.
April 21 2025
12:00 PM ET
CyLab Security and Privacy Institute
Zoom or CIC room 4105, Panther Hollow