CyLab Seminar: Daphne Ippolito

February 03, 2025

12:00 p.m. ET

Zoom or CIC room 4105, Panther Hollow

Daphne Ippolito

*Please note: this CyLab seminar is open only to partners and Carnegie Mellon University faculty, students, and staff.

Speaker:
Daphne Ippolito
Assistant Professor
Carnegie Mellon University Language Technologies Institute

Talk Title:
Troubles with Training Data for Large Language Models

Abstract:
Modern large language models (LLMs) derive their capabilities from the data used to train their underlying neural networks. While this data is the source of LLMs’ strength, it also creates fallibilities. Though the companies releasing LLMs aim to hide their training data from users, we demonstrate how it is surprisingly difficult to keep malicious, or even typical users, from accessing long strings of text that LLMs have memorized from the source data. Furthermore, most training data is derived from large-scale crawls of the Internet. We investigate whether by poisoning portions of the Internet, an adversary can insert backdoors or otherwise change the behaviour of the LLMs trained on this data.

Bio:
Daphne Ippolito is an assistant professor at the Language Technologies Institute at Carnegie Mellon University and a senior research scientist at Google Deepmind. Among other topics, she studies privacy and security issues around language generation systems, strategies for better evaluation of language models, and customizability of language models for different real-world applications.

Upcoming Events