With the rapid development of Generative AI, ensuring their safety, security, and trustworthiness is paramount. In response, researchers and practitioners have proposed red teaming to identify such risks, enabling their mitigation. Red teaming refers to adversarial tactics employed to identify flaws in GenAI-based systems, such as security vulnerabilities, harmful or discriminating outputs, privacy breaches, and copyright law violations.While several recent works proposed comprehensive evaluation frameworks for AI models, the rapid evolution of AI necessitates ongoing updates to benchmarks to avoid them from becoming outdated due to models being excessively tailored to these benchmarks. Moreover, such evaluations must also incorporate the latest findings from AI safety research, which consistently expose new breaches in generative models.
In response to the findings from red teaming exercises, researchers have taken action to curb undesirable behaviors in AI models through various methods. These include aligning the models with ethical standards, defending against jailbreak attempts, preventing the generation of untruthful content, erasing undesired concepts from the models, and even leveraging adversaries for beneficial purposes. Despite these efforts, a multitude of risks remain unresolved, underscoring the importance of continuous research in addressing the challenges identified through red teaming. The goal of this workshop is to bring leading researchers on AI safety together to discuss pressing real-world challenges faced by ever-evolving generative models. We put a special emphasis on red teaming and quantitative evaluations towards probing the limitations of our models. Some fundamental questions that this workshop will address include
We invite submissions from researchers in the field of AI security and safety. Our topics of interest include, but are not limited to:
Submission URL: Please submit your work via OpenReview. To help maintain the quality of the review process, we kindly ask you to nominate a potential reviewer by providing their email address in the OpenReview submission.
Length and Formatting: Submitted papers must be between 4 - 9 pages in PDF format using the NeurIPS 2024 Style Files or ICLR 2025 Style Files including figures and tables. Authors are permitted to upload unlimited supplementary materials and references with their submissions. We will use a double-blind review process.
Important Dates:
If you have any questions, please send us an email at redteaminggenai@gmail.com
9:00 - 9:30 | Coffee break |
9:30 - 9:35 | Opening remarks |
9:35 - 10:00 | Invited talk 1: Andy Zou and Q&A |
10:00 - 10:45 | Invited talk 2: Danqi Chen and Q&A |
10:45 - 10:55 | Contributed Talk 1: iART - Imitation guided Automated Red Teaming |
10:55 - 11:05 | Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models |
11:05 - 11:15 | Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet |
11:20 - 12:00 | Panel Discussion |
12:00 - 13:00 | Lunch Break |
12:00 - 13:50 | Poster session |
13:50 - 14:15 | Invited talk 3: Niloofar Mireshghallah and Q&A |
14:15 - 14:40 | Invited talk 4: Gowthami Somepalli and Q&A |
14:45 - 14:55 | Contributed Talk 4: Rethinking LLM Memorization through the Lens of Adversarial Compression |
14:55 - 15:30 | Coffee Break |
15:30 - 16:15 | Invited talk 5: Vitaly Shmatikov and Q&A |
16:20 - 16:30 | Contributed Talk 5: A Realistic Threat Model for Large Language Model Jailbreaks |
16:30 - 16:40 | Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack |
16:40 - 16:50 | Closing Remarks |
To ensure the accessibility of our workshop for virtual attendees, we will stream all presentations and facilitate questions from online attendees via Rocketchat.
UK AISI
Carnegie Mellon University
Princeton University
ELLIS Institute & MPI-IS
University of Washington
Wild Moose
University of Chicago
Anthropic
Robust Intelligence
Amazon AWS AI/ML
New York University
Columbia University
Hebrew University
New York University
Massachusetts Institute of Technology
Amazon AWS AI/ML
Tencent
University of Illinois Urbana-Champaign
University of Chicago
Peking University
Stanford University
New York University