NeurIPS 2024 Workshop on Red Teaming GenAI: What Can We Learn from Adversaries?

Overview

With the rapid development of Generative AI, ensuring their safety, security, and trustworthiness is paramount. In response, researchers and practitioners have proposed red teaming to identify such risks, enabling their mitigation. Red teaming refers to adversarial tactics employed to identify flaws in GenAI-based systems, such as security vulnerabilities, harmful or discriminating outputs, privacy breaches, and copyright law violations.While several recent works proposed comprehensive evaluation frameworks for AI models, the rapid evolution of AI necessitates ongoing updates to benchmarks to avoid them from becoming outdated due to models being excessively tailored to these benchmarks. Moreover, such evaluations must also incorporate the latest findings from AI safety research, which consistently expose new breaches in generative models.

In response to the findings from red teaming exercises, researchers have taken action to curb undesirable behaviors in AI models through various methods. These include aligning the models with ethical standards, defending against jailbreak attempts, preventing the generation of untruthful content, erasing undesired concepts from the models, and even leveraging adversaries for beneficial purposes. Despite these efforts, a multitude of risks remain unresolved, underscoring the importance of continuous research in addressing the challenges identified through red teaming. The goal of this workshop is to bring leading researchers on AI safety together to discuss pressing real-world challenges faced by ever-evolving generative models. We put a special emphasis on red teaming and quantitative evaluations towards probing the limitations of our models. Some fundamental questions that this workshop will address include

What are new security and safety risks in foundation models?
How do we discover and quantitatively evaluate harmful capabilities of these models?
How can we mitigate risks found through red teaming?
What are the limitations of red teaming?
Can we make safety guarantees?

Call for Papers

We invite submissions from researchers in the field of AI security and safety. Our topics of interest include, but are not limited to:

Empirical investigations into both new and existing security and safety risks in generative AI models.
Evaluation frameworks and benchmarks for red-teaming generative AI models.
Theoretical foundations for safety and security in generative AI models.
Novel risk mitigation strategies and safety mechanisms for generative AI models.
Discussions on the key safety and security challenges in generative AI, best practices in red teaming, and its limitations.

Submission URL: Please submit your work via OpenReview. To help maintain the quality of the review process, we kindly ask you to nominate a potential reviewer by providing their email address in the OpenReview submission.

Length and Formatting: Submitted papers must be between 4 - 9 pages in PDF format using the NeurIPS 2024 Style Files or ICLR 2025 Style Files including figures and tables. Authors are permitted to upload unlimited supplementary materials and references with their submissions. We will use a double-blind review process.

Important Dates:

Paper Submission Deadline: ~~Sep 14 AOE, 2024~~ Sep 19 AOE, 2024
Author Notification: Oct 9, 2024
Camera-ready Deadline: Nov 9, 2024

If you have any questions, please send us an email at redteaminggenai@gmail.com

Accepted Papers: The list of accepted papers can be found here

Schedule

Morning Session

9:00 - 9:30	Coffee break
9:30 - 9:35	Opening remarks
9:35 - 10:00	Invited talk 1: Andy Zou and Q&A
10:00 - 10:45	Invited talk 2: Danqi Chen and Q&A
10:45 - 10:55	Contributed Talk 1: iART - Imitation guided Automated Red Teaming
10:55 - 11:05	Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
11:05 - 11:15	Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
11:20 - 12:00	Panel Discussion
12:00 - 13:00	Lunch Break

Afternoon Session

12:00 - 13:50	Poster session
13:50 - 14:15	Invited talk 3: Niloofar Mireshghallah and Q&A
14:15 - 15:00	Invited talk 4: Jonas Geiping and Q&A
15:00 - 15:30	Coffee Break
15:30 - 16:15	Invited talk 5: Vitaly Shmatikov and Q&A
16:15 - 16:30	Invited talk 6: Gowthami Somepalli and Q&A
16:30 - 16:40	Contributed Talk 4: Rethinking LLM Memorization through the Lens of Adversarial Compression
16:40 - 16:50	Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack
16:50 - 17:00	Contributed Talk 7: A Realistic Threat Model for Large Language Model Jailbreaks
17:00 - 17:20	Invited talk 6: Max Kaufmann and Q&A
17:20 - 17:30	Closing Remarks

To ensure the accessibility of our workshop for virtual attendees, we will stream all presentations and facilitate questions from online attendees via Rocketchat.

Overview

Call for Papers

Schedule

Morning Session

Afternoon Session

Invited Speakers

Max Kaufmann

Andy Zou

Danqi Chen

Jonas Geiping

Niloofar Mireshghallah

Gowthami Somepalli

Vitaly Shmatikov

Panelists

Roei Schuster

Bo Li

Alex Tamkin

Yaron Singer

Workshop Organizers

Valeriia Cherepanova

Niv Cohen

Micah Goldblum

Avital Shafran

Minh Pham

Yifei Wang

Nil-Jana Akpinar

Yang Bai

Zhen Xiang

Advisors

Bo Li

Yisen Wang

James Zou

Chinmay Hegde