NeurIPS 2024 Workshop on

Red Teaming GenAI: What Can We Learn from Adversaries?






Overview

With the rapid development of Generative AI, ensuring their safety, security, and trustworthiness is paramount. In response, researchers and practitioners have proposed red teaming to identify such risks, enabling their mitigation. Red teaming refers to adversarial tactics employed to identify flaws in GenAI-based systems, such as security vulnerabilities, harmful or discriminating outputs, privacy breaches, and copyright law violations.While several recent works proposed comprehensive evaluation frameworks for AI models, the rapid evolution of AI necessitates ongoing updates to benchmarks to avoid them from becoming outdated due to models being excessively tailored to these benchmarks. Moreover, such evaluations must also incorporate the latest findings from AI safety research, which consistently expose new breaches in generative models.


In response to the findings from red teaming exercises, researchers have taken action to curb undesirable behaviors in AI models through various methods. These include aligning the models with ethical standards, defending against jailbreak attempts, preventing the generation of untruthful content, erasing undesired concepts from the models, and even leveraging adversaries for beneficial purposes. Despite these efforts, a multitude of risks remain unresolved, underscoring the importance of continuous research in addressing the challenges identified through red teaming. The goal of this workshop is to bring leading researchers on AI safety together to discuss pressing real-world challenges faced by ever-evolving generative models. We put a special emphasis on red teaming and quantitative evaluations towards probing the limitations of our models. Some fundamental questions that this workshop will address include

  • What are new security and safety risks in foundation models?
  • How do we discover and quantitatively evaluate harmful capabilities of these models?
  • How can we mitigate risks found through red teaming?
  • What are the limitations of red teaming?
  • Can we make safety guarantees?

Call for Papers

We invite submissions from researchers in the field of AI security and safety. Our topics of interest include, but are not limited to:

  • Empirical investigations into both new and existing security and safety risks in generative AI models.
  • Evaluation frameworks and benchmarks for red-teaming generative AI models.
  • Theoretical foundations for safety and security in generative AI models.
  • Novel risk mitigation strategies and safety mechanisms for generative AI models.
  • Discussions on the key safety and security challenges in generative AI, best practices in red teaming, and its limitations.

Submission URL: Please submit your work via OpenReview. To help maintain the quality of the review process, we kindly ask you to nominate a potential reviewer by providing their email address in the OpenReview submission.

Length and Formatting: Submitted papers must be between 4 - 9 pages in PDF format using the NeurIPS 2024 Style Files or ICLR 2025 Style Files including figures and tables. Authors are permitted to upload unlimited supplementary materials and references with their submissions. We will use a double-blind review process.

Important Dates:

  • Paper Submission Deadline: Sep 14 AOE, 2024 Sep 19 AOE, 2024
  • Author Notification: Oct 9, 2024
  • Camera-ready Deadline: Nov 9, 2024

If you have any questions, please send us an email at redteaminggenai@gmail.com


Schedule

Morning Session


9:00 - 9:30 Coffee break
9:30 - 9:35 Opening remarks
9:35 - 10:00 Invited talk 1: Andy Zou and Q&A
10:00 - 10:45 Invited talk 2: Danqi Chen and Q&A
10:45 - 10:55 Contributed Talk 1: iART - Imitation guided Automated Red Teaming
10:55 - 11:05 Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
11:05 - 11:15 Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
11:20 - 12:00 Panel Discussion
12:00 - 13:00 Lunch Break

Afternoon Session


12:00 - 13:50 Poster session
13:50 - 14:15 Invited talk 4: Niloofar Mireshghallah and Q&A
14:15 - 14:40 Invited talk 5: Gowthami Somepalli and Q&A
14:45 - 14:55 Contributed Talk 4: Rethinking LLM Memorization through the Lens of Adversarial Compression
14:55 - 15:30 Coffee Break
15:30 - 16:15 Invited talk 7: Vitaly Shmatikov and Q&A
16:20 - 16:30 Contributed Talk 5: A Realistic Threat Model for Large Language Model Jailbreaks
16:30 - 16:40 Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack
16:40 - 16:50 Closing Remarks
 

To ensure the accessibility of our workshop for virtual attendees, we will stream all presentations and facilitate questions from online attendees via Rocketchat.

Invited Speakers




Andy Zou

Carnegie Mellon University

Danqi Chen

Princeton University

Jonas Geiping

ELLIS Institute & MPI-IS

Niloofar Mireshghallah

University of Washington



Gowthami Somepalli

University of Maryland

Vitaly Shmatikov

Cornell Tech

Panelists




Roei Schuster

Wild Moose

Rowan Cheung

Arthur AI

Workshop Organizers




Valeriia Cherepanova

Amazon AWS AI/ML

Niv Cohen

New York University

Micah Goldblum

Columbia University

Avital Shafran

Hebrew University

Minh Pham

New York University



Yifei Wang

Massachusetts Institute of Technology

Nil-Jana Akpinar

Amazon AWS AI/ML

Yang Bai

Tencent

Zhen Xiang

University of Illinois Urbana-Champaign

Advisors




Bo Li

University of Chicago

Yisen Wang

Peking University

James Zou

Stanford University

Chinmay Hegde

New York University