NeurIPS 2024 Workshop on

Red Teaming GenAI: What Can We Learn from Adversaries?






Overview

With the rapid development of Generative AI, ensuring their safety, security, and trustworthiness is paramount. In response, researchers and practitioners have proposed red teaming to identify such risks, enabling their mitigation. Red teaming refers to adversarial tactics employed to identify flaws in GenAI-based systems, such as security vulnerabilities, harmful or discriminating outputs, privacy breaches, and copyright law violations.While several recent works proposed comprehensive evaluation frameworks for AI models, the rapid evolution of AI necessitates ongoing updates to benchmarks to avoid them from becoming outdated due to models being excessively tailored to these benchmarks. Moreover, such evaluations must also incorporate the latest findings from AI safety research, which consistently expose new breaches in generative models.


In response to the findings from red teaming exercises, researchers have taken action to curb undesirable behaviors in AI models through various methods. These include aligning the models with ethical standards, defending against jailbreak attempts, preventing the generation of untruthful content, erasing undesired concepts from the models, and even leveraging adversaries for beneficial purposes. Despite these efforts, a multitude of risks remain unresolved, underscoring the importance of continuous research in addressing the challenges identified through red teaming. The goal of this workshop is to bring leading researchers on AI safety together to discuss pressing real-world challenges faced by ever-evolving generative models. We put a special emphasis on red teaming and quantitative evaluations towards probing the limitations of our models. Some fundamental questions that this workshop will address include

  • What are new security and safety risks in foundation models?
  • How do we discover and quantitatively evaluate harmful capabilities of these models?
  • How can we mitigate risks found through red teaming?
  • What are the limitations of red teaming?
  • Can we make safety guarantees?

Call for Papers

We invite submissions from researchers in the field of AI security and safety. Our topics of interest include, but are not limited to:

  • Empirical investigations into both new and existing security and safety risks in generative AI models.
  • Evaluation frameworks and benchmarks for red-teaming generative AI models.
  • Theoretical foundations for safety and security in generative AI models.
  • Novel risk mitigation strategies and safety mechanisms for generative AI models.
  • Discussions on the key safety and security challenges in generative AI, best practices in red teaming, and its limitations.

Submission URL: Please submit your work via OpenReview. To help maintain the quality of the review process, we kindly ask you to nominate a potential reviewer by providing their email address in the OpenReview submission.

Length and Formatting: Submitted papers must be between 4 - 9 pages in PDF format using the NeurIPS 2024 Style Files, including figures and tables. Authors are permitted to upload unlimited supplementary materials and references with their submissions. We will use a double-blind review process.

Important Dates:

  • Paper Submission Deadline: Sep 14, 2024
  • Author Notification: Oct 9, 2024
  • Camera-ready Deadline: Nov 9, 2024

If you have any questions, please send us an email at redteaminggenai@gmail.com


Schedule

This is a tentative workshop schedule. All times are provided in Central European Time (CET).

Morning Session


8:30 - 8:50 Informal coffee session
8:50 - 9:00 Introduction and opening remarks
9:00 - 9:40 Invited talk 1: Katherine Lee and Q&A
9:40 - 10:00 Invited talk 2: Andy Zou and Q&A
10:00 - 10:40 Invited talk 3: Danqi Chen and Q&A
10:40 - 11:00 Coffee Break
11:00 - 11:45 Contributed Talks
11:45 - 12:30 Panel Discussion
12:30 - 13:30 Lunch Break

Afternoon Session


13:30 - 15:00 Poster session
15:00 - 15:40 Invited talk 4: Jonas Geiping and Q&A
15:40 - 16:00 Invited talk 5: Niloofar Mireshghallah and Q&A
16:00 - 16:40 Competition results and contributed talks
16:40 - 17:00 Invited talk 6: Gowthami Somepalli and Q&A
17:00 - 17:40 Invited talk 7: Vitaly Shmatikov and Q&A
17:40 - 18:00 Closing notes
 

To ensure the accessibility of our workshop for virtual attendees, we will stream all presentations and facilitate questions from online attendees via Rocketchat.

Invited Speakers




Katherine Lee

Google DeepMind

Andy Zou

Carnegie Mellon University

Danqi Chen

Princeton University

Jonas Geiping

ELLIS Institute & MPI-IS

Niloofar Mireshghallah

University of Washington



Gowthami Somepalli

University of Maryland

Vitaly Shmatikov

Cornell Tech

Panelists




Alexander Madry

Massachusetts Institute of Technology

Roei Schuster

Wild Moose

Rowan Cheung

Arthur AI

Michael Kearns

University of Pennsylvania

Workshop Organizers




Valeriia Cherepanova

Amazon AWS AI/ML

Niv Cohen

New York University

Micah Goldblum

Columbia University

Avital Shafran

Hebrew University

Minh Pham

New York University



Yifei Wang

Massachusetts Institute of Technology

Nil-Jana Akpinar

Amazon AWS AI/ML

Yang Bai

Tencent

Zhen Xiang

University of Illinois Urbana-Champaign

Advisors




Bo Li

University of Chicago

Yisen Wang

Peking University

James Zou

Stanford University

Chinmay Hegde

New York University