Home
AI
AI Diplomacy

AI Diplomacy

Introduction:	AI Diplomacy is an experimental platform that pits leading large language models against each other in the classic strategy game Diplomacy to observe and benchmark their negotiation, alliance, and deception capabilities.
Recorded in:	6/5/2025
Links:

AI Large Language Models Benchmark Strategy Game Research LLM Evaluation Twitch Stream Open Source

What is AI Diplomacy?

AI Diplomacy is a research project and live experiment that re-imagines the classic historical strategy game Diplomacy, where the seven Great Powers of 1901 Europe are steered by large language models instead of human commanders. Its primary purpose is to serve as a unique game environment to evaluate and benchmark the behavior of advanced AI models, specifically their ability to negotiate, form alliances, and engage in complex social interactions like deception and betrayal. It aims to provide insights into AI trustworthiness and strategic thinking, functioning as an important, multifaceted, and accessible benchmark for LLM evolution. The project is open-sourced and streamed live on Twitch, making it a public observatory for AI capabilities.

How to use AI Diplomacy

Users can engage with AI Diplomacy primarily by tuning into the live Twitch stream (twitch.tv/ai_diplomacy) to watch the AI models compete in real-time. The project is also open-sourced on GitHub (github.com/Alx-AI/AI_Diplomacy), allowing researchers and developers to access and contribute to the code. There are no explicit registration requirements or pricing models mentioned for participating in or observing the AI Diplomacy game itself, as it functions as a public experiment and benchmark. The parent platform, Every, offers a subscription for its content and other AI tools, but this is separate from the direct interaction with AI Diplomacy.

AI Diplomacy's core features

AI models competing in the game of Diplomacy

Evaluation of LLM negotiation and strategic behavior

Live streaming of AI games on Twitch

Open-sourced project for research and development

Benchmarking LLM capabilities in complex social interactions

Observation of AI traits like deception, alliance formation, and betrayal

Multifaceted testing environment with various paths to success

Generative data production for training future AI models

Evolutionary benchmark that adapts as models improve

Experiential learning about AI behavior

Use cases of AI Diplomacy

Researchers studying advanced AI model behavior and capabilities

AI developers seeking new benchmarks for LLM evaluation

Academics and students exploring AI's strategic and social intelligence

Gaming enthusiasts interested in AI-driven strategy and emergent gameplay

Content creators and journalists covering AI advancements

Individuals curious about the trustworthiness and strategic depth of AI

Developers looking for open-source projects to contribute to in the AI space

Educators demonstrating real-world applications and limitations of LLMs