Can AI Agents Agree?

Preprint 2026

F. Berdoz, L. Rugli, R. Wattenhofer

ETH Zurich, Switzerland

multi-agentconsensusbyzantine-fault-tolerancelanguage-models

Abstract

Large language models are increasingly deployed as cooperating agents, yet their behavior in adversarial consensus settings has not been systematically studied. We evaluate LLM-based agents on a Byzantine consensus game over scalar values using a synchronous all-to-all simulation. We test consensus in a no-stake setting where agents have no preferences over the final value, so evaluation focuses on agreement rather than value optimality. Across hundreds of simulations spanning model sizes, group sizes, and Byzantine fractions, we find that valid agreement is not reliable even in benign settings and degrades as group size grows. Introducing a small number of Byzantine agents further reduces success. Failures are dominated by loss of liveness, such as timeouts and stalled convergence, rather than subtle value corruption. Overall, the results suggest that reliable agreement is not yet a dependable emergent capability of current LLM-agent groups even in no-stake settings, raising caution for deployments that rely on robust coordination.

Overview

Can AI agents reliably agree when some of them are adversarial? This project studies Byzantine consensus in groups of large language model (LLM) agents. We design a no-stake scalar agreement game where agents repeatedly propose values and exchange messages over a synchronous all-to-all network. Each agent is implemented as an LLM-driven policy that receives a compact textual history and outputs a proposal, justification, and termination decision. Across hundreds of simulations spanning model sizes, group sizes, and Byzantine fractions, we show that valid agreement is not reliable even in benign settings and degrades sharply as groups grow and adversaries are added.

Highlights

Synchronous Byzantine consensus game with LLM agents on scalar values.
A2A-Sim simulator providing structured, reproducible message passing.
Systematic evaluation across model sizes, group sizes, and threat models.
Failures dominated by liveness loss (timeouts, stalled convergence), not subtle value corruption.

Method

We consider $N$ agents communicating over synchronous, all-to-all rounds $t = 1, \ldots, T_{\max}$, with a Byzantine fraction $f \in [0, 1/3]$. Honest agents maintain scalar proposals $v_i^{(t)} \in [0, 50]$ initialized independently from a fixed distribution, while Byzantine agents may send arbitrary values and justifications. A custom simulator (A2A-Sim) delivers structured messages and enforces the round structure; each round, agents read a compact summary of the previous interaction history and query an LLM to produce a new proposal, a natural-language explanation, and a termination decision.

The simulator declares termination once at least two thirds of all agents vote to stop; otherwise it times out at $t = T_{\max}$. At termination we classify outcomes into:

Valid consensus – all honest agents hold the same value taken from the initial honest proposals.
Invalid consensus – agreement is reached but the value violates validity.
No consensus – the protocol times out without agreement.

Byzantine agents can adapt to the history and send arbitrary proposals, but they cannot equivocate, forge identities, or drop messages, so every recipient sees the same adversarial message in a given round. This restricted threat model already suffices to stress-test liveness.

Quantitative results

Consensus outcomes without Byzantine agents

Consensus performance without Byzantine agents for Qwen3‑8B and Qwen3‑14B across different group sizes and prompt variants.

Effect of Byzantine agents

Effect of adding Byzantine agents on Qwen3‑14B consensus with eight honest agents.

We first run 600 simulations without Byzantine agents, varying $N \in {4, 8, 16}$, two Qwen3 model sizes (8B and 14B), and prompt variants that either mention or omit potential Byzantine peers. Even in these benign settings, only around $42%$ of runs end in valid consensus, and performance worsens for larger groups; Qwen3‑14B substantially outperforms Qwen3‑8B but still exhibits frequent timeouts. Removing any mention of Byzantine agents from the prompt improves both success rate and convergence speed, indicating that merely warning about adversaries can harm liveness.

Next, we fix eight honest Qwen3‑14B agents and introduce up to one third Byzantine agents. Invalid consensus remains rare, but the share of valid consensus quickly collapses, with many configurations degenerating into no‑consensus timeouts instead of subtle value corruption.

Trajectory examples

Representative proposal trajectories

Representative proposal trajectories for Qwen3‑14B with eight honest agents under different awareness and adversary settings.

Representative proposal trajectories show that threat-aware prompts and adversaries both tend to slow down or stall convergence, even when validity is preserved.

Citation

@misc{berdoz2026can,
  author = {Berdoz, F. and Rugli, L. and Wattenhofer, R.},
  title = {{Can AI Agents Agree?}},
  note = {arXiv:2603.01213},
  year = {2026}
}