AI Agents Fail Multi-User Tests: Study Exposes Security

Frontier large language models fail in predictable and significant ways when required to serve multiple users at once, according to a new study that formalises this underexplored problem for the first time.

Editor's Note: This article is based on a preprint research paper that has not yet undergone peer review. DeepBrief is actively monitoring for peer-reviewed publication and additional independent analysis.

The paper, published on ArXiv in April 2025, argues that virtually all current AI assistant systems are built around a single-user assumption — one person gives instructions, and the model tries to satisfy them. But as LLMs are embedded into team workflows, enterprise tools, and organisational platforms, that assumption increasingly breaks down. The researchers describe this gap as the shift from a "single-principal" to a "multi-principal" problem, where a single agent must balance the competing interests, authority levels, and privacy expectations of several people at once.

Why the Single-User Design Assumption Creates Real Problems

The distinction matters practically. Imagine an AI assistant deployed inside a company where a manager, a junior employee, and an external contractor all interact with the same system. Each has different permissions, different priorities, and information the others should not see. A model trained to simply follow the most recent or most forceful instruction has no framework for navigating that complexity.

According to the researchers, this creates three categories of failure: inconsistent prioritisation when user objectives conflict, privacy violations that accumulate over multi-turn conversations, and coordination bottlenecks when the agent must gather information iteratively across users.

Frontier LLMs frequently fail to maintain stable prioritisation under conflicting user objectives, exhibit increasing privacy violations over multi-turn interactions, and suffer from efficiency bottlenecks when coordination requires iterative information gathering.

The team formalised multi-user interaction as a multi-principal decision problem — a framework borrowed from economics and organisational theory, where a single agent must act on behalf of multiple stakeholders with potentially conflicting interests. This formalisation is applied systematically to LLM agents, according to the authors.

How the Researchers Tested Current Models

To evaluate how well current models handle these conditions, the team designed three targeted stress-testing scenarios, each probing a different failure mode. The first tested instruction following under conflicting directives — situations where different users issue contradictory instructions with different implied levels of authority. The second examined privacy preservation across extended, multi-turn conversations where sensitive information from one user might inadvertently surface in responses to another. The third scenario tested coordination efficiency, looking at how models performed when completing a task required iteratively gathering information from multiple users.

The paper does not name which specific frontier models were tested, referring to them collectively as "frontier LLMs." The benchmark results are reported by the researchers themselves and have not undergone independent peer review at the time of publication — a standard caveat for ArXiv preprints.

The Three Failure Modes in Detail

On prioritisation, the models showed instability: when users issued conflicting instructions, models did not apply a consistent or principled hierarchy. The ordering of instructions, their phrasing, or incidental features of the conversation influenced which user's objectives the model favoured — a pattern the researchers describe as systematic rather than random.

On privacy, the failure mode was cumulative. Individual responses might not reveal protected information, but over longer conversations, models increasingly allowed details shared by one user to leak into interactions with another. This is particularly concerning in enterprise settings where role-based access to information is a compliance requirement, not merely a preference.

On coordination, models struggled with efficiency when tasks required assembling information from multiple users sequentially. Rather than developing strategies to minimise back-and-forth, models tended toward repetitive or redundant information-gathering behaviour.

What a Unified Protocol Would Look Like

As part of their contribution, the researchers propose a unified multi-user interaction protocol — a structured framework for how LLM agents should handle requests when multiple principals are involved. The protocol is designed to make authority hierarchies explicit, define how conflicts should be resolved, and build in privacy boundaries from the start rather than as an afterthought.

The researchers stop short of claiming their protocol solves the problem. Instead, they position it as a foundation for future work, and as a diagnostic tool: the stress-testing scenarios are designed to be reusable by other researchers evaluating new models or systems.

The study contributes to a growing body of work examining AI alignment in complex, real-world settings — moving beyond the relatively clean problem of satisfying a single user toward messier, more realistic conditions. Related work on multi-agent systems has explored how multiple AI models coordinate with each other, but the specific question of how one model serves multiple human users with different roles and interests has received comparatively little formal attention.

What This Means

Any organisation deploying AI assistants across teams or departments should treat multi-user handling as an active risk, not a solved problem — today's frontier models lack the principled architecture to manage conflicting authority, protect information boundaries, or coordinate efficiently across users without dedicated design work to address these gaps.

Study Exposes Systematic Failures When AI Agents Serve Multiple Users

Why the Single-User Design Assumption Creates Real Problems

How the Researchers Tested Current Models

The Three Failure Modes in Detail

What a Unified Protocol Would Look Like

What This Means

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models