AI Agents Use Secret Language 50% Better Than Human Symbols

Artificial agents that develop their own communication protocol outperform those constrained to use human-like symbolic language by 50.5%, according to a new computational study that uses the finding to directly challenge one of cognitive science's foundational theories.

Editor's Note: This article is based on a preprint research paper that has not yet undergone peer review. DeepBrief is actively monitoring for peer-reviewed publication and additional independent analysis.

The paper, posted to arXiv (CS.AI), targets the Language of Thought (LoT) hypothesis — a theory associated most prominently with philosopher Jerry Fodor, which holds that mental processes operate over structured, language-like symbolic representations. The authors set up a scenario they call the 'AI Private Language' thought experiment: if optimal cognition were inherently symbolic, then forcing agents to use a symbolic system should not meaningfully hurt performance. Their results suggest otherwise.

What the Experiment Actually Tested

The researchers trained pairs of AI agents on a cooperative navigation task under partial observability — a standard benchmark in which agents must coordinate to reach targets without full knowledge of each other's state. Using multi-agent reinforcement learning (MARL), one group of agents was allowed to develop their own emergent communication protocol freely. A second group was constrained to communicate using a pre-defined, human-comprehensible symbolic protocol designed to mimic the structured nature of human language.

The emergent-protocol agents achieved 50.5% higher efficiency than their symbolically constrained counterparts, according to the paper. The authors label this performance gap the Efficiency Attenuation Phenomenon (EAP) — the measurable cost of imposing human-like linguistic structure on a system whose natural computational tendencies are sub-symbolic.

The EAP suggests optimal collaborative cognition in these systems is not mediated by symbolic structures, but is naturally coupled with sub-symbolic computations.

It is worth noting that the benchmark results are self-reported by the authors and have not yet undergone peer review, as the paper is a preprint.

Why the Language of Thought Is Worth Challenging

The LoT hypothesis has been influential across cognitive science, linguistics, and AI for decades. Its core claim — that thinking is, in some fundamental sense, a form of computation over mental symbols that share structural properties with language — has shaped everything from classical AI architectures to theories of human concept formation.

The hypothesis predicts that symbolic representations should be at least as expressive and efficient as non-symbolic ones for any genuine cognitive task. The EAP, if it holds up under scrutiny, introduces a computational counterexample: a class of tasks where imposing symbolic structure actively degrades performance rather than preserving it.

The authors are careful to frame this not as a wholesale refutation of LoT but as evidence for cognitive pluralism — the view that different tasks and different systems may require or naturally adopt different computational formats, some symbolic, some not.

What 'Inscrutable' Communication Actually Means Here

A key feature of the emergent protocol in the experiment is that it is inscrutable — neither human-readable nor straightforwardly interpretable. This is a common finding in emergent communication research: agents allowed to develop their own signaling systems under task pressure tend to produce compact, efficient codes that resist easy human interpretation.

This inscrutability is itself philosophically significant. If the most efficient form of machine cognition is one humans cannot directly read or audit, it raises immediate questions about AI transparency and alignment. The paper acknowledges this tension, flagging implications for AI ethics — specifically, the difficulty of overseeing systems whose internal representational formats are deliberately or incidentally opaque.

The authors are not arguing that inscrutability is desirable from a safety standpoint. Rather, they are pointing out that forcing legibility onto these systems may carry a measurable performance cost — a trade-off that engineers, ethicists, and policymakers will need to navigate explicitly.

Where This Fits in Emergent Communication Research

The study sits within a growing body of work on emergent communication in multi-agent systems. Researchers working on OpenAI and DeepMind multi-agent environments have previously shown that agents develop surprisingly efficient private protocols, but the explicit framing of these results against a specific philosophical hypothesis — LoT — is a relatively novel contribution.

The cooperative navigation setup used here is well-established in the MARL literature, which lends methodological credibility to the experimental design. However, the paper's broader philosophical conclusions rest on a single task domain, and it is not yet clear how generalizable the EAP is across different types of cognitive labor — particularly those that may inherently favor symbolic decomposition, such as logical reasoning or mathematical proof.

The preprint does not report ablations testing whether the performance gap persists across varied task complexity, agent architecture, or communication bandwidth — lines of follow-up work that reviewers are likely to request.

What This Means

If the Efficiency Attenuation Phenomenon survives peer review and replication, it will add rigorous computational weight to arguments that symbolic cognition is not a universal requirement for intelligent behavior — with direct consequences for both AI architecture design and the ethics of building systems whose reasoning resists human interpretation.

AI Agents Perform 50% Better With Their Own Secret Language Than With Human-Like Symbols, Challenging a Core Theory of Mind

What the Experiment Actually Tested

Why the Language of Thought Is Worth Challenging

What 'Inscrutable' Communication Actually Means Here

Where This Fits in Emergent Communication Research

What This Means

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

AI Agents Perform 50% Better With Their Own Secret Language Than With Human-Like Symbols, Challenging a Core Theory of Mind

What the Experiment Actually Tested

Why the Language of Thought Is Worth Challenging

What 'Inscrutable' Communication Actually Means Here

Where This Fits in Emergent Communication Research

What This Means

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models