GrandCode AI Wins Live Coding Contests 2026

An AI system called GrandCode has become the first to consistently place first in live competitive programming competitions, placing first in three consecutive Codeforces contests held in March 2026, according to a preprint published on ArXiv.

Editor's Note: This article is based on a preprint research paper that has not yet undergone peer review. DeepBrief is actively monitoring for peer-reviewed publication and additional independent analysis.

Competitive programming has long been considered one of the last domains where elite humans held a reliable advantage over AI. Prior to GrandCode, the strongest result belonged to Google's Gemini 3 Deep Think, which reached 8th place — and notably, that result was not achieved under live competition conditions. GrandCode changes the picture entirely.

GrandCode is the first AI system that consistently outperforms all human participants in live contests of competitive programming, including legendary grandmasters.

How GrandCode Works: Agents Talking to Agents

GrandCode is not a single model but a multi-agent reinforcement learning system — a collection of specialised AI modules that work together. These modules handle distinct parts of the problem-solving process: one proposes hypotheses about the problem structure, another writes solution code, a third generates test cases to probe that code, and a summarisation module distills findings across attempts.

The key insight is that these modules are trained jointly, not in isolation. The system improves through two complementary mechanisms: post-training (learning from accumulated experience offline) and online test-time RL (continuing to learn and adapt during the actual competition). This combination allows GrandCode to refine its approach even as it competes.

The researchers also introduce a new training algorithm called Agentic GRPO, designed specifically for the challenges that arise when reinforcement learning is applied to multi-stage agent pipelines. Standard RL methods struggle in this setting for two reasons: rewards are delayed — you don't know if your strategy worked until several steps later — and the data the system trains on drifts increasingly far from the data it was originally designed for, a problem known as off-policy drift. Agentic GRPO is the team's solution to both.

Three Wins, Three Weeks: The Live Competition Results

The paper reports that GrandCode placed first in all three of the most recent Codeforces live competitions: Round 1087 on March 21, 2026, Round 1088 on March 28, and Round 1089 on March 29. All three were live contests, meaning GrandCode competed in real time under the same conditions as human participants, with no special accommodations.

This distinction matters. Previous AI milestones in competitive programming — including the Gemini result — were achieved outside live competition settings, raising questions about whether the performance would hold under realistic time pressure and against active human competition. GrandCode's wins are claimed under live conditions, which the authors argue makes this a categorically stronger result.

Codeforces is one of the world's most prestigious competitive programming platforms, attracting participants rated as International Masters and Grandmasters — designations earned through sustained top performance over years. Placing first in this field consistently, not just once, is what the authors point to as the system's defining achievement.

What the Research Claims — and What Needs Scrutiny

It is worth noting that these results are self-reported in a preprint that has not yet undergone peer review. The paper has not been independently verified by Codeforces or an external body at time of publication. Readers should treat the specific competition placements as the authors' claims pending third-party confirmation.

The broader technical claims — particularly around Agentic GRPO and the system's architecture — will face scrutiny from the research community in the coming weeks. The core algorithmic contribution, if it holds up, addresses a genuine and well-documented problem in applying RL to complex, multi-step agent systems. Whether the specific design choices are optimal or whether simpler approaches could achieve similar results is an open question that future work will explore.

The paper does not disclose which underlying language model or models power GrandCode's individual agents, which limits independent analysis of where the performance gains originate — from the base model, the agent architecture, or the training procedure.

What This Means

If GrandCode's results hold up to independent verification, competitive programming — one of the last domains where human experts held a reliable edge over AI — has now fallen, and the architecture that achieved it points toward a broader shift in how AI systems are built: not as single models, but as coordinated networks of specialised agents trained to improve together.

GrandCode AI Places First in Live Coding Contests—A First

How GrandCode Works: Agents Talking to Agents

Three Wins, Three Weeks: The Live Competition Results

What the Research Claims — and What Needs Scrutiny

What This Means

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

GrandCode AI Places First in Live Coding Contests—A First

How GrandCode Works: Agents Talking to Agents

Three Wins, Three Weeks: The Live Competition Results

What the Research Claims — and What Needs Scrutiny

What This Means

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models