Stanford AI Index Shows US-China Model Gap Narrows to 2.7%

The 2026 AI Index Report, published this week by Stanford University's Institute for Human-Centered Artificial Intelligence (HAI), reports that the performance gap between the top American and Chinese AI models has narrowed to 2.7%, down from a range of 17.5 to 31.6 percentage points across major benchmarks in May 2023.

According to the report, Anthropic's Claude Opus 4.6 leads the Arena leaderboard with a score of 1,503 as of March 2026, while ByteDance's Dola-Seed-2.0-Preview sits at 1,464. The report states that DeepSeek's R1 reasoning model matched the top US model in February 2025, and that American and Chinese models have traded the lead multiple times since. The findings are summarized by TNW from the 423-page Stanford document. Source: https://thenextweb.com/news/stanford-ai-index-2026-china-us-performance-gap

Investment disparity and its caveats

The Stanford report states that US private AI investment reached $285.9 billion in 2025, compared with $12.4 billion in China — a ratio of roughly 23 to 1. California alone accounted for $218 billion, more than 75% of the US total, according to the report.

The report notes that private investment data "likely understates" China's actual AI spending, because the Chinese government channels resources through guidance funds and state-initiated investment vehicles that do not appear in private capital databases. The authors caution that the headline spending ratio may therefore overstate the true gap.

On model output, the report records that US companies produced 50 notable AI models in the most recent reporting period, compared with 30 from China. China's count doubled from 15 the previous year, the report says, while the US count grew more modestly. The US hosts 5,427 data centres, more than ten times any other country, according to Stanford's figures.

Where the report says China leads

Stanford's authors report that Chinese researchers produced 23.2% of global AI publications and 20.6% of citations in the reporting period, compared with 12.6% for the US. Chinese entities filed 69.7% of all AI patents worldwide, the report states.

The number of AI scholars moving to the United States has dropped 89% since 2017, with 80% of that decline occurring in the last year alone.

The report also records that China installed 295,000 industrial robots in the most recent reporting period, compared with 34,200 in the United States. China's electricity reserve margin has not dropped below 80%, the authors write, while they identify the US power grid as a potential infrastructure bottleneck for AI growth.

On talent, the report describes the drop in AI researcher migration to the US as "precipitous." It ranks Switzerland first in the world for AI researchers and developers per capita.

Benchmark methodology and what a 2.7% gap means

The 2.7% figure is anchored to Chatbot Arena, a crowdsourced human-preference benchmark in which users compare paired model outputs. DeepBrief was unable to obtain independent researcher comment on the Stanford report's methodology before publication and will update this article if outside analysts weigh in on how Arena score differences map to capability gaps in deployed systems.

Arena scores reflect aggregate human preferences on open-ended prompts and are not a direct measure of performance on held-out technical benchmarks such as MMLU, GPQA, or SWE-bench. The Stanford report itself documents divergent results across benchmarks: on SWE-bench, a software engineering benchmark, the report says model performance rose from 60% to near 100% in a single year. On graduate-level science questions, the report records model accuracy of 93%, above the expert human validator baseline of 81.2%.

Readers tracking related evaluation questions can see DeepBrief's earlier coverage of lightweight graph parsers outperforming LLMs on complex relation extraction and what makes preference training data work for AI reasoning, both of which bear on how benchmark numbers translate into capability claims.

The "jagged frontier"

The report describes what it calls a "jagged frontier" of capability. It states that the top model reads analog clocks correctly only 50.1% of the time. Robotic manipulation systems achieve 89.4% success in simulation, according to the report, but only 12% in real household tasks.

The authors also write that nearly half of more than 500 clinical AI studies reviewed used exam-style questions rather than real patient data, and that only 5% used actual clinical records. This echoes findings in DeepBrief's reporting on systematic failures when AI agents serve multiple users, where benchmark success did not translate to deployment reliability.

Adoption, trust, and regulation

The Stanford report states that generative AI reached 53% population adoption within three years of launch. Eighty-eight percent of organizations report using AI, according to the report, and four in five university students use generative AI tools. The US ranks 24th globally in adoption at 28.3%, behind Singapore at 61% and the UAE at 54%, the authors write.

On trust, the report states that 31% of Americans trust their government to regulate AI, the lowest figure among countries surveyed and below the global average of 54%. It records that 73% of AI experts expect a positive impact on jobs, compared with 23% of the general public.

The report says 47 countries now have active AI legislation, but only 12 have enforcement mechanisms. Documented enforcement actions rose from 43 in 2024 to 156 in 2025, according to Stanford's figures.

Environmental footprint

The report states that training xAI's Grok 4 produced 72,816 tonnes of CO2 equivalent, which it compares to the emissions of driving 17,000 cars for a year. AI data centre power capacity reached 29.6 gigawatts globally, the report says.

The full 2026 AI Index Report is published by Stanford HAI at hai.stanford.edu/ai-index/2026-ai-index-report. DeepBrief has not independently verified the underlying benchmark data or investment figures cited in the report.

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Investment disparity and its caveats

Where the report says China leads

Benchmark methodology and what a 2.7% gap means

The "jagged frontier"

Adoption, trust, and regulation

Environmental footprint

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Investment disparity and its caveats

Where the report says China leads

Benchmark methodology and what a 2.7% gap means

The "jagged frontier"

Adoption, trust, and regulation

Environmental footprint

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans