Google DeepMind has introduced a cognitive framework providing structured, measurable criteria for tracking progress toward artificial general intelligence, paired with a public Kaggle hackathon designed to crowdsource the evaluations needed to put that framework into practice.
The announcement addresses one of the most contested methodological problems in AI research: how to determine, with scientific precision, whether a system is genuinely advancing toward AGI or merely improving on narrow, task-specific metrics. Existing benchmarks — including those measuring language comprehension, mathematical reasoning, and coding ability — have repeatedly been criticised for being gameable, saturating too quickly, or failing to capture the breadth of capabilities implied by the term "general intelligence."
Decomposing Intelligence Into Measurable Dimensions
DeepMind's framework approaches the problem by decomposing cognitive progress into defined dimensions, allowing researchers to situate a given AI system along a structured continuum rather than applying a binary AGI or non-AGI label. The specific dimensions draw on cognitive science and prior theoretical work on intelligence measurement, mapping AI capabilities against a broader model of what general reasoning actually requires.
DeepMind has not characterised any existing system — including its own — as having reached AGI under the new framework. Instead, the framework is positioned as a research and evaluation instrument: a shared vocabulary and methodology the field can use to make comparisons across systems and time periods more meaningful.
"Measuring progress toward AGI" is a phrase the field has used loosely for years, but DeepMind's publication represents one of the more formal attempts by a major AI laboratory to operationalise it.
Other organisations, including OpenAI and Anthropic, have published their own internal definitions and capability thresholds for AGI, though these vary significantly in structure and in the criteria they emphasise.
A Hackathon to Distribute the Burden of Benchmark Construction
The Kaggle hackathon component is designed to extend the framework's reach beyond DeepMind's internal teams. By inviting the global data science and AI research community to build evaluations consistent with the framework, DeepMind is effectively distributing the task of benchmark construction — a recognition that no single team can anticipate all the ways general intelligence might manifest or fail in AI systems.
Hackathon participants will design and submit evaluation tasks testing the specific cognitive dimensions identified in the framework. Winning submissions could inform future benchmark suites used to assess frontier AI models, giving the effort practical downstream relevance beyond the competition itself.
This community-based approach mirrors strategies used in other high-stakes scientific measurement efforts, where independent replication and diverse test construction reduce the risk of Goodhart's Law effects — the tendency for a measure to lose its validity once it becomes a target. If the AI field converges on a shared set of AGI evaluations, developers face stronger incentives to optimise for those specific tests rather than for the underlying capabilities the tests are meant to capture. DeepMind's framework does not claim to resolve this tension entirely, but structuring evaluation development as a community process introduces at least some degree of independence from the organisations whose systems will eventually be assessed.
Why Governments and Researchers Are Paying Attention
The timing of the announcement reflects broader industry dynamics. The accelerating pace of capability improvement in large language models and multimodal systems has intensified public and regulatory interest in AGI timelines. Governments in the United States, United Kingdom, and European Union have each referenced AGI — or systems approaching it — in recent AI policy documents, though most acknowledge the absence of a consensus scientific definition.
Without agreed-upon measurement tools, debates about whether current systems are "close" to AGI remain largely rhetorical. DeepMind's framework is an explicit attempt to shift that conversation onto more empirical ground, providing external researchers, policymakers, and the public with a consistent basis for interpreting capability claims.
The framework's reception will depend in part on whether competing laboratories and independent academic researchers adopt it as a shared standard or treat it as one proposal among many. AI evaluation has historically been fragmented, with individual organisations releasing proprietary benchmarks alongside published research, making cross-system comparisons difficult.
DeepMind has not yet disclosed the full technical details of the framework's cognitive dimensions or its scoring methodology — these are expected to accompany a formal academic publication. The Kaggle hackathon provides a practical mechanism for stress-testing the framework's operationalisability before those details are finalised or widely adopted.
What This Means
If major laboratories and independent researchers adopt DeepMind's framework as a shared standard, the AI field will have its first broadly accepted scientific basis for evaluating — and publicly debating — how close any given system actually is to AGI.
