A research team has developed an AI system capable of inferring exact mathematical equations from images of physical fields — recovering the kind of analytical solutions that scientists normally derive through prolonged manual work — and demonstrated that an open-weight model can outperform leading closed-source AI on the task.

The capability, called visual-to-symbolic analytical solution inference (ViSA), addresses a gap in AI-assisted science: most current models can describe what a visualisation looks like, but cannot extract the underlying mathematical law it encodes. The paper, posted to ArXiv in April 2025, focuses on two-dimensional linear steady-state physical fields — think temperature distributions, electrostatic potentials, or laminar fluid flows — where the governing equations are well-defined but visually hidden.

From Pixels to Equations: What the System Actually Does

Given an image of a physical field — a colour-mapped plot showing how a quantity varies across space — plus optional images of its first-order derivatives and a small amount of metadata, ViSA-R2 must output a single, fully specified mathematical expression. The expression is produced in SymPy, a Python library for symbolic mathematics, meaning it is executable and verifiable, not just a textual description.

This is harder than it sounds. The model must not only identify the general shape of the solution — what the researchers call the ansatz, or structural hypothesis — but also pin down every numerical constant within it. A solution that gets the structure right but the constants wrong is physically useless.

The model must output a single executable SymPy expression with fully instantiated numeric constants — a solution that gets the structure right but the constants wrong is physically useless.

To handle this, the team designed a reasoning pipeline called self-verifying, solution-centric chain-of-thought. It walks through four stages modelled on how a physicist would approach the problem: recognising structural patterns in the image, forming a hypothesis about which family of solutions fits, deriving the specific parameters, and then checking the result for internal consistency. This staged approach is intended to reduce the hallucination of plausible-but-wrong equations.

ViSA-Bench: A Standardised Test for a New Capability

Alongside the model, the researchers released ViSA-Bench, a synthetic benchmark designed to evaluate this capability systematically. It covers 30 linear steady-state physical scenarios with verified analytical and symbolic ground-truth annotations, making it possible to score predictions objectively.

The benchmark evaluates models on three dimensions: numerical accuracy (are the constants right?), expression-structure similarity (does the equation have the correct mathematical form?), and character-level accuracy (does the symbolic output match the ground truth string?). The combination is intentional — each metric catches different failure modes that the others might miss.

The benchmark is described as VLM-ready, meaning it is formatted for direct use with vision-language models, lowering the barrier for other research groups to run their own evaluations under a standardised protocol.

Open Model Performance on the Benchmark

The core model, ViSA-R2, runs on Qwen3-VL, an open-weight vision-language model with 8 billion parameters developed by Alibaba. According to the authors' evaluation, ViSA-R2 outperforms both strong open-source baselines and closed-source frontier vision-language models tested. These benchmark results are self-reported by the authors and have not yet been independently replicated.

The choice of an 8B open model is significant. It suggests the performance gains come primarily from the specialised reasoning pipeline and training approach rather than raw model scale. If that holds under independent scrutiny, it points toward a practical path: domain-specific reasoning architectures may be more efficient than simply using larger general-purpose models for scientific tasks.

The paper does not name which closed-source models were tested in the comparison, referring instead to "evaluated closed-source frontier VLMs" — a detail that independent reviewers will likely want clarified.

Why Scientific Visualisation Is a Hard Problem for AI

Current vision-language models are generally trained to describe images in natural language or answer factual questions about them. Scientific visualisations present a different challenge: the meaningful information is encoded in gradients, contours, symmetries, and spatial relationships that must be translated into formal mathematical structure, not words.

Previous approaches to equation discovery — a field sometimes called symbolic regression — typically operate on raw numerical data rather than images. ViSA-R2 is designed to work when only a rendered visualisation is available, which is often the case when reading published research papers, textbooks, or archived datasets where the underlying data is no longer accessible.

This positions the technology as a potential tool for automated literature mining: feeding scientific figures into a system that can extract and formalise the physics encoded in them, rather than requiring a human expert to do so.

What This Means

If ViSA-R2's performance holds under independent evaluation, it represents a concrete step toward AI systems that can extract quantitative scientific knowledge directly from the visual record of published research — compressing work that currently takes expert hours into seconds.