A new academic paper demonstrates that an intermediary can profit by intelligently routing AI queries across competing model providers, achieving net margins of up to 40% — without training or owning a single model.
The research, published on ArXiv CS.AI and titled Computational Arbitrage in AI Model Markets, introduces a framework borrowed directly from financial markets: an "arbitrageur" sits between customers and model providers, allocating inference budgets across providers to deliver verifiable results at a lower price than any single provider offers. The study uses GPT-4 mini and DeepSeek v3.2 as representative models, tested against SWE-bench, a widely used benchmark for autonomous GitHub issue resolution.
How the Arbitrage Works
The core mechanism is straightforward. Customers submit a problem and specify the maximum they will pay for a verified solution. The arbitrageur selects which model — or combination of models — to use, optimizing for cost while meeting the verification threshold. Because different models have different strengths and price points, an intermediary with visibility across providers can consistently outperform what any single provider offers at a given price.
In the SWE-bench case study, simple arbitrage strategies produced profit margins of up to 40%. The paper also tests "robust" strategies designed to generalize beyond a single benchmark domain, finding these remain profitable across varied problem types.
An arbitrageur can undercut the market and create a competitive offering with no model-development risk.
This is a significant finding for the AI industry's commercial structure. It suggests that a new category of business — the AI inference broker — may not need to invest in the billions of dollars of compute and research required to build frontier models, yet can still capture meaningful margin from the market those models created.
The Distillation Angle Adds a Strategic Wrinkle
The paper goes further, examining model distillation as a related arbitrage vector. Distillation — the process of training a smaller, cheaper model to mimic a larger one — already underpins products from several major labs. The researchers argue that distillation can itself be an arbitrage strategy: by using outputs from an expensive "teacher" model to train a cheaper alternative, an arbitrageur can eventually reduce their reliance on the teacher entirely, eroding that provider's revenue over time.
This finding has direct implications for how frontier model providers think about API access. Providing cheap, unrestricted query access may inadvertently fund competitors building distilled substitutes — a tension that companies including OpenAI and Anthropic have already begun addressing through terms of service restrictions.
Market Structure Effects Cut Both Ways
The paper's economic analysis is notably nuanced. It does not frame arbitrage purely as a threat to model providers. Instead, it identifies a dual effect on market structure.
On one hand, multiple competing arbitrageurs drive down consumer prices and compress the marginal revenue of model providers — the classic outcome of intermediary competition in any commodity market. Providers that have relied on pricing power derived from capability differentiation could see that advantage eroded.
On the other hand, arbitrage also reduces market segmentation by enabling smaller or newer model providers to generate revenue earlier than they otherwise could. An arbitrageur might route a subset of queries to a lower-cost, less-capable model where that model's output is sufficient — giving that provider revenue that a direct customer relationship would not have generated. This could lower barriers to entry across the model provider tier.
The SWE-Bench Case Study as Proof of Concept
The choice of SWE-bench as the empirical testbed is deliberate. The benchmark involves resolving real GitHub issues, and critically, solutions are verifiable — code either passes the test suite or it does not. Verifiability is load-bearing for the entire arbitrage framework: without an objective way to confirm that the delivered solution meets the customer's standard, the arbitrageur cannot reliably guarantee quality or price accordingly.
The researchers acknowledge this constraint. Arbitrage, as they define it, is most viable in domains where outputs can be checked programmatically. This currently includes coding tasks, mathematical proofs, and certain data processing workflows. It is less immediately applicable to open-ended generation tasks such as marketing copy or strategic advice, where quality judgment remains subjective.
This limitation also defines the near-term opportunity. As AI applications increasingly move toward agentic and verifiable workflows — a direction most major labs are actively pursuing — the addressable market for computational arbitrage expands in parallel.
A New Business Model Hiding in Plain Sight
The infrastructure for this kind of routing already exists in nascent form. Services such as OpenRouter aggregate access to multiple model APIs and allow developers to select models by cost or capability. The paper's contribution is to formalize the economic logic underpinning such services and demonstrate that optimized arbitrage strategies, rather than simple manual selection, can generate substantial and consistent margins.
The research does not quantify the absolute revenue potential of this market or project adoption timelines, but it establishes the theoretical and empirical basis for treating computational arbitrage as a legitimate and scalable business model.
What This Means
Any company building on top of AI model APIs — and any model provider selling access to them — now has empirical evidence that sophisticated intermediaries can profitably extract value from the gap between competing models, which will pressure both pricing strategies and API access policies across the industry.