The performance gains that once made each new generation of large language models a headline event are flattening out, and organizations waiting for the next breakthrough from a foundation model provider may be waiting too long.
For the past several years, enterprises adopted a straightforward AI strategy: wait for OpenAI, Google, Anthropic, or another frontier lab to release a more capable general-purpose model, then layer applications on top. That approach delivered results when each model generation brought 10x improvements in reasoning and coding capability. According to MIT Technology Review, those jumps have now compressed into incremental gains — with one significant exception.
Where Step-Change Improvements Still Happen
Domain-specialized AI — models trained or fine-tuned on industry-specific data, workflows, and terminology — continues to deliver the kind of dramatic capability leaps that general models no longer produce. The gap between a general-purpose model answering a clinical question and one fine-tuned on a hospital system's patient records, treatment protocols, and physician notes is not marginal. It is, in many real-world deployments, the difference between a useful tool and a trusted one.
This distinction matters because it reframes the question organizations should be asking. The relevant comparison is no longer GPT-4 versus GPT-5. It is a generic model versus a model fused with an organization's own institutional knowledge.
Domain-specialized intelligence is now where step-function improvements are still the norm — and that changes what AI strategy has to look like.
Customization as Infrastructure, Not Experiment
The shift MIT Technology Review describes is architectural in nature, not merely tactical. Treating AI customization as an experiment — a proof-of-concept to be evaluated alongside other innovation initiatives — misreads the competitive dynamic now taking shape. Organizations that have embedded their proprietary data into model behavior are not simply getting better answers. They are creating systems that understand context, terminology, and decision logic that no public model can replicate.
This has practical consequences for enterprise AI investment. Fine-tuning, retrieval-augmented generation (RAG), and increasingly, full model customization pipelines require infrastructure choices made early. Data governance, labeling workflows, and model evaluation frameworks are not features that can be bolted on after a product ships. They are foundations that either exist or do not.
The businesses moving fastest are those that recognized this 12 to 18 months ago and built the data and infrastructure necessary to make customization work at scale.
What the Plateau Means for AI Vendors
The flattening of general model capability also carries implications for the commercial AI market. If marginal gains from foundation model upgrades are shrinking, the value proposition of simply accessing a more powerful API weakens. Vendors who compete primarily on raw model capability face a commoditization pressure that specialized, vertically integrated competitors do not.
This is already visible in enterprise procurement conversations. Buyers increasingly ask not just what a model can do, but how well it can be adapted to their specific environment — and how much control they retain over that process. The ability to customize, audit, and own the behavior of an AI system is becoming a procurement criterion alongside cost and latency.
For AI infrastructure companies, this represents an opportunity. Tools that simplify fine-tuning pipelines, manage training data at scale, or enable organizations to build evaluation benchmarks tailored to their domain are positioned to capture value that once flowed primarily to the frontier model builders.
The Organizational Capability Gap
The challenge is that meaningful model customization requires capabilities most organizations do not yet have. Curating high-quality training data from internal systems is technically demanding and organizationally complex. It requires cooperation between data engineering, legal and compliance, and the domain experts whose knowledge the model needs to absorb.
Many enterprises have stalled at the retrieval-augmented generation stage — connecting a general model to internal documents via search — because it requires less infrastructure than true fine-tuning. RAG produces real improvements, but it does not produce the kind of deep behavioral adaptation that comes from training a model on domain-specific examples over time. Organizations that treat RAG as a destination rather than a stepping stone may find themselves outpaced by competitors willing to make the harder infrastructure investments.
The talent requirement is also real. Building and maintaining customized AI systems requires machine learning engineers who understand both model behavior and the specific domain the model serves — a combination that remains scarce.
What This Means
For any organization still treating AI customization as optional, the architectural window to catch up is narrowing. Companies that have spent the past year building proprietary training pipelines and domain-adapted models are establishing leads that a better foundation model release will not automatically erase.
