Microsoft launched Copilot Health in late March 2026, integrating AI-powered medical record queries directly into its Copilot app — a move that arrived days after Amazon broadened access to its large language model-based Health AI tool beyond paid One Medical members.

The near-simultaneous announcements underscore how quickly major technology companies are moving to claim space in consumer health, a sector that has seen a significant influx of AI-driven products over the past two years. What has not kept pace, according to reporting by MIT Technology Review, is robust, independent evidence that these tools actually improve health outcomes.

The wave of AI health products is accelerating — but independent evidence of clinical effectiveness remains thin.

From Restricted Pilots to Mass-Market Products

Amazon's Health AI was previously available only to subscribers of One Medical, its primary-care membership service. Expanding it to a broader user base represents a significant shift in scale and in the stakes involved. Microsoft's Copilot Health, meanwhile, enables users to link personal medical records and receive responses to specific health questions — a feature that puts sensitive clinical data directly inside a general-purpose AI assistant.

Both products arrive in a market already crowded with AI symptom checkers, mental health chatbots, medication management apps, and diagnostic support tools. The common promise is the same: faster, more personalized health information at a fraction of the cost of a clinical visit. The common problem is also the same — few of these tools have been subjected to the kind of rigorous, peer-reviewed clinical evaluation that would be expected of a medical device.

What the Research Actually Shows

Studies that do exist paint a mixed picture. A 2024 review published in npj Digital Medicine, examining 51 AI-powered health chatbots across a range of conditions, found that while many tools performed well on accuracy benchmarks using curated test questions, real-world performance degraded significantly when users asked ambiguous or emotionally complex questions. Only 12 of the 51 tools reviewed had been evaluated in any form of prospective clinical trial.

A separate Stanford-led study of 1,400 patients using an AI mental health support app found modest improvements in self-reported anxiety scores over eight weeks, but noted that users with more severe presentations often disengaged from the tool entirely — meaning the population most likely to benefit may be the least likely to be reached.

The pattern is consistent: AI health tools tend to perform well in controlled evaluations, and less predictably in the hands of a general public that spans a far wider range of literacy levels, health conditions, and expectations.

Regulation Has Not Caught Up

The regulatory landscape remains fragmented. In the United States, the Food and Drug Administration classifies some AI health software as medical devices subject to pre-market review, but many consumer-facing tools — particularly those that offer information rather than a direct clinical decision — fall outside that framework. The result is a large and growing category of products that carry significant health implications but face limited mandatory oversight before reaching users.

In the European Union, the AI Act and the Medical Device Regulation together create a stricter pathway for high-risk AI health applications, but implementation is still in progress and enforcement capacity remains uneven across member states.

Microsoft and Amazon have not published clinical trial data for their respective health AI products, according to MIT Technology Review. Both companies describe their tools as informational aids rather than diagnostic instruments — a framing that, intentionally or not, positions the products outside the most demanding regulatory categories.

The Human Cost of Getting This Wrong

For patients, the stakes are not abstract. A person who receives an inaccurate AI-generated interpretation of their blood test results, or who is reassured by a chatbot when symptoms warrant urgent care, may delay treatment in ways that carry serious consequences. Conversely, tools that perform well could meaningfully expand access to health information for the approximately 100 million Americans who, according to the Kaiser Family Foundation, live in areas with a shortage of primary care providers.

The equity dimension is significant. If AI health tools are primarily adopted by people who are already health-literate, digitally connected, and engaged with the formal healthcare system, they risk widening existing disparities rather than closing them. If they genuinely reach underserved populations, they could represent a meaningful expansion of access — but only if they are accurate and safe for those users.

Healthcare professionals have responded with a mix of cautious interest and concern. Some clinicians see potential in tools that help patients arrive at appointments better prepared and better informed. Others worry that consumer AI health products will generate a new wave of anxious patients presenting with AI-derived self-diagnoses that consume clinical time to unpick.

What Comes Next

Pressure for clearer standards is building. Several health systems in the United Kingdom and United States are reportedly developing internal evaluation frameworks for AI tools that interact with their patients, independent of what regulators require. Academic medical centers are beginning to publish prospective studies, though the pipeline of rigorous evidence remains far smaller than the pipeline of new products.

Microsoft and Amazon are not alone. Google, Apple, and a growing roster of health-focused startups are all competing in the same space. The commercial logic is compelling — healthcare is a multi-trillion-dollar sector with well-documented inefficiencies that AI could, in principle, help address. The scientific logic requires more patience.

What This Means

Patients and clinicians now face a market where the sophistication of AI health tools has outrun the evidence base needed to evaluate them — choosing wisely requires scrutiny that most users are not equipped, and should not be expected, to perform alone.