Cloudflare announced the private beta of Agent Memory, a managed service the company describes as extracting information from agent conversations and surfacing it on demand without consuming the model's context window. The announcement was published on the Cloudflare blog on April 17, 2026.

According to Cloudflare, the service is designed to give AI agents persistent memory so they can "recall what matters, forget what doesn't, and get smarter over time." The company frames the product as a response to a tension it describes between keeping full conversation history in context — which it says degrades output quality — and aggressively pruning, which it says risks discarding information the agent needs later.

The Problem Cloudflare Says It Is Addressing

Cloudflare writes that even as context windows extend past one million tokens, a degradation pattern it calls "context rot" remains unsolved. The blog post links to research hosted by Chroma on the topic, though Cloudflare does not claim authorship of the term in the post itself.

The company states that agents running "for weeks or months against real codebases and production systems" expose gaps that existing memory approaches do not fully address. Cloudflare argues such agents need fast ingestion, retrieval that "doesn't block the conversation," and models that keep per-query costs reasonable.

Agent Memory is a managed service with an opinionated API and retrieval-based architecture. We've carefully considered the alternatives, and we believe this combination is the right default for most production workloads.

That framing, taken directly from the Cloudflare post, positions the product against two alternative designs the company describes: self-hosted pipelines where developers run extraction themselves, and approaches that give models raw filesystem or database access to design their own memory queries.

How the API Is Structured

Cloudflare describes a profile-based model in which memories are stored under a named profile and accessed through a set of operations. According to the post, those operations include ingest (bulk extraction from a conversation, typically triggered when a harness compacts context), remember (storing a single memory explicitly), recall (running the full retrieval pipeline and returning a synthesized answer), list, and forget.

A code sample in the post shows the service being accessed via a Workers binding named MEMORY, with profile.ingest() taking an array of role-tagged messages and a sessionId. The recall method, per the example, returns a result field containing a synthesized answer rather than raw memory records.

Cloudflare says the service is accessible as a binding from any Cloudflare Worker and via a REST API for agents running outside Workers. The company states that the service integrates with the Cloudflare Agents SDK as the reference implementation for the memory portion of its Sessions API.

Target Workloads

Cloudflare outlines three categories of deployment it says Agent Memory is designed to support. The first is memory for individual agents, including coding agents such as Claude Code and OpenCode, self-hosted frameworks, and managed services including Anthropic's Managed Agents.

The second is custom agent harnesses, including what Cloudflare describes as background agents running without a human in the loop. The post cites publicly documented systems from Ramp, Stripe, and Spotify as examples of this pattern.

The third is shared memory across agents, people, and tools. Cloudflare writes that "a memory profile doesn't have to belong to a single agent," and describes a scenario in which a team of engineers shares a profile so that coding conventions and architectural decisions learned by one person's agent are available to others.

Architecture Choices Cloudflare Defends

The post contrasts Agent Memory with other approaches in the agentic memory space. Cloudflare acknowledges that the space is "one of the fastest-moving" areas in AI infrastructure, with "new open-source libraries, managed services, and research prototypes launching on a near-weekly basis."

The company references three benchmarks — LongMemEval, LoCoMo, and BEAM — and cautions that such benchmarks can lead developers to build systems that overfit for a specific evaluation and underperform in production. Cloudflare does not publish its own benchmark results in the announcement.

On architecture, the company states that "tighter ingestion and retrieval pipelines are superior to giving agents raw filesystem access," citing cost, performance, and support for reasoning tasks such as temporal logic, supersession, and instruction following. Cloudflare adds that it "will likely expose data for programmatic querying down the road," but expects that capability to serve edge cases rather than common ones.

Pricing and Availability

The announcement describes Agent Memory as being in private beta. The post does not disclose pricing, general availability timing, rate limits, or beta access criteria. DeepBrief was unable to locate independent reporting or third-party evaluations of the service at the time of publication; readers should treat the capability and design descriptions above as Cloudflare's own characterizations.

Primary source: https://blog.cloudflare.com/introducing-agent-memory/