Amazon Web Services has detailed a hybrid RAG architecture that combines semantic and keyword-based search, using Amazon Bedrock, Amazon Bedrock AgentCore, Strands Agents, and Amazon OpenSearch to power a generative AI agentic assistant.

Retrieval-augmented generation has become a standard pattern for grounding large language model outputs in real data, but most implementations rely on a single retrieval method. Pure vector search captures meaning well but can miss exact matches; pure keyword search finds precise terms but lacks contextual understanding. Hybrid RAG addresses both weaknesses by running both retrieval strategies in parallel and combining their results before passing context to the language model.

Why Hybrid Retrieval Offers Different Trade-Offs

The core problem with single-method retrieval is coverage. A vector-only system might retrieve a semantically similar document while missing the one containing the exact product code or regulation number a user asked about. A keyword-only system returns literal matches but fails when users phrase queries differently from the source documents.

Hybrid RAG addresses both weaknesses by running semantic and keyword retrieval in parallel, combining results before any text reaches the language model.

Amazon OpenSearch supports both retrieval modes natively — dense vector search through its k-NN engine and BM25-based full-text search — making it a practical single store for hybrid pipelines. The AWS blueprint uses OpenSearch as the unified retrieval layer, avoiding the operational overhead of maintaining separate vector and keyword databases.

How the Agentic Layer Fits In

The architecture goes beyond a static retrieval pipeline by incorporating an agentic layer. Strands Agents, AWS's open-source agent framework, coordinates the retrieval steps and decides when to query the search index, when to call other tools, and how to assemble a final response. Amazon Bedrock AgentCore provides the managed runtime that hosts and scales the agent, handling session management and tool orchestration without requiring developers to build that infrastructure themselves.

This agentic design matters for enterprise use cases where a single retrieval call is rarely sufficient. Complex questions may require the agent to decompose a query, retrieve context from multiple searches, and synthesize an answer across several documents. A static RAG pipeline cannot adapt mid-task; an agent can.

Developer Experience and Integration Complexity

For teams already operating within the AWS ecosystem, the integration path is relatively contained. OpenSearch Service is a managed offering, so developers do not administer the underlying cluster directly. Bedrock provides access to foundation models — including embedding models needed to generate vectors for semantic search — through a single API, removing the need to manage separate model endpoints.

Strands Agents is open source, which means teams can inspect, extend, and test agent logic locally before deploying to Bedrock AgentCore. This reduces lock-in risk at the orchestration layer even when the surrounding infrastructure is AWS-native. The framework handles tool definitions, memory, and chain-of-thought steps through a structured SDK.

The practical workflow for a development team would involve: indexing source documents into OpenSearch with both keyword and vector representations, defining retrieval tools in Strands Agents that query both indexes, configuring a Bedrock foundation model to act as the reasoning core, and deploying the agent runtime via Bedrock AgentCore. AWS has not published explicit per-query cost estimates for the combined stack in this blueprint, but costs would accrue across OpenSearch instance hours, Bedrock model inference tokens, and AgentCore runtime usage — all billed separately according to standard AWS pricing.

What Separates This from Existing RAG Toolkits

Several open-source frameworks — LangChain, LlamaIndex — already support hybrid retrieval patterns, and they are not AWS-specific. The differentiation AWS is making here is operational rather than algorithmic: managed infrastructure, unified IAM security, and tight service integration reduce the engineering effort required to move from prototype to production.

For organisations with existing AWS commitments, the managed path lowers the barrier. For teams without those commitments, the architecture introduces meaningful vendor dependency across four AWS services simultaneously. The Strands Agents open-source layer partially mitigates that at the orchestration level, but the retrieval, embedding, and runtime components remain AWS-hosted.

AWS published the blueprint as a technical blog post rather than a generally available product feature, meaning teams need to implement the pattern themselves using the described components. There is no single-click deployment; the post functions as an architectural reference and implementation guide.

What This Means

Developers building enterprise search or question-answering systems on AWS now have a documented, production-oriented blueprint for hybrid RAG that combines the accuracy benefits of dual retrieval with a managed agentic runtime — reducing the infrastructure work that has historically made these architectures complex to operate at scale.