RL Document Rewriting Boosts Small AI Retrievers

A new reinforcement learning method can rewrite documents so that smaller, cheaper AI retrieval models match or exceed the performance of larger, more expensive ones — without requiring access to the retrieval system's internal weights.

Editor's Note: This article is based on a preprint research paper that has not yet undergone peer review. DeepBrief is actively monitoring for peer-reviewed publication and additional independent analysis.

Retrieval-augmented generation and enterprise search systems depend heavily on the quality of document indexing. A long-standing technique called document expansion — adding extra text to documents before they are indexed — was designed to improve how well retrievers match queries to relevant content. Recent research, however, showed that document expansion often backfires with modern neural retrievers, injecting noise that confuses rather than clarifies. The new paper, published on arXiv in April 2025, reframes the problem entirely.

From Expansion to Optimisation: A Subtle but Important Shift

Instead of simply appending text, the researchers treat document preparation as an optimisation problem. A language model — or a vision-language model for image-heavy documents — is fine-tuned to transform documents into versions that align better with the kinds of queries a target retriever is likely to receive. The training signal comes from the retriever's own ranking behaviour: if a transformed document ranks higher for relevant queries, the model is rewarded.

The method requires only black-box access to retrieval ranks, meaning it can be applied to commercial APIs and proprietary systems without any special permissions.

The reward mechanism uses GRPO (Group Relative Policy Optimisation), a reinforcement learning algorithm that has gained traction in language model training. Because the system only needs to observe whether rankings improve or worsen — not inspect any internal model parameters — it is compatible with closed APIs such as OpenAI's embedding models, as well as open retrieval systems.

Smaller Models Closing the Gap on Larger Ones

The practical results are significant. Applying document optimisation to OpenAI's text-embedding-3-small model raised its nDCG@5 score (a standard retrieval accuracy metric) from 58.7 to 66.8 on a code retrieval benchmark, and from 53.3 to 57.6 on visual document retrieval. These figures, which are self-reported by the researchers, approximate or exceed the scores of text-embedding-3-large — 66.3 on code and 57.0 on visual document retrieval — despite the larger model costing 6.5 times more per API call, according to the paper.

nDCG@5 measures how well a system ranks the five most relevant results at the top of a list, weighted so that higher positions matter more. A higher score means relevant documents appear earlier in search results.

The approach works across three retrieval architectures: single-vector models (which compress a document into one embedding), multi-vector models (which retain richer representations), and lexical retrievers (which rely on keyword matching). This breadth makes the technique applicable to a wide range of deployed systems.

When You Can Access the Retriever Directly

For organisations using open-source or self-hosted retrievers, the picture improves further. When retriever weights are accessible, the researchers found that document optimisation alone is often competitive with directly fine-tuning the retriever — a more computationally expensive process. Combining both approaches produced strong results across most settings.

On the Jina-ColBERT-V2 retriever, a multi-vector model, combining document optimisation with retriever fine-tuning improved the nDCG score from 55.8 to 63.3 on visual document retrieval and from 48.6 to 61.8 on code retrieval. ColBERT-style models store multiple embeddings per document and are known for high accuracy but also higher storage and compute costs.

The method also addresses a practical bottleneck: document transformation happens offline, before indexing, so there is no added latency at query time. This is a meaningful operational advantage for production search systems that must respond in milliseconds.

What This Means

Organisations using commercial embedding APIs can now apply reinforcement learning to their document preparation pipeline — with no model access required — to reduce the performance gap between budget and premium retrieval tiers, potentially reducing costs without sacrificing search quality.

RL-Trained Document Rewriting Lets Small Retrievers Match Larger Rivals

From Expansion to Optimisation: A Subtle but Important Shift

Smaller Models Closing the Gap on Larger Ones

When You Can Access the Retriever Directly

What This Means

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

RL-Trained Document Rewriting Lets Small Retrievers Match Larger Rivals

From Expansion to Optimisation: A Subtle but Important Shift

Smaller Models Closing the Gap on Larger Ones

When You Can Access the Retriever Directly

What This Means

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models