Researchers Use LLM-as-Judge to Train Smaller Language Models Without Labeled Data

AI Research CorrespondentApr 6ArXiv CS.CL✓Verified across 1 source

The Brief

A new reinforcement learning framework uses a large language model as an evaluator to train smaller LLMs on unlabeled data, eliminating the need for ground truth labels. The approach, which generates efficient single-token rewards, improved performance on math reasoning benchmarks when combined with verifiable rewards, showing LLM-based evaluators can provide effective training signals.

✓Verified across 1 independent source

Sources

01https://arxiv.org/abs/2604.02621

Researchers Use LLM-as-Judge to Train Smaller Language Models Without Labeled Data

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans

Sam Altman's Home Targeted in Second Attack Within 48 Hours

LLMs Lose Ground to Lightweight Graph Parsers When Relation Extraction Gets Complex