Researchers have proposed a graph-based theoretical framework that explains hallucinations in large language models as the product of two distinct mechanisms — one that emerges early in training, and one that solidifies later — offering what the authors describe as a unified account of a widely observed but poorly understood failure mode.

The paper, posted to ArXiv in April 2025, addresses one of the most persistent problems in deploying AI language systems: outputs that are fluent and confident yet factually wrong or inconsistent with the information the model was given. Despite the frequency of these failures, the internal mechanics of how decoder-only Transformer models — the architecture underlying systems like GPT-4 and Claude — generate them has remained largely opaque.

Reasoning as a Search Problem on a Graph

The authors reframe next-token prediction — the core process by which language models generate text — as a graph search problem. In their model, entities such as concepts, facts, and named objects correspond to nodes, while the relationships and transitions the model has learned during training form the edges connecting them. When a model reasons through a question, it is effectively navigating paths across this graph.

The framework draws a distinction between two types of reasoning. Intrinsic reasoning occurs when the model is given context — a document, a conversation, a set of facts — and must navigate a constrained subgraph derived from that input. Extrinsic reasoning occurs when no such context is provided, and the model relies entirely on structures memorised during training.

Reasoning hallucinations arise from two fundamental mechanisms: Path Reuse, where memorised knowledge overrides contextual constraints during early training, and Path Compression, where frequently traversed multi-step paths collapse into shortcut edges in later training.

What Path Reuse and Path Compression Actually Mean

Path Reuse describes what happens when a model defaults to a memorised reasoning route even when the current context calls for a different one. Think of it as a well-worn trail: the model has travelled a particular logical path so many times during training that it defaults to it automatically, even when the question at hand requires a detour. According to the paper, this tends to dominate during early training, when the model has not yet learned to respect the boundaries imposed by specific contexts.

Path Compression is a subtler, later-stage phenomenon. When the model repeatedly navigates a multi-step reasoning chain — say, moving from A to B to C — it can effectively learn a direct shortcut from A to C, collapsing the intermediate step. This makes the model faster and more fluent, but it also means it can skip reasoning steps that might be critical in edge cases. The shortcut works most of the time, which is precisely what makes it dangerous: the model appears to reason correctly, but the underlying logic has been hollowed out.

The authors argue these two mechanisms together account for a wide range of hallucination behaviours that researchers and practitioners have documented in real-world applications.

Why Existing Explanations Have Fallen Short

Previous attempts to explain hallucinations have tended to focus on surface-level phenomena: insufficient training data, exposure to misinformation, or simple statistical associations between tokens. While these factors matter, they do not offer a structural account of when and why hallucinations emerge at specific points in the training process.

The graph perspective allows the authors to make more precise claims. Path Reuse and Path Compression are not random errors — they are predictable consequences of how gradient-based learning optimises a model over time. This framing suggests that hallucinations are not bugs introduced by bad data alone, but emergent properties of the training dynamic itself.

It is worth noting that the findings presented are theoretical and mechanistic in nature. The paper does not report empirical benchmark results, and no external validation from independent research groups is available at this stage.

Connections to Known Downstream Behaviours

The authors claim their framework connects to several well-documented behaviours in deployed language models. Path Reuse maps neatly onto cases where a model contradicts an explicitly provided document — the model simply overrides the context with what it "knows." Path Compression helps explain why models sometimes skip logical steps in multi-hop reasoning tasks, producing answers that feel right but rest on incomplete chains of inference.

Both failure types have been catalogued extensively in evaluation literature. What has been missing is a unified theoretical account tying them to specific phases of the training process. If the framework holds up under empirical scrutiny, it could have practical implications for how models are trained, fine-tuned, and evaluated.

What This Means

If validated, this framework gives researchers and engineers a more precise target for reducing hallucinations — not just cleaner data or more parameters, but interventions designed to counteract Path Reuse during early training and prevent destructive Path Compression from forming later.