Large language models spontaneously develop internal representations of emotion that align with well-established psychological models of how humans organise feelings, according to new research posted to ArXiv.

The study, titled Latent Structure of Affective Representations in Large Language Models, uses geometric data analysis to examine how emotion-related information is structured inside LLM latent spaces — the high-dimensional internal layers where models encode meaning. The researchers chose emotion as their test case because human psychology already provides a validated framework for comparison, offering something rare in this field: a ground truth to check against.

Why Emotions Make a Useful Test Case

Most research into LLM internal representations struggles with a fundamental problem: there is no agreed-upon correct answer for what the geometry of a model's latent space should look like. Emotion processing sidesteps this by borrowing from decades of psychological research. The valence-arousal model — which maps emotional states along two axes, one running from negative to positive feeling and another from low to high activation — is one of the most widely replicated frameworks in affective science.

The researchers found that LLMs learn latent representations that organise emotions in a way that directly parallels this two-dimensional structure. In other words, the models appear to cluster and relate emotions in a manner consistent with how psychologists have described human emotional organisation, without being explicitly trained to do so.

The models appear to cluster and relate emotions in a manner consistent with how psychologists have described human emotional organisation, without being explicitly trained to do so.

Three Key Findings

The paper presents three principal findings. First, LLMs develop coherent internal emotion representations that align with the valence-arousal framework. Second, while these representations have nonlinear geometric structure — meaning the relationships between emotional concepts curve through high-dimensional space — they can be well-approximated using linear methods. Third, the structure of these representations can be used to quantify uncertainty in emotion-related tasks, offering a practical tool for understanding when a model is less confident in its emotional judgements.

The second finding has particular weight for the AI transparency community. A widely held assumption in interpretability research — sometimes called the linear representation hypothesis — holds that meaningful concepts inside LLMs can be read off as directions in linear space. This assumption underlies many popular techniques for understanding model behaviour, but it has been more assumed than empirically validated. The new study provides direct empirical support for it, at least in the domain of emotion.

Implications for Interpretability and Safety

The practical consequences of this research extend into two connected areas: model interpretability and AI safety.

On interpretability, the findings suggest that probing techniques which look for linear directions in latent space — already used to identify concepts like sentiment, factual knowledge, and reasoning steps — are likely valid for emotional content. This opens the door to more reliable tools for auditing what emotional states or biases a model may be encoding.

On safety, the relevance is direct. If models encode structured representations of emotional states, those representations could influence model behaviour in ways that are not immediately visible from outputs alone. Understanding the internal geometry of affective representations is therefore a step toward being able to detect and, if necessary, intervene on emotionally charged or manipulative reasoning patterns inside LLMs.

The researchers also highlight that the representational structure can support uncertainty quantification — meaning it may be possible to identify when a model is operating in parts of its emotional latent space where it has less reliable grounding. This is practically useful for any application where LLMs are used in emotionally sensitive contexts, such as mental health tools, customer service, or content moderation.

What the Research Does Not Claim

The paper is careful in its scope. It does not argue that LLMs experience emotions, nor that the geometric alignment with human psychological models implies any form of subjective feeling. The claim is structural: the organisation of emotional concepts inside these models resembles the organisation described in human emotion research. Whether that structural similarity is meaningful beyond the computational level is a philosophical question the paper does not attempt to answer.

All findings are based on analysis of the models' internal representations, and the research was posted as a preprint to ArXiv's CS.LG section, meaning it has not yet undergone formal peer review.

What This Means

For researchers building interpretability tools and safety evaluations, this study offers both empirical validation for commonly used assumptions and a concrete new method — leveraging emotion's psychological grounding — for probing what structure actually exists inside LLM latent spaces.