Researchers have published PAMELA, a framework that personalises text-to-image generation to individual aesthetic preferences, moving beyond models that optimise for what the average user finds appealing.
Most text-to-image systems today are guided by reward models trained to maximise broad human approval — a design choice that smooths over the fundamental subjectivity of taste. If one person prefers minimalist design and another gravitates toward cinematic drama, a single averaged preference signal cannot serve both well. The PAMELA research, posted to ArXiv in April 2025, directly addresses this gap.
70,000 Ratings Built Around Subjective Disagreement
The team assembled a dataset of 5,000 images generated by two state-of-the-art models — Flux 2 and Nano Banana — spanning domains including art, design, fashion, and cinematic photography. Each image was rated by 15 unique users, producing 70,000 total ratings. The deliberate use of multiple raters per image is key: it captures the spread of opinion rather than collapsing responses into a single consensus score.
This approach acknowledges something most AI image research sidesteps — that disagreement between raters is not noise to be eliminated, but signal to be modelled.
The model predicts individual liking with higher accuracy than the majority of current state-of-the-art methods predict population-level preferences.
The personalized reward model was trained jointly on this new dataset and existing aesthetic assessment benchmarks, allowing it to leverage prior work while incorporating the richer individual-level signal the new data provides. According to the authors, the model's per-user accuracy exceeds what leading methods achieve even when those methods are only asked to predict average, population-level preferences — a notable result given that predicting individual taste is a significantly harder problem.
From Prediction to Generation: Steering Images Toward Personal Preference
The researchers did not stop at building a better evaluator. They used the personalized predictor to drive prompt optimisation — automatically refining the text prompts fed into image generation models so that the outputs align more closely with a specific user's demonstrated preferences.
This closes a practical loop: a user's prior ratings inform a preference model, which then shapes the prompts used to generate new images. The result is a system that gets more personally relevant over time, without requiring the user to manually describe what they want in aesthetic terms.
The method relies on relatively simple prompt optimisation techniques rather than fine-tuning the underlying generative model itself, which has meaningful practical implications. Retraining large diffusion models for each user would be computationally prohibitive; adjusting prompts is far more tractable and deployable.
Why Existing Aesthetic Models Fall Short
Current reward models used to guide image generation — such as those used in reinforcement learning from human feedback pipelines — typically aggregate ratings into a single preference score. This works reasonably well for filtering out low-quality outputs but performs poorly when the goal is to satisfy a particular individual.
The PAMELA team argues that data quality and personalisation are two underappreciated levers in this space. Many existing aesthetic datasets were collected at scale but with limited raters per image, making it impossible to characterise the distribution of taste for any given visual. By ensuring 15 raters per image, the new dataset makes individual preference modelling statistically feasible.
The research also highlights domain specificity as a factor. Aesthetic preferences are not uniform across categories — someone who appreciates a particular photographic style may have entirely different standards for illustrated or typographic work. The dataset's coverage across multiple visual domains allows the model to account for this variation.
Public Release and Standardisation Goals
The authors have released both the dataset and the trained model, with an explicit goal of enabling standardised research in personalised text-to-image alignment and subjective visual quality assessment. This matters because the field currently lacks shared benchmarks for evaluating how well a system serves individual users rather than aggregate ones.
By providing a common evaluation foundation, the release could accelerate progress from multiple research groups working on similar problems — a pattern that has historically proven effective in computer vision and natural language processing.
It is worth noting that all benchmark comparisons cited in the paper are self-reported by the authors; independent replication has not yet occurred at the time of publication.
What This Means
For anyone building or using AI image generation tools, PAMELA signals a concrete path toward systems that adapt to individual taste rather than generic approval — making personalised creative assistance an engineering problem that can be addressed in the near term.