AI Persuasion Resistance: Role-Playing Study

A team of researchers has found that teaching people how AI systems are trained — by having them act as the AI itself — measurably reduces their susceptibility to AI-driven persuasion, offering a potential alternative to passive defences like content labels and detection tools.

Editor's Note: This article is based on a preprint research paper that has not yet undergone peer review. DeepBrief is actively monitoring for peer-reviewed publication and additional independent analysis.

The study, posted to ArXiv in April 2025, introduces LLMimic, a gamified, role-play-based tutorial in which users step into the role of a large language model and experience a simplified version of the three-stage training pipeline used to build modern AI systems: pretraining, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF). The researchers designed it as a proactive literacy intervention — one that builds understanding from the inside out rather than warning users after the fact.

Why Passive Defences May Not Be Enough

Most current approaches to protecting people from AI-generated influence rely on what the researchers call passive mechanisms — AI-content disclaimers, detector tools, and platform labels. These treat users as recipients of information rather than active participants in their own defence. The concern is practical: as LLMs become more capable and widely deployed, the volume and sophistication of AI-generated persuasion grows faster than labelling systems can keep pace with.

The LLMimic team argues that understanding how AI persuasion works — at the level of training incentives and data — gives people a more durable form of scepticism.

Teaching people to think like an AI, rather than just warning them about AI, appears to change how they respond to it.

How the Study Was Designed

The researchers ran a 2×3 between-subjects experiment with 274 participants. Half watched a conventional video about AI history (the control condition); the other half completed the LLMimic tutorial. Both groups then encountered one of three realistic AI persuasion scenarios: a charity donation request, a malicious money solicitation, and a hotel recommendation — scenarios chosen to represent a range of real-world persuasive contexts, from benign to potentially harmful.

This design allowed the team to test whether LLMimic's effects generalised across different types of persuasion, not just a single contrived situation.

What the Results Showed

According to the researchers, LLMimic produced statistically significant improvements across several measures. Participants who completed the tutorial scored meaningfully higher on AI literacy assessments (p < .001). Across all three persuasion scenarios, those participants were less likely to be successfully persuaded (p < .05). In the hotel recommendation scenario specifically, LLMimic also improved participants' truthfulness and social responsibility ratings — a measure of how critically they evaluated the AI-generated content — at p < 0.01.

These are self-reported results from a single pre-registered study and have not yet undergone peer review, though the preprint is publicly available. The sample size of 274 is modest for claims about generalised human behaviour, and replication across different populations and cultural contexts would strengthen the findings considerably.

What LLMimic Actually Involves

The tutorial is designed to be accessible to non-technical users. Participants do not need to write code or understand machine learning mathematics. Instead, they engage with interactive, gamified tasks that mirror the logic of each training stage. During the pretraining phase, for example, participants process large amounts of text to build a sense of pattern recognition. In the RLHF stage, they receive simulated human feedback on outputs and adjust accordingly.

The goal is conceptual transfer: once a person understands that an AI's persuasive outputs are the product of optimisation pressure — shaped by what humans rewarded during training — they are better positioned to ask why a model might be saying what it says.

Implications for AI Literacy Education

The broader field of AI literacy has grown significantly as a research concern, but most interventions focus on factual knowledge: what AI is, what it can and cannot do. LLMimic represents a different approach — experiential understanding over declarative knowledge. The researchers argue this makes the intervention more scalable and more human-centred, because it does not require users to stay continuously updated as AI capabilities change.

If the results hold up under further scrutiny, the model could inform how schools, platforms, and public institutions approach AI education — particularly for populations most vulnerable to online persuasion. The gamified format also suggests it could be deployed at scale without requiring expert facilitators.

There are open questions. The study does not measure how long the protective effect lasts — whether resistance to AI persuasion fades after days or weeks is unknown. It also does not address whether LLMimic remains effective against more sophisticated persuasion than participants encountered in the experiment.

What This Means

If role-playing as an AI builds durable resistance to its influence, LLMimic points toward a shift in how we think about AI safety at the human level — from warning labels applied after deployment to literacy tools built before exposure.

Role-Playing as an AI Makes People More Resistant to AI Persuasion, Study Finds

Why Passive Defences May Not Be Enough

How the Study Was Designed

What the Results Showed

What LLMimic Actually Involves

Implications for AI Literacy Education

What This Means

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

Role-Playing as an AI Makes People More Resistant to AI Persuasion, Study Finds

Why Passive Defences May Not Be Enough

How the Study Was Designed

What the Results Showed

What LLMimic Actually Involves

Implications for AI Literacy Education

What This Means

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models