Vidoc Security Reproduces Anthropic Mythos Results With

Researchers at Vidoc Security say they replicated safety-relevant findings from Anthropic's Mythos research using publicly available AI models rather than the proprietary systems Anthropic originally tested, according to a report published by Decrypt on April 17, 2026.

The Decrypt article frames the replication as relevant to a longstanding question in AI safety: whether behaviors documented in closed frontier models appear in open or widely accessible systems that any developer can download and run. DeepBrief has not independently verified the Vidoc Security paper, its author list, institutional affiliations, or peer-review status, and the Decrypt report is the sole source for the claims described below.

What Decrypt reports about the replication

According to Decrypt, the Vidoc Security researchers reproduced the core behavioral patterns that Anthropic had earlier described under the Mythos label using models available outside Anthropic's ecosystem. Decrypt characterizes Anthropic's original Mythos findings as "alarming" in its headline, a framing that is the outlet's own; the underlying Vidoc Security paper's own characterization of severity is not quoted directly in the portions of the Decrypt piece reviewed for this article.

Decrypt reports that the replication relied on "off-the-shelf" models, which the outlet uses to mean systems a third-party researcher can obtain without special access agreements with a frontier lab. The specific models tested, parameter counts, evaluation harness, and prompt sets are not reproduced in the Decrypt summary available to DeepBrief.

The core story is whether Anthropic's Mythos safety findings hold when reproduced using publicly available models rather than proprietary ones.

DeepBrief notes that the Decrypt article does not specify whether the Vidoc Security work has been submitted to a peer-reviewed venue, posted as a preprint on arXiv or a comparable server, or released only as a blog post or technical report. Readers should treat the replication claim as unverified pending publication of the underlying methodology.

Context on Anthropic's Mythos work

Anthhropic has published a series of safety and interpretability studies examining how large language models behave under adversarial prompting, alignment stress tests, and scenarios designed to elicit deceptive or unsafe outputs. The "Mythos" label referenced by Decrypt refers to one such line of Anthropic research; DeepBrief was not able to locate, within the Decrypt article, a direct link to the original Anthropic paper or blog post describing Mythos by that name.

DeepBrief reached out to Anthropic's communications team for comment on the Vidoc Security replication and has not received a response as of publication. Vidoc Security had not responded to a request for on-record comment about the methodology, model selection, or peer-review plans by the time this article was filed.

Why replication matters in AI safety research

Decrypt frames the story around reproducibility, a recurring concern raised by academic researchers who argue that AI safety findings reported by frontier labs are difficult to audit when the underlying models are not publicly available. When a safety-relevant behavior is observed only inside a proprietary system, external researchers cannot independently test the claim, vary the experimental conditions, or check whether the behavior depends on specific training data or post-training interventions that the lab has not disclosed.

A replication using publicly available models, if methodologically sound, would address part of that concern by demonstrating that the behavior is not unique to one lab's training pipeline. It would not, on its own, establish that every detail of Anthropic's original findings generalizes, nor would it speak to the severity or real-world exploitability of the behaviors involved. Decrypt's report does not claim otherwise.

Outstanding questions

Several elements of the story remain unresolved on the basis of the Decrypt report alone. The paper's authors and institutional affiliations beyond the Vidoc Security byline are not enumerated in the outlet's summary. The specific public models used — whether open-weight releases from Meta, Mistral, Alibaba, or other providers, or API-accessible models from additional frontier labs — are not named in the excerpt available. The evaluation protocol, sample sizes, and statistical treatment are likewise not described in detail.

Decrypt is the only outlet DeepBrief has identified reporting on the Vidoc Security replication at the time of publication. No independent corroborating coverage from other journalism outlets, academic commentary, or statements from Anthropic or Vidoc Security has surfaced in DeepBrief's monitoring. DeepBrief will update this article if the underlying paper is published, if Anthropic or Vidoc Security issue on-record statements, or if additional outlets report on the work.

Sources:

Decrypt, "Anthropic's Alarming Mythos Findings Replicated With Off-the-Shelf AI, Researchers Say," April 17, 2026: https://decrypt.co/364744/anthropic-mythos-replicated-public-models-vidoc-security

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

What Decrypt reports about the replication

Context on Anthropic's Mythos work

Why replication matters in AI safety research

Outstanding questions

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

What Decrypt reports about the replication

Context on Anthropic's Mythos work

Why replication matters in AI safety research

Outstanding questions

Related

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans