-
Claude Mythos Preview
Today we announced Claude Mythos Preview as part of Project Glasswing. It scores 77.8% on SWE-bench Pro, up from 53.4% for Opus 4.6.
My reliability team was asked for feedback on it, so from page 204 of the model card:
From a reliability engineering perspective, the model still cannot be left alone in a production environment to use generic mitigations. It frequently mistakes correlation with causation and it is not able to course-correct for different hypotheses. When asked to write incident retrospectives, more often than not it focuses on a single root cause and does not consider multiple contributing factors. However, we’ve found this model to be a step change in two areas. The first is signal gathering and initial analysis, where, by the time an engineer has opened two dashboards, the model has already found the outliers and what’s breaking. The second case is navigating ambiguity when there is a clearly defined outcome. For example, due to time zone differences, the reliability team in London was asked to stand up a model in a production environment with different constraints, and the engineers were unfamiliar with both the task and the constraints. Claude Mythos Preview was able to work step-by-step, fixing each error by observing other environments, checking any breadcrumbs that were left in previous commits, and reading documentation.
The London team in question was us.
-
One year at Anthropic: $2B to $30B run-rate
From the Google and Broadcom partnership announcement:
Our run-rate revenue has now surpassed $30 billion—up from approximately $9 billion at the end of 2025.
I joined Anthropic on April 1, 2025. Around that time, CNBC reported:
Annualized revenue reached $2 billion in the first quarter, the company confirmed, more than doubling from a $1 billion rate in the prior period.
$2 billion to $30 billion. 15x in the year I have been here.
-
Claude Code source leak
VentureBeat has the story. The Anthropic statement:
Earlier today, a Claude Code release included some internal source code. No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach. We’re rolling out measures to prevent this from happening again.
I tweeted:
I repeat this to every new joiner at Anthropic but it’s worth repeating in public – we have a blameless culture and no single individual is at fault when bespoke complex systems break at scale
My colleague Jake Eaton sent me the long version of that argument. The NTSB asks why, not who, and that’s why you’ve never been in a plane crash.
On a smaller note, the April 1st surprise got spoiled too.