Claude Fable 5 vs Opus 4.8 vs GPT-5.5: First Public Mythos Model (June 2026)

Twelve days ago we wrote that Mythos was coming in mid-July at the earliest, somewhere between four and eight weeks out. Anthropic shipped it this morning.

It is called Claude Fable 5, the public face of the Mythos family, and the answer to whether it changes anything for trades businesses is yes — but probably not in the way most of the AI commentary online is going to tell you.

Here is what actually shipped, what the numbers say, and what to do about it on Monday morning.

The release

Anthropic dropped Claude Fable 5 and Claude Mythos 5 together. Fable 5 is the version you can buy. Mythos 5 is the unrestricted version, available only to the Project Glasswing consortium — the same forty-plus hyperscalers (Microsoft, Apple, Google, AWS, Cisco, Nvidia, Broadcom, the Linux Foundation, et al) that we covered in the Opus 4.8 piece. Fable 5 is Mythos with a safety harness clipped on — same brain, classifier layer on top that redirects sensitive queries (cybersecurity, bio, chemistry, model distillation) over to Claude Opus 4.8.

The model ID is claude-fable-5. Available today on the Claude API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. Through June 22, it is included at no extra cost in Pro, Max, Team, and seat-based Enterprise plans.

For frame of reference: Anthropic told the world Mythos was coming in "the coming weeks." Twelve days. Whatever timeline you thought the AI race was running on, divide by three.

The benchmarks

This is not a 5-to-10% iteration. This is the category jump we said was coming.

SWE-bench Verified. The headline coding benchmark — can the AI fix real bugs in real codebases. Opus 4.8 hit 88.6% twelve days ago. Fable 5 just scored 95.0%. That is a 6.4 point jump in twelve days, which is roughly the size of the jump from GPT-4 to GPT-5 from a year ago. Anthropic is now compressing what used to be year-over-year progress into single-digit weeks.

SWE-bench Pro. The harder version, designed specifically to break models. Opus 4.8 was at 69.2%. Fable 5 hit 80.3%. GPT-5.5 is still at 58.6%. Gemini 3.1 Pro at 54.2%. That is a 22-point gap to OpenAI on the test that matters most for autonomous agent work, and a 26-point gap to Google.

GDPval-AA, Anthropic's professional knowledge-work test, climbed from 1890 Elo to 1932. The model also led both subsets of FrontierCode (Diamond and Main) and hit 72.9% on CursorBench at max effort.

The cleanest translation we can give you: Fable 5 is now solving roughly 38% more real-world coding tasks than the best OpenAI model in production. The race we said was spreading? It just sprinted.

The cost story

Standard API pricing is $10 per million input tokens, $50 per million output. That is exactly double what Opus 4.8 runs at the standard tier.

Batch pricing — for bulk async work, like reprocessing a year of historical jobs overnight — is $5 input, $25 output, which is the same as Opus 4.8's standard rate. So if your vendor is wiring up nightly bulk operations against your data, the new tier is functionally free relative to last week.

Run the math on a real trades-business use case. Say you want to use Fable 5 to read every email that hit your inbox today, classify it, match it against your client roster, draft the right reply, and update pipeline. Call it 8,000 input tokens and 3,000 output per email. On Fable 5 standard pricing that is 23 cents per email. On Opus 4.8 it was 11. A shop seeing 50 inbound emails a day went from $5.50 in compute to $11.50.

That sounds like a price hike. It's not. It is a different product. The right question is not "Fable or Opus" — it is "which job am I asking the model to do?"

High-volume, routine work (classify the inbox, advance a stage, route a lead): Opus 4.8 still wins on cost per call. Use that.
Long, complex, high-stakes work (write a fifty-line estimate against historical job data, run a multi-step proposal flow, pull together the full client history before drafting a difficult reply): Fable 5. The Opus tier will get to the wrong answer faster. The Fable tier will get to the right answer.

Your software vendor's job is to pick the right tier for the right job. If they are using one model for everything, they're either overpaying for routine work or underpowering for complex work. There is no third option.

The honesty mechanism (this is the new thing)

The most underrated part of today's release is not the benchmark numbers — it is the architecture of the safety classifier.

Fable 5 ships with what Anthropic calls a "trusted controls" layer. When you ask Fable 5 a question that lands in a high-risk category (cyber, bio, chemistry, model distillation), the system does not refuse. It does not give you a generic "I can't help with that." It silently falls back to Opus 4.8 and routes the answer through that model instead, while staying inside the same conversation thread.

Anthropic's own number is that at least 95% of Fable sessions run entirely on Fable — meaning the routing layer is invisible most of the time. The 5% it does fire on is the difference between an AI that says "no" to legitimate trades questions about, say, gas-line work or chemical storage permits, and an AI that quietly handles it with the slightly more conservative model.

For trades businesses this matters more than it should. Service work runs into edge cases — hazmat handling, regulated installations, chemical disposal — that get false-flagged by clumsy safety classifiers all the time. The "graceful degradation" pattern is the first time we have seen an AI vendor solve the over-refusal problem without dumbing down the base model.

If your software's AI used to refuse to help with the actual hard part of your job, that bug just got fixed.

Long-horizon agents

This is the part that is going to quietly rewrite what software does for you.

Fable 5 was built specifically for what Anthropic calls "long-horizon agentic work" — runs that last hours or days, with the model planning, delegating to sub-agents, executing tool calls, checking its own work, and self-correcting along the way. The launch materials specifically use the phrase "multi-day autonomous sessions."

Translate that to your shop:

The estimate that builds itself. Customer submits a quote request Friday afternoon. The agent pulls fifteen similar jobs from your history, prices the materials against three live supplier feeds, checks your crew availability for the proposed timeline, drafts the proposal, queues the followup, and has the whole thing waiting in your draft folder Monday morning. It ran for sixty hours, but you didn't.
The proposal that re-prices itself. A material cost spikes Sunday night. The agent re-runs every open quote against the new pricing, flags the ones where margin slipped under threshold, drafts the customer notification for your review, and updates the pipeline — all before you finish your coffee.
The followup that never forgets. Three months after a job closes, the agent reads the file, sees you mentioned a future maintenance need, and queues the outreach automatically — with context-specific copy, not a template.

None of this is sci-fi. Cognition (the Devin team) shipped multi-day Claude-driven coding runs last quarter. The capability exists. The only question is whether your software vendor wires it up.

The OpenAI response window

Twelve days ago we said GPT-5.6 was probably landing the week of June 9. That week starts today. The polymarket odds were 80-89% before today's release. They will be higher tonight.

Three things to watch for in the OpenAI counter:

A bigger context window than 1M. They have to win on something. The leaks all pointed at 1.5M, and dropping that under 1M now would be admitting they got out-engineered. Watch for a number with a "5" in it.
Aggressive pricing cuts on GPT-5.5. They cannot match Fable 5 on benchmarks for at least another release cycle, so they cut the floor out of the existing tier and try to win on cost-per-call.
A big agentic announcement that is suspiciously vague on benchmarks. If OpenAI ships a "long-horizon" feature without head-to-head Fable 5 numbers, that's the tell — they shipped to be in the news cycle, not to win on the technical scoreboard.

If GPT-5.6 ships inside seven days, OpenAI is keeping pace. If it slips past June 16, that is the first time in the 2026 release cycle that OpenAI has been more than two weeks behind Anthropic. Worth watching.

What you actually do about this

There are two answers, and they're both worth your time this week.

Answer one — use Fable 5 yourself, today. Through June 22, Fable 5 is included at no extra cost in every paid Claude plan. Open the Claude app, switch the model to Claude Fable 5, and put it on the kind of work that other AI models break on.

The important thing to understand is where Fable 5 actually wins. For "write a Yelp response" or "summarize this email," every model from the last two years is fine. The Fable tier is built for the hard work — the long pastes, the multi-step problems, the questions where the right answer is buried under three other plausible-sounding wrong answers. Five things you will feel inside an afternoon:

Bigger pastes that hold together. Other models start losing the thread after the first 20,000 tokens. Fable 5 holds depth across the whole customer file. Paste every email, every change order, every photo caption from your one difficult relationship — three years of it — and ask "what does this customer actually want from us that we keep missing." Other models pattern-match the first few messages. Fable 5 finds the thing buried on page 40.
Multi-step tasks where each step has to be right. "Read these 15 customer files and draft a personalized followup for each." Most models give you three real ones and twelve copies of the same template. Fable 5 actually does the work for all 15 — different references, different tones, different specifics — because the SWE-bench jump translates directly into "follows multi-step instructions without dropping any." This is the under-reported difference.
Analysis with logic in it. Pricing is logic. Job costing is logic. "Given these material costs, this labor rate, this scope, and these three contingencies, what is the right number, and what assumptions am I making that I would not make if I had seen ten of these?" That is what 95% on SWE-bench Verified actually means in plain English. Use Fable 5 where the math has to be right, not just close.
Hard ambiguity. Most models, when a request is unclear, pick one interpretation and confidently barrel through. Fable 5 asks you the right one or two clarifying questions before writing, then writes. That sounds small. It is not. It is the difference between three paragraphs you have to throw away and one paragraph you can send.
The P&L gut-check that catches the weird line item. Upload a P&L or a job costing report. Ask "where is margin actually leaking, broken down by trade, season, and client size, and what is the biggest leak I would not have spotted without seeing the data laid out." Every model can read a PDF. Fable 5 is the one that looks at the whole picture before answering.

For the technically adventurous: Fable 5 is the model behind Claude Code, Anthropic's command-line agent. If you have any kind of spreadsheet, scripting, or scheduling workflow you have been wanting to automate, it can sit on your desktop and do it in multi-day autonomous mode. Not a path for most owners, but it exists.

Answer two — pressure the software you already pay for. The shops that come out of summer 2026 ahead are the ones whose vendors moved fastest to wire Fable 5 in. The question worth asking your software providers this week fits in one sentence:

"When are you on Fable 5 for the heavy work, and Opus 4.8 for the rest?"

If the answer is "we shipped both this morning, here is how we route" — keep paying them. If the answer is "we will look at it next quarter" — start looking around. The gap between vendors who treat new model releases as urgent and vendors who treat them as research projects is now the dominant factor in software quality. It is bigger than feature sets, bigger than UX, bigger than pricing.

Where OPS is

We just told you to pressure your software vendors to be on Fable 5 for the heavy work. Fair to ask the same of us. Here is the honest read.

Heavy work — the kind Fable 5 is built for — does not actually happen inside the base OPS app. The base app runs inbox classification, lead matching, pipeline auto-advance, project comments, photo galleries, calendar. High-volume routine work, hundreds or thousands of calls per company per day. Opus 4.8 is the right model for that, and the cost-per-call math says it should stay there. Putting a Fable-tier model behind a button that fires four hundred times a day per company would be lighting money on fire to add capability the work does not need. We are not doing that, and we will say so even when it is inconvenient.

The heavy work for a trades-business owner shows up in two places, and Fable 5 is the right answer in both.

You, in the Claude app. That is Answer One above. For the hard one-off questions — the P&L gut-check, the customer-history paste, the multi-step pricing problem, the scope-of-work breakdown — open Claude on your phone. That is where the depth lives, that is where the bigger pastes work, that is where the model you would pay for in the API is already included in your subscription through June 22.

Us, in OPS Spec. Spec is our done-for-you build offering — discovery call, scope document, multi-week custom build, walkthrough, support window, retainer. The engagement where a shop walks in with an existing operations problem (a custom estimating workflow, a regulated-trade intake process, a multi-stage approval flow that doesn't fit anyone else's software) and walks out with a tailored OPS configuration plus the bespoke code to make it work. That work is long. The context is huge. The cost per engagement is high enough that the model spend is a rounding error against the outcome.

That is exactly the kind of work Fable 5 is built to supercharge. It gives our team meaningfully more leverage at every stage — from how quickly we can turn a discovery conversation into a tight scope, to how much depth we can hold while we build, to how thoroughly the work gets reviewed before it ships to your shop. Faster turnarounds, sharper scope docs, a higher quality bar on what lands in production.

If you have a Spec engagement on deck, or one you have been thinking about, this is the week the conversation got more interesting. opsapp.co/spec.

Whether you run an electrical shop, an HVAC outfit, a plumbing crew, a general contracting operation, or a painting, landscaping, concrete, or roofing business — the pace of the race is the only thing that matters now. For the routine work, pick vendors that ship new releases the day they drop. For the hard work, use Fable 5 directly in the Claude app, or talk to us about Spec.

The free window closes June 22.

Built by trades, for trades.

FABLE 5: MYTHOS IS NOW PUBLIC, AND IT LANDED EIGHT WEEKS EARLY

The release

The benchmarks

The cost story

The honesty mechanism (this is the new thing)

Long-horizon agents

The OpenAI response window

What you actually do about this

Where OPS is

More from the journal

THE AI TOOL ACCESS SHOCK JUST HIT

OPUS 4.8: DOES THIS PUT CLAUDE BACK IN THE LEAD?

AI ISN'T COMING. IT'S HERE.