Should I cancel ChatGPT and switch to Claude after the Opus 4.8 launch?

Honestly, probably not. The Reddit power-user consensus right now is to pay for both at $40/month combined and route work to whichever model handles it best. Claude is meaningfully ahead on coding (a 10-point gap on the hardest test) and on holding long, complex instructions without drifting. ChatGPT is still ahead on image generation, voice, and ecosystem breadth (plugins, agents, search). For a trades-business owner, the practical answer is simpler: stay with whatever your software vendor is using. They're the ones calling the model under the hood. Your job is to make sure they're calling the best one — which today is Opus 4.8.

Is Opus 4.8 actually better than GPT-5.5 for real-world work, or just on benchmarks?

It's better on real work, not just benchmarks — but the gap depends on what you're doing. On real coding tasks (SWE-bench Pro, which uses actual GitHub bugs from production codebases), Opus 4.8 solves 69.2% versus GPT-5.5's 58.6%. That's an 11-point gap and it shows up in practice. On structured analysis, long-document reasoning, and following multi-step instructions without losing the plot, Claude is clearly ahead. On image generation, voice features, web search, and broad multimodal stuff, GPT-5.5 still wins. So for the kind of work that powers trades software — drafting emails, parsing customer requests, generating estimates, analyzing photos for issues — Opus 4.8 is the better engine. For consumer chatbot use, it's closer to a tie.

Anthropic keeps shipping a new Claude model every six weeks. Should I even bother keeping up?

You shouldn't. The whole point of using AI through a software vendor (instead of paying for Claude or ChatGPT directly) is that the vendor handles model upgrades for you. When a new model drops, good vendors swap it in within days. You don't see anything except features getting slightly sharper and slightly cheaper. The only thing you need to keep track of is whether the software vendors you pay are actually doing the work. If a feature you depend on hasn't improved in 90 days, your vendor is either ignoring the model releases or sitting on an old contract. That's the only signal that matters.

What is actually new in Opus 4.8 vs Opus 4.7?

Three big things. First, benchmark gains across coding (SWE-bench Pro up 4.9 points to 69.2%), reasoning (GPQA Diamond at 93.6%), and browser-agent work (84% on Online-Mind2Web). Second, fast mode pricing dropped 3x — from $30/$150 to $10/$50 per million tokens — at the same 2.5x speed multiplier. Third, the model is roughly 4x less likely to let code flaws slip through without flagging them, plus two specific tool-calling regressions from 4.7 are fixed. The 1M context window, 128K max output, and $5/$25 standard pricing all carry over unchanged.

Is fast mode actually worth the higher cost?

For most workflows, yes. The math: fast mode runs the same model 2.5x faster at 2x the price ($10/$50 vs $5/$25 per million tokens). So you pay roughly 80% more for tokens but each query returns in less than half the time. For anything customer-facing — instant lead replies, a real-time response to a quote request, an estimate confirmation that needs to land before the customer closes the tab — speed wins and fast mode pays for itself in close rates. For batch overnight work (sorting yesterday's inbox into pipeline, scoring last week's leads, generating tomorrow's daily summary), standard mode is fine and saves money. Good software vendors should be routing automatically based on what the task is.

When will GPT-5.6 be released?

OpenAI hasn't announced it. The Codex logs briefly surfaced a gpt-5.6 identifier earlier this month before being scrubbed, and canary tags called iris-alpha, ember-alpha, and beacon-alpha appeared in build artifacts. Polymarket traders are pricing 80-89% odds of release before June 30. Our read: 14 days from today (week of June 9), based on OpenAI's 7-day response time after the Opus 4.7 launch in April. The leaks point to a 1.5 million token context window (50% bigger than Opus 4.8) and improved frontend code generation from sparse prompts.

What is Project Glasswing and Claude Mythos?

Mythos is Anthropic's next model tier above Opus — internal codename Capybara, reportedly a 10 trillion parameter model trained on Nvidia Blackwell hardware. It leaked accidentally on March 27, 2026 through a CMS misconfiguration that exposed about 3,000 unpublished assets. Anthropic responded by quietly releasing Mythos Preview on April 8 to a closed consortium called Project Glasswing — 40+ companies including Microsoft, Apple, Google, Amazon Web Services, Cisco, Nvidia, Broadcom, and the Linux Foundation — running Mythos against their own codebases to find and patch security vulnerabilities before general release. Anthropic said in the Opus 4.8 launch that Mythos comes to all customers in the coming weeks.

Will Opus 4.8 replace my office manager or my estimator?

No, and not in the way the headlines suggest. What it does replace is the parts of those jobs nobody actually wants to do — manually copying job details into the CRM, retyping the same customer info three times, building a quote from scratch when you've already done 200 similar ones, drafting the same followup email for the eighteenth time this week. Your office manager and your estimator still make the judgment calls, still talk to the customers, still catch the things software can't see. They just stop spending half their day on admin work that should have been automated five years ago. Shops that get this right end up needing the same headcount but doing 2-3x the volume.

Claude Opus 4.8 vs GPT-5.5 vs Mythos: Full Technical Breakdown (May 2026)

Short answer: yes. The longer answer involves a $65 billion fundraise, four major model releases in 90 days, and an OpenAI engineering team that has roughly seven days to figure out what to ship next.

Anthropic released Claude Opus 4.8 this morning, which is the fourth Opus model they've shipped in 90 days. For context, that's faster than Apple ships iOS point releases, and Apple's stuff doesn't have to learn anything along the way.

The pattern, if you actually look at it, is a little nuts. 4.5 hit in February, 4.6 in March, 4.7 on April 16, 4.8 today on May 28. Six weeks between releases, every time, and the gaps aren't getting longer — they're getting tighter. Anthropic raised $65 billion last week. They shipped this six days later. Whatever brakes existed on the AI release cycle no longer apply.

And this release matters, because the thing they shipped is genuinely the best AI model on earth right now, costs the same as the version before it, runs faster, lies less, and is about to quietly upgrade every piece of software your business uses without you lifting a finger. Whether you run an electrical shop, an HVAC outfit, a plumbing crew, or a general contracting operation, this is what you need to know.

The benchmarks

The headline number is SWE-bench Verified, the standard test for whether an AI can fix real bugs in real codebases. Opus 4.8 scored 88.6%, up from 87.6% on Opus 4.7. That's a fine bump on its own — nothing to write home about.

The interesting one is SWE-bench Pro. Same test, harder problems, designed specifically to break AI models. Opus 4.7 scored 64.3% on that one in April. Today's release jumped to 69.2%. For reference, OpenAI's flagship GPT-5.5, which they shipped a week after Opus 4.7 on April 23, scored 58.6%.

Translate that to plain English and Claude is now solving roughly 18% more real-world software bugs than the best OpenAI model. Six weeks ago that gap was 5 points. Today it's 11 points. The race isn't tightening, it's spreading.

The rest of the report card, for completeness:

82.2% on SWE-bench Multilingual (same coding test, ten programming languages, up from 77.3%)
74.6% on Terminal-Bench 2.1 (whether the AI can drive a Linux shell end-to-end)
93.6% on GPQA Diamond (PhD-level science reasoning)
1890 Elo on GDPval-AA (Anthropic's professional knowledge-work benchmark)
84% on Online-Mind2Web (real browser-agent tasks)
First AI model ever to break 10% on the all-pass version of the Legal Agent Benchmark

The one benchmark OpenAI still leads is Terminal-Bench, where GPT-5.5 sits at 82.7% on the older 2.0 version of the test. Worth noting, but it's the only place they're still holding a flag.

The cost story (the part your vendor cares about)

Standard API pricing held flat at $5 per million input tokens, $25 per million output. Nothing changed there.

The story is fast mode — the same model running 2.5x faster at premium pricing. That dropped from $30/$150 per million tokens on Opus 4.7 to $10/$50 today. A 3x cost reduction overnight at the exact same speed.

Run the math on a typical AI-assisted estimate, call it 4,000 input tokens and 2,000 output per job. Yesterday on 4.7 fast mode that was 12 cents per estimate. Today it's 4 cents. A shop running 200 estimates a month went from $24 in compute to $8. By itself, that's not going to fix your gross margin. But multiply it across every AI feature in every piece of software you touch, and the economics of building AI tools just got rewritten between dinner last night and breakfast this morning.

The reason this matters to you specifically: features your software vendor couldn't afford to run yesterday became economical today. Things like reading every email that hits your inbox in real time and routing the real leads into pipeline, matching new inquiries against your existing client roster every time a message lands, or running deeper checks before a stage auto-advance. These were too expensive to leave on full-time last week. They're realistic this week.

The honesty upgrade

This is the part that's going to be hard to overstate. Anthropic's own number is that Opus 4.8 is roughly four times less likely than Opus 4.7 to let code flaws slip through without flagging them. The team at Cognition, who build Devin (the autonomous coding agent that's been all over engineering Twitter for the last year), reported separately that 4.8 fixed two specific regressions that had been driving them crazy. The 4.7 model used to write three paragraphs of explanation when one line would do, and it would occasionally just decide that a tool call wasn't necessary when the task literally required it. Both fixed.

"Less likely to skip a required tool call" is the difference between intelligent classification that actually moves a real quote request into your pipeline versus one that confidently tells you the inbox is sorted and quietly buries three leads in the wrong folder.

Anthropic also let slip that Opus 4.8 scored close to their unreleased frontier model — Mythos — on internal alignment evaluations. That's the first time a generally-available model has hit those marks, and it's a hint about what comes next. We'll get to Mythos in a minute.

The 1M context window (still underrated)

Opus 4.8 ships with the same 1 million token context window that arrived on Opus 4.7 six weeks ago — default on the Claude API, Amazon Bedrock, and Google Vertex AI, capped at 200K on Microsoft Foundry. Max output is 128,000 tokens per response.

A million tokens is roughly 750,000 words, somewhere around eight novels worth of text. In trades-business terms, that means you can hand the model the entire three-year history of a customer account — every job, every invoice, every text message, every photo caption, every change order — and ask it a question, and it holds all of that in its head at once. You can feed it your complete historical bid database alongside a fresh set of plans and ask "what did we miss on similar jobs?" and it can actually answer with specifics from your own data.

128K output means it can write the full proposal back to you in a single shot. No chunking, no "continue," no copy-pasting between three browser tabs.

Dynamic workflows

The other shipping news under the model announcement is a research-preview feature called dynamic workflows, which lets Opus 4.8 spin up hundreds of subagents running in parallel inside a single command. Anthropic's launch demo showed it executing a refactor across "hundreds of thousands of lines" of code in one orchestrated pass.

For trades software, translate that to one trigger and twenty things happening at once. A customer submits a quote request, and the AI parses the request, pulls your historical jobs for similar scopes, prices materials against three supplier feeds in parallel, generates the proposal, drafts the followup email, schedules the next checkpoint, posts the lead to the CRM, and updates dispatch — all fanned out as parallel work instead of executed one step at a time.

This is the foundation for software that does the work end-to-end, instead of software that asks you for the next instruction every fifteen seconds.

How to actually wield it

For a trades-business owner, the honest truth is you don't wield dynamic workflows directly. It's a Claude Code research-preview feature aimed at developers and software vendors building agentic apps. You wield it the same way you wield Opus 4.8 itself — by making sure the software you pay for is actually using it.

Three signals to look for over the next sixty days that tell you your vendor is on top of it:

Features that suddenly do five things in one click. A button that used to send a quote now sends the quote, schedules the followup, logs the activity, and advances the pipeline stage — all at once, in under three seconds. That's dynamic workflows under the hood. If a click still triggers one thing and then waits, your vendor is chaining single-shot calls the old way.
Bulk operations that don't take all afternoon. "Re-score every lead in pipeline" or "regenerate every estimate against current material costs" should run in seconds, not behind a coffee-break-length progress bar. Parallel subagents are the only way that math works.
Status messages that read like a checklist, not a single sentence. "Done: parsed request. Done: matched client. Done: priced materials. Done: drafted summary." The granular play-by-play is the giveaway — that's a parallel workflow reporting in.

The one question worth asking your software vendor over the next thirty days, the one that separates the shops who get ahead from the ones who don't:

"Are you using Opus 4.8's dynamic workflows, or are you still chaining single-shot API calls?"

If they don't know the difference, that's the answer. If they say "we're testing it," check back in three weeks. If they say "yes, here's what we shipped last week" — keep paying them.

The OpenAI response window

Here's the part that's actually fun to think about. Opus 4.7 dropped on April 16. GPT-5.5 dropped on April 23. Seven days. OpenAI watched the launch, scrambled the engineering team, and shipped a counter in a week.

Today is May 28. The clock just started again.

We know a few things. Someone at OpenAI accidentally pushed a gpt-5.6 identifier into the Codex API logs earlier this month before scrubbing it. Internal canary tags called iris-alpha, ember-alpha, and beacon-alpha appeared in build artifacts. Polymarket traders are pricing 80-89% odds of a GPT-5.6 release before June 30. The leaks consistently point to a 1.5 million token context window (50% bigger than Opus 4.8) and noticeably cleaner frontend UI generation from sparse prompts.

Our read is that GPT-5.6 lands inside 14 days, probably the week of June 9. OpenAI has now lost the SWE-bench race two releases in a row, which means they almost certainly lead this launch with the two things they can still win on — context length (1.5M versus 1M) and aggressive price cuts. They have to. The benchmark leaderboard isn't a place they can keep showing up to lose.

Mythos is the real event

Anthropic said in today's announcement that Mythos-class models are coming to all customers "in the coming weeks." That sentence is doing a lot of work, and it's worth slowing down to understand what's actually being teased.

Mythos is Anthropic's next tier above Opus. The internal codename is Capybara, and the reporting suggests it's a 10 trillion parameter model trained on Nvidia's newest Blackwell hardware. The world learned about it on March 27 when Anthropic's CMS misconfigured for a few hours and exposed roughly 3,000 unpublished assets — documentation, marketing pages, internal references to the model. Anthropic responded a couple weeks later by quietly releasing Mythos Preview on April 8 to a closed consortium they're calling Project Glasswing — forty-plus companies including Microsoft, Apple, Google, Amazon Web Services, Cisco, Nvidia, Broadcom, and the Linux Foundation. The whole group has been running Mythos against their own codebases to find and patch security vulnerabilities before broader release.

For context on where the demand for all these models is actually coming from, the same hyperscalers running Project Glasswing are the ones pouring $88 billion into data center construction over the next six months. The AI race and the trades labor demand are two sides of the same story.

When Anthropic says "coming weeks" in a launch post, historically that's meant somewhere between 4 and 8 weeks. The detail that Opus 4.8 has already hit alignment parity with Mythos Preview is the tell — they're effectively signaling that the safety case for general release is mostly built. Our projection is mid-July through early August for first general API availability, with vendor product integrations rolling out into September. Zvi Mowshowitz, who tracks this stuff seriously, has a published forecast of September for GA — the conservative end of the same window.

Opus 4.8 is a meaningful upgrade. Mythos is a category jump. The difference matters because the Opus generations have followed the pattern of "5-10% better at most things." A new tier means something closer to "this can now do work that the previous tier fundamentally could not." We won't know exactly what that work is until people start using it, but the early read from the Project Glasswing companies is that it's a leap, not a step.

What you actually do about any of this

Honestly? Nothing.

You don't install anything, you don't learn anything, you don't pay more. The software you already use is going to get sharper, faster, and meaningfully cheaper to operate over the next 30 days, and exactly the same is true for your competitor.

The shops that come out ahead in the back half of 2026 aren't going to be the ones who picked the best AI tools — they're going to be the ones whose software vendors moved fastest to ship the new models on the day they dropped. That's the one question worth asking your providers this week, and it fits in one sentence:

"When are you on Opus 4.8?"

If the answer is "we shipped this morning," you've got the right vendor. If the answer is "we're evaluating," fine, check back in two weeks and see if they actually pulled the trigger. If the answer is "what's Opus?", you're using the wrong software.

Where OPS is

We shipped Opus 4.8 to our beta testers an hour ago. The model now sits underneath our intelligent classification — every email that hits your Gmail or Microsoft 365 inbox gets read, sorted, and matched against your client roster faster and with fewer mistakes than yesterday. Pipeline auto-advance is sharper at reading the activity log. Lead matching against existing clients is more accurate. Whether you're a painting, landscaping, concrete, or roofing shop in the beta, the inbox-to-pipeline flow you already use is doing more for you under the same hood. General rollout follows once the beta cohort has put a few thousand jobs through it.

And we're working on supercharging your workflows with AI. Opus 4.8 is the engine under what comes next.

When Mythos lands, we'll be on it inside 48 hours. That's not a marketing line. That's the standard.

Built by trades, for trades.

OPUS 4.8: DOES THIS PUT CLAUDE BACK IN THE LEAD?