Claude Sonnet 4.5 a month in: was "best coding model in the world" true?

Anthropic launched Sonnet 4.5 on September 29 with the bold claim of being the best coding model in the world. After a month of daily use, here is the honest measurement against that claim.

C Charles Lin · October 30, 2025

Anthropic shipped Sonnet 4.5 on September 29 with the 1890-upvote launch announcement on r/ClaudeAI leading with the claim “the best coding model in the world.” That is a hard bar to set in October 2025 — GPT-5 had been out for nearly two months, Codex CLI had matured significantly, and Anthropic was nominally behind on cost-per-token and trailing on certain agentic benchmarks. A month in, the honest measurement against the launch claim is more nuanced than either the YouTube reviewers’ “yes obviously” or the Reddit dissenters’ “marketing copy.”

This piece is the month-in retrospective on what Sonnet 4.5 actually changed in daily Claude Code use, where the model lives up to the launch claim, and where the Reddit consensus has landed by mid-October on the more reasonable framing.

What the three most-watched launch-day videos actually said

Theo’s “Sonnet 4.5 is the best coding model in the world” (33 min, 123K views, September 30) is the most-cited launch-day video and worth quoting carefully because his frame is more honest than the title suggests. Within the first two minutes he names the strategic context: “There’s been a huge shift away from Claude models and tools in favor of OpenAI stuff, largely because of Claude Code getting dumber and then GPT-5 with Codex getting really, really good. As things have shifted, it’s clear Anthropic is a bit scared, more so than usual. They’ve been very quiet since the GPT-5 release.” That is the launch context: Sonnet 4.5 dropped as a counterpunch, not a coronation. Theo’s actual hands-on take after a full day of testing landed at “Sonnet 4.5 from my experimentation so far has been really good” — qualified, not declarative — and his specific finding was that on UI generation it was “not much better than it was before” while GPT-5 “still just makes stunning UIs.” The headline supports the model’s launch claim. The body of the review pulls in the opposite direction.

James Montemagno’s “Hello Claude Sonnet 4.5! This thing is a BEAST!” (7 min, 30K views, September 30) is the cleaner positive review and the one that captures what the model genuinely is good at. He runs his standard test — a PRD-from-an-idea task followed by a landing-page implementation for a pet-friendly-locations app — through the new Claude VS Code extension. The PRD comes back well-structured with version number, business goals, success metrics, and explicit out-of-scope notes. The landing page renders as “one of the best landing pages, if not the best landing page that I’ve seen ever created for this specific prompt” with subtle animations, hover effects, working light/dark theme toggle, and a sensible navigation. That is the regime Sonnet 4.5 is genuinely strongest in: the well-specified task with a clean PRD as input. Montemagno’s positive take is not wrong; it is just narrower than “best coding model in the world” implies.

Cole Medin’s “Claude Sonnet 4.5 — The New Coding King? (Sonnet 4.5 vs. GPT-5 Codex)” (11 min, 38K views, September 30) is the live A/B test that makes the comparison concrete. He runs the same PRP (problem requirements prompt) — adding a Stripe integration to an existing agentic chat application — in parallel in Claude Code (Sonnet 4.5) and Codex CLI (GPT-5 Codex), live, with no dry run. The framing he opens with is the one the Reddit dataset would converge on a week later: “a lot of people have been switching over to Codex from Claude Code. And so I’m really curious if Sonnet 4.5 is enough to bring everyone back.” His video does not declare a clean winner. Neither does the Reddit data that followed.

What launched on September 29

Three things shipped together — and conflating them is the most common source of “did the launch live up to it?” confusion:

The Sonnet 4.5 model itself. A new model checkpoint with claimed improvements on coding, agents, computer use, reasoning, and math.
A new Claude Code interface. Refreshed UX with better diff visualization, plan-mode improvements, and more flexible permission flows.
A first-party Claude VS Code extension. Anthropic’s own VS Code integration, no longer relying entirely on third parties like Cline or Continue.

When users say “Sonnet 4.5 is great” or “Sonnet 4.5 disappointed me,” they are often partly evaluating the model and partly evaluating the new Claude Code surface. The honest analysis has to separate these.

The model: real gains, real ceilings

On the model itself, my one-month measurement on a mix of personal and client codebases:

Where Sonnet 4.5 visibly outperforms Sonnet 4:

Long-context reasoning. Tasks that required reading 8+ files to understand the issue now feel less context-fragile. Sonnet 4 would drift around the 70k-token mark; Sonnet 4.5 holds up notably better through 100k+.
Plan mode discipline. When you explicitly ask for a plan before implementation, Sonnet 4.5 produces meaningfully cleaner plans — fewer hand-wavy steps, more concrete file paths and function signatures called out by name. Montemagno’s PRD demo lands exactly on this strength.
Refactor scope discipline. Sonnet 4 had a tendency to “fix” things adjacent to what you asked for. Sonnet 4.5 stays more on-task. Smaller diffs for the same prompts.
Computer use. The benchmark that Anthropic led with at launch. For users doing browser-automation workflows via the Claude API, this is the biggest gain — but it is invisible to most coding users.

Where Sonnet 4.5 did not visibly change much:

Raw SWE-bench-style task completion. On well-specified single-file tasks, Sonnet 4.5 and Sonnet 4 produce nearly identical output. The gap is in the multi-file long-context regime.
Speed. Roughly comparable latency to Sonnet 4. Not noticeably faster.
Cost. Same per-token pricing as Sonnet 4.
UI generation quality. Theo’s finding holds — for greenfield UI work, GPT-5 still tends to produce more polished output on first attempt.

Where the launch claim oversells:

The “best coding model in the world” framing. In honest single-task benchmark comparisons (SWE-bench Verified specifically), Sonnet 4.5 is competitive with GPT-5 but not decisively better. The benchmark numbers Anthropic led with were on harder agentic tasks where the long-context reasoning matters more — which is real, but not the same as “best on coding everywhere.”

The Reddit reaction in the 138-upvote head-to-head A/B thread that landed a week after Sonnet 4.5’s launch captured the dissent cleanly. The OP built two parallel implementations of the same e-commerce monorepo using Claude Code + Sonnet 4.5 and Codex CLI + GPT-5-codex and concluded “they each won at different things.” A top comment summarized: “If I need speed/quick edits/easy fixes I use Sonnet 4.5. If I need longer-term thinking/debugging/feature planning I’ll use GPT-5 Codex.” That is not “best model in the world.” That is “shaped right for a specific kind of work.”

The new Claude Code interface: bigger than the model upgrade

Honestly, the more significant change for daily users was not the model — it was the Claude Code UI refresh that shipped alongside. New diff visualization with cleaner change indicators, plan mode that actually lets you edit the plan inline before approving, and finer-grained permission prompts (auto-allow this command, auto-deny that one, prompt for the third). These are workflow improvements that compound across hundreds of daily interactions, in a way that a 2% model-quality bump does not.

Several of the top replies under the launch thread were specifically about the new interface, not the model. “Lets go! Who else was using claude code when this popped up?” The hype was at least 50% about the UI surface, not the model checkpoint. The same launch day that Sonnet 4.5 dropped, Anthropic also shipped the Claude Usage Limit Meter (1248 ups), which the company framed as “we expect fewer than 2% of users to reach” the weekly cap. The most-upvoted reply under that announcement was a dry “They were indeed listening” (342 ups), followed immediately by “Please investigate how I already reached 17% weekly usage in less than 2 full sessions on a max 20x plan! (circa 4 hrs of total usage)” (104 ups). The transparency was real and the limits were real, and the gap between the company’s “2% of users” framing and the heavy-user experience on the threads was the first crack in the launch’s positive reception.

This is worth flagging because it is a pattern: in 2025 the model-vs-tool boundary has eroded for AI coding users. The model is part of the product, not the whole product. When Anthropic ships a Sonnet upgrade with a Claude Code refresh with a usage-meter rollout, those changes are inseparable in practice. Same is true for OpenAI’s Codex CLI improvements happening in parallel with GPT-5-codex releases.

The VS Code extension: long-overdue, mostly fine

Anthropic shipping a first-party VS Code extension was the third big launch piece. Before this, VS Code users went through Cline, Continue, or third-party wrappers to get Claude integration. The first-party extension is cleaner, has better integration with Anthropic’s account system, and now provides the MCP ecosystem hookups directly. It is not yet feature-parity with Claude Code in the terminal — sub-agents and slash commands are still terminal-only — but for users who want to stay in VS Code, it is the new default. Montemagno’s video runs entirely inside the new extension, and the workflow holds up.

The honest competitive read: Anthropic shipped this extension because GitHub Copilot’s bundling of Sonnet 4.5 inside Copilot made the VS Code integration question urgent. With Copilot Pro at $40/month providing 1500 Sonnet 4.5 requests inside VS Code proper, Anthropic needed their own first-party VS Code surface to keep direct-billing users from defecting. The extension is real product, but the timing was reactive.

Where the Reddit consensus landed by mid-October

By the third week after launch, the modal Reddit position on Sonnet 4.5 looked like this:

Real improvements on long-context and multi-file work. Acknowledged across the board.
Not a step-change improvement over Sonnet 4 for short-task work. Acknowledged with some grumbling.
The launch claim oversold. The “best coding model in the world” line aged poorly within 10 days as users ran A/B tests against GPT-5.
The Claude Code UI refresh is the real win. Underemphasized in marketing, overdelivered for daily users.
The usage-meter rollout pulled trust both directions. Praise for the transparency, frustration that the underlying weekly caps felt tighter than the “fewer than 2% of users” framing suggested.
A persistent worry about quality degradation. The Reddit dataset has a recurring complaint that mid-September Sonnet 4 felt degraded — Anthropic does not acknowledge this but the complaints are too consistent to ignore. Sonnet 4.5 has mostly reset trust on this, but the lingering caution is real and shows up in stack-decision threads like the 23-upvote “my current stack” post.

The OP of that stack post is candid: “I bought a $20 sub to claude, to use claude code. CC became my go to implement my changes. But soon it became really stupid, not following directions and degraded quality overall.” That is the kind of quote that does not show up in YouTube reviews but is repeated across enough Reddit threads to be a real signal. Sonnet 4.5’s release window mostly addressed it; whether the underlying issue (whatever it was) has been permanently fixed is something we will only know in another month.

The YouTube vs Reddit gap on this launch

YouTube reviews of Sonnet 4.5 in the first 48 hours were almost uniformly positive — but, importantly, the more detailed videos (Theo’s, Cole Medin’s) were already hedging within those first 48 hours, exactly because the creators were running live A/B tests against GPT-5. The shorter, more enthusiastic videos repeated the “best coding model in the world” claim with various levels of hedging. Reddit’s response was slower and more skeptical — the launch thread is enthusiastic, but the follow-up threads asking “Sonnet 4.5 vs GPT-5” within 7-10 days were doing real A/B testing and landing on nuanced answers.

This is a consistent 2025 pattern. YouTube captures the launch enthusiasm. Reddit captures the settled-state usage a week or two later. Both are honest signals from different stages of the adoption curve. A reader who only watches YouTube reviews would think Sonnet 4.5 was a step-change improvement; a reader who only reads Reddit threads would think it was an iterative bump. The truth is in the middle: a meaningful improvement on the specific dimensions Anthropic invested in (long-context, agentic, computer use), a marginal change everywhere else.

What changed in my own workflow

A month in, my own daily Claude Code use:

I use plan mode more. The improved planning quality makes “plan first, implement second” actually save time rather than feel like overhead.
I trust longer-context tasks more. Things I would have manually decomposed into smaller subtasks under Sonnet 4, I now hand to Sonnet 4.5 as one task.
I switch to Opus less often. Sonnet 4.5 handles enough of what I previously needed Opus for that the Opus reach-rate has dropped meaningfully. Cost savings are real.
I do not see a need to switch from Claude Code as the daily driver for “I know what I want, just do it cleanly” tasks. But I still complement it with Codex CLI for the long-debugging-session work.

The verdict at one month

Sonnet 4.5 is a real upgrade. The launch marketing oversold the “best coding model in the world” framing by enough that the Reddit hot-take cycle bounced it back to a more accurate “best for these specific things, comparable elsewhere” position. The Claude Code UI refresh that shipped alongside is the more important product change for daily users, and is not sufficiently credited because it is harder to put on a benchmark chart.

The honest one-line summary: Sonnet 4.5 keeps Claude Code competitive with GPT-5 + Codex CLI for the kind of work Anthropic users do, and adds enough long-context capability to handle harder tasks without falling back to Opus as often. It is not a step-change over Sonnet 4. It is a polished iteration that makes Claude Code a clearly defensible daily-driver choice in a market that no longer has a single obvious leader.

The next test for Anthropic is whether Sonnet 4.6 (presumably late November or December) lands with the same cadence as OpenAI’s GPT-5 minor revisions or whether the shipping velocity gap widens. The 2026 question — and the one r/ChatGPTCoding has been chewing on for weeks — is whether Anthropic’s “fewer, more polished releases” cadence beats OpenAI’s “ship more surface area faster” cadence on a 12-month horizon. Sonnet 4.5 does not answer that. It just keeps Anthropic in the race.

Sources

Every reference behind this piece. If we make a claim, it's because at least one of these said so — or we lived it ourselves.

Firsthand One month of daily Claude Code + Sonnet 4.5 on personal and client codebases
Docs Anthropic — Introducing Claude Sonnet 4.5 — Anthropic
YouTube Sonnet 4.5 is the best coding model in the world — Theo - t3.gg
YouTube Hello Claude Sonnet 4.5! This thing is a BEAST! — James Montemagno
YouTube Claude Sonnet 4.5 - The New Coding King? (Sonnet 4.5 vs. GPT 5 Codex) — Cole Medin
Blog r/ClaudeAI — Introducing Claude Sonnet 4.5 — r/ClaudeAI
Blog r/ClaudeAI — Introducing Claude Usage Limit Meter — r/ClaudeAI
Blog r/ChatGPTCoding — Codex CLI + GPT-5-codex still a more effective duo than Claude Code + Sonnet 4.5 — r/ChatGPTCoding
Blog r/ChatGPTCoding — My experience in AI coding: brief summary of tools currently using — r/ChatGPTCoding