Cursor Composer 2.5: the price-per-task model that may have just reset the workhorse tier
Composer 2.5 lands at ~$0.50 per Cursor-bench task against Opus 4.7 at ~$11. The model is not the smartest. It is the right shape for the regime most teams actually operate in.
Cursor shipped Composer 2.5 the same week Google held I/O 2026 and Anthropic teased the run-up to Opus 4.8. By the metric most working teams actually care about — cost per finished task on Cursor’s own bench — Composer 2.5 lands as the most disruptive launch of the three. The model is not the smartest. It is the right shape for the regime most teams actually operate in, and the gap between “absolute best” and “Composer 2.5” is now small enough that a lot of daily coding work should not be running on Opus tier anymore.
This piece is the working read after a week of running Composer 2.5 as primary daily driver, cross-checked against the Cursor bench numbers Matthew Berman highlighted on launch day and the Reddit signal from the Ultra-plan power users who got the new model into their hands first.
What Cursor actually shipped
Composer 2.5 is the next iteration of Cursor’s homegrown “workhorse” model — the model the company actually wants you to use for the bulk of your sessions because they own the inference economics. The series is based on the Kimi open-source family with Cursor’s own post-training. The 2.x line is positioned as a dot upgrade from 2.0, but the price-performance shift is what makes it interesting rather than incremental.
The headline launch features per Cursor’s blog:
- Cursor-only model — not available via API or in any third-party tool
- Sustained-work optimisation — meant for long-running agent loops more than single completions
- Doubled included usage for the launch week — promotional but real
- Better instruction following on multi-step complex tasks — the area where the prior 2.0 had visible weakness
The point of the launch is not “we made the smartest model.” It is “we made the model that finishes the most work per dollar inside Cursor.”
Matthew Berman’s price-per-task chart, in one paragraph
The clearest single artifact summarising why Composer 2.5 is being talked about is Matthew Berman’s launch-day reaction “Cursor just beat EVERYONE” (17 min, May 26). The Cursor-bench chart he walks through (cost per task on the X-axis, Cursor-bench score on the Y-axis) plots the entire frontier in a way that makes the regime question concrete:
- Opus 4.7 max sits at the top right — roughly 65% on Cursor-bench, around $11 per task
- GPT-5.5 family clusters in the middle — competitive intelligence, around $4 per task at the high end
- Composer 2.5 sits just below at ~64% — at roughly $0.50 per task
Berman’s framing is the one that lands the strategic point: “Imagine how much more you’re going to be able to get done for the budget. I think a lot of people, when they’re looking at these benchmarks and when they’re looking at the overall score, they just think everybody has unlimited budget. And that is not the case for most individuals and certainly not the case for most companies. Price per intelligence ratio is incredibly important to a lot of companies. Not everybody is token maxing.”
That is the central reframing of the workhorse-class story. The leader on quality (Opus 4.7 max, now Opus 4.8) charges roughly 22x what Composer 2.5 does for one-percentage-point of additional Cursor-bench accuracy. For tasks that benefit from raw frontier reasoning — the hardest debugging, novel architectural decisions, ambiguous spec interpretation — that premium is worth it. For everything else — and “everything else” is most of what working engineers actually do — the math has shifted decisively.
What the Ultra-plan users on Reddit are reporting
The Reddit signal lines up with the chart. The most useful single thread is r/cursor “Blessed without a 5h window” (37 ups, June 7). The OP, who had just upgraded to the Cursor Ultra plan, reports running Composer 2.5 in @ fast mode for “8+ hours w/ no brakes” and not coming anywhere near their quota. The closing line is the one that captures the reframe: “Honestly been loving it here more than Claude & Codex.”
That is not a benchmark observation. It is a workflow observation — the gap from a 5-hour rate-limit cycle (the standard Anthropic / OpenAI pattern) to “I can steer agents all day” is its own productivity unlock that has nothing to do with model intelligence and everything to do with how the meter feels under sustained load.
A parallel thread, r/cursor “Composer 2.5 usage with ultra” (3 ups), is smaller but tells the same story from the entry side: prospective Ultra subscribers asking “how much usage actually fits in this” and getting confirmations from current users that the answer is “more than you’ll hit if you are primarily on Composer 2.5”. That is the price-per-task chart cashed out as user experience.
The OP’s mention of “AI psychosis” is half a joke and half a real flag worth quoting — when the meter stops being the constraint, the constraint becomes your own discipline about what to dispatch and what to review.
The contra-take Theo landed the same week
The complication landed on June 3 with Theo’s “More Prompts = Worse Code?” (8 min). His argument is not specifically about Composer 2.5 but it is directly relevant to the workhorse-model thesis: there is a real risk that cheaper-per-task pricing pushes engineers to dispatch more prompts than they should, with each one slightly less thought through, and the aggregate quality of output goes down even though the per-prompt model is fine. The classic “ten cheap shots beats one expensive shot” framing breaks when the ten shots are also less carefully framed.
This is the discipline question the price-per-task chart does not address. Composer 2.5 makes it economically rational to fire off agents you would have deferred when the per-shot cost was $5. Whether those agents produce work worth your review time is a separate question. Theo’s video does not answer it definitively; it just names the risk. The early r/cursor mood — “Blessed without a 5h window” — has more enthusiasm than discipline-talk in it. The next month will tell whether the workhorse tier creates real leverage or just lower-quality output at higher volume.
Where Composer 2.5 actually fits in a 2026 stack
After a week of running it as the default, my honest segmentation:
Composer 2.5 should be your default for:
- Routine refactors with a clear spec (rename a function across the codebase, switch a Postgres column type and propagate)
- Test generation from existing code
- Boilerplate and scaffolding (typed clients from an OpenAPI spec, CRUD endpoints, basic React components)
- First-pass implementations of well-specified features where you plan to review carefully
- Anything that previously felt economically marginal at Opus pricing
Reach for Opus 4.8 (or GPT-5.5 high) when:
- The task is debugging something whose root cause you do not know
- Architectural decisions where the model’s reasoning will influence design choices that survive the PR
- Ambiguous spec interpretation where you want the model’s interpretation to be as careful as yours would be
- Multi-file refactors where the wrong abstraction propagates expensively
Skip Composer 2.5 entirely if:
- You are not in Cursor (it is Cursor-only)
- Your workload is mostly outside the routine refactor / test / scaffold envelope
- You are paid by quality of output, not volume of output, and the Opus premium pays for itself
Creator POV vs Reddit dissent
The YouTube creators’ read on Composer 2.5 is overwhelmingly positive in week one — Berman called it “the best coding model on the planet” qualified with “for the price-per-intelligence regime”, which is the right hedge. The framing other channels picked up — “Cursor finally won the workhorse fight against Anthropic and OpenAI on their own ground” — is broadly correct.
Reddit’s mood is more textured. The Ultra-plan users are happy because the rate-limit ceiling moved. The Pro-plan users on the standard tier are split — some report Composer 2.5 hitting their existing quotas just fine, others report the included usage running out faster than they expected once the promotional doubled-window ends. The discipline question Theo raised is essentially absent from the early threads — there are very few “I dispatched too many agents and the integration step ate my afternoon” posts yet, which is exactly the kind of late-cycle observation that takes 2-3 weeks to surface on Reddit.
Three weeks from now, the modal r/cursor post will tell you whether Composer 2.5 actually delivered on the workhorse thesis or just made it cheaper to burn through the same productivity ceiling more dramatically.
The honest one-week verdict
Composer 2.5 is a real change to the cost curve, and the cost curve was overdue for a change. The default coding model for the majority of routine work in 2026 should not be running at Opus pricing; it should be running at workhorse pricing. Composer 2.5 is the first widely-available answer that gets the price right without making the quality so much worse that the savings disappear into rework.
The strategic implication for Anthropic, OpenAI, and the broader market is the more interesting story. Cursor now has a moat that is not “we are the best IDE” — it is “we have the workhorse model with the best price-per-task on our own bench, available only inside our product.” That is the kind of moat that compounds with usage. The next twelve months are going to be about whether the frontier labs match the price-per-task or cede the workhorse tier entirely. My read is they cede it, then try to recapture it at the next major release with whatever the next compression technique is.
For working engineers right now: try Composer 2.5 for a week on routine work, keep Opus tier available for the hard cases, and pay attention to whether your aggregate output quality holds up under the new economics. That is the only honest way to evaluate a workhorse model that just reset the curve.
Sources
Every reference behind this piece. If we make a claim, it's because at least one of these said so — or we lived it ourselves.
- Firsthand One week of running Composer 2.5 as primary daily driver across personal and client projects
- Docs Cursor — Introducing Composer 2.5 — Cursor
- YouTube Cursor just beat EVERYONE. — Matthew Berman
- YouTube More Prompts = Worse Code? — Theo - t3.gg
- YouTube Anthropic just dropped Opus 4.8... (WOAH) — Matthew Berman
- Blog r/cursor — Blessed without a 5h window (37 ups) — r/cursor
- Blog r/cursor — Composer 2.5 usage with ultra (3 ups) — r/cursor