Anthropic vs OpenAI API pricing: the actual math at typical coding workloads
Both API providers iterated pricing through 2025. The honest "which is cheaper" answer depends entirely on workload shape. Here is the working math.
API pricing comparisons usually publish the per-million-token rates and call it analysis. Real pricing depends on what you’re doing — and on which models you’re routing to. After modelling actual spend across multiple workloads on both providers, here is the working pricing math.
The headline rates in mid-2025
Anthropic Claude:
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Claude 3.7 Sonnet | $3.00 | $15.00 | 200K |
| Claude 3.5 Haiku | $0.80 | $4.00 | 200K |
| Claude 3 Opus (legacy) | $15.00 | $75.00 | 200K |
OpenAI:
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4o-mini | $0.15 | $0.60 | 128K |
| GPT-4.1 (newer) | varies | varies | 1M |
| o1 (reasoning) | $15.00 | $60.00 | 200K |
| o1-mini | $3.00 | $12.00 | 128K |
The raw rates favour OpenAI on the cheap tier (GPT-4o-mini at $0.15/$0.60 is cheaper than Claude Haiku) and Anthropic on the premium tier (Sonnet at $3/$15 is cheaper than GPT-4o for the typical input-light, output-heavy coding workload).
The dirty secret: token ratios matter more than rates
For typical coding workloads, the input:output ratio is roughly 3:1 — you send a lot of context (existing code, prompts, examples) and the model writes less code back. The headline output rate dominates total cost.
Worked example: 1M tokens in, 300K out:
- Claude 3.7 Sonnet: 1M × $3 + 0.3M × $15 = $7.50
- GPT-4o: 1M × $2.50 + 0.3M × $10 = $5.50
- Claude Haiku: 1M × $0.80 + 0.3M × $4 = $2.00
- GPT-4o-mini: 1M × $0.15 + 0.3M × $0.60 = $0.33
GPT-4o is cheaper than Sonnet here. GPT-4o-mini is dramatically cheaper than Haiku.
But — and this matters — the cheaper-tier models produce more tokens and lower-quality output. The cost-per-completed-task is not the same as cost-per-token.
Where Anthropic actually wins on cost
Task completion rate on hard tasks. Claude 3.7 Sonnet finishes complex multi-file refactors more often than GPT-4o in my testing. A “$5 task that needs 2 retries on GPT-4o” can be a “$7 task that completes first try on Claude.” The retry math flips the headline rate.
Prompt caching. Anthropic ships prompt caching that reduces input cost by ~90% for repeated context. For workflows that re-send the same system prompt + tool definitions repeatedly (i.e., every coding session), this matters a lot. OpenAI has equivalent now but the discount math is similar.
Output predictability. Claude tends to be more concise than GPT-4o for coding output. Lower output token count = lower total cost.
Where OpenAI actually wins on cost
Bulk / volume workloads. GPT-4o-mini at $0.15/$0.60 is cheaper than anything Anthropic offers. For automation, content processing, batch work — OpenAI wins.
Reasoning-heavy tasks at scale. The o-series pricing is similar to Anthropic’s Sonnet but reasoning models often complete tasks Sonnet would need multiple iterations on.
Longer context for cheap. GPT-4.1 has 1M context at competitive pricing. Anthropic’s 200K is workable but smaller.
The community signal
The 358-upvote r/ChatGPTCoding “Value of $200/month AI users” thread reflects the heavy-user reality: serious coding users are spending $100-300/month total across both providers + subscription tiers. The cost question isn’t “which is cheaper per token” — it’s “which combination delivers the value at the right total cost.”
The 118-upvote “Anthropic is lagging on cheap fast models” thread captures the real Anthropic weakness — Haiku is competitive but not cheap-tier-dominant. OpenAI’s mini tier is the best in class for bulk work.
The 148-upvote “PSA for anyone using Cursor — wasting most of your AI requests” thread is the canonical “you’re paying for tokens you don’t need” piece — pointing out that wrapping AI in IDE tools often burns tokens on things that don’t need premium models.
The working multi-provider pattern
What heavy users actually do in 2025:
- Claude 3.7 Sonnet via Anthropic API or Claude Pro for default coding work
- GPT-4o-mini via OpenAI API for bulk / cheap routing (the cheapest credible “good enough” model)
- DeepSeek V3 via DeepSeek API for cost-conscious bulk where China-hosting is acceptable
- OpenAI o-series for hard reasoning when speed isn’t the goal
- Gemini 2.5 Pro for long-context analysis
Total monthly spend for a heavy user: $50-200/month across providers. The pattern wins because each model handles what it’s best at — no single-provider strategy lands as well.
The pricing tier comparison
Picking by your specific workload
Pros
- Default coding driver → Claude 3.7 Sonnet (best quality-per-dollar at the premium tier)
- Bulk / automation → GPT-4o-mini (cheapest credible quality)
- Hard reasoning → OpenAI o3-mini (cheaper o-series option)
- Long context → Gemini 2.5 Pro (2M context cheap)
- Cost-optimised bulk → DeepSeek V3 (cheapest, China-hosted)
- Prompt-caching workflows → Anthropic (mature caching at the premium tier)
Cons
- Don’t single-provider lock-in if you can avoid it — multi-model routing is the optimal pattern
- Don’t over-route at low volume — switching providers has overhead
- Don’t use o-series for routine work — slow and expensive for simple tasks
- Don’t use Haiku for cheap-bulk — GPT-4o-mini is cheaper at similar quality for that tier
- Don’t use Sonnet for bulk — overpriced for routine work
- Don’t ignore prompt caching — for tooling that resends context, it’s a 5-10× cost reduction
How to actually optimise
For an engineer running $50-100/month on AI APIs:
- Audit your token usage — most providers offer breakdowns. Find the workload chunks that dominate cost.
- Route those to cheaper models — bulk to GPT-4o-mini, reasoning to o3-mini, default to Sonnet
- Enable prompt caching where supported
- Use the subscription paths (Claude Pro, ChatGPT Pro) for personal use — often cheaper than equivalent API spend for individual users
The 30% savings is achievable with two hours of routing setup. Worth it.
For the broader model landscape, see our Claude vs GPT vs Gemini comparison. For the cheap-tier deep dive, DeepSeek V3 review.
Sources
Every reference behind this piece. If we make a claim, it's because at least one of these said so — or we lived it ourselves.
- Firsthand Modelled API spend across multiple workloads on both providers
- Docs Anthropic API pricing — Anthropic
- Docs OpenAI API pricing — OpenAI
- Blog r/ChatGPTCoding — value of $200/month AI users (358 ups) — r/ChatGPTCoding
- Blog r/ChatGPTCoding — Anthropic lagging on cheap fast models (118 ups) — r/ChatGPTCoding
- Blog r/ChatGPTCoding — Cursor users wasting AI requests (148 ups) — r/ChatGPTCoding
- YouTube Independent API pricing breakdowns — Various