Pillar
LLM Platforms
API providers, model comparisons, and pricing analysis across the major foundation-model platforms.
DeepSeek V4: the 1M-context + 75%-cheaper launch that made everyone else look slow
V4 ships native 1M context, two new attention mechanisms, and pricing 10-100x cheaper than the closed frontier. The technical report is a primer on what compounded efficiency actually looks like.
Claude Opus 4.8 launch: the dynamic-workflows update is the real story, the model is the bonus
Opus 4.8 dropped May 28 with SWE-bench Pro at 69.2% and honesty improvements. The Claude Code dynamic-workflows feature that shipped alongside is the change that actually moves daily use.
Google I/O 2026 and the Antigravity 3.0 follow-up: the agentic Gemini era is the actual product
I/O 2026 shipped Gemini Omni, Flash 3.5, the TPU split, and a redesigned Antigravity IDE focused on agent management. The pivot from chatbot to agent runtime is now Google's primary thesis.
M5 Max + Gemma 4 — IndyDevDan's "local stack kills providers" thesis and the dissent
IndyDevDan ran his April stack on an M5 Max with Gemma 4 via MLX and claims it kills hosted providers. The thesis is partly right and partly very wrong.
DeepSeek's Engram architecture — the March 2026 persistent-memory breakthrough
bycloud broke down Engram on Mar 24 — DeepSeek's third major architectural contribution in six months. The DualPath paper landed Feb 26; r/LocalLLaMA validated it within weeks.
Recursive Language Models — the "death of RAG" framing and what it actually means
A new paper proposes Recursive Language Models where an LLM calls itself to traverse context. The "RAG is dead" headline overshoots; the underlying pattern is genuinely interesting.
DeepSeek "adds parameters where there were none" — the February 2026 conditional-activation move
bycloud's Feb 17 video unpacked DeepSeek's next architectural innovation: virtual parameters via conditional activation. With V4 looming and GLM-5 already shipped, the open-frontier race compresses.
Opus 4.6 + Sonnet 4.6: Anthropic's February pair, and what "Fennec" actually shipped as
Opus 4.6 (Feb 5) + Sonnet 4.6 (Feb 17) — Anthropic's February pair. Leaked "Fennec" codename shipped as Sonnet 4.6; Opus 4.6 caught a post-launch safety-tuning controversy. Two weeks of routing.
The LLM billion-dollar problem — bycloud frames the AI economics tension
bycloud's Feb 10 video maps the structural cost problem facing frontier LLM dev. r/MachineLearning's "elephant in the room" thread + the AI Futures forecast capture how the field is responding.
Meituan LongCat and the Chinese open-source AI trifecta: the January 2026 lab landscape
bycloud shipped two January 2026 videos surveying the Chinese open-source AI labs. Meituan's LongCat is the surprise; the broader pattern is the more important story.
The RL irony in LLMs: why LoRA fine-tuning is the practical 2026 RL story
bycloud published a January 21 video on the "RL irony" — RL is noisy and hurts generalization, yet it remains essential. LoRA-based RL emerges as the practical compromise.
OpenAI in "CODE RED" after Gemini 3: the December competitive reset
Theo posted a December 4 video framing OpenAI's post-Gemini-3 posture as "CODE RED." Sam Altman's public statements that week confirm something shifted. What the reset actually means.
DeepSeek V3.2 and Sparse Attention: how a small lab keeps undercutting frontier model pricing
DeepSeek V3.2 shipped early December with a new sparse-attention mechanism (DSA) that explains the absurd pricing. The technical story and why it matters for engineers.
Claude Opus 4.5 launch: Anthropic punches back at Gemini 3 with a model for engineers
Opus 4.5 dropped November 24 priced at $5/$25 per million tokens. IndyDevDan called it "the model for engineers." Honest measurement against Gemini 3 Pro after two weeks of daily use.
Gemini 3 Pro launch: dominates benchmarks, but the model is not the moat anymore
Google shipped Gemini 3 Pro in November with benchmark numbers that should have been a knockout. The shipped reality: the model wins, but the agentic stack still belongs to Anthropic.
Claude Haiku 4.5 and the cheap-tier coding arms race: November 2025
Anthropic shipped Haiku 4.5 in mid-October matching Sonnet 4 at one-third the cost. By mid-November the cheap-tier race had three players. The honest measurement.
Claude Sonnet 4.5 a month in: was "best coding model in the world" true?
Anthropic launched Sonnet 4.5 on September 29 with the bold claim of being the best coding model in the world. After a month of daily use, here is the honest measurement against that claim.
GPT-5 two months in: from launch-day backlash to coding-by-default
GPT-5 shipped on August 7 to a wall of skepticism — "all this hype just to match Opus" went to 979 upvotes the same day. Two months later the narrative has flipped. Here is what actually happened.
Anthropic vs OpenAI API pricing: the actual math at typical coding workloads
Both API providers iterated pricing through 2025 and Claude Code added weekly limits. The honest "which is cheaper" answer depends entirely on workload shape. Here is the working math.
Qwen3-Coder vs DeepSeek V3-Coder — the Chinese OSS frontier coding shootout
Qwen3-Coder dropped Jul 22; DeepSeek V3 held the prior crown. r/LocalLLaMA launch threads (1928 + 1693 upvotes) frame the shootout. After running both extensively, the honest head-to-head.
Grok 4 for coding: separating the claims from the reality
Elon Musk claimed Grok 4 beats Cursor. Theo, Fireship and Matthew Berman piled in within 24 hours; r/singularity called it disappointing within four days. Working read after testing.
DeepSeek V3 for coding: the cheap-and-good model that changed the cost equation
DeepSeek V3 lands frontier-adjacent coding quality at roughly one-tenth the API price of Claude or GPT-4o. After six weeks of daily use, here is where it actually fits.
Claude vs GPT vs Gemini for coding in 2025: the API-tier shootout
Three frontier model families compete for your coding token spend. After six months running them across real workloads, here is which API actually deserves which job.
Claude 3.7 Sonnet on real coding tasks — benchmarks vs daily-use reality
Anthropic's Claude 3.7 posted strong SWE-bench numbers in Feb. AI Jason's "reduced 90% errors" workflow + IndyDevDan's starter pack + r/ClaudeAI 85% problem thread frame the daily-use picture.
Gemini 2.5 Pro — Google's "Thinking Family" reboot and the "best AI for coding" claim
Google shipped Gemini 2.5 Pro on March 25 2025 with native reasoning. The community read landed within a day: "Damn Google really cooked this time." Then the caveats showed up.
Manus AI — viral Chinese agent that turned out to be Claude Sonnet + 29 tools
In early March 2025 Manus AI went viral as the "next DeepSeek." Within days the local-LLM community reverse-engineered it: Claude Sonnet + a tool harness. The hype was the product.
Qwen QwQ-32B — the best local reasoning model joins the open frontier (March 2025)
Qwen released QwQ-32B in early March 2025 — a 32B reasoning model that competes with DeepSeek R1 at a fraction of the parameters. Local LLM coders had a new daily driver.