Agent sandboxes are the new parallel-coding primitive — E2B, Modal, Daytona, and why November 2025 matters

Running multiple AI agents in parallel started as a hack with git worktrees. November 2025 was when ready-made agent sandbox products became the obvious primitive instead.

C Charles Lin · November 19, 2025

In mid-October a small but heavily upvoted r/ChatGPTCoding post titled “PSA: Do NOT use YOLO mode in Codex without isolating it!” captured a pattern engineers had been quietly running into. Codex CLI with --dangerously-skip-permissions (or its Claude Code equivalent) was deleting files, changing system configs, and generally behaving like an unsupervised intern with sudo. The OP’s recommendation: “Create a dev container for your project. Then codex will be isolated properly and can work autonomously.”

A month later, IndyDevDan published “E2B Agent Sandboxes: The Space to Place your Claude Agents” — and the framing shifted. The container isn’t just a safety primitive. It’s the scaling primitive. If you can run an agent in an isolated container, you can run nine of them in parallel. If you can run nine, you can deploy a “best-of-N” strategy where the same problem is solved multiple ways and you take the best result. That’s a different category of capability than “I keep my single agent from breaking my machine.”

November 2025 was the month this transition happened in public. This piece tracks what changed, why E2B / Modal / Daytona suddenly mattered, and how the pattern fits into a 2026 daily-driver workflow.

What the E2B video actually argues

Dan’s framing for the E2B Agent Sandboxes video, in his own words:

“Reddit ROASTED my landing page — and they were absolutely right. So I did what any self-respecting AI engineer would do: I deployed NINE parallel agent sandboxes to fix it. This is the FUTURE of agentic coding and it’s happening RIGHT NOW.”

The mechanic: he took a single problem (improve a landing page), spun up nine isolated E2B sandboxes, each with its own Claude Code agent + repository checkout, and let them all solve the problem in parallel with different approaches. Then he reviewed the nine results and merged the best one. The total wall-clock time was less than waiting for one agent to iterate nine times serially. The cost was about 9x a single agent’s tokens, but the throughput was dramatically higher.

This is the “best-of-N” pattern that the video walks through in detail. You’re not asking one agent to do better. You’re asking many agents to each do their thing, then you cherry-pick. Each sandbox is genuinely isolated — file system, network, install state — so the agents can’t trip on each other. The orchestration is the engineering work; the per-agent compute is commodity.

In Dan’s follow-up Gemini 3 Pro video on November 24, he scaled this further — 15 agent sandboxes split across Gemini 3 Pro, Claude Sonnet 4.5, and Codex 5.1 Max simultaneously. The cost goes up linearly; the wall-clock time stays roughly constant; the quality ceiling rises because you’re sampling more of the model distribution.

The three sandbox products competing in November

By mid-November the agent sandbox space had three credible products:

E2B — Cloud sandboxes optimized for code execution and agent workflows. Pre-built environments for common stacks. Direct integration with major LLM provider APIs. Pricing is roughly $0.10-0.30 per agent-hour depending on instance size. Used by Dan as the example throughout his videos. Open source on the runtime side, hosted service for the orchestration.

Modal — Originally a Python-first serverless compute platform, has been positioning aggressively as the agent-sandbox alternative. Their safe code execution example is the canonical reference. Pricing is per-second compute billed; competitive with E2B for short-lived sandboxes, cheaper for sustained ones.

Daytona — A newer entrant focused on dev-environment-as-a-service. Their value prop is the “declarative builder” — describe the environment you want in YAML and Daytona materializes it. Less mature than E2B/Modal for pure-agent workflows but stronger for “spin up a full dev environment with services” use cases.

Plus a fourth option that’s worth flagging: roll your own with Docker + a small VPS. For users on tight budgets, a Hetzner CCX22 running 5-10 Docker containers each with a Claude Code or Codex CLI session is a viable budget version. Less polished than E2B; cheaper if you’re running constantly. The October “cancelled Cursor, built multi-agent swarms with Claude Code” Reddit thread describes essentially this pattern with home-rolled tooling.

The pattern emerging across creator content

IndyDevDan’s three November videos taken together tell a coherent story:

November 3: “The One Agent to RULE them ALL” — orchestrator pattern, single coordinator dispatches sub-agents
November 10: “Why are top engineers DITCHING MCP Servers?” — context economy matters; lean on CLI tools and Skills, not MCP-heavy stacks
November 17: “E2B Agent Sandboxes” — sandboxes are how you scale beyond your local machine
November 24: “I gave Gemini 3 Pro its own computer” — sandboxes + model diversity = quality ceiling rises

The arc is consistent: AI coding is moving from “one engineer + one agent + one chat” to “one engineer + many parallel agents + sandboxes for isolation.” Dan is the loudest voice making this argument, but the pattern shows up across other creators too — AI Jason on Gemini 3 specific use cases, Cole Medin on Claude Code workflows, the broader r/ChatGPTCoding community on multi-agent swarms.

The pattern matches what I wrote about earlier in October — parallel agents as an emerging workflow. November is when the tooling caught up. Git worktrees were a hack. E2B sandboxes are a product.

What Reddit users are actually reporting

The Reddit dataset on agent sandboxes in late October and November is messier than the YouTube creator narrative. The friction points engineers run into:

Safety isn’t optional. The PSA thread captures the most common entry path — engineers granting YOLO mode without isolation, then learning the hard way:

“Or —dangerously-skip-permissions my homie Claude doesn’t fuck my os up. Codex cooked my pc with out even bypassing safety. I had codex work on a problem on my Linux install last night. Just trying to make hibernate work. So codex wants to change boot config to do it.” — 10 upvotes

“People make fun of folks who have Claude Code rm -rf stuff because someone just gave it unrestricted Bash access. But the reality is that if you even ‘just’ give an LLM the option to execute python without your approval, you are already fucked.” — 3 upvotes

That’s the gateway problem. Engineers want autonomy from their agents; they don’t want to lose their machine to a bad command. Sandboxes solve this cleanly. The question is whether they solve it cheaply enough to be the default.

Cost is real. Running 9 sandboxes in parallel for a 20-minute task is roughly the same compute as running one agent for 3 hours. If you have a fixed monthly budget (most $200 Pro / Max subscribers do), parallel sandboxes will burn through it dramatically faster than serial usage. The “best-of-N” pattern is genuinely powerful but is not free.

Orchestration is unsolved. Dan’s nine-agent landing page demo worked because the task was naturally parallel (each agent owns its own variant). For tasks where the agents need to coordinate, the patterns are still rough. Shared state, dependency ordering, conflict resolution — these are open problems with experimental solutions, not solved primitives.

The reconciliation: where this is heading

YouTube creators are presenting agent sandboxes as the new paradigm. Reddit users are reporting them as a useful-but-expensive primitive with real friction. Both are correct.

For an engineer in November 2025, the working stance:

Use sandboxes for autonomous overnight or background work. Anything where you wouldn’t want to babysit and where the cost of “agent breaks my machine” is non-trivial. E2B or Modal are the right tools.
Use parallel sandboxes for the “best-of-N” pattern when you have a self-contained task. Landing page variants, README generation, alternative implementations of a function — these benefit dramatically from sampling. The 9x cost is worth it when the quality ceiling matters.
Do not use parallel sandboxes for tightly coupled tasks. The coordination cost will exceed the parallelism win.
Be honest about the budget. Parallel sandbox patterns can burn $200/month subscription budgets in days. If you’re rate-limit-bound today, you’ll be rate-limit-bound worse with this pattern.
For one-off interactive work, stay on local Claude Code or Codex CLI. The latency and friction of round-tripping to a remote sandbox isn’t worth it for the “fix this bug while I watch” workflow.

The bigger arc: agent sandboxes are now a first-class primitive in the AI coding stack. They sit alongside “the model,” “the agent loop,” “the IDE/CLI,” and “the orchestration layer” as a distinct architectural concern. Six months ago they were a hack engineers improvised with Docker; in November they became a product category with real competition (E2B, Modal, Daytona) and real venture investment.

The 2026 prediction worth flagging

The thing I’m watching for through Q1 2026:

First-party sandboxing from the model labs. Anthropic launching their own equivalent of E2B specifically for Claude agents. OpenAI doing the same for Codex CLI. The current “third-party sandbox” market is large enough that the labs will want to own that surface.
Better orchestration primitives for coordinated agents. Right now “best-of-N on isolated tasks” works; “5 agents coordinating on a shared codebase” is unsolved. Whichever lab or startup ships a usable primitive for the latter wins a meaningful piece of the workflow.
The cost of parallel-sandbox patterns coming down. Either through smaller-but-good-enough models (Haiku 4.5 is the start of this), through cheaper compute, or through smarter sampling strategies that don’t require 9 full agent runs.

For November 2025: agent sandboxes are the unlock for parallel AI coding workflows beyond the local-machine ceiling. Dan’s videos are the cleanest articulation of the pattern. The Reddit experience reports surface the cost and orchestration friction. The reconciliation — that this is a real new primitive with real tradeoffs — is more useful than either narrative alone.

If you’re not using agent sandboxes yet, you probably will be in Q1 2026. The runtime products are ready. The cost economics are within reach. The workflows are documented enough to copy. The only remaining barrier is the muscle memory of “I run agents on my machine” — and that’s the same barrier engineers had before they ran Docker on their machine, and before they ran services on cloud VPS instead of physical servers. It always falls. It’s falling now.

Sources

Every reference behind this piece. If we make a claim, it's because at least one of these said so — or we lived it ourselves.

YouTube IndyDevDan — "E2B Agent Sandboxes: The Space to Place your Claude Agents" — IndyDevDan
YouTube IndyDevDan — "I gave Gemini 3 Pro its own computer" (sandboxes in practice) — IndyDevDan
YouTube IndyDevDan — "I finally CRACKED Claude Agent Skills" (skills + sandboxes) — IndyDevDan
Docs E2B — Open-source code interpreter and agent sandboxes — E2B
Blog r/ChatGPTCoding — "PSA: Do NOT use YOLO mode in Codex without isolating it!" (56 ups, late October) — r/ChatGPTCoding
Blog r/ChatGPTCoding — "I cancelled my Cursor subscription. I built multi-agent swarms with Claude Code instead" (31 ups, July) — r/ChatGPTCoding
Firsthand Three months running parallel agents on E2B sandboxes for personal projects