The Sandbox Is a Harness

James Phoenix

When code becomes the interface between users and systems, the sandbox stops being a security primitive. It becomes a harness for intent.

Author: James Phoenix | Date: April 2026

Inspired by Sunil Pai’s After WIMP, which argues that capability-scoped sandboxes are the missing primitive for the next era of human-computer interaction.

The Programmer-User Line Is Dissolving

Sunil Pai makes a sharp observation in his recent essay. For decades, there were two classes of people who used computers. Programmers could issue procedures. Everyone else chose from menus. The WIMP paradigm (Windows, Icons, Menus, Pointer) was designed for the second group. It gave non-technical users a way to express intent without writing code.

LLMs erase that line. A user who says “show me all orders over $500 from the last week, grouped by region” is issuing a procedure. The model translates natural language into executable code. The user does not need to know the language. They just need to know what they want.

This is not hypothetical. Cloudflare’s “code mode” showed that models are often better at writing code to interact with a system than they are at navigating bespoke tool-calling interfaces. The practical result was a 99.9% token reduction in one implementation. But the deeper result is what matters: the model inhabits the system’s state machine rather than generating new applications from scratch.

Pai’s Missing Primitive

Pai identifies the critical gap: where does this user-generated code run safely?

His answer is capability-scoped sandboxes. Not general-purpose machines with security layered on top, but isolates that start with zero authority and explicitly grant capabilities. You don’t lock down an open system. You start locked and open specific doors.

This is correct. But I think there is a deeper pattern underneath it. One that connects directly to how we already think about agent-driven development.

The sandbox is a harness.

Harnesses Are Not Just for Developers

In The Harness Is Cheaper Now, I argued that the harness (specs, tests, types, linting, observability) has become cheap enough to build first. The economics flipped because LLMs made implementation cheap but debugging expensive. The harness catches errors before they compound.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

That argument was about developer workflows. Agents writing code inside a constrained environment, guided by structural rules, validated by tests.

Pai’s sandbox is the same pattern, extended to end users.

Developer Harness	User Sandbox
ESLint rules constrain agent output	Capability scopes constrain user-generated code
Test suites validate correctness	Runtime checks validate safety
Type systems prevent structural errors	Permission boundaries prevent unauthorized access
Specs define what “done” looks like	User intent defines what “useful” looks like

The shape is identical. Constraints first, then execution. The difference is who is expressing the intent: a developer writing a spec, or a user speaking natural language.

Capability Scoping Is a Control Plane

Pai’s capability-scoped isolates follow the same principle as owning your control plane. The capabilities you grant to a sandbox define what code can do. Add a capability, and you open a new category of actions. Remove one, and you close it. The set of capabilities is not a security config. It is a product decision.

Consider Pai’s e-commerce example. A user says “cancel my last three orders and reorder just the organic items.” In a WIMP interface, this requires navigating several screens, clicking through confirmation dialogs, manually filtering a list. In a sandbox model, the LLM writes code that calls orders.cancel() and orders.create() with the right filters.

But only if the sandbox grants those capabilities. The product team decides which operations are available, what data is accessible, what rate limits apply. This is the control plane. The user expresses intent. The model writes code. The sandbox constrains execution. The control plane defines the boundaries.

This is ESLint Rules Are Programs applied to user-facing systems. The constraints are the real product.

After WIMP Is Not After Constraints

Pai makes an important distinction: after WIMP is not after UI. Some interfaces will remain hand-authored. Others will be generated. The shift is that the behaviour of the system no longer has to be fully predetermined before the user arrives.

I’d reframe this slightly. After WIMP is not after constraints. It is after predetermined paths.

WIMP interfaces encode a finite set of paths through a system. Click this, then this, then this. Every path was designed by a product team. If your intent doesn’t map to an existing path, you’re stuck.

Sandbox-based interfaces encode constraints, not paths. The paths are generated at runtime by the model. But the constraints (what data is accessible, what operations are allowed, what invariants must hold) are designed by the product team. The user gets infinite paths within finite boundaries.

This is the same insight behind Auto-Harness Synthesis. A well-constrained small model beats an unconstrained large model. A well-constrained sandbox serving user-generated code beats an unconstrained general-purpose runtime. The constraints are not limitations. They are what make the system trustworthy enough to be useful.

The Compound Angle

What excites me about Pai’s framing is the compounding potential. Today, each user-generated program is disposable. You ask, the model writes code, it runs, you get the result.

But what if user-generated programs persist? What if they accumulate? A user who routinely cancels and reorders organic items gets a standing program that runs on a schedule. A power user builds a library of personal automations, each one a small program living in their sandbox, each one scoped to exactly the capabilities they need.

This is where the sandbox-as-harness framing pays off. The harness compounds. Each constraint you add prevents a class of errors for every future program, not just the current one. Each capability you scope reduces the attack surface for every program that runs in that sandbox. The infrastructure investment pays dividends across all user-generated code, not just the next feature.

The product teams that figure this out first will have a structural advantage. Not because their models are better, but because their harnesses are.

Related: