Buy Orchestration, Own Semantics

James Phoenix

The five layers of an agent system are not mutually exclusive. You do not need alpha in all of them. But you need to know which ones are hard and which ones are solved.

The Five Layers

Every agent-powered development system decomposes into five layers. They compound when combined, but each stands alone. You can outsource some and invest deeply in others.

Layer	What It Does	Example Tools
1. Orchestration	Schedules agents, manages budgets, parallelises, routes, retries	Paperclip, OpenClaw, custom DAGs
2. Tasks	Defines the work queue, dependencies, state transitions, claims	tx, Linear, custom task graphs
3. Specs	Defines what “correct” means, acceptance criteria, invariants, PRDs	tx specs, design docs, typed schemas
4. Boilerplate / Harness	The codebase environment agents operate inside: folder structure, typed boundaries, ESLint rules, test harness, patterns	tx-agent-kit, CLAUDE.md, custom starters
5. Runtime	Actually executes work: workflows, persistence, domain adapters, tool access	Effect, Temporal, Drizzle, domain code

These layers are not mutually exclusive. You could outsource orchestration entirely to Paperclip and focus all your energy on making specs incredibly precise. You could build a boilerplate with a world-class test harness and use someone else’s task system. You could go deep on runtime and treat specs as lightweight markdown files. The layers compound when combined, but you pick where to invest based on where you have the most leverage.

Orchestration Is Mainly Solved

Agent orchestration is converging to commodity. The core primitives are well-understood:

Task queues with priority and claim semantics
DAGs for dependency resolution
Heartbeats for liveness detection
Schedulers for cron, event-driven, and on-demand dispatch
Budget controls for token/cost limits
Dashboards for visibility

These are distributed systems problems with known solutions. Multiple frameworks already solve them well. The orchestration layer is important, but it is not where the difficulty lives.

You can buy orchestration and lose very little. Paperclip, OpenClaw, or even a custom cron + task queue will get you 90% of the way. The marginal return on building your own orchestrator is low unless orchestration itself is your product.

Boilerplate and Specs Are the Hard Layers

The hardest problem is not “make an agent do something.” It is: make an agent do the right thing, repeatedly, inside a complex evolving system.

That difficulty lives almost entirely in two places: the boilerplate/harness and the specs.

Why Boilerplate Is Hard

Consider the agent’s job as navigating a state space S (all possible repo states) by choosing actions A (edits, refactors, commands). Without strong boilerplate, the agent does blind search over S. That is hopeless.

Your boilerplate is a reduction of the search space: S to S’, where S’ is smaller, more structured, more predictable.

When you use Effect layers, repository pattern, strict ESLint rules, consistent folder structure, typed errors, and deterministic interfaces, you collapse the action space. Valid actions become obvious. Invalid actions get rejected by the compiler or linter before the agent even finishes. The agent is not “intelligent.” It is operating in a heavily regularized space.

This is why an engineer with strong boilerplate barely writes code themselves and still ships. The environment does the work.

Examples of boilerplate that makes agents dramatically better:

A test harness that catches regressions before merge
ESLint rules that enforce architectural boundaries
Typed error hierarchies that prevent silent failures
Repository patterns that make data access predictable
CLAUDE.md files that encode domain knowledge into agent context

Why Specs Are Hard

Specs define the target. Without precise specs, even a perfect agent in a perfect harness will build the wrong thing.

The difficulty is not writing a document. The difficulty is:

Defining “correct” precisely enough that a machine can verify it
Decomposing ambiguous requirements into testable invariants
Mapping business intent to acceptance criteria an agent can check
Keeping specs updated as the system evolves

Specs and boilerplate form a feedback loop. Better specs tell agents what to build. Better boilerplate constrains how they build it. Together they collapse the search space from “anything an LLM could generate” to “the small set of correct implementations.”

Difficulty Comparison

Layer	Difficulty	Why
Orchestration	5/10	Solved problem. Tasks + DAGs + heartbeats + schedulers
Tasks	6/10	State machines, dependency graphs. Well-understood patterns
Specs	9/10	Defining truth precisely, mapping to testable invariants
Boilerplate / Harness	10/10	Correctness under ambiguity, evolving constraints, long-range dependencies
Runtime	7/10	Domain-specific but built on proven foundations (Effect, Temporal, etc.)

You Do Not Need Alpha in All Five

This is the key insight. The layers compound, but you can build a strong position by going deep on just one or two.

Scenario A: Spec-focused. You build the best spec tooling in a niche. Your PRDs, design docs, and acceptance criteria are so precise that any agent framework can execute them reliably. The specs are the product.

Scenario B: Harness-focused. You build a boilerplate/starter with an incredible test harness, typed boundaries, and ESLint rules. Any agent dropped into this environment immediately becomes more productive. The harness is the product.

Scenario C: Full-stack. You invest in specs, boilerplate, and runtime together. They compound. This is the most powerful position but the most expensive to build.

Scenario D: Orchestration-focused. You build the best coordinator, dashboard, budget system. This is where Paperclip and OpenClaw live. It is a valid business, but it is the most commoditized layer.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

The strategic question is: where do you have the most alpha? Invest there. Outsource the rest.

The Human Loop Is the Alpha

The most important layer is not in the table above. It is the human in the loop.

Agents optimize for local completion. They finish the task in front of them. They do not know whether the task matters. They do not know whether the feature should exist. They do not know whether the market shifted last week. They cannot tell the difference between “technically correct” and “strategically right.”

The human provides:

Taste. Which of the ten valid implementations is the one customers actually want?
Strategy. What to build next, what to stop building, what to ignore entirely.
Judgment under ambiguity. When the spec is unclear, agents guess. Humans ask the right question.
Risk calibration. Agents treat all tasks equally. Humans know which ones are load-bearing.
Narrative. Why this product exists, who it serves, what story it tells.

Do not try to build a zero-human company. That is not the goal. The goal is to make the human’s time absurdly leveraged. One hour of human direction should produce 10-20 hours of agent output. That ratio is already achievable today with good specs and a strong harness.

The fantasy of “agents run everything while I am on a beach” collapses for the same reason fully autonomous vehicles keep stalling. The long tail of edge cases is where all the value and all the danger lives. Agents handle the 80% of predictable work brilliantly. The 20% that requires judgment, context, and taste is where humans earn their keep.

The right mental model: you are the executive function. Agents are the workforce. An executive who disappears entirely gets a company that drifts. An executive who shows up for 2 focused hours a day and directs a tireless workforce gets disproportionate output.

Day-to-Day vs Holiday Mode

Agents should run differently depending on whether a human is in the loop or not. The difference is not capability. It is permission scope.

Day-to-Day (Human in the Loop)

You are actively directing work. Agents are extensions of your hands. This is where most of the value is created.

Activity	How It Works
Feature development	You write specs or PRDs. Agents implement, you review and merge. Tight feedback loop.
Bug fixes	You triage and prioritize. Agents draft fixes, run tests. You approve.
Refactoring	You define the target architecture. Agents execute the migration in branches.
Code review	Agents run swarms for security, performance, maintainability. You make the call.
Spec refinement	You iterate on specs with agents. Back-and-forth until acceptance criteria are precise.
Exploratory work	Agents research, prototype, summarize options. You pick the direction.

Day-to-day is high-trust, high-bandwidth. Agents can merge, deploy, and make changes because you are watching. The human provides judgment, taste, and strategy. The agents provide speed and thoroughness. This mode is where the real compounding happens, because human direction keeps the system on the right trajectory.

Holiday Mode (No Human in the Loop)

You are away. Agents operate inside a constrained policy. The goal is not “run the company.” The goal is keep momentum, prevent chaos, compound prepared work.

Activity	Allowed	Why
Triage backlog	Yes	Low risk, high value on return
Run test suites	Yes	Catches regressions early
Draft PRs (do not merge)	Yes	Work is prepared for review
Prepare refactors behind flags	Yes	Reversible, non-destructive
Write specs from known templates	Yes	Queues future work
Summarize emails, issues, commits	Yes	Reduces catch-up time
Produce daily ops digests	Yes	Surfaces anomalies
Dependency bumps (minor/patch)	Yes	Low risk with test gates
Merge to main	No	Irreversible without review
Deploy to prod	No	Blast radius too high
Change billing/pricing	No	Financial commitment
Email customers automatically	No	Reputation risk
Change infra without approval	No	Hard to reverse
Publish content	No	Brand/accuracy risk
Delete data	No	Irreversible
Mutate core schemas	No	Cascading breakage
Architecture decisions	No	Strategic, requires taste

Escalation triggers (stop autonomous action, queue for human):

Ambiguous requirement with no clear spec
Failing integration test that was previously green
Conflicting specs or contradictory acceptance criteria
Increased error rate in monitoring
Unbounded retries or cost spike
Customer-facing impact
Any auth, billing, or data model touch

The Return

The measure of a good holiday mode system is what you come back to. Elite:

8-12 reviewable PRs ready for merge
Cleaned and prioritized issue queue
Updated specs with open questions flagged
Ranked risks with supporting evidence
Daily digests summarizing what happened
A few blocked escalations with clear context

That is not “autonomous company.” That is high-trust constrained operations. The agents kept the factory warm. You come back and immediately ship.

The Deeper Principle

You are not building an “agent system.” You are building a programming language for agents to operate inside.

Your boilerplate defines:

Grammar (what is allowed)
Syntax (how things are structured)
Semantics (what things mean)
Type system (what is valid)
Runtime (how things execute)

LLMs are just probabilistic interpreters of that language.

The agents that seem magical are not smarter. They are operating in environments where the boilerplate and specs have collapsed the search space so aggressively that correct output is the path of least resistance.

Orchestration scales agents. But boilerplate and specs make them correct. The first is a commodity. The second two are where the real engineering lives.

Own Your Control Plane – The deeper argument for owning primitives
Agent-Driven Development – The workflow these layers serve
Orchestration Patterns – Coordinator, swarm, pipeline patterns
Building the Factory – Meta-infrastructure for compounding
Constraint-First Development – Constraints as specification
Invariants in Programming and LLM Generation – Why typed boundaries reduce agent search space
Agent-Native Architecture – Designing for agents as first-class citizens
Infrastructure Principles – Compound infrastructure philosophy