The five layers of an agent system are not mutually exclusive. You do not need alpha in all of them. But you need to know which ones are hard and which ones are solved.
The Five Layers
Every agent-powered development system decomposes into five layers. They compound when combined, but each stands alone. You can outsource some and invest deeply in others.
| Layer | What It Does | Example Tools |
|---|---|---|
| 1. Orchestration | Schedules agents, manages budgets, parallelises, routes, retries | Paperclip, OpenClaw, custom DAGs |
| 2. Tasks | Defines the work queue, dependencies, state transitions, claims | tx, Linear, custom task graphs |
| 3. Specs | Defines what “correct” means, acceptance criteria, invariants, PRDs | tx specs, design docs, typed schemas |
| 4. Boilerplate / Harness | The codebase environment agents operate inside: folder structure, typed boundaries, ESLint rules, test harness, patterns | tx-agent-kit, CLAUDE.md, custom starters |
| 5. Runtime | Actually executes work: workflows, persistence, domain adapters, tool access | Effect, Temporal, Drizzle, domain code |
These layers are not mutually exclusive. You could outsource orchestration entirely to Paperclip and focus all your energy on making specs incredibly precise. You could build a boilerplate with a world-class test harness and use someone else’s task system. You could go deep on runtime and treat specs as lightweight markdown files. The layers compound when combined, but you pick where to invest based on where you have the most leverage.
Orchestration Is Mainly Solved
Agent orchestration is converging to commodity. The core primitives are well-understood:
- Task queues with priority and claim semantics
- DAGs for dependency resolution
- Heartbeats for liveness detection
- Schedulers for cron, event-driven, and on-demand dispatch
- Budget controls for token/cost limits
- Dashboards for visibility
These are distributed systems problems with known solutions. Multiple frameworks already solve them well. The orchestration layer is important, but it is not where the difficulty lives.
You can buy orchestration and lose very little. Paperclip, OpenClaw, or even a custom cron + task queue will get you 90% of the way. The marginal return on building your own orchestrator is low unless orchestration itself is your product.
Boilerplate and Specs Are the Hard Layers
The hardest problem is not “make an agent do something.” It is: make an agent do the right thing, repeatedly, inside a complex evolving system.
That difficulty lives almost entirely in two places: the boilerplate/harness and the specs.
Why Boilerplate Is Hard
Consider the agent’s job as navigating a state space S (all possible repo states) by choosing actions A (edits, refactors, commands). Without strong boilerplate, the agent does blind search over S. That is hopeless.
Your boilerplate is a reduction of the search space: S to S’, where S’ is smaller, more structured, more predictable.
When you use Effect layers, repository pattern, strict ESLint rules, consistent folder structure, typed errors, and deterministic interfaces, you collapse the action space. Valid actions become obvious. Invalid actions get rejected by the compiler or linter before the agent even finishes. The agent is not “intelligent.” It is operating in a heavily regularized space.
This is why an engineer with strong boilerplate barely writes code themselves and still ships. The environment does the work.
Examples of boilerplate that makes agents dramatically better:
- A test harness that catches regressions before merge
- ESLint rules that enforce architectural boundaries
- Typed error hierarchies that prevent silent failures
- Repository patterns that make data access predictable
- CLAUDE.md files that encode domain knowledge into agent context
Why Specs Are Hard
Specs define the target. Without precise specs, even a perfect agent in a perfect harness will build the wrong thing.
The difficulty is not writing a document. The difficulty is:
- Defining “correct” precisely enough that a machine can verify it
- Decomposing ambiguous requirements into testable invariants
- Mapping business intent to acceptance criteria an agent can check
- Keeping specs updated as the system evolves
Specs and boilerplate form a feedback loop. Better specs tell agents what to build. Better boilerplate constrains how they build it. Together they collapse the search space from “anything an LLM could generate” to “the small set of correct implementations.”
Difficulty Comparison
| Layer | Difficulty | Why |
|---|---|---|
| Orchestration | 5/10 | Solved problem. Tasks + DAGs + heartbeats + schedulers |
| Tasks | 6/10 | State machines, dependency graphs. Well-understood patterns |
| Specs | 9/10 | Defining truth precisely, mapping to testable invariants |
| Boilerplate / Harness | 10/10 | Correctness under ambiguity, evolving constraints, long-range dependencies |
| Runtime | 7/10 | Domain-specific but built on proven foundations (Effect, Temporal, etc.) |
You Do Not Need Alpha in All Five
This is the key insight. The layers compound, but you can build a strong position by going deep on just one or two.
Scenario A: Spec-focused. You build the best spec tooling in a niche. Your PRDs, design docs, and acceptance criteria are so precise that any agent framework can execute them reliably. The specs are the product.
Scenario B: Harness-focused. You build a boilerplate/starter with an incredible test harness, typed boundaries, and ESLint rules. Any agent dropped into this environment immediately becomes more productive. The harness is the product.
Scenario C: Full-stack. You invest in specs, boilerplate, and runtime together. They compound. This is the most powerful position but the most expensive to build.
Scenario D: Orchestration-focused. You build the best coordinator, dashboard, budget system. This is where Paperclip and OpenClaw live. It is a valid business, but it is the most commoditized layer.
The strategic question is: where do you have the most alpha? Invest there. Outsource the rest.
The Human Loop Is the Alpha
The most important layer is not in the table above. It is the human in the loop.
Agents optimize for local completion. They finish the task in front of them. They do not know whether the task matters. They do not know whether the feature should exist. They do not know whether the market shifted last week. They cannot tell the difference between “technically correct” and “strategically right.”
The human provides:
- Taste. Which of the ten valid implementations is the one customers actually want?
- Strategy. What to build next, what to stop building, what to ignore entirely.
- Judgment under ambiguity. When the spec is unclear, agents guess. Humans ask the right question.
- Risk calibration. Agents treat all tasks equally. Humans know which ones are load-bearing.
- Narrative. Why this product exists, who it serves, what story it tells.
Do not try to build a zero-human company. That is not the goal. The goal is to make the human’s time absurdly leveraged. One hour of human direction should produce 10-20 hours of agent output. That ratio is already achievable today with good specs and a strong harness.
The fantasy of “agents run everything while I am on a beach” collapses for the same reason fully autonomous vehicles keep stalling. The long tail of edge cases is where all the value and all the danger lives. Agents handle the 80% of predictable work brilliantly. The 20% that requires judgment, context, and taste is where humans earn their keep.
The right mental model: you are the executive function. Agents are the workforce. An executive who disappears entirely gets a company that drifts. An executive who shows up for 2 focused hours a day and directs a tireless workforce gets disproportionate output.
Day-to-Day vs Holiday Mode
Agents should run differently depending on whether a human is in the loop or not. The difference is not capability. It is permission scope.
Day-to-Day (Human in the Loop)
You are actively directing work. Agents are extensions of your hands. This is where most of the value is created.
| Activity | How It Works |
|---|---|
| Feature development | You write specs or PRDs. Agents implement, you review and merge. Tight feedback loop. |
| Bug fixes | You triage and prioritize. Agents draft fixes, run tests. You approve. |
| Refactoring | You define the target architecture. Agents execute the migration in branches. |
| Code review | Agents run swarms for security, performance, maintainability. You make the call. |
| Spec refinement | You iterate on specs with agents. Back-and-forth until acceptance criteria are precise. |
| Exploratory work | Agents research, prototype, summarize options. You pick the direction. |
Day-to-day is high-trust, high-bandwidth. Agents can merge, deploy, and make changes because you are watching. The human provides judgment, taste, and strategy. The agents provide speed and thoroughness. This mode is where the real compounding happens, because human direction keeps the system on the right trajectory.
Holiday Mode (No Human in the Loop)
You are away. Agents operate inside a constrained policy. The goal is not “run the company.” The goal is keep momentum, prevent chaos, compound prepared work.
| Activity | Allowed | Why |
|---|---|---|
| Triage backlog | Yes | Low risk, high value on return |
| Run test suites | Yes | Catches regressions early |
| Draft PRs (do not merge) | Yes | Work is prepared for review |
| Prepare refactors behind flags | Yes | Reversible, non-destructive |
| Write specs from known templates | Yes | Queues future work |
| Summarize emails, issues, commits | Yes | Reduces catch-up time |
| Produce daily ops digests | Yes | Surfaces anomalies |
| Dependency bumps (minor/patch) | Yes | Low risk with test gates |
| Merge to main | No | Irreversible without review |
| Deploy to prod | No | Blast radius too high |
| Change billing/pricing | No | Financial commitment |
| Email customers automatically | No | Reputation risk |
| Change infra without approval | No | Hard to reverse |
| Publish content | No | Brand/accuracy risk |
| Delete data | No | Irreversible |
| Mutate core schemas | No | Cascading breakage |
| Architecture decisions | No | Strategic, requires taste |
Escalation triggers (stop autonomous action, queue for human):
- Ambiguous requirement with no clear spec
- Failing integration test that was previously green
- Conflicting specs or contradictory acceptance criteria
- Increased error rate in monitoring
- Unbounded retries or cost spike
- Customer-facing impact
- Any auth, billing, or data model touch
The Return
The measure of a good holiday mode system is what you come back to. Elite:
- 8-12 reviewable PRs ready for merge
- Cleaned and prioritized issue queue
- Updated specs with open questions flagged
- Ranked risks with supporting evidence
- Daily digests summarizing what happened
- A few blocked escalations with clear context
That is not “autonomous company.” That is high-trust constrained operations. The agents kept the factory warm. You come back and immediately ship.
The Deeper Principle
You are not building an “agent system.” You are building a programming language for agents to operate inside.
Your boilerplate defines:
- Grammar (what is allowed)
- Syntax (how things are structured)
- Semantics (what things mean)
- Type system (what is valid)
- Runtime (how things execute)
LLMs are just probabilistic interpreters of that language.
The agents that seem magical are not smarter. They are operating in environments where the boilerplate and specs have collapsed the search space so aggressively that correct output is the path of least resistance.
Orchestration scales agents. But boilerplate and specs make them correct. The first is a commodity. The second two are where the real engineering lives.
Related
- Own Your Control Plane – The deeper argument for owning primitives
- Agent-Driven Development – The workflow these layers serve
- Orchestration Patterns – Coordinator, swarm, pipeline patterns
- Building the Factory – Meta-infrastructure for compounding
- Constraint-First Development – Constraints as specification
- Invariants in Programming and LLM Generation – Why typed boundaries reduce agent search space
- Agent-Native Architecture – Designing for agents as first-class citizens
- Infrastructure Principles – Compound infrastructure philosophy

