A workflow where AI agents execute development tasks from structured specs, with humans controlling the task layer rather than writing code directly.
The Three Pillars
You must control:
- Tasks – The work queue (what gets done)
- Orchestration – The loop (how work flows)
- Memory – Context persistence (optional but powerful)
Control the task layer and you can endlessly add work. Spin up 1-3 workers grinding tasks while you focus on specs and review.
The Core Loop
PRD + Design Doc → Worker → Tests
- Write PRD (what)
- Write design doc (how, interfaces, testing strategy)
- Worker generates implementation
- Tests validate correctness
- Human reviews and iterates
This is “best iteration guess” mode. You front-load thinking into specs, agent does the grind.
Two-Phase Workflow
Phase 1: Spec-Driven Implementation
PRD + Design Doc → Worker → Tests → Review
Get the first working iteration from structured specs.
Phase 2: Agent Swarms for Polish
Agent Swarm on services/<service> → Bug Detection → Fixes
Later, run agent swarms on globs to find bugs, inconsistencies, and cleanup opportunities. Target specific directories:
services/<service-name>src/components/lib/
Swarms do the tedious polish work humans skip.
PRD Quality vs Trajectory
Better specs reduce random search. The relationship follows a log curve:
Trajectory Accuracy
↑
│ day 4+ ← Extremely accurate trajectory
│ day 3 ·
│ day 2 ·
│ ·
│ ·
│ ·
│ ·
│· day 1
│
└─────────────────────────────────→ Days spent on PRD + Design Doc
↑
Not accurate
trajectory
| Time Investment | Trajectory Alignment |
|---|---|
| 1 day | ~60% |
| 3 days | ~80% |
| 5 days | ~90% |
Key insight: The first day of spec work provides massive alignment gains. After that, diminishing returns. But skipping specs entirely means agents wander randomly.
Same principle applies to humans: better specs = less wasted iteration. This is real computer science, not an agent limitation.
PRD vs Design Doc Split
PRD (What):
- Feature requirements
- User stories
- Architectural constraints (“Redis for X because of latency, Postgres for Y because of ACID”)
- Success criteria
Design Doc (How):
- Interfaces and types
- Implementation approach
- Testing strategy
- File structure
Link everything in an index.md. Your task system can then create tasks directly from the docs.
Why You Must Own the Task Layer
If the entire stack is prompts all the way down, then whoever controls the task layer controls what gets built. The orchestration flow encodes your domain knowledge. If a tool dictates the flow, it’s not a tool. It’s a competitor.
This is why frameworks are the wrong abstraction. You need primitives.
Headless vs Framework
Framework approach: Opinionated flow, batteries included, less flexibility. The framework owns your orchestration. You’re renting.
Headless approach (recommended): Just provide tasks + memory primitives (like TanStack does for state). You own your orchestration. You’re building.
Headless wins because:
- Every codebase has different needs
- Orchestration patterns vary by domain
- You want control, not magic
- The orchestration IS your product. You can’t outsource it.
tx: The Control Plane
tx is the concrete implementation of this philosophy. Primitives for AI agents, not a framework. Headless infrastructure for memory, tasks, and orchestration.
Core primitives:
| Primitive | What it does |
|---|---|
tx ready |
Get next workable task |
tx claim |
Lease-based claim so parallel agents don’t collide |
tx done |
Complete a task |
tx block |
Declare dependencies between tasks |
tx handoff |
Transfer work with context |
tx context |
Retrieve relevant learnings/memory |
These are the minimal building blocks. You compose them into whatever orchestration your domain needs. No opinions on flow, no magic, just primitives you wire together.
tx-agent-kit is the full-stack starter (Effect, Temporal, Next.js, Drizzle) that shows one way to compose these primitives into a working system.
Getting Started
Burn $100-200/month on a project just to get used to having a worker constantly running. This builds intuition for:
- How to structure specs for agents
- What tasks to automate vs. do manually
- How to review agent output efficiently
- When to intervene vs. let it run
This is the mainstream future. The skill is learning to direct agents, not compete with them.
It’s Prompts All the Way Down
Everything in agent-driven development reduces to the same primitive: a prompt that produces work. The entire stack is just prompts invoking prompts.
Vision (prompt)
→ PRDs (prompts that define what to build)
→ Design Docs (prompts that define how)
→ Tasks (prompts that agents execute)
→ Sub-tasks (prompts spawned by tasks)
→ Code, tests, docs (output)
There is no magic layer. A PRD is a prompt. A task is a prompt. A design doc is a prompt. An agent’s system message is a prompt. The only difference is scope and audience. Each layer is a prompt that generates the layer below it.
The Task Layer is Self-Recursive
The task layer is special because it can feed itself. A task is just a prompt, and a prompt can say “generate more prompts.”
meta-task (prompt) → 10-50 concrete tasks (prompts)
├── task A (prompt → code)
├── task B (prompt → code)
├── ...
└── task N (prompt → more prompts → more tasks)
This is not a hack. It’s the natural consequence of the stack being prompts all the way down. Any layer can generate any other layer.
Why This Works
When agents run on a cron or loop, task completion can outpace task creation. The queue drains and the system stalls.
Meta-tasks fix this. They are tasks whose only job is to generate more tasks. The queue feeds itself. “Grow outcomes from outcomes.”
Examples:
- “Read these 10 PRDs and create implementation tasks for each”
- “Audit the codebase for missing tests and create a task per gap”
- “Break this epic into 20 sub-tasks with acceptance criteria”
When to Use
- Cron-based agent loops where completion rate > creation rate
- Bootstrapping a new project (e.g. “generate tasks for all 10 PRDs”)
- Expanding scope without manual intervention (side projects, experiments)
Guardrails
| Risk | Mitigation |
|---|---|
| Runaway expansion | Cap meta-task depth or ratio (e.g. 1 in 10 generated tasks is a meta-task) |
| Quality decay | Tasks drift from intent the further from root. Anchor meta-tasks to PRDs/specs |
| Cron bottleneck | If hourly cron still can’t keep up, increase frequency or batch size |
Best suited for side projects and experiments where full human-in-the-loop isn’t needed. For production systems, keep the human in the loop on task creation.
The Meta
- It’s prompts all the way down. PRDs, design docs, tasks, agent configs. Same primitive, different scope.
- Control the task layer. That’s where leverage lives.
- The task layer is self-recursive. Tasks can create tasks. Prompts can create prompts.
- Structure specs in linked PRD + design docs (see index pattern)
- Workers grind the implementation
- Agent swarms clean up
- Humans set vision, review, and handle edge cases
You become the architect. Agents become the workforce. The task queue is the machine.
Related:

