Agent-Driven Development

James Phoenix

A workflow where AI agents execute development tasks from structured specs, with humans controlling the task layer rather than writing code directly.

The Three Pillars

You must control:

Tasks – The work queue (what gets done)
Orchestration – The loop (how work flows)
Memory – Context persistence (optional but powerful)

Control the task layer and you can endlessly add work. Spin up 1-3 workers grinding tasks while you focus on specs and review.

The Core Loop

PRD + Design Doc → Worker → Tests

Write PRD (what)
Write design doc (how, interfaces, testing strategy)
Worker generates implementation
Tests validate correctness
Human reviews and iterates

This is “best iteration guess” mode. You front-load thinking into specs, agent does the grind.

Two-Phase Workflow

Phase 1: Spec-Driven Implementation

PRD + Design Doc → Worker → Tests → Review

Get the first working iteration from structured specs.

Phase 2: Agent Swarms for Polish

Agent Swarm on services/<service> → Bug Detection → Fixes

Later, run agent swarms on globs to find bugs, inconsistencies, and cleanup opportunities. Target specific directories:

services/<service-name>
src/components/
lib/

Swarms do the tedious polish work humans skip.

PRD Quality vs Trajectory

Better specs reduce random search. The relationship follows a log curve:

Trajectory Accuracy
        ↑
        │                          day 4+  ← Extremely accurate trajectory
        │                     day 3 ·
        │                day 2 ·
        │           ·
        │       ·
        │    ·
        │  ·
        │·  day 1
        │
        └─────────────────────────────────→ Days spent on PRD + Design Doc
          ↑
     Not accurate
      trajectory

Time Investment	Trajectory Alignment
1 day	~60%
3 days	~80%
5 days	~90%

Key insight: The first day of spec work provides massive alignment gains. After that, diminishing returns. But skipping specs entirely means agents wander randomly.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

Same principle applies to humans: better specs = less wasted iteration. This is real computer science, not an agent limitation.

PRD vs Design Doc Split

PRD (What):

Feature requirements
User stories
Architectural constraints (“Redis for X because of latency, Postgres for Y because of ACID”)
Success criteria

Design Doc (How):

Interfaces and types
Implementation approach
Testing strategy
File structure

Link everything in an index.md. Your task system can then create tasks directly from the docs.

Why You Must Own the Task Layer

If the entire stack is prompts all the way down, then whoever controls the task layer controls what gets built. The orchestration flow encodes your domain knowledge. If a tool dictates the flow, it’s not a tool. It’s a competitor.

This is why frameworks are the wrong abstraction. You need primitives.

Headless vs Framework

Framework approach: Opinionated flow, batteries included, less flexibility. The framework owns your orchestration. You’re renting.

Headless approach (recommended): Just provide tasks + memory primitives (like TanStack does for state). You own your orchestration. You’re building.

Headless wins because:

Every codebase has different needs
Orchestration patterns vary by domain
You want control, not magic
The orchestration IS your product. You can’t outsource it.

tx: The Control Plane

tx is the concrete implementation of this philosophy. Primitives for AI agents, not a framework. Headless infrastructure for memory, tasks, and orchestration.

Core primitives:

Primitive	What it does
`tx ready`	Get next workable task
`tx claim`	Lease-based claim so parallel agents don’t collide
`tx done`	Complete a task
`tx block`	Declare dependencies between tasks
`tx handoff`	Transfer work with context
`tx context`	Retrieve relevant learnings/memory

These are the minimal building blocks. You compose them into whatever orchestration your domain needs. No opinions on flow, no magic, just primitives you wire together.

tx-agent-kit is the full-stack starter (Effect, Temporal, Next.js, Drizzle) that shows one way to compose these primitives into a working system.

Getting Started

Burn $100-200/month on a project just to get used to having a worker constantly running. This builds intuition for:

How to structure specs for agents
What tasks to automate vs. do manually
How to review agent output efficiently
When to intervene vs. let it run

This is the mainstream future. The skill is learning to direct agents, not compete with them.

It’s Prompts All the Way Down

Everything in agent-driven development reduces to the same primitive: a prompt that produces work. The entire stack is just prompts invoking prompts.

Vision (prompt)
  → PRDs (prompts that define what to build)
    → Design Docs (prompts that define how)
      → Tasks (prompts that agents execute)
        → Sub-tasks (prompts spawned by tasks)
          → Code, tests, docs (output)

There is no magic layer. A PRD is a prompt. A task is a prompt. A design doc is a prompt. An agent’s system message is a prompt. The only difference is scope and audience. Each layer is a prompt that generates the layer below it.

The Task Layer is Self-Recursive

The task layer is special because it can feed itself. A task is just a prompt, and a prompt can say “generate more prompts.”

meta-task (prompt) → 10-50 concrete tasks (prompts)
  ├── task A (prompt → code)
  ├── task B (prompt → code)
  ├── ...
  └── task N (prompt → more prompts → more tasks)

This is not a hack. It’s the natural consequence of the stack being prompts all the way down. Any layer can generate any other layer.

Why This Works

When agents run on a cron or loop, task completion can outpace task creation. The queue drains and the system stalls.

Meta-tasks fix this. They are tasks whose only job is to generate more tasks. The queue feeds itself. “Grow outcomes from outcomes.”

Examples:

“Read these 10 PRDs and create implementation tasks for each”
“Audit the codebase for missing tests and create a task per gap”
“Break this epic into 20 sub-tasks with acceptance criteria”

When to Use

Cron-based agent loops where completion rate > creation rate
Bootstrapping a new project (e.g. “generate tasks for all 10 PRDs”)
Expanding scope without manual intervention (side projects, experiments)

Guardrails

Risk	Mitigation
Runaway expansion	Cap meta-task depth or ratio (e.g. 1 in 10 generated tasks is a meta-task)
Quality decay	Tasks drift from intent the further from root. Anchor meta-tasks to PRDs/specs
Cron bottleneck	If hourly cron still can’t keep up, increase frequency or batch size

Best suited for side projects and experiments where full human-in-the-loop isn’t needed. For production systems, keep the human in the loop on task creation.

The Meta

It’s prompts all the way down. PRDs, design docs, tasks, agent configs. Same primitive, different scope.
Control the task layer. That’s where leverage lives.
The task layer is self-recursive. Tasks can create tasks. Prompts can create prompts.
Structure specs in linked PRD + design docs (see index pattern)
Workers grind the implementation
Agent swarms clean up
Humans set vision, review, and handle edge cases

You become the architect. Agents become the workforce. The task queue is the machine.

Related: