You are not “coding with an LLM.” You are running a compute fabric for reasoning, then constraining it based on observed failures. That is online learning applied to software production.
Author: James Phoenix | Date: February 2026
Summary
The agentic development loop (worker churns, you observe failures, you add constraints) is formally online learning. The worker is a policy generating actions. You observe counterexamples. You add constraints that shrink the action space. Each constraint is proof you found an invariant. The sentence test: “In this codebase, X must never happen, because it causes Y.” If you cannot write that crisply, the constraint is premature. This reframing explains why the loop compounds, why it breaks, and what the human’s actual job is.
The Loop Formalized
┌──────────────────────────────────────────────┐
│ │
│ Worker generates changes (policy) │
│ │ │
│ ▼ │
│ You observe failures (counterexamples) │
│ │ │
│ ▼ │
│ You add constraints (shrink action space) │
│ │ │
│ ▼ │
│ Worker operates in smaller space │
│ │ │
│ └───────────── loop ────────────┘
│
└──────────────────────────────────────────────┘
In ML terms:
| Software Loop | ML Equivalent |
|---|---|
| Worker generates code | Policy produces actions |
| You catch a problem | Observe negative reward signal |
| You add a constraint | Update the policy (shrink action space) |
| Worker produces better code | Policy improves in the constrained space |
This is not pair programming. It is not code review. It is stochastic optimization where the human provides the loss signal and the constraint updates.
The Unit of Work Shift
This loop changes what “engineering” means.
Before
Unit of progress: "I type correct code"
Constraint: Human working memory + keystrokes + time
Bottleneck: Execution speed
After
Unit of progress: "I specify, evaluate, and constrain systems"
Constraint: Clarity of intent + quality of invariants + feedback loops
Bottleneck: Thinking, not typing
You are no longer competing on speed of fingers. You are competing on:
- Quality of specifications
- Decomposition skill
- Ability to detect and correct drift
- Ability to define loss functions for systems
That shift feels like cheating because most engineers are still rewarded for the old unit. But it is engineering in the original sense: defining constraints that a system must satisfy.
Each Constraint Is Proof of an Invariant
When you add a constraint (type, primitive, lint rule, test), you are encoding a discovered invariant.
The sentence test: Every constraint should map to a sentence of the form:
“In this codebase, X must never happen, because it causes Y.”
| Constraint | Invariant Sentence |
|---|---|
| Branded TenantId type | “A tenant ID must never be confused with a user ID, because it causes data leakage.” |
tenantDb.query() primitive |
“A database query must never skip tenant scoping, because it exposes other tenants’ data.” |
no-floating-promises lint rule |
“A promise must never be left unhandled, because it silently swallows errors.” |
| Idempotency test | “A webhook handler must never double-process, because it causes duplicate charges.” |
If you cannot write the sentence crisply, the constraint is premature. You have not yet identified the invariant. The problem might be a one-off mistake, not a pattern.
Why This Is Stochastic Optimization
Each cycle of the loop is a noisy gradient step:
1. Worker samples from the action space A(t)
2. Some actions produce bugs (positive loss)
3. You observe which actions caused positive loss
4. You remove those actions from the space: A(t+1) ⊂ A(t)
5. Worker samples from the smaller space A(t+1)
Over time:
|A(0)| > |A(1)| > |A(2)| > ... > |A(n)|
The action space shrinks monotonically. The worker’s output quality increases because the set of possible outputs has fewer bad options.
This is the same principle as quality gates reducing state space, but applied to the development process itself, not just the code.
Constraint Explosion: When the Loop Breaks
The loop breaks when constraints grow faster than understanding.
Healthy signal:
New constraints per week: 2-3
Issue recurrence rate: Decreasing
Worker productivity: Stable or increasing
Time to understand new constraint: < 5 minutes
Unhealthy signal:
New constraints per week: 10+
Issue recurrence rate: Flat (not decreasing)
Worker productivity: Declining (fighting constraints)
Time to understand new constraint: > 15 minutes
If rule count goes up but your “caught something” rate stays flat, you are not converging. You are just moving the mess around.
Root Causes of Constraint Explosion
- Wrong layer: Adding lint rules for problems that need types or primitives. See the escalation ladder.
- Too specific: Each rule handles one instance instead of a class. Constraints should target bug classes, not individual bugs.
- No deletion: Old rules accumulate without review. Rules must pay rent.
- Agent routing around: The worker learns to satisfy constraints mechanically without understanding the intent. Constraints become a prison instead of a guide.
The Competence Test
If Claude disappeared tomorrow, would I understand the system well enough to fix it?
If yes, you are engineering. The worker is amplifying your capability.
If no, you are outsourcing thinking. The worker is building something you do not own.
This test distinguishes leverage from dependency:
| Leverage | Dependency |
|---|---|
| You define constraints, worker executes | Worker decides, you approve |
| You understand every invariant | Some invariants are mysterious |
| You can explain the architecture | “It works, I think” |
| Removing the worker slows you down | Removing the worker stops you |
The loop should make you more capable over time, not less. Each cycle teaches you an invariant about your system. If you are not learning from the failures you observe, you are not engineering.
Why This Makes Good Engineers Terrifying
This loop does not make bad engineers good. It amplifies existing capability.
The loop punishes:
- Vague thinking (constraints require precision)
- Bad abstractions (the worker amplifies them at scale)
- Missing domain knowledge (you cannot constrain what you do not understand)
- Ignoring feedback (the loop only works if you act on counterexamples)
The loop rewards:
- Clear specification
- Deep domain modeling
- Invariant thinking
- Willingness to tighten constraints permanently
An engineer who:
- owns their primitives
- defines constraints precisely
- learns from every counterexample
- tightens the action space each cycle
…produces output that looks “unfair” to someone operating in the old paradigm. It is not unfair. It is a different game on a different layer.
The Historical Pattern
Every real tooling step-change follows this pattern:
1. Early adopters feel guilty ("this feels like cheating")
2. Incumbents dismiss it ("not real engineering")
3. Tooling becomes table stakes
4. The bar for "competent" moves up
5. The gap widens permanently
Assembly to C. Manual memory to garbage collection. Hand-rolled servers to cloud. Imperative spaghetti to typed FP. Human coding to constraint-based agent development.
The discomfort is just the sound of the ladder being kicked away behind you.
Practical Implementation
Daily Loop
1. Worker churns through task queue
2. At end of cycle, review diffs and issues
3. For each problem found:
a. Is this a one-off? → Fix locally
b. Is this a recurring class? → Add constraint at the right layer
c. Is this a new invariant? → Write the sentence, encode it
4. Update loss metrics
5. Repeat
Constraint Log
Maintain a log of constraints added and their impact:
## Constraint Log
| Date | Constraint | Layer | Invariant | Recurrence After |
|------|-----------|-------|-----------|-----------------|
| 2026-01-15 | Branded TenantId | Type | "Never confuse tenant and user IDs" | 0 |
| 2026-01-18 | tenantDb primitive | Primitive | "Never query without tenant scope" | 0 |
| 2026-01-22 | no-floating-promises | Lint | "Never leave promises unhandled" | 1 (fixed) |
| 2026-01-25 | Idempotency property test | Test | "Never double-process webhooks" | 0 |
If “Recurrence After” is consistently zero, your constraints are effective. If nonzero, the constraint layer is wrong.
Key Insight
You are not coding. You are running a stochastic optimizer and constraining it with discovered invariants. The output looks like code. The process is optimization. The skill is knowing which constraints to add, at which layer, and when to stop.
Related
- Constraint Escalation Ladder – Choosing the right layer for each constraint
- Constraint-First Development – The philosophical foundation
- Synthetic Loss Functions – What the optimizer is minimizing
- Quality Gates as Information Filters – Gates as constraint mechanisms
- Swarm Convergence Theory – Why this loop converges (when it does)
- The Meta-Engineer Identity – Building systems that build systems
- Learning Loops – Encoding discoveries into prevention
- Skill Atrophy – What to keep sharp in the new paradigm
- Highest Leverage Points – Plans and validation as the human’s real job

