Online Learning via Constraints: The Worker-Observe-Constrain Loop

James Phoenix
James Phoenix

You are not “coding with an LLM.” You are running a compute fabric for reasoning, then constraining it based on observed failures. That is online learning applied to software production.

Author: James Phoenix | Date: February 2026


Summary

The agentic development loop (worker churns, you observe failures, you add constraints) is formally online learning. The worker is a policy generating actions. You observe counterexamples. You add constraints that shrink the action space. Each constraint is proof you found an invariant. The sentence test: “In this codebase, X must never happen, because it causes Y.” If you cannot write that crisply, the constraint is premature. This reframing explains why the loop compounds, why it breaks, and what the human’s actual job is.


The Loop Formalized

┌──────────────────────────────────────────────┐
│                                              │
│   Worker generates changes (policy)          │
│              │                               │
│              ▼                               │
│   You observe failures (counterexamples)     │
│              │                               │
│              ▼                               │
│   You add constraints (shrink action space)  │
│              │                               │
│              ▼                               │
│   Worker operates in smaller space           │
│              │                               │
│              └───────────── loop ────────────┘
│
└──────────────────────────────────────────────┘

In ML terms:

Software Loop ML Equivalent
Worker generates code Policy produces actions
You catch a problem Observe negative reward signal
You add a constraint Update the policy (shrink action space)
Worker produces better code Policy improves in the constrained space

This is not pair programming. It is not code review. It is stochastic optimization where the human provides the loss signal and the constraint updates.


The Unit of Work Shift

This loop changes what “engineering” means.

Before

Unit of progress:  "I type correct code"
Constraint:        Human working memory + keystrokes + time
Bottleneck:        Execution speed

After

Unit of progress:  "I specify, evaluate, and constrain systems"
Constraint:        Clarity of intent + quality of invariants + feedback loops
Bottleneck:        Thinking, not typing

You are no longer competing on speed of fingers. You are competing on:

  • Quality of specifications
  • Decomposition skill
  • Ability to detect and correct drift
  • Ability to define loss functions for systems

That shift feels like cheating because most engineers are still rewarded for the old unit. But it is engineering in the original sense: defining constraints that a system must satisfy.


Each Constraint Is Proof of an Invariant

When you add a constraint (type, primitive, lint rule, test), you are encoding a discovered invariant.

The sentence test: Every constraint should map to a sentence of the form:

“In this codebase, X must never happen, because it causes Y.”

Constraint Invariant Sentence
Branded TenantId type “A tenant ID must never be confused with a user ID, because it causes data leakage.”
tenantDb.query() primitive “A database query must never skip tenant scoping, because it exposes other tenants’ data.”
no-floating-promises lint rule “A promise must never be left unhandled, because it silently swallows errors.”
Idempotency test “A webhook handler must never double-process, because it causes duplicate charges.”

If you cannot write the sentence crisply, the constraint is premature. You have not yet identified the invariant. The problem might be a one-off mistake, not a pattern.


Why This Is Stochastic Optimization

Each cycle of the loop is a noisy gradient step:

1. Worker samples from the action space A(t)
2. Some actions produce bugs (positive loss)
3. You observe which actions caused positive loss
4. You remove those actions from the space: A(t+1) ⊂ A(t)
5. Worker samples from the smaller space A(t+1)

Over time:

|A(0)| > |A(1)| > |A(2)| > ... > |A(n)|

The action space shrinks monotonically. The worker’s output quality increases because the set of possible outputs has fewer bad options.

This is the same principle as quality gates reducing state space, but applied to the development process itself, not just the code.


Constraint Explosion: When the Loop Breaks

The loop breaks when constraints grow faster than understanding.

Healthy signal:

New constraints per week:      2-3
Issue recurrence rate:         Decreasing
Worker productivity:           Stable or increasing
Time to understand new constraint: < 5 minutes

Unhealthy signal:

New constraints per week:      10+
Issue recurrence rate:         Flat (not decreasing)
Worker productivity:           Declining (fighting constraints)
Time to understand new constraint: > 15 minutes

If rule count goes up but your “caught something” rate stays flat, you are not converging. You are just moving the mess around.

Root Causes of Constraint Explosion

  1. Wrong layer: Adding lint rules for problems that need types or primitives. See the escalation ladder.
  2. Too specific: Each rule handles one instance instead of a class. Constraints should target bug classes, not individual bugs.
  3. No deletion: Old rules accumulate without review. Rules must pay rent.
  4. Agent routing around: The worker learns to satisfy constraints mechanically without understanding the intent. Constraints become a prison instead of a guide.

The Competence Test

If Claude disappeared tomorrow, would I understand the system well enough to fix it?

If yes, you are engineering. The worker is amplifying your capability.

If no, you are outsourcing thinking. The worker is building something you do not own.

This test distinguishes leverage from dependency:

Leverage Dependency
You define constraints, worker executes Worker decides, you approve
You understand every invariant Some invariants are mysterious
You can explain the architecture “It works, I think”
Removing the worker slows you down Removing the worker stops you

The loop should make you more capable over time, not less. Each cycle teaches you an invariant about your system. If you are not learning from the failures you observe, you are not engineering.


Why This Makes Good Engineers Terrifying

This loop does not make bad engineers good. It amplifies existing capability.

The loop punishes:

  • Vague thinking (constraints require precision)
  • Bad abstractions (the worker amplifies them at scale)
  • Missing domain knowledge (you cannot constrain what you do not understand)
  • Ignoring feedback (the loop only works if you act on counterexamples)

The loop rewards:

  • Clear specification
  • Deep domain modeling
  • Invariant thinking
  • Willingness to tighten constraints permanently

An engineer who:

  • owns their primitives
  • defines constraints precisely
  • learns from every counterexample
  • tightens the action space each cycle

…produces output that looks “unfair” to someone operating in the old paradigm. It is not unfair. It is a different game on a different layer.


The Historical Pattern

Every real tooling step-change follows this pattern:

1. Early adopters feel guilty ("this feels like cheating")
2. Incumbents dismiss it ("not real engineering")
3. Tooling becomes table stakes
4. The bar for "competent" moves up
5. The gap widens permanently

Assembly to C. Manual memory to garbage collection. Hand-rolled servers to cloud. Imperative spaghetti to typed FP. Human coding to constraint-based agent development.

The discomfort is just the sound of the ladder being kicked away behind you.


Practical Implementation

Daily Loop

1. Worker churns through task queue
2. At end of cycle, review diffs and issues
3. For each problem found:
   a. Is this a one-off? → Fix locally
   b. Is this a recurring class? → Add constraint at the right layer
   c. Is this a new invariant? → Write the sentence, encode it
4. Update loss metrics
5. Repeat

Constraint Log

Maintain a log of constraints added and their impact:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
## Constraint Log

| Date | Constraint | Layer | Invariant | Recurrence After |
|------|-----------|-------|-----------|-----------------|
| 2026-01-15 | Branded TenantId | Type | "Never confuse tenant and user IDs" | 0 |
| 2026-01-18 | tenantDb primitive | Primitive | "Never query without tenant scope" | 0 |
| 2026-01-22 | no-floating-promises | Lint | "Never leave promises unhandled" | 1 (fixed) |
| 2026-01-25 | Idempotency property test | Test | "Never double-process webhooks" | 0 |

If “Recurrence After” is consistently zero, your constraints are effective. If nonzero, the constraint layer is wrong.


Key Insight

You are not coding. You are running a stochastic optimizer and constraining it with discovered invariants. The output looks like code. The process is optimization. The skill is knowing which constraints to add, at which layer, and when to stop.


Related

Topics
Action SpaceConstraintsCounterexampleInvariantLeverageMeta EngineeringOnline LearningPolicyStochastic OptimizationUnit Of Work

More Insights

Cover Image for Own Your Control Plane

Own Your Control Plane

If you use someone else’s task manager, you inherit all of their abstractions. In a world where LLMs make software a solved problem, the cost of ownership has flipped.

James Phoenix
James Phoenix
Cover Image for Indexed PRD and Design Doc Strategy

Indexed PRD and Design Doc Strategy

A documentation-driven development pattern where a single `index.md` links all PRDs and design documents, creating navigable context for both humans and AI agents.

James Phoenix
James Phoenix