Synthetic Loss Functions for Agent Swarms: Treating Software Production as Optimization

James Phoenix
James Phoenix

Traditional development measures progress by output (features shipped). The right measure is whether system error decreases over time.

Author: James Phoenix | Date: February 2026


Summary

In ML, you do not ask “is the model good?” You define a loss function, then optimise it down over time. Apply the same principle to software production with agent swarms. Define a composite scalar L_total from measurable terms (spec violations, test failures, architectural drift, type errors, regressions, operational gaps, unknowns). Track it per cycle. Stop when it is low, flat, and bounce-free. This turns “is the system done?” from a vibes question into a measurable one.


Why Software Needs a Scalar Loss Function

Agent swarms can produce enormous output. Output volume is not a signal of progress. A swarm that generates 10,000 lines of code per cycle might be making the system worse.

The problem:

"the code feels messy"not quantified
"this is risky"not quantified
"the architecture is drifting"not quantified
"we should refactor"not quantified

None of these are continuously optimised. Progress is measured indirectly via features shipped, not system health.

ML systems solved this decades ago with an explicit loss function. The system is judged by whether loss decreases over time. Software production needs the same thing.


The Formula

Define total system loss as a weighted sum of 7 measurable terms:

L_total = w₁·L_spec + w₂·L_tests + w₃·L_arch + w₄·L_types + w₅·L_reg + w₆·L_ops + w₇·L_unknown

Where each term is a non-negative scalar counting unresolved issues, and each weight wᵢ encodes how much you care about that category.

The system objective:

Minimise L_total over time

L_total(t+1) ≤ L_total(t)

If loss does not trend downward, the system is not improving, regardless of how many features are shipped.


The 7 Loss Terms

1. Spec Loss (L_spec)

Divergence between PRDs/design docs and the actual codebase.

Signal Example
Feature implemented differently from spec Auth flow uses sessions when spec says JWT
Missing required behaviour Rate limiting not implemented
Extra behaviour not in spec Unauthorized admin endpoint added
L_spec = Σᵢ w_specᵢ    (weighted by severity)

2. Test Loss (L_tests)

Failing, flaky, or missing tests.

Signal Example
Failing tests 3 unit tests broken after refactor
Flaky tests Payment test passes 80% of runs
Missing critical path coverage No test for session expiry
L_tests = Σᵢ w_testᵢ

Analogous to training error in ML.

3. Architectural Loss (L_arch)

Structural decay that compounds future loss.

Signal Example
Layer violations UI component imports from database layer
Circular dependencies Module A depends on B depends on A
Broken domain boundaries Payment logic leaks into auth module
L_arch = Σᵢ w_archᵢ

This term is the most dangerous because it compounds. A layer violation today means 10 bugs tomorrow.

4. Type and Invariant Loss (L_types)

Violations of static types, runtime invariants, Effect error contracts, ESLint domain rules.

Signal Example
TypeScript errors 12 type errors after dependency update
Effect channel violations Unhandled error in Effect pipeline
Lint violations 8 architectural lint rule breaches
L_types = Σᵢ w_typeᵢ

This term collapses the error search space. It is the cheapest loss to reduce.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course

5. Regression Loss (L_reg)

Previously fixed issues that reappear. Regressions indicate unstable convergence.

Signal Example
Reopened issues Bug #342 reintroduced by refactor
Reverted fixes Rate limiter broken after auth changes
Test de-stabilization Stable test now flaky
L_reg = Σᵢ w_regᵢ

A rising L_reg means the system is not learning. Weight this term heavily.

6. Operational Loss (L_ops)

Runtime health signals.

Signal Example
Performance regressions p99 latency doubled
Memory leaks Heap grows 5MB/hour
Observability gaps No metrics on payment endpoint
Missing alerts No alert for database connection failures
L_ops = Σᵢ w_opsᵢ

7. Unknown Loss (L_unknown)

Uncertainty about system correctness. This is the term most people miss.

Signal Example
Code changed without tests 200 lines in payment module, no test
High-risk module with low observability Auth has no structured logging
New code paths without spec references Endpoint added with no design doc
L_unknown = Σᵢ w_unknownᵢ

You want L_unknown near zero before calling anything “done.” It is a penalty for “we do not know if this is safe.”


Weighting Strategies

Weights encode your priorities. There is no universal weighting, but a reasonable starting point:

Term Weight Rationale
L_reg 3.0 Regressions destroy trust in convergence
L_spec 2.0 Spec violations mean the wrong thing was built
L_arch 2.0 Architectural decay compounds future loss
L_types 1.5 Types are cheap to fix, expensive to ignore
L_tests 1.0 Baseline correctness signal
L_unknown 1.0 Uncertainty penalty
L_ops 0.5 Important but less urgent pre-release

Adjust per phase. During a polish phase, increase L_reg weight. During early growth, tolerate higher L_unknown temporarily.


The Stop Condition

Do not stop when L_total is low once. Stop when it is low and stable.

Stop if ALL of:
  1. L_total < T                          (below threshold)
  2. slope(L_total, last K cycles) ≈ 0    (no meaningful improvement left)
  3. regressions in last K cycles == 0     (no bounce)
  4. L_unknown < U                         (uncertainty is low)

In words: low, flat, and no bounce.

This avoids the classic failure: loss dips, you stop, then it pops back up next cycle.

Stop Condition Visualized

Healthy (ready to stop):

L_total
  
  │╲
   
    
     ╲___________   flat, below threshold
  
  └───────────────── cycles


Unhealthy (not ready):

L_total
  
  │╲  ╱╲  ╱╲
   ╲╱  ╲╱        sawtooth, regressions
             ╲╱
  └───────────────── cycles


Chaotic (no convergence):

L_total
  
   ╱╲╱╲╱╲╱╲╱╲╱╲   noisy flat, no progress
  
  └───────────────── cycles

Implementation: Per-Cycle Tracking

Store per cycle:

interface CycleMetrics {
  cycle: number;
  timestamp: string;

  // Raw counts per loss term
  specViolations: number;
  failingTests: number;
  flakyTests: number;
  archViolations: number;
  typeErrors: number;
  lintViolations: number;
  regressions: number;
  opsIssues: number;
  unknownRiskAreas: number;

  // Computed
  L_total: number;

  // Deltas (what changed this cycle)
  issuesClosed: string[];
  issuesOpened: string[];
  regressionEvents: string[];

  // Churn metrics
  filesChanged: number;
  linesAdded: number;
  linesRemoved: number;
}

Compute L_total each cycle:

function computeLoss(m: CycleMetrics, weights: Weights): number {
  return (
    weights.spec * m.specViolations +
    weights.tests * (m.failingTests + m.flakyTests) +
    weights.arch * m.archViolations +
    weights.types * (m.typeErrors + m.lintViolations) +
    weights.reg * m.regressions +
    weights.ops * m.opsIssues +
    weights.unknown * m.unknownRiskAreas
  );
}

Tracking the Trend

function isConverging(history: CycleMetrics[], windowSize: number): boolean {
  const recent = history.slice(-windowSize);
  const losses = recent.map(m => m.L_total);

  // Check monotonic decrease (with tolerance)
  const slope = linearRegressionSlope(losses);
  const noRegressions = recent.every(m => m.regressions === 0);
  const belowThreshold = losses[losses.length - 1] < THRESHOLD;

  return slope <= 0 && noRegressions && belowThreshold;
}

Loss Curve Patterns

Smooth Decay (Healthy)

Loss decreases monotonically. Swarm is converging. Constraints are biting. Regressions are rare.

What this looks like in practice:

  • Issue count drops each cycle
  • Same bug class does not reappear
  • Diff sizes shrink over time
  • Agent output becomes more repetitive (running out of things to fix)

Sawtooth (Regressions)

Loss drops then spikes repeatedly. The swarm is fixing things and breaking things at the same rate.

Root causes:

  • Insufficient regression tests
  • Agents rewriting modules without understanding dependencies
  • No accept/reject gate on changes

Fix: Increase regression penalty weight, add regression memory, tighten acceptance gate.

Noisy Flat (No Progress)

Loss oscillates around a fixed value. No real improvement despite high activity.

Root causes:

  • Agents performing random walks (no gradient signal)
  • Constraints too weak to guide behaviour
  • Loss function missing a key term
  • Goodharting on proxy metrics

Fix: Review whether all loss terms are actually being measured. Add the missing term. Tighten constraints.

Monotonic Rise (Diverging)

Loss increases each cycle. The swarm is making things actively worse.

Root causes:

  • No acceptance gate (all changes applied)
  • Agents introducing complexity faster than they resolve it
  • Architectural violations compounding

Fix: Stop the swarm. Audit constraints. Add an explicit accept/reject gate.


Integration with Existing Quality Gates

This framework does not replace quality gates. It wraps them. Each quality gate contributes to reducing one or more loss terms:

Quality Gate Loss Term Reduced
Type checker L_types
ESLint L_types, L_arch
Unit tests L_tests
Integration tests L_tests, L_spec
E2E tests L_spec
Security scan L_ops
Architecture linter L_arch
Spec review L_spec
Coverage check L_unknown

The loss function aggregates what gates measure into a single trend line.


Key Insight

Agents do not need to be correct. They need to be biased toward negative delta-L. Individual agents can be wrong as long as the aggregate signal drives loss down.

This is why parallelism works. Each agent samples a noisy estimate of where loss can be reduced. The orchestrator applies only the updates that actually reduce loss. The swarm converges not because any single agent is smart, but because the accept/reject gate filters for improvement.


Related

Topics
Agent SwarmsConvergenceLoss FunctionMetricsOptimizationQuality GatesRegressionSoftware ProductionStochastic OptimizationStop Condition

More Insights

Cover Image for Own Your Control Plane

Own Your Control Plane

If you use someone else’s task manager, you inherit all of their abstractions. In a world where LLMs make software a solved problem, the cost of ownership has flipped.

James Phoenix
James Phoenix
Cover Image for Indexed PRD and Design Doc Strategy

Indexed PRD and Design Doc Strategy

A documentation-driven development pattern where a single `index.md` links all PRDs and design documents, creating navigable context for both humans and AI agents.

James Phoenix
James Phoenix