Synthetic Loss Functions for Agent Swarms: Treating Software Production as Optimization

James Phoenix

Traditional development measures progress by output (features shipped). The right measure is whether system error decreases over time.

Author: James Phoenix | Date: February 2026

Summary

In ML, you do not ask “is the model good?” You define a loss function, then optimise it down over time. Apply the same principle to software production with agent swarms. Define a composite scalar L_total from measurable terms (spec violations, test failures, architectural drift, type errors, regressions, operational gaps, unknowns). Track it per cycle. Stop when it is low, flat, and bounce-free. This turns “is the system done?” from a vibes question into a measurable one.

Why Software Needs a Scalar Loss Function

Agent swarms can produce enormous output. Output volume is not a signal of progress. A swarm that generates 10,000 lines of code per cycle might be making the system worse.

The problem:

"the code feels messy"         → not quantified
"this is risky"                → not quantified
"the architecture is drifting" → not quantified
"we should refactor"           → not quantified

None of these are continuously optimised. Progress is measured indirectly via features shipped, not system health.

ML systems solved this decades ago with an explicit loss function. The system is judged by whether loss decreases over time. Software production needs the same thing.

The Formula

Define total system loss as a weighted sum of 7 measurable terms:

L_total = w₁·L_spec + w₂·L_tests + w₃·L_arch + w₄·L_types + w₅·L_reg + w₆·L_ops + w₇·L_unknown

Where each term is a non-negative scalar counting unresolved issues, and each weight wᵢ encodes how much you care about that category.

The system objective:

Minimise L_total over time

L_total(t+1) ≤ L_total(t)

If loss does not trend downward, the system is not improving, regardless of how many features are shipped.

The 7 Loss Terms

1. Spec Loss (L_spec)

Divergence between PRDs/design docs and the actual codebase.

Signal	Example
Feature implemented differently from spec	Auth flow uses sessions when spec says JWT
Missing required behaviour	Rate limiting not implemented
Extra behaviour not in spec	Unauthorized admin endpoint added

L_spec = Σᵢ w_specᵢ    (weighted by severity)

2. Test Loss (L_tests)

Failing, flaky, or missing tests.

Signal	Example
Failing tests	3 unit tests broken after refactor
Flaky tests	Payment test passes 80% of runs
Missing critical path coverage	No test for session expiry

L_tests = Σᵢ w_testᵢ

Analogous to training error in ML.

3. Architectural Loss (L_arch)

Structural decay that compounds future loss.

Signal	Example
Layer violations	UI component imports from database layer
Circular dependencies	Module A depends on B depends on A
Broken domain boundaries	Payment logic leaks into auth module

L_arch = Σᵢ w_archᵢ

This term is the most dangerous because it compounds. A layer violation today means 10 bugs tomorrow.

4. Type and Invariant Loss (L_types)

Violations of static types, runtime invariants, Effect error contracts, ESLint domain rules.

Signal	Example
TypeScript errors	12 type errors after dependency update
Effect channel violations	Unhandled error in Effect pipeline
Lint violations	8 architectural lint rule breaches

L_types = Σᵢ w_typeᵢ

This term collapses the error search space. It is the cheapest loss to reduce.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

5. Regression Loss (L_reg)

Previously fixed issues that reappear. Regressions indicate unstable convergence.

Signal	Example
Reopened issues	Bug #342 reintroduced by refactor
Reverted fixes	Rate limiter broken after auth changes
Test de-stabilization	Stable test now flaky

L_reg = Σᵢ w_regᵢ

A rising L_reg means the system is not learning. Weight this term heavily.

6. Operational Loss (L_ops)

Runtime health signals.

Signal	Example
Performance regressions	p99 latency doubled
Memory leaks	Heap grows 5MB/hour
Observability gaps	No metrics on payment endpoint
Missing alerts	No alert for database connection failures

L_ops = Σᵢ w_opsᵢ

7. Unknown Loss (L_unknown)

Uncertainty about system correctness. This is the term most people miss.

Signal	Example
Code changed without tests	200 lines in payment module, no test
High-risk module with low observability	Auth has no structured logging
New code paths without spec references	Endpoint added with no design doc

L_unknown = Σᵢ w_unknownᵢ

You want L_unknown near zero before calling anything “done.” It is a penalty for “we do not know if this is safe.”

Weighting Strategies

Weights encode your priorities. There is no universal weighting, but a reasonable starting point:

Term	Weight	Rationale
L_reg	3.0	Regressions destroy trust in convergence
L_spec	2.0	Spec violations mean the wrong thing was built
L_arch	2.0	Architectural decay compounds future loss
L_types	1.5	Types are cheap to fix, expensive to ignore
L_tests	1.0	Baseline correctness signal
L_unknown	1.0	Uncertainty penalty
L_ops	0.5	Important but less urgent pre-release

Adjust per phase. During a polish phase, increase L_reg weight. During early growth, tolerate higher L_unknown temporarily.

The Stop Condition

Do not stop when L_total is low once. Stop when it is low and stable.

Stop if ALL of:
  1. L_total < T                          (below threshold)
  2. slope(L_total, last K cycles) ≈ 0    (no meaningful improvement left)
  3. regressions in last K cycles == 0     (no bounce)
  4. L_unknown < U                         (uncertainty is low)

In words: low, flat, and no bounce.

This avoids the classic failure: loss dips, you stop, then it pops back up next cycle.

Stop Condition Visualized

Healthy (ready to stop):

L_total
  │
  │╲
  │ ╲
  │  ╲
  │   ╲___________  ← flat, below threshold
  │
  └───────────────── cycles


Unhealthy (not ready):

L_total
  │
  │╲  ╱╲  ╱╲
  │ ╲╱  ╲╱  ╲     ← sawtooth, regressions
  │           ╲╱
  └───────────────── cycles


Chaotic (no convergence):

L_total
  │
  │ ╱╲╱╲╱╲╱╲╱╲╱╲  ← noisy flat, no progress
  │
  └───────────────── cycles

Implementation: Per-Cycle Tracking

Store per cycle:

interface CycleMetrics {
  cycle: number;
  timestamp: string;

  // Raw counts per loss term
  specViolations: number;
  failingTests: number;
  flakyTests: number;
  archViolations: number;
  typeErrors: number;
  lintViolations: number;
  regressions: number;
  opsIssues: number;
  unknownRiskAreas: number;

  // Computed
  L_total: number;

  // Deltas (what changed this cycle)
  issuesClosed: string[];
  issuesOpened: string[];
  regressionEvents: string[];

  // Churn metrics
  filesChanged: number;
  linesAdded: number;
  linesRemoved: number;
}

Compute L_total each cycle:

function computeLoss(m: CycleMetrics, weights: Weights): number {
  return (
    weights.spec * m.specViolations +
    weights.tests * (m.failingTests + m.flakyTests) +
    weights.arch * m.archViolations +
    weights.types * (m.typeErrors + m.lintViolations) +
    weights.reg * m.regressions +
    weights.ops * m.opsIssues +
    weights.unknown * m.unknownRiskAreas
  );
}

Tracking the Trend

function isConverging(history: CycleMetrics[], windowSize: number): boolean {
  const recent = history.slice(-windowSize);
  const losses = recent.map(m => m.L_total);

  // Check monotonic decrease (with tolerance)
  const slope = linearRegressionSlope(losses);
  const noRegressions = recent.every(m => m.regressions === 0);
  const belowThreshold = losses[losses.length - 1] < THRESHOLD;

  return slope <= 0 && noRegressions && belowThreshold;
}

Loss Curve Patterns

Smooth Decay (Healthy)

Loss decreases monotonically. Swarm is converging. Constraints are biting. Regressions are rare.

What this looks like in practice:

Issue count drops each cycle
Same bug class does not reappear
Diff sizes shrink over time
Agent output becomes more repetitive (running out of things to fix)

Sawtooth (Regressions)

Loss drops then spikes repeatedly. The swarm is fixing things and breaking things at the same rate.

Root causes:

Insufficient regression tests
Agents rewriting modules without understanding dependencies
No accept/reject gate on changes

Fix: Increase regression penalty weight, add regression memory, tighten acceptance gate.

Noisy Flat (No Progress)

Loss oscillates around a fixed value. No real improvement despite high activity.

Root causes:

Agents performing random walks (no gradient signal)
Constraints too weak to guide behaviour
Loss function missing a key term
Goodharting on proxy metrics

Fix: Review whether all loss terms are actually being measured. Add the missing term. Tighten constraints.

Monotonic Rise (Diverging)

Loss increases each cycle. The swarm is making things actively worse.

Root causes:

No acceptance gate (all changes applied)
Agents introducing complexity faster than they resolve it
Architectural violations compounding

Fix: Stop the swarm. Audit constraints. Add an explicit accept/reject gate.

Integration with Existing Quality Gates

This framework does not replace quality gates. It wraps them. Each quality gate contributes to reducing one or more loss terms:

Quality Gate	Loss Term Reduced
Type checker	L_types
ESLint	L_types, L_arch
Unit tests	L_tests
Integration tests	L_tests, L_spec
E2E tests	L_spec
Security scan	L_ops
Architecture linter	L_arch
Spec review	L_spec
Coverage check	L_unknown

The loss function aggregates what gates measure into a single trend line.

Key Insight

Agents do not need to be correct. They need to be biased toward negative delta-L. Individual agents can be wrong as long as the aggregate signal drives loss down.

This is why parallelism works. Each agent samples a noisy estimate of where loss can be reduced. The orchestrator applies only the updates that actually reduce loss. The swarm converges not because any single agent is smart, but because the accept/reject gate filters for improvement.

Quality Gates as Information Filters – Gates that reduce state space (individual loss term reducers)
Agent Swarm Patterns – How to run swarms (this article explains what to measure)
Constraint-First Development – Constraints define the feasible region for loss minimization
Swarm Convergence Theory – Why swarms converge or diverge
Goodharting Prevention – When agents optimize the proxy instead of the real objective
Growth vs Polish Phases – Adjusting loss weights per development phase
Compounding Effects of Quality Gates – Why layered gates produce exponential quality improvement

Synthetic Loss Functions for Agent Swarms: Treating Software Production as Optimization

Summary

Why Software Needs a Scalar Loss Function

The Formula

The 7 Loss Terms

1. Spec Loss (L_spec)

2. Test Loss (L_tests)

3. Architectural Loss (L_arch)

4. Type and Invariant Loss (L_types)

Learn Prompt Engineering

5. Regression Loss (L_reg)

6. Operational Loss (L_ops)

7. Unknown Loss (L_unknown)

Weighting Strategies

The Stop Condition

Stop Condition Visualized

Implementation: Per-Cycle Tracking

Tracking the Trend

Loss Curve Patterns

Smooth Decay (Healthy)

Sawtooth (Regressions)

Noisy Flat (No Progress)

Monotonic Rise (Diverging)

Integration with Existing Quality Gates

Key Insight

Related

More Insights

Own Your Control Plane

Indexed PRD and Design Doc Strategy