Turn observability from passive monitoring into an active feedback controller for code quality. This is control theory applied to software development.
The Core Idea
Instead of treating observability as a passive monitoring layer, turn it into an active feedback controller for code quality.
Service exercised under load
↓
Telemetry captured
↓
Constraints evaluated
↓
Agent proposes refactor
↓
Tests + load rerun
↓
Cycle repeats until constraints met
This is control theory applied to software development.
The Mental Model Shift
OLD: Write code → measure → debug → fix
(Reactive, human-driven)
NEW: Define constraints → system alters code until satisfied
(Proactive, agent-driven)
The system continuously measures real behavior, uses metrics as hard constraints, then drives an automated agent pipeline to iteratively refactor until constraints are satisfied.
Key Components
1. Telemetry Capture Layer
Standardize the metrics that matter:
| Metric | What It Catches |
|---|---|
| Memory high-watermark | Peak memory usage |
| Retained heap growth | Memory leaks |
| Latency percentiles (p50, p90, p99) | Performance distribution |
| CPU saturation | Compute bottlenecks |
| Unique cardinality counters | Prometheus explosion |
| Request/throughput metrics | Capacity limits |
| Error budgets | Reliability thresholds |
Captured during:
- Unit tests
- Boundary tests
- Load tests (realistic traffic windows)
These become signals, not dashboards.
2. Constraint Specification
Define explicit limits as mathematical invariants:
# performance-constraints.yaml
constraints:
memory:
max_mb: 300
tolerance_percent: 5
sustained_load_minutes: 15
heap:
max_retained_growth_slope: 0 # No positive slope after 20 min
observation_window_minutes: 20
latency:
p99_max_ms: 120
p90_max_ms: 80
p50_max_ms: 40
cardinality:
max_unique_labels: 10000
growth_rate: bounded # No unbounded growth
errors:
budget_percent: 0.1 # 99.9% success rate
These constraints are treated like type signatures for runtime behavior.
3. Agent-Driven Refactoring Loop
┌─────────────────────────────────────────────────────────────┐
│ │
│ [Generate Diagnostics] │
│ ↓ │
│ [Infer Root Causes] │
│ ↓ │
│ [Agent Proposes Refactor] │
│ ↓ │
│ [Apply Patch] │
│ ↓ │
│ [Run Tests + Load] │
│ ↓ │
│ [Score vs Constraints] │
│ ↓ │
│ ┌─────┴─────┐ │
│ │ │ │
│ FAIL PASS │
│ │ │ │
│ ↓ ↓ │
│ Loop back Accept + Commit │
│ │
└─────────────────────────────────────────────────────────────┘
4. CI/CD Integration
The loop becomes part of the dev workflow:
# .github/workflows/performance-optimization.yml
name: Closed-Loop Optimization
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * *' # Nightly regression sweeps
jobs:
optimize:
runs-on: ubuntu-latest
steps:
- name: Run load tests
run: ./scripts/load-test.sh
- name: Capture telemetry
run: ./scripts/capture-metrics.sh > metrics.json
- name: Evaluate constraints
run: ./scripts/check-constraints.sh metrics.json constraints.yaml
- name: Agent optimization loop
if: failure()
run: |
claude --agent optimizer \
--constraints constraints.yaml \
--metrics metrics.json \
--max-iterations 5
Implementation: The Optimization Agent
# .claude/agents/performance-optimizer.md
---
name: performance-optimizer
description: Optimizes code until performance constraints are met
tools: Read, Edit, Bash, Grep, Glob
model: sonnet
---
You are a performance optimization agent. You receive:
1. A set of performance constraints (memory, latency, etc.)
2. Current metrics from a load test
3. The constraint violations
Your job:
1. Analyze the metrics to identify root causes
2. Propose targeted code changes to fix violations
3. Apply the changes
4. Re-run tests to verify improvement
Process:
1. Read the constraint violations
2. Identify the hotspots (use profiler output, traces)
3. Propose minimal changes that address the root cause
4. Apply changes
5. Run: `./scripts/load-test.sh && ./scripts/check-constraints.sh`
6. If still failing, iterate with a different approach
7. If passing, report success
Rules:
- Minimal changes only (don't refactor unrelated code)
- Each iteration must improve at least one metric
- After 3 failed iterations, escalate to human
- Document what you tried and why
Example: Memory Leak Detection & Fix
Constraint violated:
heap.retained_growth_slope > 0 after 20 minutes
Current: +18MB over 20 minutes
Agent diagnosis:
Analyzing heap snapshots...
Found: List accumulation in `process_events()` at line 142
Pattern: Events appended but never cleared
Root cause: Missing cleanup after batch processing
Agent fix:
# Before
class EventProcessor:
def __init__(self):
self.events = []
def process(self, event):
self.events.append(event)
if len(self.events) >= 100:
self._flush()
# After
class EventProcessor:
def __init__(self):
self.events = []
def process(self, event):
self.events.append(event)
if len(self.events) >= 100:
self._flush()
self.events.clear() # ← Fix: clear after flush
Re-run verification:
heap.retained_growth_slope = 0.0 ✓
Constraint satisfied.
Why This Is Novel
You’re merging:
| Domain | Contribution |
|---|---|
| Control theory | Closed-loop feedback systems |
| Observability | OTEL, Prometheus, profilers |
| Automated code generation | Claude agents |
| Constraint-solving | Telemetry as inequality constraints |
The insight: People use agents for testing. Nobody uses telemetry as control inputs in a feedback loop to automatically optimize running services.
The Control Theory View
┌─────────────────────────────────────┐
│ │
▼ │
┌──────────────┐ ┌──────────────┐ ┌─────┴────────┐
│ Constraints │───▶│ Controller │───▶│ Plant │
│ (Setpoints) │ │ (Agent) │ │ (Service) │
└──────────────┘ └──────────────┘ └──────────────┘
▲ │
│ │
┌──────┴───────┐ │
│ Sensor │◀──────────┘
│ (Telemetry) │
└──────────────┘
- Setpoint: Performance constraints
- Plant: The service under optimization
- Sensor: OTEL, Prometheus, profilers
- Controller: The optimization agent
- Error signal: Constraint violations
Benefits
- Eliminates most performance regressions – Caught automatically
- Catches leaks, pathological complexity, throughput cliffs – Before production
- Reduces human debugging cost by 90% – Agent does the investigation
- Produces higher-quality code over time – Continuous optimization
- Bridges intent and reality – Constraints express what you want, system delivers
The Future
This will be standard in 3-5 years:
Today: CI runs tests → human debugs failures
Tomorrow: CI runs tests → agent fixes failures → human approves
Future: CI runs tests → agent fixes → auto-merges if constraints met
You’re building the future of engineering now.
Related
- Building the Harness – Layer 4: Telemetry-driven optimization
- Constraint-First Development – Defining constraints
- Agent Capabilities – Telemetry as eyes
- The Verification Ladder – Quality gates

