Error Registry for Agents: Own the Primitives

James Phoenix
James Phoenix

Summary

Agents repeat errors they have no memory of. ERRORS.md files help, but they are flat, unstructured, and require manual curation. The next step is a proper error registry: a structured, queryable store of every error an agent has encountered, with fingerprinting, deduplication, resolution history, and prevention rules. Think Sentry, but built for agents, where you own every primitive. When agents can read and write to a shared error registry before and after every task, recurring failures drop to near zero.

The Problem

There are three levels of agent error memory today, and most teams are stuck at level one.

Level 0: No memory. The agent hits the same error every session. You fix it manually each time. This is the default state for every LLM interaction.

Level 1: Flat files. You maintain an ERRORS.md that documents common mistakes. This works, but it has structural problems:

  • No fingerprinting. Two instances of “missing await” in different files are stored as separate entries or, worse, only one gets recorded.
  • No queryability. You grep for keywords. The agent cannot ask “have I seen an error like this before?” and get a structured answer.
  • No resolution graph. You know the fix, but not which fixes were tried and failed. Not which other errors the fix introduced.
  • No automatic ingestion. Every entry requires a human to write it. Errors that happen at 2 AM go unrecorded.
  • Context window cost. Including the full ERRORS.md in every prompt wastes tokens on irrelevant entries.

Level 2: External tools. You send errors to Sentry or Datadog. These tools are designed for human developers scrolling dashboards. They have no API surface optimized for agent consumption. The error data exists, but agents cannot act on it.

The gap between Level 1 and what agents actually need is where the error registry sits.

The Solution

Build a structured error registry that agents can read from, write to, and query before every task. Own every primitive:

┌─────────────────────────────────────────────────────────┐
│                    ERROR REGISTRY                        │
├─────────────────────────────────────────────────────────┤
│ Fingerprinting  → Deduplicate errors by structure       │
│ Classification  → Categorize by root cause type         
│ Resolution Log  → Track what fixes work (and don't)     │
│ Prevention Rules→ Machine-readable prevention policies   │
│ Agent API       → Query interface for agent consumption  │
│ Auto-Ingestion  → Capture errors without human effort    │
└─────────────────────────────────────────────────────────┘

The registry is not a dashboard. It is an agent-facing knowledge store where every error becomes a permanent lesson.

Why Own the Primitives

Sentry is excellent for human operators. But agents need something different:

Capability Sentry (Human-First) Error Registry (Agent-First)
Error format Stack traces, breadcrumbs Structured JSON with code patterns
Resolution tracking “Resolve” button Resolution log with what was tried
Prevention Alert thresholds Machine-readable prevention rules
Query interface Web dashboard Programmatic API or local file query
Context delivery Full error page Minimal, relevant context snippet
Ingestion SDK auto-capture Agent tool calls + CI pipeline hooks
Deduplication Stack trace hashing AST-level structural fingerprinting

When you own the primitives, you control:

  1. What gets stored. Not just stack traces, but the code pattern that caused the error, the bad output the agent generated, and the correct output.
  2. How it is queried. Agents get structured responses, not HTML pages. Token-efficient, relevant context.
  3. How prevention works. Prevention rules are code, not alerts. They feed directly into CLAUDE.md, hooks, or CI checks.
  4. How resolution evolves. A fix that worked once is a candidate. A fix that worked ten times is a rule. A fix that was tried and failed is marked as such.

Architecture

Core Schema

Every error entry has a consistent structure:

interface ErrorEntry {
  // Identity
  id: string;
  fingerprint: string;           // Structural hash for deduplication
  title: string;                 // Human-readable summary

  // Classification
  category: "context" | "model" | "rules" | "testing" | "quality-gate";
  severity: "critical" | "high" | "medium" | "low";
  tags: string[];                // e.g., ["async", "database", "zod"]

  // Evidence
  symptom: string;               // What the developer or agent observes
  badPattern: CodePattern;       // The code that causes the error
  correctPattern: CodePattern;   // The code that fixes it
  rootCause: string;             // Why this happens
  relatedFiles: string[];        // Where this has occurred

  // Tracking
  occurrences: Occurrence[];     // Every time this error appeared
  firstSeen: Date;
  lastSeen: Date;
  frequency: number;             // Total count

  // Resolution
  resolutions: Resolution[];     // Fixes tried, with success/failure
  currentFix: Resolution | null; // The active fix
  preventionRules: PreventionRule[];

  // Status
  status: "active" | "mitigated" | "prevented" | "archived";
}

interface CodePattern {
  language: string;
  code: string;
  description: string;
}

interface Occurrence {
  timestamp: Date;
  file: string;
  context: string;               // What task was being performed
  agentSession: string;          // Which session produced this
}

interface Resolution {
  id: string;
  description: string;
  appliedAt: Date;
  success: boolean;
  sideEffects: string[];         // Other errors this fix introduced
}

interface PreventionRule {
  type: "lint-rule" | "hook" | "test" | "claude-md" | "ci-check";
  description: string;
  implementation: string;        // The actual rule or config
  addedAt: Date;
  effectiveness: number;         // 0-1, tracked over time
}

Fingerprinting: The Key Primitive

The most important primitive is fingerprinting. Two errors are the same error if their structural pattern matches, even if they occur in different files, at different times, by different agents.

function fingerprintError(error: RawError): string {
  // Level 1: Exact match on error message template
  const messageTemplate = error.message
    .replace(/['"][^'"]*['"]/g, "'<STRING>'")  // Normalize strings
    .replace(/\d+/g, "<NUM>")                    // Normalize numbers
    .replace(/\/[\w./]+/g, "<PATH>");             // Normalize paths

  // Level 2: Code pattern hash
  const patternHash = hashCodePattern(error.codeSnippet);

  // Level 3: Error category + affected construct
  const structural = `${error.errorType}:${error.affectedConstruct}`;

  return hash(`${messageTemplate}|${patternHash}|${structural}`);
}

// Example: These two errors produce the same fingerprint
// Error A: "Cannot read property 'email' of null" in src/api/users.ts
// Error B: "Cannot read property 'name' of null" in src/api/posts.ts
// Both are: null-property-access on a database query result

Fingerprinting turns a stream of individual incidents into a registry of error classes. Each class accumulates evidence, resolutions, and prevention rules over time.

Storage Options

The registry can live at different levels of sophistication:

Option A: Structured JSON file (simplest)

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
project/
└── .errors/
    ├── registry.json       # All error entries
    ├── index.json          # Fingerprint → entry ID lookup
    └── rules/
        ├── lint-rules.json # Generated lint rules
        └── hooks.json      # Generated hook configs

Good for single-developer projects. Version-controlled, human-readable, no infrastructure.

Option B: SQLite database

CREATE TABLE errors (
  id TEXT PRIMARY KEY,
  fingerprint TEXT UNIQUE,
  title TEXT,
  category TEXT,
  severity TEXT,
  bad_pattern TEXT,
  correct_pattern TEXT,
  root_cause TEXT,
  frequency INTEGER DEFAULT 1,
  first_seen TEXT,
  last_seen TEXT,
  status TEXT DEFAULT 'active'
);

CREATE TABLE occurrences (
  id TEXT PRIMARY KEY,
  error_id TEXT REFERENCES errors(id),
  timestamp TEXT,
  file TEXT,
  context TEXT,
  agent_session TEXT
);

CREATE TABLE resolutions (
  id TEXT PRIMARY KEY,
  error_id TEXT REFERENCES errors(id),
  description TEXT,
  applied_at TEXT,
  success BOOLEAN,
  side_effects TEXT
);

CREATE INDEX idx_fingerprint ON errors(fingerprint);
CREATE INDEX idx_category ON errors(category);
CREATE INDEX idx_status ON errors(status);

Good for teams. Queryable, concurrent-safe, still local.

Option C: API service

For organizations running multiple agents across multiple repos. A central registry that all agents query. This is the “build your own Sentry” path, but the API is designed for agents, not humans.

Agent Query Interface

The critical design choice: how do agents consume the registry? The answer is a tool or MCP server that returns token-efficient, relevant results.

// Agent tool: query_error_registry
interface QueryErrorRegistry {
  // Search by similarity to current error
  query?: string;

  // Filter by classification
  category?: ErrorCategory;
  tags?: string[];
  severity?: Severity;

  // Filter by status
  status?: "active" | "mitigated" | "prevented";

  // Limit results for context efficiency
  limit?: number;
}

// Example: Agent encounters a new error
const results = await queryErrorRegistry({
  query: "Cannot read property of null after database query",
  category: "testing",
  limit: 3,
});

// Returns:
// [
//   {
//     title: "Missing null checks after database queries",
//     frequency: 15,
//     status: "mitigated",
//     correctPattern: "if (!result) { return error('Not found'); }",
//     preventionRules: [
//       { type: "lint-rule", rule: "no-unchecked-db-result" }
//     ]
//   }
// ]

The agent now knows: this error has happened 15 times before, there is a known fix, and there is a lint rule that should catch it. It can apply the fix immediately and verify the lint rule is active.

Implementation: From ERRORS.md to Registry

Step 1: Auto-Ingest from Agent Sessions

Instead of manually documenting errors, capture them automatically:

// Hook into agent error events
async function onAgentError(error: AgentError): Promise<void> {
  const fingerprint = fingerprintError(error);
  const existing = await registry.getByFingerprint(fingerprint);

  if (existing) {
    // Known error: increment occurrence
    await registry.addOccurrence(existing.id, {
      timestamp: new Date(),
      file: error.file,
      context: error.taskDescription,
      agentSession: error.sessionId,
    });

    // Update last seen and frequency
    await registry.update(existing.id, {
      lastSeen: new Date(),
      frequency: existing.frequency + 1,
    });
  } else {
    // New error: create entry
    await registry.create({
      fingerprint,
      title: classifyErrorTitle(error),
      category: diagnoseRootCause(error),
      severity: assessSeverity(error),
      symptom: error.message,
      badPattern: extractCodePattern(error),
      firstSeen: new Date(),
      lastSeen: new Date(),
      frequency: 1,
      status: "active",
    });
  }
}

Step 2: Pre-Task Context Injection

Before starting any task, the agent queries the registry for relevant errors:

async function buildTaskContext(task: TaskDescription): Promise<string> {
  // Query registry for errors related to this task
  const relevantErrors = await registry.query({
    tags: extractTags(task),
    status: "active",
    limit: 5,
  });

  if (relevantErrors.length === 0) return "";

  // Format as concise context block
  return `
## Known Error Patterns (from error registry)

${relevantErrors.map(e => `
### ${e.title} (${e.frequency} occurrences)
- Symptom: ${e.symptom}
- Fix: ${e.correctPattern.description}
- Prevention: ${e.preventionRules.map(r => r.description).join(", ")}
`).join("\n")}

Avoid these patterns when implementing this task.
`;
}

This replaces the “include all of ERRORS.md” approach with targeted, relevant context. Token cost drops from thousands of tokens to a few hundred.

Step 3: Post-Resolution Learning

When an agent fixes an error, record the resolution:

async function onErrorResolved(
  errorId: string,
  resolution: ResolutionAttempt
): Promise<void> {
  await registry.addResolution(errorId, {
    description: resolution.description,
    appliedAt: new Date(),
    success: resolution.testsPass,
    sideEffects: resolution.newErrors,
  });

  // If this is the third successful resolution with the same pattern,
  // promote to prevention rule
  const entry = await registry.get(errorId);
  const successfulFixes = entry.resolutions.filter(r => r.success);

  if (successfulFixes.length >= 3 && !entry.preventionRules.length) {
    await suggestPreventionRule(entry);
  }
}

Step 4: Automatic Prevention Promotion

When an error reaches a frequency threshold, automatically generate prevention:

async function suggestPreventionRule(entry: ErrorEntry): Promise<void> {
  // Generate lint rule from bad pattern
  if (entry.badPattern.language === "typescript") {
    const lintRule = await generateLintRule(entry.badPattern, entry.correctPattern);

    await registry.addPreventionRule(entry.id, {
      type: "lint-rule",
      description: `Prevent: ${entry.title}`,
      implementation: lintRule,
      addedAt: new Date(),
      effectiveness: 0, // Track over time
    });
  }

  // Add to CLAUDE.md
  const claudeRule = formatForClaudeMd(entry);
  await registry.addPreventionRule(entry.id, {
    type: "claude-md",
    description: `CLAUDE.md rule: ${entry.title}`,
    implementation: claudeRule,
    addedAt: new Date(),
    effectiveness: 0,
  });

  // Update status
  await registry.update(entry.id, { status: "mitigated" });
}

The Compound Effect

The registry creates a flywheel:

Agent encounters error
    → Registry captures it (auto-ingest)
    → Agent queries registry next time (pre-task context)
    → Error is avoided (known pattern)
    → If error recurs, resolution is tracked (post-resolution learning)
    → At threshold, prevention rule is generated (automatic promotion)
    → Error class is eliminated (status: prevented)

Each loop makes every future agent session better. After weeks of operation:

Week 1:  Agent encounters 20 errors, registry has 20 entries
Week 2:  Agent encounters 15 errors, 5 were prevented by registry
Week 4:  Agent encounters 8 errors, 12 prevented, 3 auto-promoted to lint rules
Week 8:  Agent encounters 3 errors, most are genuinely novel
Week 12: Registry has 80+ entries, 60% have prevention rules, new error rate is minimal

This is the same compound curve as the ERRORS.md pattern, but automated. No human has to remember to document errors. No human has to include the right section in the prompt. The agent does it all.

MCP Server Implementation

The cleanest way to expose the registry to agents is as an MCP server:

// error-registry MCP server
const tools = [
  {
    name: "query_errors",
    description: "Search the error registry for known error patterns",
    parameters: {
      query: { type: "string", description: "Error description or symptom" },
      tags: { type: "array", items: { type: "string" } },
      limit: { type: "number", default: 5 },
    },
  },
  {
    name: "report_error",
    description: "Report a new error or occurrence to the registry",
    parameters: {
      symptom: { type: "string" },
      badCode: { type: "string" },
      file: { type: "string" },
      context: { type: "string" },
    },
  },
  {
    name: "report_resolution",
    description: "Record that an error was fixed",
    parameters: {
      errorId: { type: "string" },
      fixDescription: { type: "string" },
      fixCode: { type: "string" },
      testsPass: { type: "boolean" },
    },
  },
  {
    name: "get_prevention_rules",
    description: "Get active prevention rules for a set of tags",
    parameters: {
      tags: { type: "array", items: { type: "string" } },
    },
  },
];

Now any agent with MCP access can read from and write to the registry without custom integration code.

Comparison: ERRORS.md vs Error Registry

Dimension ERRORS.md Error Registry
Ingestion Manual, human writes each entry Automatic, agent reports errors
Deduplication Manual, human checks for duplicates Fingerprinting, automatic
Query grep/search, full file in context Structured query, relevant results only
Resolution tracking Free text, no history Structured log, success/failure tracked
Prevention Manual, human creates rules Auto-promotion at frequency threshold
Context cost Full file (thousands of tokens) Relevant entries only (hundreds of tokens)
Team scaling One file, merge conflicts Database, concurrent-safe
Agent interaction Passive (read-only) Active (read + write)

ERRORS.md is Level 1. The error registry is Level 3. Both are better than Level 0.

When to Build This

Start with ERRORS.md. It costs nothing and captures 80% of the value.

Graduate to a registry when:

  • You have 20+ documented errors and searching ERRORS.md is slow
  • Multiple agents or team members are encountering the same errors
  • You want automatic ingestion (errors captured without human effort)
  • You want prevention rules generated from frequency data
  • You are running agents in CI/CD and need programmatic error tracking
  • Your ERRORS.md is consuming too many tokens when included in context

Do not build this when:

  • You are a solo developer with a small project
  • You have fewer than 10 documented error patterns
  • Your agent interactions are infrequent

Best Practices

1. Fingerprint at the Pattern Level, Not the Instance Level

Bad:  Hash the full error message (too specific, misses duplicates)
Good: Hash the structural pattern (catches all instances of the same class)

2. Keep Prevention Rules as Code

Prevention rules should be executable, not advisory:

// Bad: "Remember to check for null after database queries"
// Good:
{
  type: "lint-rule",
  implementation: "@typescript-eslint/no-floating-promises: error"
}

3. Track Resolution Effectiveness

Not every fix works. Track which resolutions succeed and which fail:

// After applying a resolution, verify it worked
const resolution = await registry.getResolution(resolutionId);
if (errorRecurred(resolution.errorId, resolution.appliedAt)) {
  await registry.updateResolution(resolutionId, { success: false });
}

4. Prune Stale Entries

Errors that have not occurred in 6+ months with active prevention rules can be archived:

async function pruneStaleEntries(): Promise<void> {
  const stale = await registry.query({
    status: "mitigated",
    lastSeenBefore: sixMonthsAgo(),
  });

  for (const entry of stale) {
    if (entry.preventionRules.length > 0) {
      await registry.update(entry.id, { status: "archived" });
    }
  }
}

5. Separate Registry per Project, Shared Patterns Across Projects

Each project has its own registry. But some error patterns (missing await, null checks, type mismatches) are universal. Extract these into a shared “base registry” that seeds new projects.

~/.error-registry/         # Shared base patterns
project-a/.errors/         # Project-specific errors
project-b/.errors/         # Project-specific errors

Common Pitfalls

Pitfall 1: Over-Engineering the Storage Layer

Start with a JSON file. Move to SQLite when querying gets slow. Move to an API when multiple services need access. Do not start with Postgres.

Pitfall 2: Capturing Too Much Context

Each error entry should be minimal. The bad pattern, the fix, and the prevention rule. Not the full stack trace, not the entire file contents, not the conversation history. Token efficiency matters.

Pitfall 3: Never Reviewing Prevention Effectiveness

A prevention rule that is 50% effective is worse than no rule (it creates false confidence). Track effectiveness and remove rules that do not work.

Pitfall 4: Manual-Only Ingestion

The whole point of the registry over ERRORS.md is automatic capture. If agents cannot write to the registry, you are just building a fancier ERRORS.md.

Related

References

Topics
Agent ArchitectureAgent MemoryCompound SystemsError RegistryError TrackingObservabilityPreventionSelf HealingSentryStructured Errors

More Insights

Cover Image for ASCII Previews Before Expensive Renders

ASCII Previews Before Expensive Renders

Image and video generation are among the most expensive API calls you can make. A single image render costs $0.02-0.20+, and video generation can cost dollars per clip. Before triggering these renders

James Phoenix
James Phoenix
Cover Image for The Six-Layer Lint Harness: What Actually Scales Agent-Written Code

The Six-Layer Lint Harness: What Actually Scales Agent-Written Code

Rules eliminate entire bug classes permanently. But rules alone aren’t enough. You need the three-legged stool: structural constraints, behavioral verification, and generative scaffolding.

James Phoenix
James Phoenix