Summary
Agents repeat errors they have no memory of. ERRORS.md files help, but they are flat, unstructured, and require manual curation. The next step is a proper error registry: a structured, queryable store of every error an agent has encountered, with fingerprinting, deduplication, resolution history, and prevention rules. Think Sentry, but built for agents, where you own every primitive. When agents can read and write to a shared error registry before and after every task, recurring failures drop to near zero.
The Problem
There are three levels of agent error memory today, and most teams are stuck at level one.
Level 0: No memory. The agent hits the same error every session. You fix it manually each time. This is the default state for every LLM interaction.
Level 1: Flat files. You maintain an ERRORS.md that documents common mistakes. This works, but it has structural problems:
- No fingerprinting. Two instances of “missing await” in different files are stored as separate entries or, worse, only one gets recorded.
- No queryability. You grep for keywords. The agent cannot ask “have I seen an error like this before?” and get a structured answer.
- No resolution graph. You know the fix, but not which fixes were tried and failed. Not which other errors the fix introduced.
- No automatic ingestion. Every entry requires a human to write it. Errors that happen at 2 AM go unrecorded.
- Context window cost. Including the full ERRORS.md in every prompt wastes tokens on irrelevant entries.
Level 2: External tools. You send errors to Sentry or Datadog. These tools are designed for human developers scrolling dashboards. They have no API surface optimized for agent consumption. The error data exists, but agents cannot act on it.
The gap between Level 1 and what agents actually need is where the error registry sits.
The Solution
Build a structured error registry that agents can read from, write to, and query before every task. Own every primitive:
┌─────────────────────────────────────────────────────────┐
│ ERROR REGISTRY │
├─────────────────────────────────────────────────────────┤
│ Fingerprinting → Deduplicate errors by structure │
│ Classification → Categorize by root cause type │
│ Resolution Log → Track what fixes work (and don't) │
│ Prevention Rules→ Machine-readable prevention policies │
│ Agent API → Query interface for agent consumption │
│ Auto-Ingestion → Capture errors without human effort │
└─────────────────────────────────────────────────────────┘
The registry is not a dashboard. It is an agent-facing knowledge store where every error becomes a permanent lesson.
Why Own the Primitives
Sentry is excellent for human operators. But agents need something different:
| Capability | Sentry (Human-First) | Error Registry (Agent-First) |
|---|---|---|
| Error format | Stack traces, breadcrumbs | Structured JSON with code patterns |
| Resolution tracking | “Resolve” button | Resolution log with what was tried |
| Prevention | Alert thresholds | Machine-readable prevention rules |
| Query interface | Web dashboard | Programmatic API or local file query |
| Context delivery | Full error page | Minimal, relevant context snippet |
| Ingestion | SDK auto-capture | Agent tool calls + CI pipeline hooks |
| Deduplication | Stack trace hashing | AST-level structural fingerprinting |
When you own the primitives, you control:
- What gets stored. Not just stack traces, but the code pattern that caused the error, the bad output the agent generated, and the correct output.
- How it is queried. Agents get structured responses, not HTML pages. Token-efficient, relevant context.
- How prevention works. Prevention rules are code, not alerts. They feed directly into CLAUDE.md, hooks, or CI checks.
- How resolution evolves. A fix that worked once is a candidate. A fix that worked ten times is a rule. A fix that was tried and failed is marked as such.
Architecture
Core Schema
Every error entry has a consistent structure:
interface ErrorEntry {
// Identity
id: string;
fingerprint: string; // Structural hash for deduplication
title: string; // Human-readable summary
// Classification
category: "context" | "model" | "rules" | "testing" | "quality-gate";
severity: "critical" | "high" | "medium" | "low";
tags: string[]; // e.g., ["async", "database", "zod"]
// Evidence
symptom: string; // What the developer or agent observes
badPattern: CodePattern; // The code that causes the error
correctPattern: CodePattern; // The code that fixes it
rootCause: string; // Why this happens
relatedFiles: string[]; // Where this has occurred
// Tracking
occurrences: Occurrence[]; // Every time this error appeared
firstSeen: Date;
lastSeen: Date;
frequency: number; // Total count
// Resolution
resolutions: Resolution[]; // Fixes tried, with success/failure
currentFix: Resolution | null; // The active fix
preventionRules: PreventionRule[];
// Status
status: "active" | "mitigated" | "prevented" | "archived";
}
interface CodePattern {
language: string;
code: string;
description: string;
}
interface Occurrence {
timestamp: Date;
file: string;
context: string; // What task was being performed
agentSession: string; // Which session produced this
}
interface Resolution {
id: string;
description: string;
appliedAt: Date;
success: boolean;
sideEffects: string[]; // Other errors this fix introduced
}
interface PreventionRule {
type: "lint-rule" | "hook" | "test" | "claude-md" | "ci-check";
description: string;
implementation: string; // The actual rule or config
addedAt: Date;
effectiveness: number; // 0-1, tracked over time
}
Fingerprinting: The Key Primitive
The most important primitive is fingerprinting. Two errors are the same error if their structural pattern matches, even if they occur in different files, at different times, by different agents.
function fingerprintError(error: RawError): string {
// Level 1: Exact match on error message template
const messageTemplate = error.message
.replace(/['"][^'"]*['"]/g, "'<STRING>'") // Normalize strings
.replace(/\d+/g, "<NUM>") // Normalize numbers
.replace(/\/[\w./]+/g, "<PATH>"); // Normalize paths
// Level 2: Code pattern hash
const patternHash = hashCodePattern(error.codeSnippet);
// Level 3: Error category + affected construct
const structural = `${error.errorType}:${error.affectedConstruct}`;
return hash(`${messageTemplate}|${patternHash}|${structural}`);
}
// Example: These two errors produce the same fingerprint
// Error A: "Cannot read property 'email' of null" in src/api/users.ts
// Error B: "Cannot read property 'name' of null" in src/api/posts.ts
// Both are: null-property-access on a database query result
Fingerprinting turns a stream of individual incidents into a registry of error classes. Each class accumulates evidence, resolutions, and prevention rules over time.
Storage Options
The registry can live at different levels of sophistication:
Option A: Structured JSON file (simplest)
project/
└── .errors/
├── registry.json # All error entries
├── index.json # Fingerprint → entry ID lookup
└── rules/
├── lint-rules.json # Generated lint rules
└── hooks.json # Generated hook configs
Good for single-developer projects. Version-controlled, human-readable, no infrastructure.
Option B: SQLite database
CREATE TABLE errors (
id TEXT PRIMARY KEY,
fingerprint TEXT UNIQUE,
title TEXT,
category TEXT,
severity TEXT,
bad_pattern TEXT,
correct_pattern TEXT,
root_cause TEXT,
frequency INTEGER DEFAULT 1,
first_seen TEXT,
last_seen TEXT,
status TEXT DEFAULT 'active'
);
CREATE TABLE occurrences (
id TEXT PRIMARY KEY,
error_id TEXT REFERENCES errors(id),
timestamp TEXT,
file TEXT,
context TEXT,
agent_session TEXT
);
CREATE TABLE resolutions (
id TEXT PRIMARY KEY,
error_id TEXT REFERENCES errors(id),
description TEXT,
applied_at TEXT,
success BOOLEAN,
side_effects TEXT
);
CREATE INDEX idx_fingerprint ON errors(fingerprint);
CREATE INDEX idx_category ON errors(category);
CREATE INDEX idx_status ON errors(status);
Good for teams. Queryable, concurrent-safe, still local.
Option C: API service
For organizations running multiple agents across multiple repos. A central registry that all agents query. This is the “build your own Sentry” path, but the API is designed for agents, not humans.
Agent Query Interface
The critical design choice: how do agents consume the registry? The answer is a tool or MCP server that returns token-efficient, relevant results.
// Agent tool: query_error_registry
interface QueryErrorRegistry {
// Search by similarity to current error
query?: string;
// Filter by classification
category?: ErrorCategory;
tags?: string[];
severity?: Severity;
// Filter by status
status?: "active" | "mitigated" | "prevented";
// Limit results for context efficiency
limit?: number;
}
// Example: Agent encounters a new error
const results = await queryErrorRegistry({
query: "Cannot read property of null after database query",
category: "testing",
limit: 3,
});
// Returns:
// [
// {
// title: "Missing null checks after database queries",
// frequency: 15,
// status: "mitigated",
// correctPattern: "if (!result) { return error('Not found'); }",
// preventionRules: [
// { type: "lint-rule", rule: "no-unchecked-db-result" }
// ]
// }
// ]
The agent now knows: this error has happened 15 times before, there is a known fix, and there is a lint rule that should catch it. It can apply the fix immediately and verify the lint rule is active.
Implementation: From ERRORS.md to Registry
Step 1: Auto-Ingest from Agent Sessions
Instead of manually documenting errors, capture them automatically:
// Hook into agent error events
async function onAgentError(error: AgentError): Promise<void> {
const fingerprint = fingerprintError(error);
const existing = await registry.getByFingerprint(fingerprint);
if (existing) {
// Known error: increment occurrence
await registry.addOccurrence(existing.id, {
timestamp: new Date(),
file: error.file,
context: error.taskDescription,
agentSession: error.sessionId,
});
// Update last seen and frequency
await registry.update(existing.id, {
lastSeen: new Date(),
frequency: existing.frequency + 1,
});
} else {
// New error: create entry
await registry.create({
fingerprint,
title: classifyErrorTitle(error),
category: diagnoseRootCause(error),
severity: assessSeverity(error),
symptom: error.message,
badPattern: extractCodePattern(error),
firstSeen: new Date(),
lastSeen: new Date(),
frequency: 1,
status: "active",
});
}
}
Step 2: Pre-Task Context Injection
Before starting any task, the agent queries the registry for relevant errors:
async function buildTaskContext(task: TaskDescription): Promise<string> {
// Query registry for errors related to this task
const relevantErrors = await registry.query({
tags: extractTags(task),
status: "active",
limit: 5,
});
if (relevantErrors.length === 0) return "";
// Format as concise context block
return `
## Known Error Patterns (from error registry)
${relevantErrors.map(e => `
### ${e.title} (${e.frequency} occurrences)
- Symptom: ${e.symptom}
- Fix: ${e.correctPattern.description}
- Prevention: ${e.preventionRules.map(r => r.description).join(", ")}
`).join("\n")}
Avoid these patterns when implementing this task.
`;
}
This replaces the “include all of ERRORS.md” approach with targeted, relevant context. Token cost drops from thousands of tokens to a few hundred.
Step 3: Post-Resolution Learning
When an agent fixes an error, record the resolution:
async function onErrorResolved(
errorId: string,
resolution: ResolutionAttempt
): Promise<void> {
await registry.addResolution(errorId, {
description: resolution.description,
appliedAt: new Date(),
success: resolution.testsPass,
sideEffects: resolution.newErrors,
});
// If this is the third successful resolution with the same pattern,
// promote to prevention rule
const entry = await registry.get(errorId);
const successfulFixes = entry.resolutions.filter(r => r.success);
if (successfulFixes.length >= 3 && !entry.preventionRules.length) {
await suggestPreventionRule(entry);
}
}
Step 4: Automatic Prevention Promotion
When an error reaches a frequency threshold, automatically generate prevention:
async function suggestPreventionRule(entry: ErrorEntry): Promise<void> {
// Generate lint rule from bad pattern
if (entry.badPattern.language === "typescript") {
const lintRule = await generateLintRule(entry.badPattern, entry.correctPattern);
await registry.addPreventionRule(entry.id, {
type: "lint-rule",
description: `Prevent: ${entry.title}`,
implementation: lintRule,
addedAt: new Date(),
effectiveness: 0, // Track over time
});
}
// Add to CLAUDE.md
const claudeRule = formatForClaudeMd(entry);
await registry.addPreventionRule(entry.id, {
type: "claude-md",
description: `CLAUDE.md rule: ${entry.title}`,
implementation: claudeRule,
addedAt: new Date(),
effectiveness: 0,
});
// Update status
await registry.update(entry.id, { status: "mitigated" });
}
The Compound Effect
The registry creates a flywheel:
Agent encounters error
→ Registry captures it (auto-ingest)
→ Agent queries registry next time (pre-task context)
→ Error is avoided (known pattern)
→ If error recurs, resolution is tracked (post-resolution learning)
→ At threshold, prevention rule is generated (automatic promotion)
→ Error class is eliminated (status: prevented)
Each loop makes every future agent session better. After weeks of operation:
Week 1: Agent encounters 20 errors, registry has 20 entries
Week 2: Agent encounters 15 errors, 5 were prevented by registry
Week 4: Agent encounters 8 errors, 12 prevented, 3 auto-promoted to lint rules
Week 8: Agent encounters 3 errors, most are genuinely novel
Week 12: Registry has 80+ entries, 60% have prevention rules, new error rate is minimal
This is the same compound curve as the ERRORS.md pattern, but automated. No human has to remember to document errors. No human has to include the right section in the prompt. The agent does it all.
MCP Server Implementation
The cleanest way to expose the registry to agents is as an MCP server:
// error-registry MCP server
const tools = [
{
name: "query_errors",
description: "Search the error registry for known error patterns",
parameters: {
query: { type: "string", description: "Error description or symptom" },
tags: { type: "array", items: { type: "string" } },
limit: { type: "number", default: 5 },
},
},
{
name: "report_error",
description: "Report a new error or occurrence to the registry",
parameters: {
symptom: { type: "string" },
badCode: { type: "string" },
file: { type: "string" },
context: { type: "string" },
},
},
{
name: "report_resolution",
description: "Record that an error was fixed",
parameters: {
errorId: { type: "string" },
fixDescription: { type: "string" },
fixCode: { type: "string" },
testsPass: { type: "boolean" },
},
},
{
name: "get_prevention_rules",
description: "Get active prevention rules for a set of tags",
parameters: {
tags: { type: "array", items: { type: "string" } },
},
},
];
Now any agent with MCP access can read from and write to the registry without custom integration code.
Comparison: ERRORS.md vs Error Registry
| Dimension | ERRORS.md | Error Registry |
|---|---|---|
| Ingestion | Manual, human writes each entry | Automatic, agent reports errors |
| Deduplication | Manual, human checks for duplicates | Fingerprinting, automatic |
| Query | grep/search, full file in context | Structured query, relevant results only |
| Resolution tracking | Free text, no history | Structured log, success/failure tracked |
| Prevention | Manual, human creates rules | Auto-promotion at frequency threshold |
| Context cost | Full file (thousands of tokens) | Relevant entries only (hundreds of tokens) |
| Team scaling | One file, merge conflicts | Database, concurrent-safe |
| Agent interaction | Passive (read-only) | Active (read + write) |
ERRORS.md is Level 1. The error registry is Level 3. Both are better than Level 0.
When to Build This
Start with ERRORS.md. It costs nothing and captures 80% of the value.
Graduate to a registry when:
- You have 20+ documented errors and searching ERRORS.md is slow
- Multiple agents or team members are encountering the same errors
- You want automatic ingestion (errors captured without human effort)
- You want prevention rules generated from frequency data
- You are running agents in CI/CD and need programmatic error tracking
- Your ERRORS.md is consuming too many tokens when included in context
Do not build this when:
- You are a solo developer with a small project
- You have fewer than 10 documented error patterns
- Your agent interactions are infrequent
Best Practices
1. Fingerprint at the Pattern Level, Not the Instance Level
Bad: Hash the full error message (too specific, misses duplicates)
Good: Hash the structural pattern (catches all instances of the same class)
2. Keep Prevention Rules as Code
Prevention rules should be executable, not advisory:
// Bad: "Remember to check for null after database queries"
// Good:
{
type: "lint-rule",
implementation: "@typescript-eslint/no-floating-promises: error"
}
3. Track Resolution Effectiveness
Not every fix works. Track which resolutions succeed and which fail:
// After applying a resolution, verify it worked
const resolution = await registry.getResolution(resolutionId);
if (errorRecurred(resolution.errorId, resolution.appliedAt)) {
await registry.updateResolution(resolutionId, { success: false });
}
4. Prune Stale Entries
Errors that have not occurred in 6+ months with active prevention rules can be archived:
async function pruneStaleEntries(): Promise<void> {
const stale = await registry.query({
status: "mitigated",
lastSeenBefore: sixMonthsAgo(),
});
for (const entry of stale) {
if (entry.preventionRules.length > 0) {
await registry.update(entry.id, { status: "archived" });
}
}
}
5. Separate Registry per Project, Shared Patterns Across Projects
Each project has its own registry. But some error patterns (missing await, null checks, type mismatches) are universal. Extract these into a shared “base registry” that seeds new projects.
~/.error-registry/ # Shared base patterns
project-a/.errors/ # Project-specific errors
project-b/.errors/ # Project-specific errors
Common Pitfalls
Pitfall 1: Over-Engineering the Storage Layer
Start with a JSON file. Move to SQLite when querying gets slow. Move to an API when multiple services need access. Do not start with Postgres.
Pitfall 2: Capturing Too Much Context
Each error entry should be minimal. The bad pattern, the fix, and the prevention rule. Not the full stack trace, not the entire file contents, not the conversation history. Token efficiency matters.
Pitfall 3: Never Reviewing Prevention Effectiveness
A prevention rule that is 50% effective is worse than no rule (it creates false confidence). Track effectiveness and remove rules that do not work.
Pitfall 4: Manual-Only Ingestion
The whole point of the registry over ERRORS.md is automatic capture. If agents cannot write to the registry, you are just building a fancier ERRORS.md.
Related
- Error Messages as Training – The ERRORS.md pattern this builds on
- Five-Point Error Diagnostic Framework – Classification system for the
categoryfield - Agent Memory Patterns – Broader memory architecture this fits into
- Learning Loops – The encode-to-prevent philosophy
- Institutional Memory via Learning Files – Complementary pattern for successes
- MCP Server for Project Context – Implementation pattern for the MCP interface
- Custom ESLint Rules for Determinism – Auto-generated lint rules from registry
- Closed-Loop Telemetry Optimization – Registry as part of the feedback loop
- Prevention Protocol – Systematic prevention that the registry automates
References
- Sentry Error Tracking – The human-first error tracking tool this draws inspiration from
- Error Fingerprinting (Sentry Docs) – How Sentry groups errors, adapted here for agent consumption
- Model Context Protocol – The protocol for exposing the registry to agents

