Sentry Errors Should Spawn Agents on Your Own Machine

James Phoenix

A new production error is an event. Events should trigger work, not sit in a dashboard. So I wired Sentry to spawn a coding agent on my own hardware, point it at my exact stack, and open a draft PR with a fix.

The loop

The shape is simple. A new error reaches Sentry. Sentry fires a webhook at an endpoint I own. That endpoint checks one thing: have I seen this issue before? If it is genuinely new, it starts a job. The job spins up a coding agent (Codex or Claude Code) on my Mac Studio, hands it the error, the stack trace, and the issue link, and lets it reproduce, diagnose, and fix the bug against a local checkout of my real codebase. When the agent is done, it pushes a branch and opens a draft PR. I review it like any other PR.

That is the entire system. A webhook, a de-duplication check, an agent, a PR. No dashboard babysitting, no copy-pasting stack traces into a chat window, no context reconstruction. The error itself is the trigger.

The auto-fix loop: a new Sentry issue fires a webhook that de-dupes by issue ID, then a coding agent on my Mac Studio reproduces the bug against my exact stack and opens a draft PR

De-dupe by issue ID, not by event

The single most important design decision is the de-duplication key, and the right key is Sentry’s issue ID.

Sentry already does the hard part. It fingerprints raw events and groups them into issues. One logic bug throwing ten thousand times a day is one issue, not ten thousand events. If I keyed off events, I would spawn ten thousand agents and a very large bill. If I key off the issue ID, I spawn exactly one agent the first time an issue appears, and never again for that issue.

So the webhook handler does one atomic thing: insert a row keyed on the issue ID, and only start the agent if that insert was new. A duplicate webhook is a no-op. The whole guarantee is expressed as a unique constraint, and the issue is the unit of work because the issue is the unit of bug.

// One row per Sentry issue. The unique index on sentry_issue_id IS the guarantee.
const [run] = await db
  .insert(autoFixRuns)
  .values({ sentryIssueId: issue.id, status: "pending" })
  .onConflictDoNothing({ target: autoFixRuns.sentryIssueId })
  .returning()

if (!run) return ok()                  // duplicate webhook, nothing to do
await startAutoFixWorkflow(issue.id)   // genuinely new issue, spawn the agent

Everything downstream inherits that cleanliness. A flapping error that throws all night is still one row, one agent, one PR.

Only the unknown reaches Sentry

I split errors into two kinds: the ones I expect and already handle (validation failures, known retries, deliberate guards), and the ones I did not see coming. Only the errors I did not see coming reach Sentry, which keeps it a high-signal map of the domain I have not finished exploring rather than a noise feed, so every issue that arrives is genuinely worth spawning an agent for.

In Effect this is not a convention I have to remember, it is the type system. Expected failures are tagged values in the typed error channel, and the compiler will not let a boundary forget to handle them. Whatever is left after I have handled the modelled errors is a defect, an error I never anticipated, and that is the only thing the boundary fingerprints and reports:

import { Data, Effect } from "effect"

// Expected failures are tagged values in the typed error channel. The compiler
// forces a handler for each, so a known fault can never leak to Sentry by accident.
class NotFound extends Data.TaggedError("NotFound")<{ message: string }> {}
class RateLimited extends Data.TaggedError("RateLimited")<{ retryAfter: number }> {}

const atTheBoundary = <A, R>(effect: Effect.Effect<A, NotFound | RateLimited, R>) =>
  effect.pipe(
    // The ones I expect: mapped to a 4xx, logged as a warning, never reported.
    Effect.catchTag("NotFound", (e) => Effect.succeed(http(404, e.message))),
    Effect.catchTag("RateLimited", (e) => Effect.succeed(retryAfter(e))),
    // After the modelled errors are handled, the channel is `never`: the only
    // way left to fail is a defect I did not see coming. That is what Sentry gets,
    // fingerprinted so the same fault clusters into one issue (and one agent).
    Effect.tapDefect((cause) => Sentry.captureScoped(cause, fingerprintOf(cause))),
    Effect.catchAllDefect(() => Effect.succeed(http(500, "Internal server error")))
  )

Run it locally, not in the cloud

Sentry ships its own agent, Seer. It is good. I still run my own, locally, and the reason is the entire thesis.

When the agent runs on my machine, it inherits my machine. I am already authenticated to everything. The cloud provider CLI can pull logs, metrics, and traces. The Vercel CLI can read deployment logs and environment config. The Sentry CLI can pull the full event detail. The agent can fetch the web. And critically, it runs against my exact stack: the same dependencies, the same database shape, the same services, the same code at the same commit.

A cloud agent has to reconstruct all of that. It works from the artifact you upload and the context you can hand it. A local agent reconstructs nothing, because it is already standing inside the environment that produced the error. It debugs from real telemetry with the real toolbelt, the same way I would, because it is using the tools I use.

This is the part people miss. Your development environment is the most powerful agent runtime you own. It is already wired to your observability, your deploys, and your secrets. Pointing an agent at it is not a downgrade from a hosted product. It is an upgrade, because nothing has been sandboxed away.

The bounds that keep it sane

An autonomous agent with live credentials and a webhook trigger is a great way to set money on fire if you are careless. Three bounds keep it safe.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

Three bounds: concurrency capped at one, a hard daily cap on fix attempts, and draft PRs only so nothing merges itself

Concurrency is capped at one. Only one agent runs at a time, so there is no thundering herd when a deploy goes bad and twenty issues open at once. I let the worker enforce this rather than a lock file:

// One agent at a time. Concurrency is a worker property, not a lock I can forget.
const worker = await Worker.create({
  taskQueue: "auto-fix",
  maxConcurrentActivityTaskExecutions: 1,
})

It is rate limited per day. There is a hard ceiling on how many fix attempts can run in twenty-four hours. A bad day cannot turn into a bad invoice:

// Hard ceiling per day. Surplus issues just wait; they do not pile up a bill.
const key = `auto-fix:${todayUtc()}`
const used = await redis.incr(key)
if (used === 1) await redis.expire(key, 60 * 60 * 24) // expire the counter at day roll-over
if (used > MAX_FIXES_PER_DAY) return { status: "rate_limited" }

Every output is a draft PR. Nothing merges itself. The agent proposes, I dispose, and the blast radius of a wrong fix is a branch I close.

The de-dupe key does double duty here. Once-per-issue means a flapping error cannot re-trigger the agent in a loop. The unique constraint is both the correctness guarantee and a cost control.

Why this is exactly how I operate

I do not think of errors as things to be notified about. I think of them as events that should trigger work. The same way a git push triggers CI/CD, a new issue should trigger a fix attempt. The dashboard is a fallback, not the primary interface.

Once you accept that framing, the marginal cost of a fix attempt collapses toward zero. I am not spending attention on triage. I am spending it reviewing a PR that already exists, written by an agent that already reproduced the bug with full access to my telemetry. The expensive, attention-heavy part (reproduce, gather context, form a hypothesis) is done by the time I look.

That is the compound move. I am not buying an incident-response product. I am wiring my own authenticated environment to an event stream and letting it take the first pass. Every error makes the system slightly better, and the cost of the next fix attempt keeps dropping. The factory gets faster, the products get cheaper to maintain, and my attention stays on the decisions that actually need a human.

When this applies, and when it does not

This works when you own your stack and your environment is reproducible locally. It works when a draft PR is an acceptable output and a human reviews before merge. It works when your error volume, grouped by issue, is low enough that one-per-issue with a daily cap covers it.

It does not replace on-call for genuine incidents, where the answer is “roll back now”, not “open a PR in twenty minutes”. It does not help with errors that need a canary account or production data the agent should never touch. And it is only as good as your observability: an agent debugging from thin telemetry is just guessing faster.

But for the long tail of real, reproducible bugs that would otherwise sit in a dashboard for a week, turning the error into an event that spawns an agent on my own machine is, I am now convinced, the way.

Sentry Errors Should Spawn Agents on Your Own Machine

The loop

De-dupe by issue ID, not by event

Only the unknown reaches Sentry

Run it locally, not in the cloud

The bounds that keep it sane

Read The Meta-Engineer

Why this is exactly how I operate

When this applies, and when it does not

Related

Become a better AI engineer

More Insights

Seniority Was a Proxy for Typing Speed

Which Code Is Allowed to Be Understood by Nobody