Sentry Errors Should Spawn Agents on Your Own Machine

James Phoenix
James Phoenix

A new production error is an event. Events should trigger work, not sit in a dashboard. So I wired Sentry to spawn a coding agent on my own hardware, point it at my exact stack, and open a draft PR with a fix.


The loop

The shape is simple. A new error reaches Sentry. Sentry fires a webhook at an endpoint I own. That endpoint checks one thing: have I seen this issue before? If it is genuinely new, it starts a job. The job spins up a coding agent (Codex or Claude Code) on my Mac Studio, hands it the error, the stack trace, and the issue link, and lets it reproduce, diagnose, and fix the bug against a local checkout of my real codebase. When the agent is done, it pushes a branch and opens a draft PR. I review it like any other PR.

That is the entire system. A webhook, a de-duplication check, an agent, a PR. No dashboard babysitting, no copy-pasting stack traces into a chat window, no context reconstruction. The error itself is the trigger.

The auto-fix loop: a new Sentry issue fires a webhook that de-dupes by issue ID, then a coding agent on my Mac Studio reproduces the bug against my exact stack and opens a draft PR
The auto-fix loop: a new Sentry issue fires a webhook that de-dupes by issue ID, then a coding agent on my Mac Studio reproduces the bug against my exact stack and opens a draft PR

De-dupe by issue ID, not by event

The single most important design decision is the de-duplication key, and the right key is Sentry’s issue ID.

Sentry already does the hard part. It fingerprints raw events and groups them into issues. One logic bug throwing ten thousand times a day is one issue, not ten thousand events. If I keyed off events, I would spawn ten thousand agents and a very large bill. If I key off the issue ID, I spawn exactly one agent the first time an issue appears, and never again for that issue.

So the webhook handler does one atomic thing: insert a row keyed on the issue ID, and only start the agent if that insert was new. A duplicate webhook is a no-op. The whole guarantee is expressed as a unique constraint, and the issue is the unit of work because the issue is the unit of bug.

// One row per Sentry issue. The unique index on sentry_issue_id IS the guarantee.
const [run] = await db
  .insert(autoFixRuns)
  .values({ sentryIssueId: issue.id, status: "pending" })
  .onConflictDoNothing({ target: autoFixRuns.sentryIssueId })
  .returning()

if (!run) return ok()                  // duplicate webhook, nothing to do
await startAutoFixWorkflow(issue.id)   // genuinely new issue, spawn the agent

Everything downstream inherits that cleanliness. A flapping error that throws all night is still one row, one agent, one PR.


Only the unknown reaches Sentry

I split errors into two kinds: the ones I expect and already handle (validation failures, known retries, deliberate guards), and the ones I did not see coming. Only the errors I did not see coming reach Sentry, which keeps it a high-signal map of the domain I have not finished exploring rather than a noise feed, so every issue that arrives is genuinely worth spawning an agent for.

In Effect this is not a convention I have to remember, it is the type system. Expected failures are tagged values in the typed error channel, and the compiler will not let a boundary forget to handle them. Whatever is left after I have handled the modelled errors is a defect, an error I never anticipated, and that is the only thing the boundary fingerprints and reports:

import { Data, Effect } from "effect"

// Expected failures are tagged values in the typed error channel. The compiler
// forces a handler for each, so a known fault can never leak to Sentry by accident.
class NotFound extends Data.TaggedError("NotFound")<{ message: string }> {}
class RateLimited extends Data.TaggedError("RateLimited")<{ retryAfter: number }> {}

const atTheBoundary = <A, R>(effect: Effect.Effect<A, NotFound | RateLimited, R>) =>
  effect.pipe(
    // The ones I expect: mapped to a 4xx, logged as a warning, never reported.
    Effect.catchTag("NotFound", (e) => Effect.succeed(http(404, e.message))),
    Effect.catchTag("RateLimited", (e) => Effect.succeed(retryAfter(e))),
    // After the modelled errors are handled, the channel is `never`: the only
    // way left to fail is a defect I did not see coming. That is what Sentry gets,
    // fingerprinted so the same fault clusters into one issue (and one agent).
    Effect.tapDefect((cause) => Sentry.captureScoped(cause, fingerprintOf(cause))),
    Effect.catchAllDefect(() => Effect.succeed(http(500, "Internal server error")))
  )

Run it locally, not in the cloud

Sentry ships its own agent, Seer. It is good. I still run my own, locally, and the reason is the entire thesis.

When the agent runs on my machine, it inherits my machine. I am already authenticated to everything. The cloud provider CLI can pull logs, metrics, and traces. The Vercel CLI can read deployment logs and environment config. The Sentry CLI can pull the full event detail. The agent can fetch the web. And critically, it runs against my exact stack: the same dependencies, the same database shape, the same services, the same code at the same commit.

A cloud agent has to reconstruct all of that. It works from the artifact you upload and the context you can hand it. A local agent reconstructs nothing, because it is already standing inside the environment that produced the error. It debugs from real telemetry with the real toolbelt, the same way I would, because it is using the tools I use.

This is the part people miss. Your development environment is the most powerful agent runtime you own. It is already wired to your observability, your deploys, and your secrets. Pointing an agent at it is not a downgrade from a hosted product. It is an upgrade, because nothing has been sandboxed away.


The bounds that keep it sane

An autonomous agent with live credentials and a webhook trigger is a great way to set money on fire if you are careless. Three bounds keep it safe.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated
Claude Code + agentic systems
View Book
Three bounds: concurrency capped at one, a hard daily cap on fix attempts, and draft PRs only so nothing merges itself
Three bounds: concurrency capped at one, a hard daily cap on fix attempts, and draft PRs only so nothing merges itself

Concurrency is capped at one. Only one agent runs at a time, so there is no thundering herd when a deploy goes bad and twenty issues open at once. I let the worker enforce this rather than a lock file:

// One agent at a time. Concurrency is a worker property, not a lock I can forget.
const worker = await Worker.create({
  taskQueue: "auto-fix",
  maxConcurrentActivityTaskExecutions: 1,
})

It is rate limited per day. There is a hard ceiling on how many fix attempts can run in twenty-four hours. A bad day cannot turn into a bad invoice:

// Hard ceiling per day. Surplus issues just wait; they do not pile up a bill.
const key = `auto-fix:${todayUtc()}`
const used = await redis.incr(key)
if (used === 1) await redis.expire(key, 60 * 60 * 24) // expire the counter at day roll-over
if (used > MAX_FIXES_PER_DAY) return { status: "rate_limited" }

Every output is a draft PR. Nothing merges itself. The agent proposes, I dispose, and the blast radius of a wrong fix is a branch I close.

The de-dupe key does double duty here. Once-per-issue means a flapping error cannot re-trigger the agent in a loop. The unique constraint is both the correctness guarantee and a cost control.


Why this is exactly how I operate

I do not think of errors as things to be notified about. I think of them as events that should trigger work. The same way a git push triggers CI/CD, a new issue should trigger a fix attempt. The dashboard is a fallback, not the primary interface.

Once you accept that framing, the marginal cost of a fix attempt collapses toward zero. I am not spending attention on triage. I am spending it reviewing a PR that already exists, written by an agent that already reproduced the bug with full access to my telemetry. The expensive, attention-heavy part (reproduce, gather context, form a hypothesis) is done by the time I look.

That is the compound move. I am not buying an incident-response product. I am wiring my own authenticated environment to an event stream and letting it take the first pass. Every error makes the system slightly better, and the cost of the next fix attempt keeps dropping. The factory gets faster, the products get cheaper to maintain, and my attention stays on the decisions that actually need a human.


When this applies, and when it does not

This works when you own your stack and your environment is reproducible locally. It works when a draft PR is an acceptable output and a human reviews before merge. It works when your error volume, grouped by issue, is low enough that one-per-issue with a daily cap covers it.

It does not replace on-call for genuine incidents, where the answer is “roll back now”, not “open a PR in twenty minutes”. It does not help with errors that need a canary account or production data the agent should never touch. And it is only as good as your observability: an agent debugging from thin telemetry is just guessing faster.

But for the long tail of real, reproducible bugs that would otherwise sit in a dashboard for a week, turning the error into an event that spawns an agent on my own machine is, I am now convinced, the way.


Related

Topics
Agent OrchestrationAutomationCode ReviewCoding Agents

Newsletter

Become a better AI engineer

Weekly deep dives on production AI systems, context engineering, and the patterns that compound. No fluff, no tutorials. Just what works.

Join 306K+ developers. No spam. Unsubscribe anytime.


More Insights

Cover Image for The Environment Leads The Agent

The Environment Leads The Agent

For a long time I tried to lead my coding agents with better and better prompts, and they kept drifting. What finally worked was the opposite move. As I optimised the boilerplate of the repository I was building, I kept pushing each lesson I learned down into the floor of the repo: hermetic environments, typed contracts, mechanical lint, integration-first tests, queryable telemetry. Somewhere along the way the repo itself became the thing steering the agent, and it asks me what to do far less than it used to. This is the journey that got me there, and what it taught me.

James Phoenix
James Phoenix
Cover Image for Your Own Life Is a Queryable, Validated Corpus

Your Own Life Is a Queryable, Validated Corpus

Your private data exhaust deserves the same treatment as production data: indexed, validated, version-controlled, and queried by an agent. Once you make that move, writing a song, paying a tax bill, and updating a CV all become the same engineering problem.

James Phoenix
James Phoenix