Closed-Loop Agent Observability: IDs + Injected Prompts

James Phoenix

The typical approach to improving your local coding agent observability when generating features is to add logging and hope the agent reads it after the fact. That is backwards. The right pattern is to inject correlation IDs and operator prompts at the exact moment state is created, so the agent can self-close the loop without any human in the middle.

I discovered this concretely while debugging TikTok canary runs. The script had posted a video and was silently polling. When I queried the staging API manually, the IDs were there all along: postId, scheduledPostId, workflowId. The agent running that script had no way to know those IDs existed until the whole cell finished. That is the failure mode: the agent is blind during the most interesting window.

The Two-Part Pattern

Most implementations get one of these right. The compound pattern needs both.

Part 1: Emit correlation IDs immediately, not at the end. As soon as the system creates a resource, print the ID. Do not wait for terminal state. The moment post_scheduled fires, the script should print:

created postId=017936e4-...
scheduled scheduledPostId=979a0647-... workflowId=social-publish-979a0647-...

These IDs are the handles. Everything downstream (Temporal, Cloud Logging, API polling) pivots on them.

Part 2: Inject a contextual prompt alongside the IDs. This is the part most implementations miss. Alongside those IDs, the script should emit exact CLI commands the agent can run right now:

temporal workflow describe --workflow-id social-publish-979a0647-... --namespace staging --address 100.123.72.113:7233 --output json
gcloud logging read 'timestamp>="2026-06-26T10:07:00Z" AND labels."context.scheduledPostId"="979a0647-..."' --project octospark-staging --limit=100 --format=json

These are not documentation. They are a prompt dynamically injected into the agent’s stdout at the precise moment the relevant state exists. The agent does not have to know the Temporal address, the gcloud project, or the label schema. It gets told, right there, what tools it has and what commands to run.

Why This Changes the Loop

Without this pattern, a coding agent running a canary is in a passive role. It fires the mutation, waits N seconds, reads a result, and makes a binary pass/fail call. If something goes wrong mid-flight, it cannot investigate. It has no handles.

With this pattern, the agent becomes active. When a cell stalls, it can run the Temporal describe command, read the event history, see that the worker is not polling the task queue, and conclude: worker is down, not provider error. That is a qualitatively different diagnosis from “timed out.”

The key insight is that you are not just giving richer output. You are giving the agent its next move. The prompt injection is the bridge between “I created something” and “here is what to do if it misbehaves.”

Closed-loop agent harness: run, emit IDs, poll, branch on pass/fail

The Concrete Value: Infrastructure vs Provider Disambiguation

This came up immediately in practice. A TikTok canary cell stalled. The naive read was: TikTok OAuth is broken, or the content posting scope is missing, or the provider is rate-limiting.

Because the canary emitted the workflowId the moment post_scheduled fired, I could run the injected Temporal command and check task queue pollers. The result: zero pollers on tx-agent-kit-staging. The staging worker was crash-looping due to a TLS handshake failure against Temporal Cloud, completely unrelated to TikTok.

Without the IDs, I would have debugged the wrong layer for an unknown amount of time. With them, the diagnosis took one command. The failure was infrastructure, not provider. That distinction matters enormously when you are deciding whether to fix OAuth scopes, restart a worker, or rotate a certificate.

This is the compounding return on the pattern. Every canary run that emits IDs gives you faster triage on the next failure, regardless of which layer the failure lives in.

Code Shape

The implementation is small. Three pieces.

1. Lifecycle events fired from the seam, not the poller.

type CanaryLifecycleEvent =
  | { kind: 'post_created'; postId: string }
  | { kind: 'post_scheduled'; postId: string; scheduledPostId: string; workflowId?: string; socialAccountId: string }
  | { kind: 'status_polled'; observation: { status: string; platformPostId?: string; errorCategory?: string } }

// Pass into the seam at construction time
const seam = makeStagingCanarySeam({
  socialAccountId: accountId,
  onLifecycleEvent: (event) => lifecycleLogger?.(event),
})

2. The observabilityHints function – parameterized, exact, ready to copy-paste.

const observabilityHints = (ids: CanaryCorrelationIds): string[] => {
  const hints: string[] = []

  hints.push(
    `temporal task-queue describe --task-queue ${TEMPORAL_TASK_QUEUE} ` +
    `--namespace ${TEMPORAL_NAMESPACE} --address ${TEMPORAL_ADDRESS} --output json`
  )

  if (ids.workflowId) {
    hints.push(
      `temporal workflow describe --workflow-id ${ids.workflowId} ` +
      `--namespace ${TEMPORAL_NAMESPACE} --address ${TEMPORAL_ADDRESS} --output json`
    )
  }

  const window = { start: new Date(Date.now() - 30 * 60_000).toISOString(), end: new Date(Date.now() + 10 * 60_000).toISOString() }
  const terms = [
    ids.postId && `labels."context.postId"="${ids.postId}"`,
    ids.scheduledPostId && `labels."context.scheduledPostId"="${ids.scheduledPostId}"`,
    ids.workflowId && `jsonPayload.workflowId="${ids.workflowId}"`,
  ].filter(Boolean)

  if (terms.length > 0) {
    hints.push(
      `gcloud logging read '${`timestamp>="${window.start}" AND (${terms.join(' OR ')})`}' ` +
      `--project ${GCLOUD_PROJECT} --limit=100 --format=json`
    )
  }

  return hints
}

3. The lifecycle logger: prints IDs and hints as soon as they exist, not at the end.

const makeLifecycleLogger = (label: string) => {
  return (event: CanaryLifecycleEvent): void => {
    if (event.kind === 'post_created') {
      console.log(`  created postId=${event.postId}`)
    }
    if (event.kind === 'post_scheduled') {
      const ids = { postId: event.postId, scheduledPostId: event.scheduledPostId, workflowId: event.workflowId, socialAccountId: event.socialAccountId }
      console.log(`  scheduled scheduledPostId=${event.scheduledPostId}${event.workflowId ? ` workflowId=${event.workflowId}` : ''}`)
      console.log('  ids:', JSON.stringify(ids))
      console.log('  observability:')
      for (const hint of observabilityHints(ids)) {
        console.log(`    ${hint}`)
      }
    }
    if (event.kind === 'status_polled') {
      const o = event.observation
      console.log(`  poll ${label}: status=${o.status}${o.platformPostId ? ` platformPostId=${o.platformPostId}` : ''}${o.errorCategory ? ` errorCategory=${o.errorCategory}` : ''}`)
    }
  }
}

The key constraint: observabilityHints must be called inside the lifecycle logger on post_scheduled, not after scheduleCanaryCell returns. By the time the function returns, the interesting window has already closed.

The Same Pattern in Test Harnesses

I use a version of this in vitest setup hooks. When a test suite sets up integration state (seeding a DB, starting a fake provider sidecar), the setup hook can print the connection string, the seed run ID, or the server port alongside the exact query to inspect state:

psql postgres://localhost:5432/test_db -c "SELECT * FROM scheduled_posts WHERE id = '979a0647-...'"

A failing test now has an immediate diagnostic path. The agent does not need to re-read the test setup file to figure out what database to query.

The pattern generalizes: any time a harness creates ephemeral state that might need inspection, emit the correlation ID and the tool command in the same stdout breath.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

Diagnosing Missing Infrastructure

A useful extension is to detect when the tool itself is unreachable and say so explicitly, rather than silently timing out. In the TikTok run, Temporal was unreachable from the local machine. The canary script should catch that connection failure and print:

WARN: temporal unreachable at 100.123.72.113:7233 (context deadline exceeded)
      Temporal CLI commands above are logged for when you have network access.
      Use gcloud logging as the fallback.

This is still a prompt injection. It redirects the agent to the fallback tool rather than leaving it with a dead command.

Broader Principle

Every place a harness creates state that an agent might need to debug, that harness should emit the tool commands for debugging that state. Not as documentation in a README. Not as a generic “you can use Temporal CLI.” As exact, parameterized, copy-paste-ready commands injected at the moment the relevant IDs exist.

This is what makes a closed-loop harness genuinely closed. The agent gets the handle and the next move in the same message. It does not have to reason about what commands exist or how to parameterize them. It can act.

The compound effect: over time you build a harness that is self-diagnosing. Each new provider, each new workflow type, each new failure mode adds one more observabilityHints branch. The agent that runs canaries in six months will have a richer set of diagnostic moves than the one running them today, without any change to the agent itself.

Closed-Loop Agent Observability: IDs + Injected Prompts

The Two-Part Pattern

Why This Changes the Loop

The Concrete Value: Infrastructure vs Provider Disambiguation

Code Shape

The Same Pattern in Test Harnesses

Read The Meta-Engineer

Diagnosing Missing Infrastructure

Broader Principle

Become a better AI engineer

More Insights

Never Build the First Version: Async Design Variations with Claude Code

Ask Your Agent to Create a Live Progress Report