Human in the Loop for AI Agents

Principle

Autonomy should be granted by risk tier, not by enthusiasm.

Why it matters

Human review is not a failure of automation. It is how production systems keep judgement where judgement belongs. The design work is deciding which actions need approval, what evidence the reviewer sees, how the agent resumes, and how the system avoids turning every task into a meeting.

Build this

Risk tiers for actions: safe read, reversible write, external send, destructive change, financial or legal impact.
Approval payloads that show intent, diff, evidence, blast radius, and rollback plan.
Resume logic so the agent continues from the approved decision instead of starting over.
Escalation paths for uncertainty, repeated failure, policy conflict, and user disagreement.

Watch for

Review prompts that ask for approval without showing what will actually happen.
Humans approving too many low-risk steps and missing the important ones.
Agents continuing after a rejection without understanding the reason.
Approval decisions missing from the audit trail.

Proof it works

Dangerous actions cannot execute without the required approval record.
A rejected action changes the next agent step, not just the UI state.
Reviewers can understand the request without reading raw traces.

Implementation checklist

Define risk tiers before building the approval UI.

Use diffs, previews, and linked evidence rather than generic confirmation text.

Capture who approved, what they approved, when, and against which artifact version.

Measure approval burden so guardrails do not become noise.

Related dictionary terms

Human in the loop Permission mode Permission request

Put human review where it changes outcomes