Computer Use Kills the Config Tax, Not the Trust Tax

James Phoenix

My sister hates job applications because they make her re-submit information she already has. That is the same pain as API app review, and the same agent that lives in my codebase can dissolve both. This feels insane, and it is the new default shape of the work.

Author: James Phoenix | Date: June 2026

It Started With My Sister

My sister was applying for jobs last month and said something that stuck with me. The part she hated was not the rejection or the competition. It was that every application made her submit information she already had. Her CV exists. Her work history is written down. But each company’s portal makes her re-type it into a slightly different set of boxes, re-attach the same document, re-enter the same dates, answer the same questions in a new shape. The information is finished. The act of moving it by hand into one more bespoke form is the pain.

I sat with that, because I had been doing the exact same thing all week without naming it. I was onboarding OctoSpark to six posting platforms: X, TikTok, Instagram, Facebook, YouTube, and LinkedIn. Each one needs a registered app, an OAuth client, the right scopes, redirect URIs for staging and prod, and eventually a production review. And it hit me that app review and OAuth setup is the same shape as my sister’s job applications. My codebase already knows the redirect URIs. It already knows the scopes, and which OAuth client is which. I was being the human clipboard between a source of truth that already existed and a dumb form that would not read it.

That is the universal pain: re-entering information you already have. And it is exactly the pain Claude Code or Codex dissolves, because a coding agent can hold the canonical source and fill the form itself. So I ran the experiment. I pointed Claude Code at my real browser via the Playwright CLI and walked through Google first, one platform at a time, to see how much of “submit what you already have” a machine could carry.

Two Costs, And Only One Needs Me Watching

What I found is that provider onboarding is two different costs wearing one coat.

Two costs, one coat: the config tax is re-entering what I already have and is safe to run unwatched, the trust tax is proving who I am and pulls me in

The config tax is the re-entry tax. Create a project. Enable an API. Paste four scopes. Add two redirect URIs. Register a test user. Create an OAuth client and copy the ID and secret into a secrets manager. Then do all of it again for prod, and again for the next platform. None of it is hard. Every value already exists somewhere, in my repo or my head, and the work is purely transcription: moving known information into yet another form. This is my sister’s pain, and it is where the hours go.

The trust tax is human-gated on purpose. Business verification with legal documents. App review with a screencast of the exact scope in use. Google’s separate audits for sensitive scopes like youtube.upload. TikTok’s posting audit. LinkedIn’s partner products that small apps routinely get rejected from. This is not transcription. It is a reviewer making a yes or no call, and several steps cannot be automated even in principle.

The reason the workflow works is that those two costs map almost perfectly onto “safe to run unwatched” and “needs a human.” The config tax is deterministic transcription against known-good values, so an agent can grind it with no one watching. The trust tax is exactly the set of moments where a human must intervene anyway. The boundary I have to police is not arbitrary. The work tells me where it is.

Claude Code or Codex Is Inside My Setup, Not Driving a Browser From Outside

The reason I can leave it alone is that this is not a generic browser bot poking at a portal blind. It is Claude Code or Codex, running on my machine, in my repo, with my tools. It already holds the information the forms are begging me to re-type. I ran this round with Claude Code, but nothing about it is Claude-specific. Any coding agent that lives in the terminal with my files and secrets fits the same slot.

When the Google console asked for a redirect URI, the agent did not guess and did not ask me. It read packages/contracts/src/provider-scopes.ts and the worker env config, found that the code expects a separate YOUTUBE_OAUTH_CLIENT_ID distinct from the existing GOOGLE_OIDC_CLIENT_ID, and pulled https://api-staging.octospark.ai/v1/oauth/youtube/callback straight out of the host constants. It filled the form from my source of truth, not from my memory.

That is one agent holding four surfaces at once:

The codebase, which already contains what the portal is asking for (scopes, callbacks, which client is which).
The browser, where it does the transcription against my live, authenticated session.
The shell and 1Password CLI, where it checks which secrets already exist and writes the new client ID and secret back to where the code expects them.
The live portal state, which it reconciles against the code. The project already existed. The login client existed. The YouTube client did not. It discovered all of that from the page and only created the missing piece.

A human doing this is the integration glue between four tools, tabbing between a portal, a repo, a password manager, and a terminal, fat-fingering a scope string at 11pm. The agent is the glue. It reads the canonical value, drives the portal, captures the credential, files it correctly, and never asks me to re-enter what already exists. That is the part that feels insane. The same process owns the code, the secrets, the shell, and the browser, so the gap between “what the form wants” and “what I already have” simply closes.

This is the exact thing my sister needs. An agent that holds her CV as the source of truth and fills each portal from it, so she never re-types her own history again.

The human clipboard, dissolved: source of truth flows through the agent into the provider forms unwatched, and only the gates escalate to me

How I Actually Drive It

The whole thing runs on the Playwright CLI, and the setup is deliberately boring. I launch one real, headed Chrome with a persistent profile and log into the four accounts by hand, once. That session sticks, so every later command reuses my authenticated cookies. No login automation, no captchas, no 2FA fights. I did the human part; the agent does the robot part.

# 1. Launch a real Chrome on a persistent profile, then log in by hand once.
#    The named session keeps my cookies alive for every command that follows.
playwright-cli open --session octospark

# 2. From here it is a tight loop the agent runs on its own:
#    go to a page, read the accessibility tree, act on a referenced element.
playwright-cli -s=octospark goto "https://console.cloud.google.com/auth/clients?project=octospark-staging"
playwright-cli -s=octospark snapshot       # returns labelled refs: button "Enable" [ref=e193] ...
playwright-cli -s=octospark click e193     # act on the element by its ref, not by pixel coords
playwright-cli -s=octospark type e210 "https://api-staging.octospark.ai/v1/oauth/youtube/callback"

The important detail is snapshot. It does not hand the agent a screenshot to squint at. It returns a structured accessibility tree where every actionable element has a stable ref, so the agent reasons over labels like button "Enable" instead of guessing pixel coordinates. That is what makes the loop reliable enough to leave alone: read the tree, narrate the next click, act on a ref, snapshot again. When it reaches a consent grant or a billing page, it stops and hands the same session back to me. I take over in the exact window it was driving, do the sensitive step, and let it pick back up.

I Supervise the Exceptions, Not the Work

The config tax was never expensive in wall-clock time. It was expensive in continuous attention. Re-typing an OAuth setup takes twenty minutes, but it is twenty minutes of you, fully present, unable to do anything else. Six platforms times two environments and your afternoon is gone, not because the work is long but because it is pinned to your eyes.

Computer use converts that continuous attention into intermittent attention, and that is the actual unlock.

While the agent enabled the YouTube Data API, added scopes, registered a test user, and created the client, I was not watching it. I was writing this, answering messages, and lining up the next platform’s review materials. The agent works the queue and taps me on the shoulder only when it hits a locked door. I hold the keys. It does the walking.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

Two things make walking away safe rather than reckless:

The work itself is the boundary. The transcription steps cannot do damage with my accounts because they are deterministic config against known values. The dangerous steps, consent grants and billing and 2FA, are exactly the ones it is built to stop at. So “leave it running” and “keep the keyboard for sensitive actions” are the same rule, not two competing ones.
It narrates before it acts. Claude snapshots the page, tells me what it is about to click, then clicks. Because of that, I can context-switch away and trust it to halt at a consent screen rather than barrel through a payment page. The narration is what makes the supervision event-driven instead of time-driven. I am the escalation path, not the operator.

The human-in-the-loop stops being a person sitting in the loop and becomes a person the loop calls. One pins your whole afternoon. The other costs you four thirty-second decisions spread across an hour while you do real work.

Where It Actually Pulled Me In

The handoffs were small and crisp every time.

I kept staging in Testing mode, so the sensitive YouTube scopes work for a named test user with zero verification. That is a real shortcut around the trust tax, not through it: the moment prod needs real users, the audit is waiting, and no amount of Playwright makes a reviewer approve faster. So the agent flagged it and moved on rather than pretending it could close that gap.

A friction exposed the seam: I tried to add james@justunderstandingdata.com as a test user and Google rejected it because that address is not a Google identity. The agent did not stall or hallucinate a fix. It surfaced the exact error, dropped the bad address, kept the valid one, and asked me a one-line question. That is the texture of good escalation. It does not hand me “it didn’t work.” It hands me a decision with the context already gathered.

And the line I held throughout: consent grants, 2FA, password re-entry, and anything touching billing are mine. I am not giving an autonomous loop unsupervised control of my primary identity across every logged-in tab. The arrangement only works because the agent does the transcription between those gates and stops cleanly at each one.

The Real Collapse Is Attention

The cost that collapses here is not clock time, it is attention, and attention is the scarcer resource. The config tax is a large, multiplicative, deeply boring matrix of information I already have: six platforms, two environments, a handful of scopes and clients each. That matrix is exactly what an agent with my codebase, my secrets, and my browser is good at, and exactly what I am bad at after the third identical client.

So the right mental model is not “automate provider onboarding.” It is:

Run the re-entry unwatched. Real session, repo as source of truth, secrets written back automatically. A pinned afternoon per platform becomes a background process I glance at.
Schedule the trust tax early. The verifications and audits are calendar-bound. Kick them off on day one and let them run while the transcription happens.
Spend my attention only at the gates. Account choices, ambiguous decisions, consent, billing. Four small decisions, not four hours.

Computer use did not make platform onboarding free. It made the boring 80%, the part that is just submitting what I already have, require none of my attention, and put a bright line around the 20% that was always going to need a human.

The Meta

This is why my sister’s complaint and my week converged. Almost every “go set up the third-party thing” task is the same two-part shape: a large surface that is pure re-entry of information you already have, safe to run unattended, and a smaller trust surface that is gated on purpose and pulls you in. Job applications, API onboarding, KYC forms, insurance, tax portals. The drudgery is always being the human clipboard between a canonical source and a form that refuses to read it.

The leverage is not a browser bot. It is an agent that lives inside your setup, the code or the CV, the secrets, the shell, and the browser at once, transcribes the known information with no one watching, and calls you by name at the exact moments a human is required. The default shape of this work used to be a person pinned to a screen re-typing things they already had. Now it is a person doing real work while a process carries the re-entry and escalates the irreducible decisions.

I started this to onboard six API providers. I finished it thinking I should build the same thing for my sister.

Computer Use Kills the Config Tax, Not the Trust Tax

It Started With My Sister

Two Costs, And Only One Needs Me Watching

Claude Code or Codex Is Inside My Setup, Not Driving a Browser From Outside

How I Actually Drive It

I Supervise the Exceptions, Not the Work

Read The Meta-Engineer

Where It Actually Pulled Me In

The Real Collapse Is Attention

The Meta

Related

Become a better AI engineer

More Insights

Seniority Was a Proxy for Typing Speed

Which Code Is Allowed to Be Understood by Nobody