Your Own Life Is a Queryable, Validated Corpus

James Phoenix

Your private data exhaust deserves the same treatment as production data: indexed, validated, version-controlled, and queried by an agent. Once you make that move, writing a song, paying a tax bill, and updating a CV all become the same engineering problem.

Author: James Phoenix | Date: June 2026

Treat Your Life Like Production Data

Most people treat their personal data as something that happens to them. Messages pile up, tax numbers live in a portal, a CV rots in a Downloads folder. I treat all of it the same way I treat production data: as a corpus an agent can query, with the same discipline I would put on anything that ships. Index it, validate it, version it, and let the agent read it. Once you make that move, three unrelated chores turn into the same engineering problem.

The Messages Database Is Just SQLite

I wanted to write a song and I wanted it to sound like a real relationship, not a greeting card. The most authentic source of phrasing I have is years of my own messages, so I pointed the agent at them.

macOS keeps every iMessage in a plain SQLite file at ~/Library/Messages/chat.db. You can open it directly:

sqlite3 ~/Library/Messages/chat.db \
  "SELECT h.id, COUNT(m.ROWID) AS msg_count
   FROM handle h
   LEFT JOIN message m ON m.handle_id = h.ROWID
   GROUP BY h.id ORDER BY msg_count DESC LIMIT 10"

That query is doing identity resolution. I never had to tell the agent which contact to read. The handle with the highest message count, by a wide margin, is the person I talk to most, so ranking by msg_count is enough to pick the right thread automatically. That is the first lesson: in personal corpora, identity is a statistic, not a lookup.

Then the trap. A naive SELECT text FROM message comes back mostly empty:

sqlite3 ~/Library/Messages/chat.db \
  "SELECT COUNT(*),
          COUNT(text),
          COUNT(CASE WHEN text IS NOT NULL AND length(text) > 0 THEN 1 END)
   FROM message"

The counts do not match. On modern macOS the message body is not in the text column at all for a large fraction of rows. It is serialized into attributedBody, a binary NSAttributedString blob, and the plain text field is left null. If you stop at SELECT text you silently lose most of your data and never know it. You have to decode the blob, which means a small Python pass over the rows rather than pure SQL. The date column is its own puzzle: Apple stores nanoseconds since the 2001-01-01 epoch, so every timestamp needs m.date/1000000000 + strftime('%s','2001-01-01') before it means anything.

None of this is exotic. It is exactly the work you do when you reverse engineer any production table whose schema lies to you. The payoff was real source material: actual phrases and in-jokes pulled out of years of conversation, handed to the lyric prompt as ground truth instead of invention.

The point is not the song. The point is that an evening of “help me write something personal” became a database reverse-engineering task, and the agent was good at it precisely because it is a database reverse-engineering task.

Tax as a Validated JSON Artifact

My tax position is not a spreadsheet and it is not a number in a government portal. It is a scenario.json file that lives in my dotfiles repo under version control. Every time something changes (I pay a payment on account, I sync my FreeAgent companies) the agent edits that one file, validates it against my own tax calculator API, and only then commits.

TAX_KEY=$(op read "op://understanding-data/UnderstandingData TAX_CALC_API_KEY/password")
curl -sX POST 'https://understandingdata.com/api/tax/calculate' \
  -H "Authorization: Bearer $TAX_KEY" \
  -H 'Content-Type: application/json' \
  --data @tax/scenario.json

The validation step is the part people skip. The scenario is not trusted because I edited it. It is trusted because the live calculator accepted it and returned numbers that move in the direction I expected. When I drop a fresh export onto the desktop and ask the agent to use it, the safest thing it can do is diff the resulting numeric output against the previous run. A wrong-file mistake (an old export, a stale dividend split) shows up as a tax bill that jumped by thousands for no reason, and the delta catches it before the commit lands. Secrets stay in 1Password and are read at the point of use, never pasted into the file.

So the workflow is: edit one canonical artifact, validate it against a real API, diff the result, commit with a message that says what changed. That is how you treat a production config. It is also now how I treat my own tax.

The CV Is a Versioned Build With an Audit

The same shape covers my CV. It is not a single Word document I overwrite. It is versioned HTML variants in a small registry, each one a build, with an index.md that says which version targets what. And every variant goes through an adversarial honesty pass before it is allowed out.

That audit is the interesting bit. For every metric on the page the agent has to split the claim into what is true now and what was demonstrated under test, so the line survives an interview probe rather than collapsing under one follow-up question. “Scaled to 100K+” becomes a real current load plus a load-tested capacity, stated separately, because a claim you cannot defend in the room is a liability, not an asset. The CV is held to the same standard as code that has to pass review: every assertion has to be reproducible.

Leanpub Book

Read The Meta-Engineer

A practical book on building autonomous AI systems with Claude Code, context engineering, verification loops, and production harnesses.

Continuously updated

Claude Code + agentic systems

View Book

The General Move

Three chores, one method:

Messages: a SQLite store with a lying schema, queried after you understand its binary columns and its epoch.
Tax: a version-controlled JSON artifact, validated against a live API, with numeric-delta diffing as the regression check.
CV: versioned build variants in a registry, gated by an adversarial claim audit.

The unifying belief is that your personal data exhaust is not noise to be ignored or a chore to be suffered. It is a corpus. The instant you give it production-grade treatment, indexed and validated and version-controlled and readable by an agent, it stops being admin and starts being infrastructure. The agent is good at this work for the same reason it is good at touching your real systems: querying a backing store, decoding a format, validating against an oracle, and diffing the result are exactly what it does all day. Your life is just one more set of tables it has not been pointed at yet.

Your Own Life Is a Queryable, Validated Corpus

Treat Your Life Like Production Data

The Messages Database Is Just SQLite

Tax as a Validated JSON Artifact

The CV Is a Versioned Build With an Audit

Read The Meta-Engineer

The General Move

Become a better AI engineer

More Insights

Measuring Coding Agent Leverage

Using DSL Languages for LLM Harnesses