Six convictions that shape how I build.

jamesaphoenix — zsh — 80×24

belief.01

Last login: Mon Apr 13 16:41:12 on ttys002

jamesaphoenix@Mac ~/beliefs %

belief system ready.

type 'ls' to inspect the directory.

type 'cat belief.01' to open the first belief.

type 'help' for the supported commands.

commands: ls, cat belief.01, cat FINAL_DIRECTIVE.md, cat known_limitations.md, next, prev, clear

shortcuts: up/down history, left/right paginate, ? help, esc clear

belief system ready.

type 'ls' to inspect the directory.

type 'cat belief.01' to open the first belief.

type 'help' for the supported commands.

belief.01

belief.02

belief.03

belief.04

belief.05

belief.06

FINAL_DIRECTIVE.md

known_limitations.md

Supported commands:

cat belief.01 ... cat belief.06

cat FINAL_DIRECTIVE.md

cat known_limitations.md

next / prev

help

clear

[ BOSS BELIEF / THE META-TRUTH ]

AI development is an optimisation problem under constraints. Not a programming problem.

everything above is that one sentence, applied five different ways.

[ SIDE QUEST / WHERE THE HARNESS LEAKS ]

limit.01

Scoring functions can be wrong

When a loop optimises against the wrong metric, the agent gets better at the wrong thing. I don't always catch it on the first run.

limit.02

The harness can become its own rabbit hole

Building scaffolding is load-bearing. It also compounds complexity. Knowing when to stop building the factory and start building the product is a judgment call I keep recalibrating.

limit.03

Constraints catch bugs, not bad decisions

Types and lint rules prevent a class of errors. They do nothing for a flawed product bet, a wrong abstraction, or a misread of what the user actually needs. Taste still matters.

limit.04

The model keeps moving

Patterns that worked on a previous model iteration don't always transfer. Everything here is a snapshot of what I currently believe, not a fixed doctrine.

[ BELIEF 01 / STOCHASTICITY ]

Treat every model output as an untrusted sample

> The model does not know. It generates. Correctness has to be established outside the model.

without the constraint

- Trust a single pass

- Ship what the first prompt returns

- Confuse confidence with correctness

- Variance you can't see

with the constraint

+ Assume outputs are wrong by default

+ Wrap every generation in tests, invariants, and evaluators

+ Re-run, diff, and repair until the harness is green

+ Promote only outputs that survive verification

how this shows up in my work

· Named invariants like INV-BILLING-008 that every run must satisfy

· Property-based tests on generated code

· Pre-commit gates that block regressions before they reach main

· Agent outputs are treated as candidates, not conclusions

use 'next' or 'prev' to paginate.

[ BELIEF 02 / CONSTRAINTS ]

Constraints create reliability

> The tighter the constraints, the smaller the valid solution space, the more repeatable the output.

without the constraint

- Free-form prompts

- Unbounded search space

- Works once, breaks twice

- Invalid states everywhere

with the constraint

+ Type systems, schemas, lint rules, invariants

+ Shrink the space the model can explore

+ Make invalid states unrepresentable

+ Consistency by construction

how this shows up in my work

· Type-driven development with TypeScript strict mode, Effect schemas, and Effect layers at every boundary

· Custom ESLint rules that encode project conventions before bad patterns spread

· Domain-driven design to keep capabilities loosely coupled behind explicit boundaries

· Use the right language for the right job: Effect for typed orchestration, Temporal for durable workflows, Python for AI/ML

use 'next' or 'prev' to paginate.

[ BELIEF 03 / FEEDBACK ]

Feedback loops beat one-shot prompting

> Quality comes from iteration, not from a better first prompt.

without the constraint

- Generate once, hope

- No way to score the output

- Debugging is vibes

- Loops that never stop

with the constraint

+ Generate, evaluate, correct, repeat

+ A closed-loop harness with a real scoring function

+ Every loop has a budget, a stopping condition, and a score

+ Every run produces signal you can read

how this shows up in my work

· The RALPH loop running specs end-to-end against named invariants

· Actor and critic pairs. One agent writes, one reviews

· Scheduled loops against invariants so bugs get fixed overnight

· Stop when invariants pass, evaluator score clears the bar, or retries hit the budget without new signal

use 'next' or 'prev' to paginate.

[ BELIEF 04 / ENVIRONMENT ]

Environment design beats prompt engineering

> The system around the model determines the outcome. Not the prompt.

without the constraint

- Tuning wording forever

- Flaky state, flaky results

- Every run starts from zero

with the constraint

+ Build the harness

+ Isolate state

+ Define inputs and outputs

+ Control execution

how this shows up in my work

· Custom git worktree scripts that spin up a fresh API worker and web UI per branch with md5-hashed ports

· Per-worktree Postgres schemas on a single shared container. No ten-container local stack

· Simple bcrypt auth so the test harness has no external dependencies

use 'next' or 'prev' to paginate.

[ BELIEF 05 / THROUGHPUT ]

Throughput scales through parallel, isolated systems

> You don't scale AI by making it smarter. You scale it by running more of it safely.

without the constraint

- One agent, one branch, one database

- Serial work, shared state, flaky tests

with the constraint

+ Git worktrees per task

+ Isolated schemas per worktree

+ Parallel agents that can't collide

how this shows up in my work

· Factory functions (createUser, createOrg, signInUser, createData) make every test self-contained

· Global teardown per run. Zero flake, infinite tests

· Creating a new Postgres table triggers an ESLint rule that ensures each new table has a test factory, like a users table with createUser(), so parallelism stays cheap

use 'next' or 'prev' to paginate.

[ BELIEF 06 / OBSERVABILITY ]

What cannot be observed cannot be improved

> Feedback loops only work when every run leaves behind readable signal.

without the constraint

- Failures disappear into chat logs

- Passing and correctness get confused

- No memory of what changed between runs

with the constraint

+ Every run emits traces, scores, diffs, costs, and failure reasons

+ Regressions are classified, not just noticed

+ The system gets easier to debug every time it fails

how this shows up in my work

· Every agent run writes structured telemetry: inputs, outputs, scores, invariant failures, retries, and patch diffs

· Failure taxonomies separate flaky infra, bad specs, weak prompts, invalid assumptions, and real regressions

· Dashboards track invariant pass rate, repair rate, token cost, time to green, and repeated failure clusters

use 'next' or 'prev' to paginate.

Become a better AI engineer