Core idea
A high-leverage loop for agentic engineering is:
- Let agents use your custom CLI/tools in real tasks
- Observe where they fail, hesitate, or misuse interfaces
- Convert those learnings into improved skills/instructions/tool UX
- Feed those improvements back into the next agent runs
- Repeat quickly
This creates a practical form of in-context recursive self-improvement: the system gets better at using itself through tight feedback cycles.
Why this matters
- You improve the operating layer (skills, prompts, wrappers), not just one task outcome.
- Reliability compounds: fewer repeated tool mistakes over time.
- Agents become more autonomous because instructions and tool affordances become clearer and more deterministic.
Practical implementation pattern
- Capture run telemetry: failed commands, retries, ambiguity points.
- Maintain a small error/lesson log per tool.
- Update skill docs + tool wrappers after each significant failure class.
- Add explicit examples for common edge cases.
- Re-run with the new skill context and compare failure rate.
Suggested metric stack
- Tool-call success rate
- Retries per task
- Time-to-completion per workflow
- Human intervention count
- Recurring failure fingerprint count
Opinionated takeaway
The strongest “alpha” is not any single prompt trick — it is a disciplined loop where agent behavior continuously improves from real tool-use traces.
Source post: https://x.com/doodlestein/status/2035233207965122943
Author: Jeffrey Emanuel (@doodlestein)

