Give your agent a fake tool. Let it tell you what the real ones should be.
The Pattern
Function-Driven Development (FDD) is a product discovery technique for agent systems. Instead of guessing which tools an agent needs, you give it proxy tools that do nothing except log what the agent asked for. The agent’s reasoning engine becomes your requirements engine.
The core loop:
- Give the agent a stub tool with an open-ended description
- Run it against real scenarios
- Collect every call the agent makes to the stub
- Cluster and rank the calls by frequency and specificity
- Build the real tools in priority order
The agent specs its own tooling. You just read the logs.
Origin
This pattern was demonstrated by the Sonarly team (YC W26) while building an AI agent for production incident investigation. They gave their agent a tool called magic_fetch:
def magic_fetch(description: str) -> str:
"""
Use this when you need any data you don't currently
have access to. Describe exactly what you need and why.
"""
log_tool_call(description)
return "Data retrieved successfully. Continue your reasoning."
Across 50 incident simulations, the agent called it 134 times with highly specific requests like “recent deploys for this service in the last 2 hours” and “feature flags recently toggled for this service.” Each call was effectively a product requirement written by the agent itself.
Beyond Data Fetches
The original insight focused on reads. The agent expressed what data it was missing. But the pattern generalises to every category of tool:
Reads (Data Fetches)
Stub: “Use this when you need data you don’t have.”
Reveals: which integrations to build, which APIs to wire up, which data sources the agent considers relevant to its task.
Writes (Data Mutations)
Stub: “Use this when you want to take an action or change something in the system.”
Reveals: what the agent would do if it had permission. Restart a service? Roll back a deploy? Toggle a feature flag? Page a specific team? This is how you discover the agent’s ideal action space without letting it touch anything real.
Delegation (Sub-Agent Spawning)
Stub: “Use this when you need another specialist to handle a subtask.”
Reveals: where the agent wants to decompose work. It will describe the specialist it needs and the task to delegate. This maps directly to your sub-agent architecture.
Communication (Notifications and Escalation)
Stub: “Use this when you need to inform or ask a human something.”
Reveals: the agent’s escalation logic. When does it want help? What does it consider ambiguous enough to flag? This tells you where to put human-in-the-loop checkpoints.
Why This Works
LLMs are trained on massive amounts of documentation about how systems work, how incidents get investigated, how deploys get rolled back. When you give them an open-ended tool, they draw on that latent knowledge to express intent that is often more specific and well-reasoned than what a product manager would spec.
The agent is a proxy for your best domain expert. It just needs a blank canvas to express what it knows.
Key properties that make it effective:
- Zero implementation cost. The stub is 3 lines of code. You can run the experiment in an afternoon.
- Ground-truth prioritisation. Call frequency across many scenarios is a direct signal of tool importance. No guessing.
- Specificity for free. The agent doesn’t say “I need monitoring data.” It says “CPU metrics for the upstream auth-service over the last 30 minutes.” That level of specificity is your API contract.
- Safe exploration of mutations. You discover what the agent would write/delete/modify without it actually doing anything. This is critical for high-stakes domains.
Running the Experiment
Step 1: Define your stub categories
At minimum, create two stubs:
def magic_read(description: str) -> str:
"""Use when you need data you don't have access to.
Describe exactly what you need and why."""
log_tool_call("READ", description)
return "Data retrieved successfully. Continue your reasoning."
def magic_act(description: str) -> str:
"""Use when you want to take an action or change
something. Describe the action and expected outcome."""
log_tool_call("WRITE", description)
return "Action completed successfully. Continue your reasoning."
Step 2: Run against real scenarios
Use production incidents, support tickets, or realistic simulations. Volume matters. 30-50 scenarios gives you a solid distribution.
Step 3: Cluster the logs
Group calls by semantic similarity. You will see natural clusters emerge:
READS:
deploy-history (23 calls) -> Build Kubernetes/CD integration
metrics (19 calls) -> Build Datadog/Prometheus integration
feature-flags (12 calls) -> Build LaunchDarkly integration
WRITES:
rollback (15 calls) -> Build deploy rollback capability
page-team (11 calls) -> Build PagerDuty integration
toggle-flag (8 calls) -> Build feature flag toggle capability
Step 4: Build in priority order
The clusters ranked by frequency are your integration roadmap. Build the top 3. Re-run the experiment. The agent will start using the real tools and the remaining stubs will shift to reveal the next tier of priorities.
FDD vs Traditional Product Discovery
| Dimension | Traditional | Function-Driven |
|---|---|---|
| Requirements source | PM interviews, user research | Agent behaviour under simulation |
| Specificity | “We need monitoring integration” | “CPU metrics for upstream auth-service, last 30 min” |
| Prioritisation | Gut feel, stakeholder politics | Call frequency across N scenarios |
| Mutation discovery | Requires careful security review upfront | Safely observed via stubs, no real side effects |
| Time to roadmap | Weeks | An afternoon |
When to Use This
- You are building an agent system and don’t know which tools to prioritise
- You have too many possible integrations and need to rank them
- You want to understand what actions an agent would take before granting it real permissions
- You are designing a human-in-the-loop boundary and need to know where the agent wants autonomy
Related
- Zero-Cost Knowledge Extraction – FDD is zero-cost extraction applied to agent behaviour
- Agent-Driven Development
- AI-Native Principles
- Context Engineering
- Own Your Control Plane