Swallow all test/build/lint output and replace it with a single
✓if the stage passes. IfexitCode != 0, dump the stashed output.
The Problem

When tests pass, developers waste 2-3% of context window conveying results that require fewer than 10 tokens to communicate.
Stay in the Smart Zone
Claude models perform optimally within approximately 75k tokens. Beyond this, you enter the “dumb zone” where:
- Agents miss obvious errors
- Instructions get ignored
- Human intervention costs 10x more than token savings
Human time managing agents in a “dumb zone” costs roughly 10x more than token economy concerns.
The Fix: run_silent()

run_silent() {
local description="$1"
local command="$2"
local tmp_file=$(mktemp)
if eval "$command" > "$tmp_file" 2>&1; then
printf " ✓ %s\n" "$description"
rm -f "$tmp_file"
return 0
else
local exit_code=$?
printf " ✗ %s\n" "$description"
cat "$tmp_file"
rm -f "$tmp_file"
return $exit_code
fi
}
# Usage
run_silent "Auth tests" "pytest tests/auth/"
run_silent "Utils tests" "pytest tests/utils/"
run_silent "API tests" "pytest tests/api/"
Output on success:
✓ Auth tests
✓ Utils tests
✓ API tests
Output on failure:
✓ Auth tests
✓ Utils tests
✗ API tests
FAIL src/api/users.test.ts
● should validate email format
Expected: true
Received: false

Progressive Refinement
Enable failFast
Process one failure at a time instead of dumping all errors:
# Python
pytest -x tests/
# JavaScript
jest --bail
# Go
go test -failfast ./...
# Vitest
vitest run --bail 1
Filter Irrelevant Output
Strip noise using standard Unix tools:
run_silent_filtered() {
local description="$1"
local command="$2"
local filter="${3:-cat}" # Default: no filter
local tmp_file=$(mktemp)
if eval "$command" 2>&1 | eval "$filter" > "$tmp_file"; then
printf " ✓ %s\n" "$description"
rm -f "$tmp_file"
return 0
else
local exit_code=$?
printf " ✗ %s\n" "$description"
cat "$tmp_file"
rm -f "$tmp_file"
return $exit_code
fi
}
# Filter out timing info and stack frames from node_modules
run_silent_filtered "API tests" \
"jest tests/api" \
"grep -v 'node_modules' | grep -v 'Time:'"
Framework-Specific Parsing
Extract only relevant info from test output:
# Pytest: show only test counts on success
pytest_summary() {
pytest "$@" 2>&1 | tail -1 | grep -E "passed|failed|error"
}
# Jest: extract failure details
jest_failures() {
jest "$@" 2>&1 | grep -A 5 "FAIL\|●"
}
# Go: show only failures
go_test_failures() {
go test "$@" 2>&1 | grep -E "FAIL|---"
}
Anti-Patterns: What Models Do Wrong
Output Swallowing

# Bad: Model pipes to /dev/null then describes
npm test > /dev/null 2>&1 && echo "Tests passed" || echo "Tests failed"
# Model then writes: "I ran the tests and they passed. All 47 test suites..."
# This uses MORE tokens than just showing the output!
Piping to head/tail
# Bad: Model truncates to save tokens
npm test 2>&1 | head -n 50
# Problem: Agent may need to re-run 5-minute test suite
# when the relevant error was at line 51
Conservative model behavior burns more tokens, human time, and cognitive energy.
Full Implementation
A complete test runner with backpressure:
#!/bin/bash
# scripts/test.sh - Context-efficient test runner
set -e
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m'
run_silent() {
local description="$1"
local command="$2"
local tmp_file=$(mktemp)
if eval "$command" > "$tmp_file" 2>&1; then
printf "${GREEN}✓${NC} %s\n" "$description"
rm -f "$tmp_file"
return 0
else
local exit_code=$?
printf "${RED}✗${NC} %s\n" "$description"
echo ""
cat "$tmp_file"
rm -f "$tmp_file"
return $exit_code
fi
}
echo "Running test suite..."
echo ""
run_silent "Type check" "tsc --noEmit"
run_silent "Lint" "eslint src/ --max-warnings 0"
run_silent "Unit tests" "jest --bail"
run_silent "Integration tests" "jest --config jest.integration.config.js --bail"
echo ""
echo "All checks passed!"
TypeScript Version
For programmatic use:
import { spawn } from "child_process";
interface RunResult {
success: boolean;
output: string;
exitCode: number;
}
async function runSilent(
description: string,
command: string,
args: string[]
): Promise<RunResult> {
return new Promise((resolve) => {
const chunks: Buffer[] = [];
const proc = spawn(command, args, { shell: true });
proc.stdout.on("data", (data) => chunks.push(data));
proc.stderr.on("data", (data) => chunks.push(data));
proc.on("close", (exitCode) => {
const output = Buffer.concat(chunks).toString();
if (exitCode === 0) {
console.log(` ✓ ${description}`);
resolve({ success: true, output: "", exitCode: 0 });
} else {
console.log(` ✗ ${description}`);
console.log(output);
resolve({ success: false, output, exitCode: exitCode ?? 1 });
}
});
});
}
// Usage in agent tooling
async function runTests(): Promise<string> {
const results: RunResult[] = [];
results.push(await runSilent("Type check", "npx", ["tsc", "--noEmit"]));
results.push(await runSilent("Lint", "npx", ["eslint", "src/"]));
results.push(await runSilent("Tests", "npx", ["jest", "--bail"]));
const failed = results.filter((r) => !r.success);
if (failed.length === 0) {
return "All checks passed ✓";
}
return failed.map((r) => r.output).join("\n");
}
Key Principle
If you already know what matters, don’t leave it to a model to churn through 1000s of junk tokens to decide.
Deterministic output beats non-deterministic parsing.
Related
- 12 Factor Agents – Factor 9: Compact Errors into Context
- Writing a Good CLAUDE.md
- Infrastructure Principles

