Summary
Token budgeting is the practice of allocating your finite context window to maximize information value per token spent. Like a financial budget, you have limited resources (tokens) and competing demands (system prompts, documentation, code, conversation history). This article provides practical strategies for prioritizing high-information-density content, calculating budget allocations, and measuring the effectiveness of your context spending.
The Problem
Context windows are finite and expensive. A 200K token window sounds large, but fills quickly when you load:
- System prompts and instructions (2-10K tokens)
- CLAUDE.md files (1-5K tokens)
- Code files for context (10-50K tokens)
- Conversation history (grows unbounded)
- Tool outputs (variable, often large)
Without deliberate budgeting, you face:
- Context exhaustion: Running out of tokens mid-task
- Signal dilution: Important context buried in noise
- Cost explosion: Paying for low-value tokens
- Performance degradation: “Lost in the middle” effects
The Solution
Treat your context window as a budget. Allocate tokens based on information density (bits of useful information per token) rather than convenience. High-density content gets priority; low-density content gets compressed, summarized, or excluded.
Information Density: The Core Metric
Not all tokens carry equal information. From information theory:
Information Density = Useful Information (bits) / Tokens Used
Density Rankings by Content Type
| Content Type | Tokens | Info Bits | Density | Priority |
|---|---|---|---|---|
| Type signatures | 20 | 80 | 4.0 | Highest |
| Test assertions | 50 | 175 | 3.5 | High |
| Working code examples | 100 | 350 | 3.5 | High |
| Explicit constraints | 30 | 90 | 3.0 | High |
| API schemas (Zod/OpenAPI) | 200 | 600 | 3.0 | High |
| Structured documentation | 500 | 1000 | 2.0 | Medium |
| Prose explanations | 300 | 450 | 1.5 | Low |
| Generic comments | 100 | 100 | 1.0 | Lowest |
| Boilerplate/imports | 50 | 25 | 0.5 | Exclude |
Why Types Beat Comments
Types provide machine-verifiable constraints that eliminate invalid programs:
// LOW DENSITY: 15 tokens, ~15 bits of information
// Process the user data and return a result with success status
// HIGH DENSITY: 15 tokens, ~60 bits of information
function processUser(user: User): Promise<Result<ProcessedUser>>
The type signature eliminates 90%+ of invalid implementations. The comment eliminates maybe 10%.
Budget Allocation Framework
Step 1: Define Your Budget Tiers
Allocate your context window into tiers by priority:
interface ContextBudget {
total: number; // 200_000 tokens
tiers: {
critical: number; // 20% - Must always load
important: number; // 30% - Load when relevant
supplementary: number; // 30% - Load on demand
dynamic: number; // 20% - Conversation + outputs
};
}
const budget: ContextBudget = {
total: 200_000,
tiers: {
critical: 40_000, // System prompt, core CLAUDE.md
important: 60_000, // Domain context, types, tests
supplementary: 60_000, // Code files, documentation
dynamic: 40_000, // Conversation history, tool outputs
},
};
Step 2: Assign Content to Tiers
Critical (20%): Always loaded, highest information density
- System prompt with core instructions
- Root CLAUDE.md (lean, <50 lines)
- Active task description
- Critical constraints and anti-patterns
Important (30%): Loaded when relevant to current task
- Domain-specific CLAUDE.md files
- Type definitions for current domain
- Relevant test cases
- Working examples of the pattern needed
Supplementary (30%): Loaded on-demand
- Full code files being modified
- Documentation for unfamiliar APIs
- Historical context (previous implementations)
- Error messages and stack traces
Dynamic (20%): Reserved for conversation and runtime
- Conversation history (with compaction)
- Tool outputs (file contents, command results)
- LLM reasoning and responses
Step 3: Calculate Content Costs
Before loading content, estimate its token cost:
interface ContentItem {
name: string;
tokens: number;
informationBits: number;
tier: 'critical' | 'important' | 'supplementary';
}
function calculateDensity(item: ContentItem): number {
return item.informationBits / item.tokens;
}
function shouldInclude(
item: ContentItem,
budget: ContextBudget,
used: number
): boolean {
const tierBudget = budget.tiers[item.tier];
const density = calculateDensity(item);
// High density content gets priority
if (density >= 3.0) return used + item.tokens <= tierBudget;
if (density >= 2.0) return used + item.tokens <= tierBudget * 0.8;
if (density >= 1.0) return used + item.tokens <= tierBudget * 0.5;
// Low density: include only if space remains
return used + item.tokens <= tierBudget * 0.3;
}
Practical Budgeting Strategies
Strategy 1: Progressive Loading
Load context in waves, stopping when budget exhausted:
async function loadContext(
task: Task,
budget: ContextBudget
): Promise<string> {
const context: string[] = [];
let used = 0;
// Wave 1: Critical (always load)
const critical = [
await loadSystemPrompt(), // ~2,000 tokens
await loadRootClaudeMd(), // ~500 tokens
task.description, // ~200 tokens
];
for (const item of critical) {
context.push(item);
used += countTokens(item);
}
// Wave 2: Important (load if relevant)
if (used < budget.tiers.critical + budget.tiers.important) {
const domainContext = await loadDomainContext(task.domain);
const types = await loadRelevantTypes(task.files);
const tests = await loadRelatedTests(task.files);
// Sort by density, load until budget exhausted
const items = [domainContext, types, tests]
.sort((a, b) => b.density - a.density);
for (const item of items) {
if (used + item.tokens <= budget.tiers.important) {
context.push(item.content);
used += item.tokens;
}
}
}
// Wave 3: Supplementary (load on demand)
// ... similar pattern
return context.join('\n\n');
}
Strategy 2: Density-Based Compression
When budget is tight, compress low-density content:
interface CompressionStrategy {
threshold: number; // Density below which to compress
ratio: number; // Target compression ratio
}
async function compressLowDensity(
content: string,
density: number,
strategy: CompressionStrategy
): Promise<string> {
if (density >= strategy.threshold) {
return content; // High density: keep as-is
}
// Low density: summarize with fast model
const summary = await haiku.summarize(content, {
targetLength: Math.floor(content.length * strategy.ratio),
preserveCode: true,
preserveConstraints: true,
});
return summary;
}
// Usage
const docContent = await readFile('docs/api-guide.md');
const density = estimateDensity(docContent); // 1.2 bits/token
const compressed = await compressLowDensity(docContent, density, {
threshold: 2.0, // Compress if density < 2.0
ratio: 0.3, // Target 30% of original size
});
// Result: 5000 tokens → 1500 tokens with key info preserved
Strategy 3: Hierarchical Loading
Load from specific to general, stopping when sufficient:
async function loadHierarchicalContext(
filePath: string,
maxTokens: number
): Promise<string[]> {
const contexts: string[] = [];
let tokens = 0;
// Start with most specific context
const hierarchy = getContextHierarchy(filePath);
// e.g., ['src/routes/users/CLAUDE.md', 'src/routes/CLAUDE.md',
// 'src/CLAUDE.md', 'CLAUDE.md']
for (const contextFile of hierarchy) {
const content = await readFile(contextFile);
const fileTokens = countTokens(content);
if (tokens + fileTokens > maxTokens) {
// Budget exhausted: summarize remaining
const remaining = hierarchy.slice(hierarchy.indexOf(contextFile));
const summaries = await summarizeContextFiles(remaining);
contexts.push(...summaries);
break;
}
contexts.push(content);
tokens += fileTokens;
}
return contexts;
}
Strategy 4: Dynamic Reallocation
Adjust budgets based on task complexity:
interface TaskComplexity {
filesAffected: number;
domainsInvolved: number;
estimatedSteps: number;
}
function reallocateBudget(
base: ContextBudget,
complexity: TaskComplexity
): ContextBudget {
const adjusted = { ...base };
// Complex tasks need more dynamic space for conversation
if (complexity.estimatedSteps > 10) {
adjusted.tiers.dynamic += 20_000;
adjusted.tiers.supplementary -= 20_000;
}
// Multi-domain tasks need more important context
if (complexity.domainsInvolved > 2) {
adjusted.tiers.important += 15_000;
adjusted.tiers.supplementary -= 15_000;
}
// Many files: need more supplementary space
if (complexity.filesAffected > 5) {
adjusted.tiers.supplementary += 10_000;
adjusted.tiers.critical -= 10_000;
}
return adjusted;
}
Measuring Budget Effectiveness
Metric 1: Utilization Efficiency
interface UtilizationMetrics {
totalBudget: number;
tokensUsed: number;
tokensWasted: number; // Low-density content
utilization: number; // tokensUsed / totalBudget
efficiency: number; // (tokensUsed - tokensWasted) / tokensUsed
}
function measureUtilization(
content: ContentItem[],
budget: ContextBudget
): UtilizationMetrics {
const tokensUsed = content.reduce((sum, c) => sum + c.tokens, 0);
const tokensWasted = content
.filter(c => calculateDensity(c) < 1.5)
.reduce((sum, c) => sum + c.tokens, 0);
return {
totalBudget: budget.total,
tokensUsed,
tokensWasted,
utilization: tokensUsed / budget.total,
efficiency: (tokensUsed - tokensWasted) / tokensUsed,
};
}
// Target:
// utilization: 60-80% (leave room for dynamic content)
// efficiency: >80% (most tokens carry useful information)
Metric 2: Context Relevance Score
interface RelevanceMetrics {
totalItems: number;
relevantItems: number;
relevanceScore: number; // relevantItems / totalItems
}
function measureRelevance(
content: ContentItem[],
task: Task
): RelevanceMetrics {
const relevantItems = content.filter(item =>
isRelevantToTask(item, task)
);
return {
totalItems: content.length,
relevantItems: relevantItems.length,
relevanceScore: relevantItems.length / content.length,
};
}
// Target: relevanceScore > 0.85 (85%+ of loaded content is relevant)
Metric 3: First-Try Success Rate
// Track whether budget allocation led to correct output
interface SuccessMetrics {
attempts: number;
firstTrySuccess: number;
avgIterations: number;
budgetCorrelation: number; // Higher budget → better success?
}
// Target:
// firstTrySuccess rate > 70%
// avgIterations < 2
Common Budgeting Anti-Patterns
Anti-Pattern 1: Flat Loading
// ❌ BAD: Load everything without prioritization
const context = [
systemPrompt, // 5K tokens
entireClaudeMd, // 10K tokens (lots of irrelevant)
allTypeDefinitions, // 20K tokens (90% unused)
allTests, // 30K tokens (95% irrelevant)
conversationHistory, // 50K tokens (needs compaction)
].join('\n');
// Result: 115K tokens, <20% relevant
// ✅ GOOD: Prioritize by density and relevance
const context = await loadPrioritized({
critical: [systemPrompt, leanClaudeMd], // 3K tokens
important: [relevantTypes, relevantTests], // 5K tokens
supplementary: [currentFile, relatedFiles], // 10K tokens
dynamic: [compactedHistory], // 8K tokens
});
// Result: 26K tokens, >80% relevant
Anti-Pattern 2: No History Compaction
// ❌ BAD: Full conversation history
const history = messages.map(m => m.content).join('\n');
// After 50 turns: 100K+ tokens of repetitive content
// ✅ GOOD: Compact older history
const history = compactHistory(messages, {
keepRecent: 10, // Full content for last 10 messages
summarizeOlder: true, // Summarize messages 11-50
dropIrrelevant: true, // Remove tangential discussions
});
// Result: 15K tokens with all key information
Anti-Pattern 3: Loading Full Files
// ❌ BAD: Load entire files
const context = await Promise.all(
relevantFiles.map(f => readFile(f))
);
// 10 files × 500 lines avg = 50K tokens
// ✅ GOOD: Load relevant sections
const context = await Promise.all(
relevantFiles.map(f => loadRelevantSections(f, task))
);
// 10 files × 50 relevant lines = 5K tokens
Anti-Pattern 4: Ignoring Dynamic Budget
// ❌ BAD: Static allocation that doesn't leave room for outputs
const budget = {
system: 50_000,
context: 100_000,
conversation: 50_000,
// No room for tool outputs!
};
// Tool returns 30K tokens → context truncation
// ✅ GOOD: Reserve dynamic buffer
const budget = {
system: 30_000,
context: 70_000,
conversation: 40_000,
reserved: 60_000, // For tool outputs, responses
};
Budget Templates by Use Case
Template 1: Code Review
const codeReviewBudget = {
critical: {
systemPrompt: 2_000,
reviewGuidelines: 1_500,
codeStandards: 1_000,
},
important: {
diffContent: 10_000, // The code to review
relevantTypes: 3_000,
relatedTests: 2_000,
},
supplementary: {
surroundingCode: 5_000, // Context around changes
documentation: 2_000,
},
dynamic: {
conversation: 10_000,
response: 5_000,
},
};
// Total: ~42K tokens
Template 2: Feature Implementation
const featureImplementationBudget = {
critical: {
systemPrompt: 3_000,
taskDescription: 500,
constraints: 1_000,
},
important: {
domainClaudeMd: 2_000,
typeDefinitions: 5_000,
existingPatterns: 3_000, // How similar features work
tests: 2_000,
},
supplementary: {
existingCode: 15_000, // Files to modify
documentation: 3_000,
},
dynamic: {
conversation: 15_000, // Multi-turn feature work
toolOutputs: 10_000,
response: 10_000,
},
};
// Total: ~70K tokens
Template 3: Bug Investigation
const bugInvestigationBudget = {
critical: {
systemPrompt: 2_000,
bugReport: 500,
errorMessage: 500,
},
important: {
stackTrace: 1_000,
relevantCode: 5_000, // Code around the bug
relatedTests: 2_000,
recentChanges: 3_000, // Git diff of recent changes
},
supplementary: {
logsAndMetrics: 5_000,
historicalBugs: 2_000, // Similar past bugs
},
dynamic: {
conversation: 20_000, // Investigation is iterative
toolOutputs: 15_000, // Lots of file reading
response: 10_000,
},
};
// Total: ~66K tokens
Implementation: Budget Manager
class ContextBudgetManager {
private budget: ContextBudget;
private allocated: Map<string, number> = new Map();
constructor(budget: ContextBudget) {
this.budget = budget;
}
allocate(tier: keyof ContextBudget['tiers'], item: string, tokens: number): boolean {
const tierBudget = this.budget.tiers[tier];
const tierUsed = this.getTierUsage(tier);
if (tierUsed + tokens > tierBudget) {
return false; // Over budget
}
this.allocated.set(`${tier}:${item}`, tokens);
return true;
}
getTierUsage(tier: keyof ContextBudget['tiers']): number {
let total = 0;
for (const [key, tokens] of this.allocated) {
if (key.startsWith(`${tier}:`)) {
total += tokens;
}
}
return total;
}
getTotalUsage(): number {
let total = 0;
for (const tokens of this.allocated.values()) {
total += tokens;
}
return total;
}
getRemaining(): number {
return this.budget.total - this.getTotalUsage();
}
canFit(tokens: number): boolean {
return this.getRemaining() >= tokens;
}
getReport(): BudgetReport {
return {
total: this.budget.total,
used: this.getTotalUsage(),
remaining: this.getRemaining(),
byTier: {
critical: {
budget: this.budget.tiers.critical,
used: this.getTierUsage('critical'),
},
important: {
budget: this.budget.tiers.important,
used: this.getTierUsage('important'),
},
supplementary: {
budget: this.budget.tiers.supplementary,
used: this.getTierUsage('supplementary'),
},
dynamic: {
budget: this.budget.tiers.dynamic,
used: this.getTierUsage('dynamic'),
},
},
};
}
}
// Usage
const manager = new ContextBudgetManager(budget);
manager.allocate('critical', 'systemPrompt', 2_000);
manager.allocate('critical', 'claudeMd', 500);
manager.allocate('important', 'types', 3_000);
manager.allocate('important', 'tests', 2_000);
console.log(manager.getReport());
// {
// total: 200_000,
// used: 7_500,
// remaining: 192_500,
// byTier: {
// critical: { budget: 40_000, used: 2_500 },
// important: { budget: 60_000, used: 5_000 },
// supplementary: { budget: 60_000, used: 0 },
// dynamic: { budget: 40_000, used: 0 },
// },
// }
Best Practices
1. Measure Before Optimizing
Don’t guess at token costs. Measure actual usage:
const tokenizer = await loadTokenizer('cl100k_base');
const tokens = tokenizer.encode(content).length;
2. Set Budget Alerts
Warn when approaching limits:
if (manager.getRemaining() < 20_000) {
console.warn('Context budget low. Consider compacting history.');
}
3. Cache Token Counts
Token counting is expensive. Cache results:
const tokenCache = new Map<string, number>();
function countTokens(content: string): number {
const hash = hashContent(content);
if (tokenCache.has(hash)) {
return tokenCache.get(hash)!;
}
const count = tokenizer.encode(content).length;
tokenCache.set(hash, count);
return count;
}
4. Use Prompt Caching
Cached content doesn’t count against your per-request budget:
// Cached context (stable, load once)
const cachedContext = systemPrompt + claudeMd + baseTypes;
// Dynamic context (per-request budget)
const dynamicContext = taskDescription + relevantCode;
// Effective budget: fullBudget for dynamic content
5. Review and Adjust
Periodically audit your budget allocation:
// Log budget metrics for analysis
logger.info('Budget report', {
utilization: manager.getTotalUsage() / manager.budget.total,
byTier: manager.getReport().byTier,
taskSuccess: task.succeeded,
iterations: task.iterations,
});
Conclusion
Token budgeting transforms context management from a constraint into a competitive advantage. By allocating tokens based on information density rather than convenience, you:
- Maximize signal: High-density content gets priority
- Reduce costs: Don’t pay for low-value tokens
- Improve accuracy: Relevant context leads to better outputs
- Enable scaling: Room for complex, multi-turn tasks
Key Takeaways:
- Types > Comments: 4x information density
- Tier your budget: Critical (20%), Important (30%), Supplementary (30%), Dynamic (20%)
- Compress low-density: Summarize prose, keep code
- Reserve dynamic space: Tool outputs need room
- Measure and adjust: Track utilization and efficiency
The Result: Context windows that work harder with fewer tokens, producing better outputs at lower cost.
Related
- Information Theory for Coding Agents – Mathematical foundations of information density
- Progressive Disclosure Context – Load context in layers by priority
- Hierarchical Context Patterns – CLAUDE.md organization for efficient loading
- Context Rot Auto-Compacting – Automatic context compaction strategies
- Prompt Caching Strategy – Cache stable content to preserve budget
- Sliding Window History – Bounded state management for conversation history
- Model Switching Strategy – Use cheaper models for compression tasks
References
- Anthropic Prompt Engineering Guide – Official guidance on context management
- Claude Shannon – A Mathematical Theory of Communication – Foundation of information theory
- tiktoken – Fast tokenizer for token counting

