Token Budgeting Strategies: Allocating Context by Information Density

James Phoenix

Summary

Token budgeting is the practice of allocating your finite context window to maximize information value per token spent. Like a financial budget, you have limited resources (tokens) and competing demands (system prompts, documentation, code, conversation history). This article provides practical strategies for prioritizing high-information-density content, calculating budget allocations, and measuring the effectiveness of your context spending.

The Problem

Context windows are finite and expensive. A 200K token window sounds large, but fills quickly when you load:

System prompts and instructions (2-10K tokens)
CLAUDE.md files (1-5K tokens)
Code files for context (10-50K tokens)
Conversation history (grows unbounded)
Tool outputs (variable, often large)

Without deliberate budgeting, you face:

Context exhaustion: Running out of tokens mid-task
Signal dilution: Important context buried in noise
Cost explosion: Paying for low-value tokens
Performance degradation: “Lost in the middle” effects

The Solution

Treat your context window as a budget. Allocate tokens based on information density (bits of useful information per token) rather than convenience. High-density content gets priority; low-density content gets compressed, summarized, or excluded.

Information Density: The Core Metric

Not all tokens carry equal information. From information theory:

Information Density = Useful Information (bits) / Tokens Used

Density Rankings by Content Type

Content Type	Tokens	Info Bits	Density	Priority
Type signatures	20	80	4.0	Highest
Test assertions	50	175	3.5	High
Working code examples	100	350	3.5	High
Explicit constraints	30	90	3.0	High
API schemas (Zod/OpenAPI)	200	600	3.0	High
Structured documentation	500	1000	2.0	Medium
Prose explanations	300	450	1.5	Low
Generic comments	100	100	1.0	Lowest
Boilerplate/imports	50	25	0.5	Exclude

Why Types Beat Comments

Types provide machine-verifiable constraints that eliminate invalid programs:

// LOW DENSITY: 15 tokens, ~15 bits of information
// Process the user data and return a result with success status

// HIGH DENSITY: 15 tokens, ~60 bits of information
function processUser(user: User): Promise<Result<ProcessedUser>>

The type signature eliminates 90%+ of invalid implementations. The comment eliminates maybe 10%.

Budget Allocation Framework

Step 1: Define Your Budget Tiers

Allocate your context window into tiers by priority:

interface ContextBudget {
  total: number;           // 200_000 tokens
  tiers: {
    critical: number;      // 20% - Must always load
    important: number;     // 30% - Load when relevant
    supplementary: number; // 30% - Load on demand
    dynamic: number;       // 20% - Conversation + outputs
  };
}

const budget: ContextBudget = {
  total: 200_000,
  tiers: {
    critical: 40_000,      // System prompt, core CLAUDE.md
    important: 60_000,     // Domain context, types, tests
    supplementary: 60_000, // Code files, documentation
    dynamic: 40_000,       // Conversation history, tool outputs
  },
};

Step 2: Assign Content to Tiers

Critical (20%): Always loaded, highest information density

System prompt with core instructions
Root CLAUDE.md (lean, <50 lines)
Active task description
Critical constraints and anti-patterns

Important (30%): Loaded when relevant to current task

Domain-specific CLAUDE.md files
Type definitions for current domain
Relevant test cases
Working examples of the pattern needed

Supplementary (30%): Loaded on-demand

Full code files being modified
Documentation for unfamiliar APIs
Historical context (previous implementations)
Error messages and stack traces

Dynamic (20%): Reserved for conversation and runtime

Conversation history (with compaction)
Tool outputs (file contents, command results)
LLM reasoning and responses

Step 3: Calculate Content Costs

Before loading content, estimate its token cost:

interface ContentItem {
  name: string;
  tokens: number;
  informationBits: number;
  tier: 'critical' | 'important' | 'supplementary';
}

function calculateDensity(item: ContentItem): number {
  return item.informationBits / item.tokens;
}

function shouldInclude(
  item: ContentItem,
  budget: ContextBudget,
  used: number
): boolean {
  const tierBudget = budget.tiers[item.tier];
  const density = calculateDensity(item);

  // High density content gets priority
  if (density >= 3.0) return used + item.tokens <= tierBudget;
  if (density >= 2.0) return used + item.tokens <= tierBudget * 0.8;
  if (density >= 1.0) return used + item.tokens <= tierBudget * 0.5;

  // Low density: include only if space remains
  return used + item.tokens <= tierBudget * 0.3;
}

Practical Budgeting Strategies

Strategy 1: Progressive Loading

Load context in waves, stopping when budget exhausted:

async function loadContext(
  task: Task,
  budget: ContextBudget
): Promise<string> {
  const context: string[] = [];
  let used = 0;

  // Wave 1: Critical (always load)
  const critical = [
    await loadSystemPrompt(),        // ~2,000 tokens
    await loadRootClaudeMd(),        // ~500 tokens
    task.description,                 // ~200 tokens
  ];

  for (const item of critical) {
    context.push(item);
    used += countTokens(item);
  }

  // Wave 2: Important (load if relevant)
  if (used < budget.tiers.critical + budget.tiers.important) {
    const domainContext = await loadDomainContext(task.domain);
    const types = await loadRelevantTypes(task.files);
    const tests = await loadRelatedTests(task.files);

    // Sort by density, load until budget exhausted
    const items = [domainContext, types, tests]
      .sort((a, b) => b.density - a.density);

    for (const item of items) {
      if (used + item.tokens <= budget.tiers.important) {
        context.push(item.content);
        used += item.tokens;
      }
    }
  }

  // Wave 3: Supplementary (load on demand)
  // ... similar pattern

  return context.join('\n\n');
}

Strategy 2: Density-Based Compression

When budget is tight, compress low-density content:

interface CompressionStrategy {
  threshold: number;  // Density below which to compress
  ratio: number;      // Target compression ratio
}

async function compressLowDensity(
  content: string,
  density: number,
  strategy: CompressionStrategy
): Promise<string> {
  if (density >= strategy.threshold) {
    return content; // High density: keep as-is
  }

  // Low density: summarize with fast model
  const summary = await haiku.summarize(content, {
    targetLength: Math.floor(content.length * strategy.ratio),
    preserveCode: true,
    preserveConstraints: true,
  });

  return summary;
}

// Usage
const docContent = await readFile('docs/api-guide.md');
const density = estimateDensity(docContent); // 1.2 bits/token

const compressed = await compressLowDensity(docContent, density, {
  threshold: 2.0,  // Compress if density < 2.0
  ratio: 0.3,      // Target 30% of original size
});

// Result: 5000 tokens → 1500 tokens with key info preserved

Strategy 3: Hierarchical Loading

Load from specific to general, stopping when sufficient:

async function loadHierarchicalContext(
  filePath: string,
  maxTokens: number
): Promise<string[]> {
  const contexts: string[] = [];
  let tokens = 0;

  // Start with most specific context
  const hierarchy = getContextHierarchy(filePath);
  // e.g., ['src/routes/users/CLAUDE.md', 'src/routes/CLAUDE.md',
  //        'src/CLAUDE.md', 'CLAUDE.md']

  for (const contextFile of hierarchy) {
    const content = await readFile(contextFile);
    const fileTokens = countTokens(content);

    if (tokens + fileTokens > maxTokens) {
      // Budget exhausted: summarize remaining
      const remaining = hierarchy.slice(hierarchy.indexOf(contextFile));
      const summaries = await summarizeContextFiles(remaining);
      contexts.push(...summaries);
      break;
    }

    contexts.push(content);
    tokens += fileTokens;
  }

  return contexts;
}

Strategy 4: Dynamic Reallocation

Adjust budgets based on task complexity:

interface TaskComplexity {
  filesAffected: number;
  domainsInvolved: number;
  estimatedSteps: number;
}

function reallocateBudget(
  base: ContextBudget,
  complexity: TaskComplexity
): ContextBudget {
  const adjusted = { ...base };

  // Complex tasks need more dynamic space for conversation
  if (complexity.estimatedSteps > 10) {
    adjusted.tiers.dynamic += 20_000;
    adjusted.tiers.supplementary -= 20_000;
  }

  // Multi-domain tasks need more important context
  if (complexity.domainsInvolved > 2) {
    adjusted.tiers.important += 15_000;
    adjusted.tiers.supplementary -= 15_000;
  }

  // Many files: need more supplementary space
  if (complexity.filesAffected > 5) {
    adjusted.tiers.supplementary += 10_000;
    adjusted.tiers.critical -= 10_000;
  }

  return adjusted;
}

Measuring Budget Effectiveness

Metric 1: Utilization Efficiency

interface UtilizationMetrics {
  totalBudget: number;
  tokensUsed: number;
  tokensWasted: number;  // Low-density content
  utilization: number;   // tokensUsed / totalBudget
  efficiency: number;    // (tokensUsed - tokensWasted) / tokensUsed
}

function measureUtilization(
  content: ContentItem[],
  budget: ContextBudget
): UtilizationMetrics {
  const tokensUsed = content.reduce((sum, c) => sum + c.tokens, 0);
  const tokensWasted = content
    .filter(c => calculateDensity(c) < 1.5)
    .reduce((sum, c) => sum + c.tokens, 0);

  return {
    totalBudget: budget.total,
    tokensUsed,
    tokensWasted,
    utilization: tokensUsed / budget.total,
    efficiency: (tokensUsed - tokensWasted) / tokensUsed,
  };
}

// Target:
// utilization: 60-80% (leave room for dynamic content)
// efficiency: >80% (most tokens carry useful information)

Metric 2: Context Relevance Score

interface RelevanceMetrics {
  totalItems: number;
  relevantItems: number;
  relevanceScore: number;  // relevantItems / totalItems
}

function measureRelevance(
  content: ContentItem[],
  task: Task
): RelevanceMetrics {
  const relevantItems = content.filter(item =>
    isRelevantToTask(item, task)
  );

  return {
    totalItems: content.length,
    relevantItems: relevantItems.length,
    relevanceScore: relevantItems.length / content.length,
  };
}

// Target: relevanceScore > 0.85 (85%+ of loaded content is relevant)

Metric 3: First-Try Success Rate

// Track whether budget allocation led to correct output
interface SuccessMetrics {
  attempts: number;
  firstTrySuccess: number;
  avgIterations: number;
  budgetCorrelation: number;  // Higher budget → better success?
}

// Target:
// firstTrySuccess rate > 70%
// avgIterations < 2

Common Budgeting Anti-Patterns

Anti-Pattern 1: Flat Loading

// ❌ BAD: Load everything without prioritization
const context = [
  systemPrompt,           // 5K tokens
  entireClaudeMd,         // 10K tokens (lots of irrelevant)
  allTypeDefinitions,     // 20K tokens (90% unused)
  allTests,               // 30K tokens (95% irrelevant)
  conversationHistory,    // 50K tokens (needs compaction)
].join('\n');

// Result: 115K tokens, <20% relevant

// ✅ GOOD: Prioritize by density and relevance
const context = await loadPrioritized({
  critical: [systemPrompt, leanClaudeMd],     // 3K tokens
  important: [relevantTypes, relevantTests],  // 5K tokens
  supplementary: [currentFile, relatedFiles], // 10K tokens
  dynamic: [compactedHistory],                // 8K tokens
});

// Result: 26K tokens, >80% relevant

Anti-Pattern 2: No History Compaction

// ❌ BAD: Full conversation history
const history = messages.map(m => m.content).join('\n');
// After 50 turns: 100K+ tokens of repetitive content

// ✅ GOOD: Compact older history
const history = compactHistory(messages, {
  keepRecent: 10,           // Full content for last 10 messages
  summarizeOlder: true,     // Summarize messages 11-50
  dropIrrelevant: true,     // Remove tangential discussions
});
// Result: 15K tokens with all key information

Anti-Pattern 3: Loading Full Files

// ❌ BAD: Load entire files
const context = await Promise.all(
  relevantFiles.map(f => readFile(f))
);
// 10 files × 500 lines avg = 50K tokens

// ✅ GOOD: Load relevant sections
const context = await Promise.all(
  relevantFiles.map(f => loadRelevantSections(f, task))
);
// 10 files × 50 relevant lines = 5K tokens

Anti-Pattern 4: Ignoring Dynamic Budget

// ❌ BAD: Static allocation that doesn't leave room for outputs
const budget = {
  system: 50_000,
  context: 100_000,
  conversation: 50_000,
  // No room for tool outputs!
};

// Tool returns 30K tokens → context truncation

// ✅ GOOD: Reserve dynamic buffer
const budget = {
  system: 30_000,
  context: 70_000,
  conversation: 40_000,
  reserved: 60_000,  // For tool outputs, responses
};

Budget Templates by Use Case

Template 1: Code Review

const codeReviewBudget = {
  critical: {
    systemPrompt: 2_000,
    reviewGuidelines: 1_500,
    codeStandards: 1_000,
  },
  important: {
    diffContent: 10_000,        // The code to review
    relevantTypes: 3_000,
    relatedTests: 2_000,
  },
  supplementary: {
    surroundingCode: 5_000,     // Context around changes
    documentation: 2_000,
  },
  dynamic: {
    conversation: 10_000,
    response: 5_000,
  },
};
// Total: ~42K tokens

Template 2: Feature Implementation

const featureImplementationBudget = {
  critical: {
    systemPrompt: 3_000,
    taskDescription: 500,
    constraints: 1_000,
  },
  important: {
    domainClaudeMd: 2_000,
    typeDefinitions: 5_000,
    existingPatterns: 3_000,    // How similar features work
    tests: 2_000,
  },
  supplementary: {
    existingCode: 15_000,       // Files to modify
    documentation: 3_000,
  },
  dynamic: {
    conversation: 15_000,       // Multi-turn feature work
    toolOutputs: 10_000,
    response: 10_000,
  },
};
// Total: ~70K tokens

Template 3: Bug Investigation

const bugInvestigationBudget = {
  critical: {
    systemPrompt: 2_000,
    bugReport: 500,
    errorMessage: 500,
  },
  important: {
    stackTrace: 1_000,
    relevantCode: 5_000,        // Code around the bug
    relatedTests: 2_000,
    recentChanges: 3_000,       // Git diff of recent changes
  },
  supplementary: {
    logsAndMetrics: 5_000,
    historicalBugs: 2_000,      // Similar past bugs
  },
  dynamic: {
    conversation: 20_000,       // Investigation is iterative
    toolOutputs: 15_000,        // Lots of file reading
    response: 10_000,
  },
};
// Total: ~66K tokens

Implementation: Budget Manager

class ContextBudgetManager {
  private budget: ContextBudget;
  private allocated: Map<string, number> = new Map();

  constructor(budget: ContextBudget) {
    this.budget = budget;
  }

  allocate(tier: keyof ContextBudget['tiers'], item: string, tokens: number): boolean {
    const tierBudget = this.budget.tiers[tier];
    const tierUsed = this.getTierUsage(tier);

    if (tierUsed + tokens > tierBudget) {
      return false; // Over budget
    }

    this.allocated.set(`${tier}:${item}`, tokens);
    return true;
  }

  getTierUsage(tier: keyof ContextBudget['tiers']): number {
    let total = 0;
    for (const [key, tokens] of this.allocated) {
      if (key.startsWith(`${tier}:`)) {
        total += tokens;
      }
    }
    return total;
  }

  getTotalUsage(): number {
    let total = 0;
    for (const tokens of this.allocated.values()) {
      total += tokens;
    }
    return total;
  }

  getRemaining(): number {
    return this.budget.total - this.getTotalUsage();
  }

  canFit(tokens: number): boolean {
    return this.getRemaining() >= tokens;
  }

  getReport(): BudgetReport {
    return {
      total: this.budget.total,
      used: this.getTotalUsage(),
      remaining: this.getRemaining(),
      byTier: {
        critical: {
          budget: this.budget.tiers.critical,
          used: this.getTierUsage('critical'),
        },
        important: {
          budget: this.budget.tiers.important,
          used: this.getTierUsage('important'),
        },
        supplementary: {
          budget: this.budget.tiers.supplementary,
          used: this.getTierUsage('supplementary'),
        },
        dynamic: {
          budget: this.budget.tiers.dynamic,
          used: this.getTierUsage('dynamic'),
        },
      },
    };
  }
}

// Usage
const manager = new ContextBudgetManager(budget);

manager.allocate('critical', 'systemPrompt', 2_000);
manager.allocate('critical', 'claudeMd', 500);
manager.allocate('important', 'types', 3_000);
manager.allocate('important', 'tests', 2_000);

console.log(manager.getReport());
// {
//   total: 200_000,
//   used: 7_500,
//   remaining: 192_500,
//   byTier: {
//     critical: { budget: 40_000, used: 2_500 },
//     important: { budget: 60_000, used: 5_000 },
//     supplementary: { budget: 60_000, used: 0 },
//     dynamic: { budget: 40_000, used: 0 },
//   },
// }

Best Practices

1. Measure Before Optimizing

Don’t guess at token costs. Measure actual usage:

const tokenizer = await loadTokenizer('cl100k_base');
const tokens = tokenizer.encode(content).length;

2. Set Budget Alerts

Warn when approaching limits:

if (manager.getRemaining() < 20_000) {
  console.warn('Context budget low. Consider compacting history.');
}

3. Cache Token Counts

Token counting is expensive. Cache results:

const tokenCache = new Map<string, number>();

function countTokens(content: string): number {
  const hash = hashContent(content);
  if (tokenCache.has(hash)) {
    return tokenCache.get(hash)!;
  }
  const count = tokenizer.encode(content).length;
  tokenCache.set(hash, count);
  return count;
}

4. Use Prompt Caching

Cached content doesn’t count against your per-request budget:

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

// Cached context (stable, load once)
const cachedContext = systemPrompt + claudeMd + baseTypes;

// Dynamic context (per-request budget)
const dynamicContext = taskDescription + relevantCode;

// Effective budget: fullBudget for dynamic content

5. Review and Adjust

Periodically audit your budget allocation:

// Log budget metrics for analysis
logger.info('Budget report', {
  utilization: manager.getTotalUsage() / manager.budget.total,
  byTier: manager.getReport().byTier,
  taskSuccess: task.succeeded,
  iterations: task.iterations,
});

Conclusion

Token budgeting transforms context management from a constraint into a competitive advantage. By allocating tokens based on information density rather than convenience, you:

Maximize signal: High-density content gets priority
Reduce costs: Don’t pay for low-value tokens
Improve accuracy: Relevant context leads to better outputs
Enable scaling: Room for complex, multi-turn tasks

Key Takeaways:

Types > Comments: 4x information density
Tier your budget: Critical (20%), Important (30%), Supplementary (30%), Dynamic (20%)
Compress low-density: Summarize prose, keep code
Reserve dynamic space: Tool outputs need room
Measure and adjust: Track utilization and efficiency

The Result: Context windows that work harder with fewer tokens, producing better outputs at lower cost.

Information Theory for Coding Agents – Mathematical foundations of information density
Progressive Disclosure Context – Load context in layers by priority
Hierarchical Context Patterns – CLAUDE.md organization for efficient loading
Context Rot Auto-Compacting – Automatic context compaction strategies
Prompt Caching Strategy – Cache stable content to preserve budget
Sliding Window History – Bounded state management for conversation history
Model Switching Strategy – Use cheaper models for compression tasks

References

Anthropic Prompt Engineering Guide – Official guidance on context management
Claude Shannon – A Mathematical Theory of Communication – Foundation of information theory
tiktoken – Fast tokenizer for token counting