Context Rot Prevention: Auto-Compacting for Long AI Sessions

James Phoenix
James Phoenix

The Problem: Context Rot in Long Sessions

When you work with AI coding agents over extended sessions, context accumulates like sediment:

  • Message 1-20: Fresh, relevant context about current work
  • Message 21-50: Mix of current work + completed tasks + debugging steps
  • Message 51-100: Current work buried under mountain of historical context
  • Message 100+: AI starts referencing deleted code, old decisions, non-existent files

This is context rot: the gradual degradation of output quality as stale information drowns out current state.

Symptoms of Context Rot

You know you have context rot when the AI:

  1. References outdated code: “Using the Redis cache we set up earlier…” (you deleted it 50 messages ago)
  2. Suggests old architecture: “Following the microservices pattern…” (you switched to monolith)
  3. Confuses state: “The auth system uses JWT tokens” (it was migrated to sessions)
  4. Hallucinates files: “Let me update old-service.ts” (file never existed)
  5. Decreases accuracy: Early messages are spot-on, later messages drift

Why This Happens

LLMs process context linearly. As conversations grow:

Messages 1-20:   Signal-to-noise ratio = 90% (mostly relevant)
Messages 21-50:  Signal-to-noise ratio = 60% (some obsolete info)
Messages 51-100: Signal-to-noise ratio = 30% (lots of stale context)
Messages 100+:   Signal-to-noise ratio = 10% (buried in history)

The AI can’t distinguish between:

  • Current state: What the code actually looks like now
  • Historical state: What it looked like 50 messages ago
  • Intermediate steps: Debugging attempts that were later abandoned

Everything gets equal weight, causing confusion.

Real-World Impact

Example: 150-message authentication refactor session

Message 10:  "Implement JWT auth" - Done
Message 30:  "Add password reset" - Done
Message 60:  "Migrate to Supabase" - Done (deleted JWT code)
Message 90:  "Add rate limiting" - Done
Message 120: "Implement 2FA"

AI generates:
import { verifyJWT } from './jwt-utils'; // File deleted at message 60!

The AI references code deleted 60 messages ago because that context is still in memory.

The Solution: Auto-Compacting

Auto-compacting periodically summarizes completed work and removes obsolete information, keeping context focused on current state.

How Claude Code Handles This Automatically

Claude Code has built-in auto-compacting:

  1. Monitors context size: Tracks message count and token usage
  2. Triggers compacting: When context grows too large (~100K tokens)
  3. Summarizes history: Compresses completed work into concise summary
  4. Preserves key decisions: Keeps architectural choices, current state
  5. Removes noise: Deletes intermediate debugging steps, obsolete code references

Result: Context shrinks 70-90% while retaining all important information.

BEFORE compacting:
- 150 messages
- 100K tokens
- References to deleted code
- Confusion about current architecture

AFTER compacting:
- 10 messages (summary + recent work)
- 15K tokens
- Clear current state
- No outdated references

Manual Compacting via Task List Recursion

You can manually trigger compacting using task lists:

Step 1: Track Work with Task Lists

Completed: Implement user authentication
Completed: Add email validation
Completed: Create password reset flow
Completed: Add rate limiting
In Progress: Write integration tests (current focus)
Pending: Add 2FA support
Pending: Implement OAuth

Step 2: Compact Completed Tasks

When you have 5-10 completed tasks, ask the AI:

"Summarize all completed work:
1. What features were implemented?
2. What architectural decisions were made?
3. What's the current state?
4. What's still pending?

Output: Compact summary for context"

Step 3: Replace Verbose History with Summary

OLD CONTEXT (verbose, 10K tokens):

[100+ messages about implementing auth:
 - "Let's try JWT" -> "Actually, let's use sessions" -> "Wait, use Supabase"
 - 50 debugging attempts
 - Multiple refactors
 - Code that was deleted]

NEW CONTEXT (compact, 500 tokens):

## Authentication System - Completed

**Implementation**:
- Supabase JWT-based auth
- Email validation (regex + DNS check)
- Password reset (email tokens, 30min expiry)
- Rate limiting (10 attempts/min per IP)
- All routes protected with middleware

**Tests**: 95% coverage, all passing

**Current Focus**: Writing integration tests

**Pending**: 2FA support, OAuth integration

Step 4: Continue with Fresh Context

Now the AI has:

  • Clear current state
  • Key architectural decisions
  • What’s done vs. pending
  • No outdated code references
  • No debugging noise

Context reduced by 95% while preserving all important information.

Implementation Strategies

Strategy 1: Spec-Driven Development with Compacting

Use specifications as compacting boundaries:

# Phase 1: Define Spec

Feature: User Profile Management
Requirements:
- CRUD operations for user profiles
- Image upload with S3 storage
- Privacy settings
- Activity history

# Phase 2: Break into Tasks

Completed: 1.1 Create user_profiles table
Completed: 1.2 Add profile CRUD endpoints
Completed: 1.3 Implement S3 image upload
In Progress: 1.4 Add privacy settings
Pending: 1.5 Build activity history
Pending: 1.6 Write integration tests

# Phase 3: Compact Completed Subtasks

"Tasks 1.1-1.3 completed. Compact into summary:

Completed: User Profiles (Phase 1):
   - DB: user_profiles table with RLS policies
   - API: Full CRUD at /api/v1/profiles
   - Storage: S3 integration for profile images
   - Tests: Unit tests passing

Current: Adding privacy settings (Task 1.4)"

# Phase 4: Continue with Compacted Context

Context is now 80% smaller, AI focuses on current work.

Strategy 2: Recursive Compacting (Multi-Level)

Apply compacting at multiple granularities:

Level 1: Task Completion
   Task completed -> Compact into 1-2 sentences

Level 2: Feature Completion
   All tasks for feature completed -> Compact into paragraph

Level 3: Sprint/Milestone Completion
   Multiple features completed -> Compact into DIGEST.md

Level 4: Major Version
   Entire version completed -> Archive, keep only summary

Example: Recursive Compacting

## Level 1: Individual Tasks (10 messages each)

Task 1.1: "Created user_profiles table with id, email, name, avatar_url, created_at"
Task 1.2: "Added CRUD endpoints at /api/v1/profiles with RLS policies"
Task 1.3: "Implemented S3 upload for avatars, max 5MB, JPG/PNG only"

## Level 2: Feature Summary (compacts 3 tasks, 30 messages -> 50 words)

"User Profiles MVP: Full CRUD with S3 avatar uploads. Database schema includes RLS. API endpoints follow RESTful conventions. Image uploads validated for size/format."

## Level 3: Sprint Summary (compacts 10 features, 300 messages -> 200 words)

"Sprint 3 Completed: User management system with profiles, auth, permissions. Supabase integration for DB/auth. S3 for file storage. All features tested, 90% coverage."

## Level 4: Version Archive (compacts entire version -> CHANGELOG.md)

"v1.0.0: Initial release with user management, content system, API layer."

Each level compresses by ~90%, creating exponential context savings.

Strategy 3: Boundary-Based Compacting

Compact at natural boundaries:

- After completing 5-10 tasks
- After finishing a feature
- When switching contexts (e.g., different package)
- When AI starts referencing old/deleted code
- When conversation exceeds ~100 messages
- Before starting a new major feature
- After merging a PR (git commit boundary)

Compacting Prompt Patterns

Pattern 1: Simple Summary

"We've completed many tasks. Let's compact the context:

1. Review all completed tasks
2. Summarize what was accomplished
3. Note key architectural decisions
4. Document current state
5. List what's still pending
6. Output a compact summary (max 500 words)

After this, we'll continue with fresh, focused context."

Pattern 2: Structured Summary

"Create a compact summary with this structure:

## Completed Work
- Feature X: [implementation details]
- Feature Y: [implementation details]

## Architectural Decisions
- Decision 1: [rationale]
- Decision 2: [rationale]

## Current State
- What's working: [summary]
- What's tested: [coverage]
- What's deployed: [env]

## Pending Work
- Task A: [description]
- Task B: [description]

Keep it under 500 words."

Pattern 3: Migration Summary

"We migrated from X to Y. Summarize:

## Before
- Architecture: [old approach]
- Challenges: [what didn't work]

## After
- Architecture: [new approach]
- Benefits: [improvements]

## Migration Steps
1. [step]
2. [step]

## Current State
- All code migrated: Yes/No
- All tests passing: Yes/No
- Cleanup completed: Yes/No

Do NOT reference old code anymore."

Real-World Example: Full Compacting Cycle

Before Compacting (Context Rot Setting In)

Session Stats:
- Messages: 150+
- Token count: ~100K
- Time elapsed: 4 hours

Recent AI Message:
"Let me update the Redis cache we configured earlier..."

Problem: Redis was removed 60 messages ago!

Context State:
- AI references deleted cache layer
- Confuses old auth system with current
- Suggests patterns that were abandoned
- Hallucinating about old file structure

Trigger Compacting

User: "Compact all completed work into summary. Focus on current architecture only."

After Compacting (Fresh Context)

Session Stats:
- Messages: 5 (summary + new work)
- Token count: ~8K
- Context reduction: 92%

AI Understanding:
"Authentication v2.0 - Current State:

Completed:
- Supabase JWT auth (replaced Redis sessions)
- Email validation + password reset
- Rate limiting (10 attempts/min per IP)
- All routes protected via middleware
- Integration tests: 95% coverage

Current Focus:
- Implementing 2FA support

Pending:
- OAuth integration
- Audit logging"

Benefits:
- AI knows current architecture (Supabase, not Redis)
- No references to deleted code
- Clear focus on 2FA (current work)
- Understands what's done vs. pending

Impact

Before Compacting:
- 5 out of 10 suggestions referenced deleted code
- Had to correct AI 8 times about architecture
- Generation accuracy: ~60%

After Compacting:
- 0 references to deleted code
- No architecture corrections needed
- Generation accuracy: ~95%

Integration with Other Patterns

Combine with DIGEST.md Files

Store compacted summaries in package-level DIGEST.md:

# packages/api/DIGEST.md

## Recent Work (Last Compacted: 2025-11-02)

### Authentication System (v2.0)
- Implementation: Supabase JWT
- Features: Email/password, reset, rate limiting
- Status: Production-ready, 95% test coverage

### User Profiles API
- Endpoints: CRUD at /api/v1/profiles
- Storage: S3 for avatars
- Status: MVP complete, pending 2FA

AI can reference DIGEST.md for historical context without loading full message history.

Combine with Todo Lists

Use todo lists as compacting structure:

// TodoWrite tool creates persistent task list

Completed: Phase 1: User Auth
   Completed: 1.1: JWT implementation
   Completed: 1.2: Password reset
   Completed: 1.3: Rate limiting

In Progress: Phase 2: User Profiles
   Completed: 2.1: Database schema
   Completed: 2.2: CRUD endpoints
   In Progress: 2.3: Privacy settings (CURRENT)
   Pending: 2.4: Activity history

// When Phase 1 complete:
Compact all Phase 1 tasks -> Summary in DIGEST.md
Remove Phase 1 detailed context
Focus on Phase 2

Combine with Hierarchical CLAUDE.md

Update domain CLAUDE.md files with compacted learnings:

# packages/auth/CLAUDE.md

## Architectural Decisions (Compacted from Sprint 3)

### Auth Provider: Supabase (2025-11-02)
**Decision**: Use Supabase instead of custom JWT
**Rationale**: Managed service, built-in RLS, lower maintenance
**Migration**: Completed, all Redis code removed
**Status**: Production, 0 incidents in 30 days

### Rate Limiting: IP-based (2025-11-02)
**Decision**: 10 attempts/min per IP
**Implementation**: Middleware at route level
**Status**: Active, catching ~50 brute force attempts/day

AI loads compacted context from CLAUDE.md instead of re-reading 100+ messages.

Combine with Git Boundaries

Compact at commit/PR boundaries:

# After merging PR
git log --oneline -10
# Shows recent commits

# Compact session:
"Summarize all work from PR #123:
- What was implemented
- What tests were added
- What's the current state

Then start fresh for next PR."

When to Compact

Automatic Triggers

Claude Code compacts automatically when:

  • Context exceeds ~100K tokens
  • Session becomes unwieldy
  • Performance degrades

Manual Triggers

You should manually compact when:

- After completing 5-10 tasks
- After finishing a feature
- When switching contexts (different package/domain)
- When AI references deleted/outdated code
- When conversation exceeds ~100 messages
- Before starting new major feature
- After merging a PR
- End of day (save state, start fresh tomorrow)
- After long debugging session (remove failed attempts)

Warning Signs You Need to Compact

Red Flags:
- AI suggests using code you deleted
- AI confused about current architecture
- AI references old decisions you reversed
- Generation quality noticeably decreased
- You're correcting AI frequently
- Context feels "heavy" and slow

Best Practices

1. Compact Proactively, Not Reactively

Bad: Wait until AI is confused (context already rotted)
Good: Compact after completing features (prevent rot)

2. Use Task Lists as Compacting Structure

- Structured tasks: Easy to identify what's done
- Clear boundaries: Know when to compact
- Summary template: Tasks become bullet points

3. Preserve Key Decisions, Remove Noise

Keep:

  • Architectural decisions + rationale
  • Current state (what’s working)
  • Key implementation details
  • What’s pending

Remove:

  • Debugging attempts that failed
  • Code that was deleted
  • Intermediate refactoring steps
  • Off-topic discussions

4. Set Regular Compacting Intervals

Every 10 tasks:
  -> Compact into summary
  -> Update DIGEST.md
  -> Clear completed task history

Every feature:
  -> Compact entire feature
  -> Update domain CLAUDE.md
  -> Start fresh for next feature

Every sprint:
  -> Compact all features
  -> Archive to CHANGELOG.md
  -> Clean slate for next sprint

5. Use Compacting to Transfer Knowledge

Compacted summaries are perfect for:

  • Onboarding new team members
  • Documenting decisions for future reference
  • Creating DIGEST.md files
  • Updating CLAUDE.md with learnings

Measuring Success

Key Metrics

1. Context Size

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course
Before: 100K tokens
After: 10K tokens
Reduction: 90%

2. AI Accuracy

Before compacting: 60% suggestions relevant
After compacting: 95% suggestions relevant
Improvement: +58%

3. References to Deleted Code

Before: 5-10 per 10 messages
After: 0 per 10 messages
Improvement: 100%

4. Correction Frequency

Before: Correcting AI 8 times per hour
After: Correcting AI 1 time per hour
Improvement: 87%

Tracking Dashboard

interface CompactingMetrics {
  sessionsCompacted: number;
  avgContextReduction: number; // percentage
  avgAccuracyImprovement: number; // percentage
  timeToContextRot: number; // messages before rot appears
}

const metrics: CompactingMetrics = {
  sessionsCompacted: 23,
  avgContextReduction: 85, // 85% smaller
  avgAccuracyImprovement: 45, // 45% more accurate
  timeToContextRot: 120, // rot appears ~120 messages
};

// Goal: Compact every 80-100 messages (before rot)

Common Pitfalls

Pitfall 1: Compacting Too Aggressively

Problem: Removing important context

Bad:
"Summarize everything in 50 words"
-> Loses architectural decisions

Good:
"Summarize completed work. Preserve:
- Key architectural decisions
- Current implementation state
- What's pending
Max 500 words."

Pitfall 2: Never Compacting

Problem: Context grows unbounded until unusable

300 messages later:
- AI completely confused
- Every suggestion references old code
- Session effectively dead

Solution: Set regular compacting schedule (every 80-100 messages)

Pitfall 3: Compacting Mid-Task

Problem: Losing track of current work

Bad:
Start feature -> Compact halfway through -> Lose context

Good:
Complete feature -> Compact -> Start next feature

Pitfall 4: Not Updating CLAUDE.md with Learnings

Problem: Compacted knowledge is lost

Bad:
Compact session -> Start new session -> Re-learn same things

Good:
Compact session -> Update CLAUDE.md -> Next session has context

Conclusion

Context rot is the invisible tax on long AI coding sessions. Auto-compacting eliminates this by:

  1. Removing stale context: Old code references, abandoned decisions
  2. Preserving key decisions: Architecture, current state, pending work
  3. Reducing context size: 80-90% smaller, faster, more focused
  4. Improving accuracy: AI stays aligned with current codebase

Key Takeaways:

  • Claude Code compacts automatically when context grows large
  • Manually compact using task list recursion (complete -> summarize -> continue)
  • Compact at natural boundaries (features, sprints, PRs)
  • Preserve decisions, remove noise
  • Update DIGEST.md and CLAUDE.md with compacted knowledge
  • Target: Compact every 80-100 messages before rot appears

The result: Long AI sessions that stay focused, accurate, and productive from message 1 to message 300+.

Related Concepts

Topics
Auto CompactingClaude CodeContext EfficiencyContext ManagementContext RotLong SessionsSession ManagementSummarizationTask ListsWorkflow Optimization

More Insights

Cover Image for LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

LLM VCR and Agent Trace Hierarchy: Deterministic Replay for Agent Pipelines

Three patterns that turn agent pipelines from opaque prompt chains into debuggable, reproducible engineering systems: (1) an LLM VCR that records and replays model interactions, (2) a Run > Step > Mes

James Phoenix
James Phoenix
Cover Image for Agent Search Observation Loop: Learning What Context to Provide

Agent Search Observation Loop: Learning What Context to Provide

Watch how the agent navigates your codebase. What it searches for tells you what to hand it next time.

James Phoenix
James Phoenix