Lost in the Middle: Preventing Context Window Attention Degradation

James Phoenix
James Phoenix

Summary

Large language models exhibit a U-shaped attention pattern: they attend most strongly to information at the beginning and end of their context window, while information in the middle receives reduced attention. This “lost in the middle” effect causes relevant information to be missed even when present in context. Mitigation strategies include positioning critical content at extremes, using progressive disclosure to avoid middle-stuffing, chunking with summaries, and query-anchored context placement.

The Problem

You’ve carefully curated your context. The relevant code is included. The documentation is there. The examples are perfect. Yet the LLM ignores crucial information and generates incorrect output.

The culprit is often not what you included, but where you placed it.

The U-Shaped Attention Curve

Research from “Lost in the Middle: How Language Models Use Long Contexts” (Liu et al., 2023) documented a striking pattern:

“`
Attention Strength
^
| ████ ████
| ████ ████
| ████ ████
| ████ ▓▓▓▓ ▓▓▓▓████
| ████ ▓▓▓▓ ░░░░░░░░░ ▓▓▓▓████
| ████ ▓▓▓▓ ░░░░░░░░░ ▓▓▓▓████
+———————————–> Position
Start Middle End

████ = High attention (80-100%)
▓▓▓▓ = Medium attention (50-80%)
░░░░ = Low attention (20-50%)
“`

Key findings:

  • Information at positions 1-10% and 90-100% is processed most accurately
  • Information at positions 40-60% (middle) shows significant accuracy degradation
  • Effect is more pronounced as context length increases
  • Models trained on long contexts still exhibit the pattern

Real-World Impact

Scenario: 20K token context with relevant code in middle

“`
Context Structure:
├── System prompt (1K tokens) – Position: 0-5%
├── CLAUDE.md (2K tokens) – Position: 5-15%
├── Type definitions (3K tokens) – Position: 15-30%
├── Irrelevant utility files (5K tokens) – Position: 30-55%
├── THE RELEVANT CODE (2K tokens) – Position: 55-65% ← IN THE DANGER ZONE
├── More utilities (4K tokens) – Position: 65-85%
└── Test examples (3K tokens) – Position: 85-100%

Result: Model may miss the relevant code entirely!
“`

Symptoms of Lost-in-the-Middle

  1. Ignoring provided examples: You included exact patterns, but output doesn’t follow them
  2. Missing explicit constraints: Constraints were in context but violated
  3. Not using provided code: Relevant code included but model writes from scratch
  4. Selective blindness: Model acknowledges context exists but doesn’t use it
  5. Recency bias: Model heavily weights most recent context, ignores middle

Why This Happens

Attention Mechanism Limitations

Transformer attention has computational biases:

“`
Attention Score = softmax(Q × K^T / √d_k) × V

Where:

  • Q (Query) = What we’re looking for
  • K (Key) = What’s available in context
  • V (Value) = The content itself
    “`

Positional encodings create implicit biases:

  • Beginning tokens: Strong signal from initial position embeddings
  • End tokens: Strong signal from recency in autoregressive attention
  • Middle tokens: Weaker relative position signals, competition from both ends

Long-Context Training Gaps

Even models trained on long contexts show the pattern because:

  1. Training data has important information distributed differently
  2. Attention patterns learned from typical document structures
  3. Most training documents front-load key information
  4. Autoregressive training creates recency effects

Mitigation Strategies

Strategy 1: Critical Content at Extremes

Place most important information at beginning and end:

“`typescript
interface ContextStructure {
beginning: string[]; // High attention zone (0-15%)
middle: string[]; // Lower attention zone (15-85%)
end: string[]; // High attention zone (85-100%)
}

function structureContext(items: ContextItem[]): ContextStructure {
// Sort by importance
const sorted = […items].sort((a, b) => b.importance – a.importance);

const total = sorted.length;
const extremeCount = Math.ceil(total * 0.3); // 30% at extremes

return {
beginning: sorted.slice(0, extremeCount / 2),
middle: sorted.slice(extremeCount / 2, -extremeCount / 2),
end: sorted.slice(-extremeCount / 2)
};
}
“`

Application:

“`markdown

BEGINNING: Critical constraints (high attention)

  • Return types must be Result<T, E>
  • Never throw exceptions
  • All functions must be pure

MIDDLE: Background context (lower attention OK)

  • Project history…
  • Architecture documentation…
  • Less critical utilities…

END: Task-specific requirements (high attention)

  • The specific function to implement
  • Required tests to pass
  • Current file context
    “`

Strategy 2: Query-Anchored Context

Place task-relevant context near the query (end of prompt):

“`typescript
function buildQueryAnchoredContext(
staticContext: string,
relevantCode: string,
query: string
): string {
// Static context first (less critical positioning)
// Relevant code right before query (high attention zone)
return `
${staticContext}

Relevant Code (Read This Carefully)

${relevantCode}

Your Task

${query}
`;
}
“`

Before (Lost in middle):

“`
System prompt -> Types -> Utilities -> RELEVANT CODE -> More code -> Task
^^^^^^^^^^^^^^^^
(Position 50-60%, low attention)
“`

After (Query-anchored):

“`
System prompt -> Types -> Utilities -> More code -> RELEVANT CODE -> Task
^^^^^^^^^^^^^^^^
(Position 85-95%, high attention)
“`

Strategy 3: Progressive Disclosure

Don’t stuff everything into context. Load on-demand:

“`typescript
// BAD: Load all 50 files into middle of context
const context = allFiles.map(f => f.content).join(‘\n’);

// GOOD: Load only relevant files, position strategically
async function buildProgressiveContext(task: Task): Promise {
const coreContext = await loadCoreContext(); // Always at beginning
const relevantFiles = await findRelevantFiles(task); // Near end

// Skip irrelevant files entirely – they would just fill the middle
return `
${coreContext}

Files Relevant to This Task

${relevantFiles.map(f => f.content).join(‘\n’)}

Task

${task.description}
`;
}
“`

Strategy 4: Chunking with Summaries

Break long content into chunks with summary headers:

“`typescript
function chunkWithSummaries(
content: string,
chunkSize: number = 500
): string {
const chunks = splitIntoChunks(content, chunkSize);

return chunks.map((chunk, i) => {
const summary = generateSummary(chunk); // Brief summary
return `

Chunk ${i + 1}: ${summary}

${chunk}
`;
}).join(‘\n’);
}
“`

Why this helps:

  • Summaries at chunk boundaries act as attention anchors
  • Model can scan summaries even if chunk content is in low-attention zone
  • Reduces effective “middle” by creating mini-beginnings throughout

Strategy 5: Repeated Key Information

Repeat critical constraints at multiple positions:

“`typescript
const constraints = `
CRITICAL: Return Result<T, E>, never throw exceptions.
`;

function buildContextWithRepeats(
staticContext: string,
code: string,
task: string
): string {
return `

${constraints}

${staticContext}

${constraints}

Code

${code}

${constraints}

Task

${task}
`;
}
“`

Strategy 6: Attention Anchors

Use formatting that naturally draws attention:

“`markdown

⚠️ CRITICAL CONSTRAINT ⚠️

Never use throw statements. Return Result types only.

🎯 RELEVANT CODE 🎯

“`typescript
// This is the function you need to modify
function processUser(user: User): Result<ProcessedUser, Error> {
// …
}
“`

📋 YOUR TASK 📋

Implement the missing validation logic.
“`

Symbols and formatting create visual anchors that influence attention patterns.

Strategy 7: Hierarchical Context with Pointers

Use pointers from high-attention zones to middle content:

“`markdown

System Prompt (Beginning – High Attention)

Key patterns are defined in Section 3 below. You MUST follow them.

Section 1: Background

…general info…

Section 2: Architecture

…architecture details…

Section 3: PATTERNS TO FOLLOW (Critical)

…the important patterns…

Section 4: Task (End – High Attention)

Implement following the patterns in Section 3.
“`

The pointer from beginning creates a mental link that increases attention to Section 3.

Implementation: Context Position Optimizer

“`typescript
interface ContextItem {
content: string;
tokens: number;
importance: number; // 0-1, higher = more important
type: ‘constraint’ | ‘example’ | ‘code’ | ‘documentation’ | ‘task’;
}

interface PositionedContext {
items: Array<{ item: ContextItem; position: number }>;
totalTokens: number;
attentionCoverage: number; // How much important content is in high-attention zones
}

function optimizeContextPosition(
items: ContextItem[],
maxTokens: number
): PositionedContext {
// Filter to fit budget
const sorted = […items].sort((a, b) => b.importance – a.importance);
const selected: ContextItem[] = [];
let totalTokens = 0;

for (const item of sorted) {
if (totalTokens + item.tokens <= maxTokens) {
selected.push(item);
totalTokens += item.tokens;
}
}

// Position optimization
const highAttentionItems = selected.filter(i => i.importance > 0.7);
const mediumItems = selected.filter(i => i.importance <= 0.7 && i.importance > 0.4);
const lowItems = selected.filter(i => i.importance <= 0.4);

// Constraints always at beginning
const constraints = highAttentionItems.filter(i => i.type === ‘constraint’);

// Task always at end
const tasks = highAttentionItems.filter(i => i.type === ‘task’);

// Examples and code near end (high attention)
const examples = highAttentionItems.filter(i =>
i.type === ‘example’ || i.type === ‘code’
);

// Build positioned context
const positioned: ContextItem[] = [
…constraints, // Beginning: constraints
…lowItems, // Middle: low importance (OK to have reduced attention)
…mediumItems, // Middle-end transition
…examples, // Near end: critical code/examples
…tasks // End: task description
];

// Calculate positions and attention coverage
let currentPosition = 0;
const result: Array<{ item: ContextItem; position: number }> = [];

for (const item of positioned) {
const startPos = currentPosition / totalTokens;
result.push({ item, position: startPos });
currentPosition += item.tokens;
}

// Calculate how much important content is in high-attention zones
const highAttentionThreshold = 0.15; // 0-15% and 85-100%
const importantInHighAttention = result.filter(({ item, position }) =>
item.importance > 0.7 &&
(position < highAttentionThreshold || position > (1 – highAttentionThreshold))
);

const totalImportant = selected.filter(i => i.importance > 0.7)
.reduce((sum, i) => sum + i.tokens, 0);
const importantInZones = importantInHighAttention
.reduce((sum, { item }) => sum + item.tokens, 0);

return {
items: result,
totalTokens,
attentionCoverage: totalImportant > 0 ? importantInZones / totalImportant : 1
};
}
“`

Measuring Effectiveness

Metric 1: Position Score

“`typescript
function calculatePositionScore(items: PositionedItem[]): number {
let score = 0;
let maxScore = 0;

for (const { item, position } of items) {
const importance = item.importance;
maxScore += importance;

// High attention zones: 0-15% and 85-100%
if (position < 0.15 || position > 0.85) {
  score += importance * 1.0;  // Full credit
} else if (position < 0.30 || position > 0.70) {
  score += importance * 0.7;  // Partial credit
} else {
  score += importance * 0.4;  // Low credit (danger zone)
}

}

return maxScore > 0 ? score / maxScore : 1;
}

// Target: positionScore > 0.8
“`

Metric 2: Constraint Compliance Rate

“`typescript
interface ComplianceMetrics {
constraintsTotal: number;
constraintsFollowed: number;
complianceRate: number;
positionCorrelation: number; // Do well-positioned constraints have higher compliance?
}

// Track whether constraints at different positions are followed
// High correlation = position matters for your use case
“`

Metric 3: A/B Testing Position

“`typescript
async function abTestPositioning(
context: ContextItem[],
task: string,
numTrials: number = 10
): Promise<{ original: number; optimized: number }> {
let originalSuccess = 0;
let optimizedSuccess = 0;

for (let i = 0; i < numTrials; i++) {
// Original: naive ordering
const originalContext = context.map(c => c.content).join(‘\n’);
const originalOutput = await llm.generate(originalContext + ‘\n’ + task);
if (meetsRequirements(originalOutput)) originalSuccess++;

// Optimized: position-aware ordering
const optimized = optimizeContextPosition(context, 100000);
const optimizedContext = optimized.items.map(p => p.item.content).join('\n');
const optimizedOutput = await llm.generate(optimizedContext + '\n' + task);
if (meetsRequirements(optimizedOutput)) optimizedSuccess++;

}

return {
original: originalSuccess / numTrials,
optimized: optimizedSuccess / numTrials
};
}
“`

Best Practices

1. Audit Your Context Structure

“`typescript
function auditContextPositions(context: string, markers: string[]): void {
const totalLength = context.length;

for (const marker of markers) {
const position = context.indexOf(marker);
if (position === -1) {
console.log(`${marker}: NOT FOUND`);
} else {
const pct = (position / totalLength * 100).toFixed(1);
const zone = position / totalLength < 0.15 ? ‘HIGH’ :
position / totalLength > 0.85 ? ‘HIGH’ : ‘LOW’;
console.log(`${marker}: ${pct}% (${zone} attention zone)`);
}
}
}

// Usage:
auditContextPositions(context, [
‘CRITICAL CONSTRAINT:’,
‘function processUser’,
‘Your task is to’
]);
“`

2. Use Position-Aware Templates

“`typescript
const POSITION_OPTIMIZED_TEMPLATE = `

{{constraints}}

{{background}}
{{documentation}}
{{utilities}}

{{relevantCode}}

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

4.5/5 rating
306,000+ learners
View Course

{{examples}}

{{task}}
`;
“`

3. Avoid Middle-Heavy Context

“`typescript
// BAD: All important content in middle
const badContext = `
System prompt…
Type definitions…
${criticalConstraints} // Position: 40%
${relevantCode} // Position: 50%
${examples} // Position: 60%
More utilities…
Task…
`;

// GOOD: Important content at extremes
const goodContext = `
${criticalConstraints} // Position: 5%
System prompt…
Type definitions…
More utilities…
${relevantCode} // Position: 85%
${examples} // Position: 90%
${task} // Position: 95%
`;
“`

4. Test Position Sensitivity

Before deploying, test if your specific task is position-sensitive:

“`typescript
async function testPositionSensitivity(
content: string,
task: string
): Promise {
// Test same content at different positions
const results = await Promise.all([
llm.generate(`${content}\n\n${task}`), // Content at start
llm.generate(`Padding…\n${content}\n\nPadding…\n${task}`), // Content in middle
llm.generate(`Padding…\n\n${task}\n\n${content}`) // Content at end
]);

// If outputs differ significantly, position matters
const unique = new Set(results.map(normalizeOutput)).size;
return unique > 1;
}
“`

Common Pitfalls

Pitfall 1: Assuming Position Doesn’t Matter

“`typescript
// BAD: Random/chronological ordering
const context = files.map(f => f.content).join(‘\n’);

// GOOD: Intentional positioning
const context = buildPositionOptimizedContext(files, task);
“`

Pitfall 2: Overloading the Middle

“`typescript
// BAD: Stuffing everything because “it’s in context”
const context = `
${systemPrompt}
${allDocumentation} // 10K tokens in middle
${allUtilities} // 15K tokens in middle
${task}
`;

// GOOD: Only include what’s needed, position strategically
const context = `
${systemPrompt}
${relevantDocumentation} // Only 2K tokens
${relevantCode} // Near end
${task}
`;
“`

Pitfall 3: Not Repeating Critical Constraints

“`typescript
// BAD: Constraint mentioned once in middle
// GOOD: Constraint at beginning AND near task
“`

Pitfall 4: Ignoring Context Length Effects

The lost-in-the-middle effect is worse with longer contexts:

“`typescript
// Short context (5K tokens): Middle still gets ~60% attention
// Long context (100K tokens): Middle may get <30% attention

// Adjust strategy based on context length
function getPositioningStrategy(contextTokens: number) {
if (contextTokens < 10000) {
return ‘standard’; // Less aggressive positioning needed
} else if (contextTokens < 50000) {
return ‘optimized’; // Use position optimization
} else {
return ‘aggressive’; // Repeat constraints, minimal middle content
}
}
“`

Conclusion

The lost-in-the-middle effect is a real and measurable phenomenon. Information position in context directly impacts whether it gets used. By understanding and optimizing for attention patterns, you can significantly improve LLM output quality.

Key Takeaways:

  1. U-shaped attention: Beginning and end get most attention, middle gets less
  2. Position critical content: Constraints at start, task-relevant code near end
  3. Avoid middle-stuffing: Progressive disclosure beats context cramming
  4. Use anchors: Summaries, formatting, and repetition boost attention
  5. Measure position score: Track how much important content is in high-attention zones
  6. Test sensitivity: Not all tasks are equally position-sensitive

The difference between “the model ignored my context” and “the model followed my context perfectly” is often just position.

Related

References

Topics
Attention PatternsContext EngineeringContext WindowInformation RetrievalLost In The MiddlePosition EffectsPrompt EngineeringToken Placement

More Insights

Cover Image for Own Your Control Plane

Own Your Control Plane

If you use someone else’s task manager, you inherit all of their abstractions. In a world where LLMs make software a solved problem, the cost of ownership has flipped.

James Phoenix
James Phoenix
Cover Image for Indexed PRD and Design Doc Strategy

Indexed PRD and Design Doc Strategy

A documentation-driven development pattern where a single `index.md` links all PRDs and design documents, creating navigable context for both humans and AI agents.

James Phoenix
James Phoenix