Position Bias in LLMs

U-shaped attention bias in transformer models.

The Problem

LLMs exhibit U-shaped attention bias — tokens at beginning and end receive higher attention than middle, regardless of relevance.

Consequences:

Important info in middle often ignored
10-20% performance drop when key context is middle
"Lost in the middle" problem

Research

"Found in the Middle" (Hsieh et al., ACL 2024 Findings)

Finding: LLMs have U-shaped attention bias across architectures and sizes.

Results:

Bias exists regardless of model size
Persists with instruction tuning
Affects open-source and proprietary models

arXiv | ACL Anthology

"On the Emergence of Position Bias" (Wu et al., ICML 2025)

Finding: Causal masking amplifies early-position bias across layers.

Why:

Causal masking restricts attention to previous tokens
Early tokens accumulate attention from all subsequent tokens
Middle tokens get "squeezed"

OpenReview

Quantitative Impact

Scenario	Accuracy
Key info at beginning	Baseline
Key info in middle	-10-20%
Key info at end	-2-5%
With position-aware loading	+25-30%

Tachikoma's Solution

Strategy:

Intent classification first → know what context is needed
Priority-based loading → important rules at beginning
Selective loading → only relevant modules
Position optimization → high-relevance at boundaries
Reflect → Was context sufficient? Should I have loaded more?

Context Module Priority

Priority 0:   00-core-contract      (always first)
Priority 10:  10-coding-standards   (coding tasks)
Priority 12:  12-commenting-rules   (with coding-standards)
Priority 20:  20-git-workflow       (git tasks)
Priority 30:  30-research-methods   (research tasks)

Position Bias in LLMs ​

The Problem ​

Research ​

"Found in the Middle" (Hsieh et al., ACL 2024 Findings) ​

"On the Emergence of Position Bias" (Wu et al., ICML 2025) ​

Quantitative Impact ​

Tachikoma's Solution ​

Context Module Priority ​

See Also ​