What Is Context Bloat and How Does It Hurt Skill Performance?

Context bloat is when a Claude Code skill loads more tokens into context than the task actually requires. At Agent Engineer Master (AEM), it is the most common source of inconsistent output in production skill libraries. The extra tokens don't make the skill smarter. They push the instructions that do matter further from the start of context, where recall is highest.

TL;DR: Context bloat comes from three sources in Claude Code skills: oversized SKILL.md files, documentation files in the skill folder, and reference file chains. The fix is progressive disclosure: keep SKILL.md lean, move reference material to on-demand files, and remove non-skill files from the skill directory. A bloated skill library doesn't hurt individual skills; it hurts every skill in the session.

Where does context bloat come from in a Claude Code skill?

Context bloat in a Claude Code skill comes from three sources: an oversized SKILL.md body, documentation files stored in the skill folder, and chained reference files that each trigger additional loads. Each source has a different fix, and all three can compound in the same skill simultaneously.

Source 1: An oversized SKILL.md body. SKILL.md is loaded into context every time the skill runs. A 500-line skill body loaded for every invocation wastes the token budget that other skills and the conversation context need. The rule at Agent Engineer Master: SKILL.md body under 200 lines. Any reference material over 50 lines belongs in a separate reference file loaded on demand.
Source 2: Documentation files in the skill folder. Claude scans all files in the skill folder at startup for metadata. A 300-line README.md and a 200-line CHANGELOG.md in the skill folder add 500 lines of dead weight to the startup token cost. Neither file contributes to skill execution.
Source 3: Chained reference files. A reference file that links to another reference file, which links to another. Each chain link triggers a load. A 4-link chain can add 3,000 to 5,000 tokens of context before a single instruction executes. In our builds, we break reference chains at 2 levels: SKILL.md links to reference files, reference files contain the material directly.

How does context bloat hurt skill performance?

Context bloat degrades skill performance in two ways that compound each other: instruction recall drops as context grows longer, and every token a bloated skill consumes is unavailable to the other skills sharing the same session budget. A skill that is bad in isolation becomes worse in a library.

Instruction recall degrades at context tails. Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems (Liu et al., Stanford NLP Group, "Lost in the Middle," ArXiv 2307.03172, 2023). Performance is highest when relevant information appears at the start or end of context, and degrades significantly when it appears in the middle, a pattern that persists even in models with 128K+ context windows. A separate study found that performance drops of 17-20% occur from context length pressure alone, even when the model has perfect access to all relevant content (Shi et al., ArXiv 2510.05381, 2024). A skill with 800 lines of context before the actual instructions produces less reliable output than a skill with 80 lines.
Skills compete for the same token budget. Every token loaded by one skill is unavailable to the others. A library of 10 skills where 3 have bloated SKILL.md files effectively reduces the context available to the other 7. The system prompt character budget for Claude Code skills is not unlimited: the SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable controls it, scaling dynamically at 1% of the context window with a fallback of 8,000 characters, and each skill's combined description text is hard-capped at 1,536 characters regardless of budget. When skills exceed that budget, later skills in the load order get truncated (Anthropic, Claude Code Skills documentation, 2025).

The result is a skill library that gets less reliable as it grows, with the degradation distributed across every skill rather than concentrated in the bloated one.

How do I know if my skill has context bloat?

Run three diagnostic checks on any skill you suspect is bloated: count the SKILL.md body lines, list every file in the skill folder, and trace the depth of each reference file the skill loads. You can complete all three in under five minutes without running the skill at all.

Check 1: SKILL.md line count. Open the file. If the body (everything after the frontmatter) exceeds 200 lines, it has context bloat. Lines 1 to 100 will be processed reliably. Lines past 200 are at risk of being ignored or inconsistently applied.
Check 2: Skill folder contents. List all files in the skill directory. Any file that isn't SKILL.md, evals.json, or a named reference file is contributing to startup bloat. README.md, CHANGELOG.md, and planning documents don't belong there.
Check 3: Reference file depth. Open each reference file the skill loads. If any reference file itself loads another file (contains instructions like "also read X.md" or "see also Y.md"), that's a reference chain. Flatten it. Reference chains are the most expensive form of context bloat per line: each chained load adds a full file load overhead before the skill's instructions reach the model.

A prompt caching pass reduces the cost of repeated context (cached input tokens run at US$0.30 per million vs. US$3.00 uncached on Claude Sonnet, a 10x saving), but caching does not reduce the instruction recall penalty from long contexts. Volume reduction is the only fix for that (Anthropic, Claude pricing documentation, 2025).

What is the fix for each source of context bloat?

Each source has a targeted fix: trim the SKILL.md body to under 200 lines by moving reference material into on-demand files, remove non-skill files from the skill directory entirely, and flatten reference chains to a single level. The three fixes are independent and can be applied in any order.

How do I fix an oversized SKILL.md?

Move any block of reference material over 50 lines into a dedicated reference file. In SKILL.md, replace it with one instruction line: "Before [step X], read reference-file.md for [topic]." The SKILL.md body should stay under 200 lines. Reference files load only when the relevant step executes, so they cost nothing at startup.

The skill body should contain process steps, rules, and the output contract. Everything else is a candidate for a reference file. See What Are Reference Files in a Claude Code Skill for the correct reference file structure.

How do I fix documentation files in the skill folder?

Move them out of the skill directory entirely. README and CHANGELOG belong in the repository root or in a documentation folder outside the skill directory. If you need developer documentation for the skill, create a /docs/skills/[skill-name].md file at the repo level. Nothing in the skill folder except SKILL.md and named reference files.

How do I fix chained reference files?

Copy the relevant content from the downstream file directly into the upstream file. Reference A gets its own copy of the content it needs from Reference C. The chain collapses to a single level: SKILL.md loads Reference A, Reference A has everything it needs, no further loads required. One level deep is the target.

How much does context bloat actually matter?

It depends on library size. A single skill with a 250-line SKILL.md in a project with no other skills produces marginally worse output than a 100-line version. The difference is measurable but not dramatic. In a library of 15 or more skills, that same 250-line skill becomes a compounding liability for every other skill in the session.

The same skill in a project with 15 other skills has a compounding problem. Token budget is shared. Every skill that takes more than its fair share degrades the others. Research on reasoning tasks found that LLM performance begins to degrade noticeably at around 3,000 tokens of context, making prompt discipline a production concern well below the model's stated context limit (Levy, Jacoby, and Goldberg, 2024). In our builds with 20+ skill libraries, trimming context bloat in 3 bloated skills routinely produces measurable accuracy improvements across the entire library, not just those 3 skills.

"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." — Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)

For a small project with 2 to 5 skills, context bloat is a hygiene issue. For a production library with 15+ skills, it is a performance constraint.

What is progressive disclosure and how does it prevent context bloat?

Progressive disclosure is the architectural pattern that loads skill content in three layers, with each layer loaded only when needed. The metadata layer loads at startup for all skills. The body layer loads when the skill activates. Reference files load only when a specific step requires them. This keeps startup token cost near zero for unactivated skills.

Metadata layer: The description field only. Loaded at startup for every skill. This is what Claude uses for skill discovery.
Body layer: The full SKILL.md content. Loaded when the skill activates.
Reference layer: Named reference files. Loaded by explicit instruction within the skill body, only when the step that needs them executes.

A skill built on progressive disclosure uses minimal startup token budget (just the description), loads the full body only when triggered, and loads reference files only when specific steps require them. When auto-compaction occurs, Claude Code re-attaches invoked skills within a shared budget of 25,000 tokens, preserving the first 5,000 tokens of each skill. Skills with bloated bodies compete for that budget too: a 600-line skill body at compaction leaves less room for other skills than a 150-line one (Anthropic, Claude Code documentation, 2025). See How Does Progressive Disclosure Save Tokens and Improve Performance for the mechanics and token-count breakdown.

Is there a target line count for each layer?

Yes. AEM holds specific line-count targets for every layer in a production skill: under 150 characters for the description, under 200 lines for the SKILL.md body, under 500 lines for each reference file, and a maximum chain depth of 1 level. Skills that exceed these targets are at higher risk of inconsistent behavior in multi-skill environments.

Layer	Location	Target
Description	SKILL.md frontmatter	1 line, under 150 characters
SKILL.md body	SKILL.md file	Under 200 lines
Reference files	Separate .md files	Under 500 lines each
Reference depth	Chain length	1 level maximum

Skills that exceed these targets are not automatically broken. They are at higher risk of inconsistent behavior, especially in multi-skill environments. Benchmarks across 18 models found effective reliable capacity at 60-70% of a model's stated context window maximum before accuracy drops noticeably (LLM Context Window Performance benchmarks, aiagentmemory.org, 2025). AEM's line-count targets are calibrated to stay well inside that margin even when multiple skills load simultaneously.

Common questions about context bloat

Context bloat applies to all skill types in all invocation modes. Slash commands load the full skill body on invocation, automatic triggers do the same, and subagents preload full skill content at startup. The three sources and three fixes described above apply regardless of how a skill is invoked.

Does context bloat affect skills that I invoke with a slash command?

Yes. Even for slash commands, the skill body is loaded into context on invocation. A 600-line SKILL.md loaded by a slash command takes up the same token space as one loaded by automatic trigger. The difference is that slash-command invocations are intentional, so you'll notice failures faster.

If my skill is working fine, should I fix context bloat anyway?

If it's working fine in a project with 3 skills, it will probably work less fine in a project with 15 skills. Fix it before the library grows. Retroactive refactoring is more expensive than building clean from the start.

Can reference files themselves be too large?

Yes. A 600-line reference file loaded into context on demand is still 600 lines of context. Split large reference files by topic. A style guide reference file and an API schema reference file are two files, not one. Load only the one relevant to the current step.

What's the token cost of my skill descriptions at startup?

Each skill description adds approximately 100 tokens to the startup system prompt. A library of 30 skills adds approximately 3,000 tokens of description overhead before any conversation starts. The total description budget for all skills combined defaults to 8,000 characters, scaling dynamically to 1% of the context window when set; each individual skill entry is hard-capped at 1,536 characters regardless of how large the total budget is (Anthropic, Claude Code Skills documentation, 2025). That's the metadata layer cost. See How Many Tokens Does Claude Use to Store My Skill Descriptions at Startup for the detailed breakdown.

Does context bloat cause the skill to fail entirely, or just degrade performance?

Degrade, not fail entirely. A bloated skill will activate correctly and follow the early steps of its instructions reliably. It is the later steps, the rules near the end of the body, and the nuanced constraints that get inconsistently applied. The failure is partial, which makes it harder to diagnose than a complete failure.

How is context bloat different from the "too many rules" anti-pattern?

Context bloat is about total token volume: too much content loaded, regardless of content type. Too many rules is a specific cause of context bloat within the rules section of the instruction body. Both produce the same symptom: late-in-context instructions followed less reliably. Fix context bloat first (get SKILL.md under 200 lines), then audit the rules section specifically.

Last updated: 2026-04-18