How Do Claude Code Skills Load Into Claude's Context Window?

Claude Code skills load in three distinct stages: descriptions load at session startup, the full skill body loads when you trigger a skill, and reference files load on demand when the skill instructions tell Claude to read them. Each stage has a token cost. Understanding the stages explains why a skill that exists on disk can still be invisible to Claude, and why a skill that loads correctly can still run slowly.

TL;DR: Three loading stages govern skill behavior. Stage 1: at session start, Claude reads every skill's description field into its context, costing roughly 100 tokens per skill. Stage 2: when you trigger a skill, the full SKILL.md body loads. Stage 3: reference files load only when a skill step instructs Claude to read them. This progressive disclosure model is why Agent Engineer Master keeps SKILL.md files lean and stores domain knowledge in reference files: the body loads every time, but the reference files load only when the task requires them.

What happens when Claude starts a session?

At startup, Claude reads the description field from every installed skill's frontmatter and loads these descriptions into its system context. Not the full skill body. Not the reference files. Only the one-line description.

Each description costs approximately 100 tokens. A library of 20 skills costs 2,000 tokens before any conversation starts. A library of 30 skills costs 3,000 tokens. The total system prompt budget sits around 15,000 characters, so description overhead is non-trivial at scale (AEM internal measurement, 2026).

This startup cost is why the description field is the most performance-critical part of a skill. A vague description wastes its 100 tokens on text that does not help Claude trigger the skill correctly. A precise description earns every token by naming the exact triggering condition.

The practical consequence: a new skill you added to .claude/skills/ does not exist in the current session. Claude loaded descriptions at startup. Your new skill is not in that set. Start a new session. The skill will be there.

When does Claude read the full SKILL.md body?

When a skill is triggered. Claude reads the full SKILL.md body only after it decides the skill is the right tool for the current request.

At this point, every line in your SKILL.md enters the context window. This is the full loading cost of the skill, not just the description. For a 150-line skill, that is roughly 1,500-2,000 tokens depending on content density.

"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." — Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)

This finding from Stanford NLP has a direct implication for skill design: instructions that appear in the middle of a long SKILL.md body are followed less reliably than instructions near the top or bottom. It is one reason Agent Engineer Master structures skills with the most critical constraints at the top of the body, not buried after background context.

The 500-line recommendation for SKILL.md exists precisely because of this effect. A 600-line skill loads 6,000-8,000 tokens of instructions. The instructions in the middle of that range are statistically the least reliable. Keeping the body lean and moving domain knowledge to reference files is not an aesthetic choice. It is a reliability choice.

How do reference files work?

Reference files load on demand, only when a skill step explicitly instructs Claude to read them.

A skill step might say: "Load references/review-checklist.md before proceeding to step 3." Claude reads that file at step 3. Not at session start. Not at skill trigger. At the moment the step runs.

The token cost of a reference file is its full content, loaded at the moment it is needed. A 200-line reference file adds 2,000-3,000 tokens to the context at that step. If the skill triggers on 10 requests but only 3 require the reference file, the reference file's context cost applies only to those 3 runs.

This is the core of what progressive disclosure achieves: skills are available at all times at low cost, the full instruction set loads only when a skill is triggered, and domain knowledge loads only when it is needed. The total context overhead for a typical 5-skill session is far lower than loading all 5 skill bodies at startup.

We have seen reference files misused in commissioned skill builds at Agent Engineer Master. The common mistake: embedding a 300-line domain knowledge document inside the SKILL.md body instead of as a reference file. The skill loads that content every single time, regardless of whether the current task needs it. That is 3,000 tokens of waste per invocation.

Why does loading order matter?

Because instructions that arrive early in the context window are followed more reliably than instructions that arrive late, and reference files that load at the wrong step can cause Claude to apply criteria it has not yet seen.

Two failure modes from incorrect loading order:

Early reference loading. A skill loads a 200-line reference file in step 1, before Claude has read the user's actual request. The reference content fills context before the problem statement, which means the model must parse the request against a large block of pre-loaded criteria. This works but wastes tokens on criteria that may not apply.

Late reference loading. A skill applies criteria in step 3 but loads the reference file in step 5. Claude uses improvised criteria for steps 3 and 4, then loads the actual criteria afterward. The output is inconsistent because the reference arrived after the work was done.

The correct pattern: load a reference file in the step immediately before the step that uses it. Not before all steps. Not after. At the moment the content is needed.

For a deeper look at the three-layer model and how to optimize it, see What Are the Three Layers of Progressive Disclosure? and How Does Progressive Disclosure Save Tokens and Improve Performance?.

Frequently Asked Questions

Does Claude reload skill descriptions every time I send a message? No. Descriptions load once at session startup and stay in the context window for the entire session. The skill body loads when the skill triggers. Reference files load when their step runs. None of these reload automatically between messages unless you start a new session.

Can I force Claude to reload a skill during a session? Not directly. Claude does not support in-session skill refreshes. If you modify a SKILL.md during a session, start a new session to see the changes. The current session will keep running the version that loaded at startup.

What happens if my skill body is too long and runs into the context limit? Claude truncates the skill body or earlier context to fit within the window. Truncation behavior is not predictable: you cannot control which parts Claude prioritizes. The practical fix is to keep SKILL.md under 300 lines and move overflow content to reference files loaded on demand.

Does installing more skills slow down Claude Code? Startup time increases marginally with more installed skills because Claude reads more description text at session start. The more significant effect is on context quality, not speed. With 30+ skills loaded, descriptions compete for classifier attention, and the system becomes less reliable at matching the right skill to the right request.

Do user-level skills and project-level skills both load at startup? Yes. Both load at session start. Claude reads descriptions from ~/.claude/skills/ and from .claude/skills/ in the project directory. The combined description set is what Claude uses to match requests to skills. If you have 15 user-level skills and 20 project-level skills active, all 35 descriptions load at startup.

Last updated: 2026-05-01