What's the Token Cost of Each Skill at Startup and How Does That Constrain Library Design?

Each Claude Code skill costs approximately 100 tokens of system prompt overhead at startup. That cost is fixed per skill per session: it applies whether the skill activates or not. A library of 30 skills consumes 3,000 tokens before any task begins. At 50 skills, that's 5,000 tokens. This is not a theoretical concern. At AEM, startup token costs are the primary architectural constraint we enforce on every skill library we build.

TL;DR: Each Claude Code skill costs roughly 100 tokens at startup, consumed by the name and description fields. At 30 skills, that's 3,000 tokens consumed before any work starts. This budget constraint is the primary reason the 30-skill curation threshold exists and why description length is a design decision, not a style preference.

The mechanism: at session start, Claude Code reads every installed skill's name and description and loads them into the system prompt as the metadata layer. This is the first layer of progressive disclosure, the lightest-weight representation of each skill. The token cost is paid regardless of whether the skill activates during that session.

How Does the 100-Token Estimate Break Down?

The 100-token figure comes from three components loaded at session start: the skill name, the description field, and Claude Code's structural metadata wrapper. A description of 100-150 characters combined with name and overhead totals 80-120 tokens. Across typical production skill descriptions, 100 is the observed midpoint (Claude Code Skill Engineering research, 2026).

Each skill's metadata footprint includes:

The name field (typically 3-15 characters)
The description field (up to 1,024 characters, but ideally 100-200 characters for token efficiency)
Structural metadata added by Claude Code's skill loader

A description of 100-150 characters, combined with the name and structural overhead, totals approximately 80-120 tokens. The 100-token figure is the midpoint across typical production skill descriptions (Claude Code Skill Engineering research, 2026).

Long descriptions cost proportionally more. A 500-character description costs roughly 125 tokens. A 1,000-character description costs roughly 250 tokens. This is the token argument for keeping descriptions concise: a shorter description that routes accurately is strictly better than a longer one that routes accurately. The extra tokens buy nothing.

What Is the Total Skill Description Budget?

Claude Code enforces a system prompt character budget for all skill descriptions combined, capped at approximately 15,000 characters (Claude Code documentation). At 100 characters per description, that is room for 150 skills before the budget is exceeded. At 200 characters per description, the budget allows 75 skills.

The budget cap forces a choice: limit the number of skills, limit description length, or both. The right answer is both. Cap descriptions at 150 characters. Cap the library at 30-50 active skills. What exceeds those limits is a library with duplication, deprecated skills that were never removed, or skills that should be components of a broader skill rather than standalone files.

"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." — Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)

This applies directly to skill metadata. A bloated metadata layer is not just expensive. It pushes instructions further into context, reducing how reliably Claude follows them. Liu et al. measured a roughly 20 percentage point accuracy drop when relevant content moved from position 1 to position 10 in a 20-document context (ArXiv 2307.03172). The token cost is visible. The reliability degradation is harder to see but more damaging.

Why Does the 30-Skill Threshold Exist?

At roughly 30 skills, two problems converge: the metadata cost becomes significant at typical description lengths (approaching 3,000 tokens), and Claude's ability to distinguish between skills with overlapping trigger conditions starts to degrade. The 3,000-token cost is a hard budget hit; the routing degradation is subtler but compounds across every session.

With 10 skills, Claude's discovery mechanism accurately identifies the right skill for nearly any prompt. With 30 skills, especially if several have similar trigger language, false positives and missed activations appear. At 50 skills, description conflicts are common enough to require active curation (Claude Code Skill Engineering research, 2026).

The 30-skill threshold is not a technical ceiling. It is the empirical point at which the library requires active management rather than passive growth. At Claude Sonnet input pricing of $3.00 per million tokens (Anthropic, May 2026), a 3,000-token startup context costs $0.009 per session. Across 1,000 developers running 10 sessions per day, that is approximately $2,700 per month in dead-weight startup overhead before any user prompt is processed. Claude's paid-plan context window is 200,000 tokens (Anthropic, 2026). A 30-skill library consuming 3,000 tokens at startup uses 1.5% of that window before a single user message is sent.

Above 30 skills, the correct response is not to stop building. It is to audit. Remove deprecated skills. Merge skills with overlapping purposes into one. Move rarely-used skills to user-level instead of project-level. A pruned 20-skill library outperforms an untouched 50-skill library on both token cost and activation accuracy.

For guidance on the right library size, see How Many Skills Can I Have Before Performance Degrades?.

How Does Token Cost Constrain Library Architecture?

Startup token cost shapes three concrete decisions: how long descriptions can be, whether to consolidate overlapping skills, and where to install skills on the filesystem. Each decision has a recurring cost that compounds across every developer session. Getting all three wrong on a 30-skill library adds hundreds of wasted tokens per session, every session, across every project.

Description length targets: Treat 150 characters as the target, not the limit. Every character over 150 is tokens paid every session, for every developer, on every project. A description that says exactly what it needs to say in 100 characters is worth more than one that explains it beautifully in 400.
Skill consolidation over proliferation: Two skills with related purposes each pay 100 tokens at startup. One consolidated skill that handles both purposes pays 100 tokens once. If consolidation does not reduce activation accuracy, it is preferable from a token economics standpoint. This is why the skill count ceiling matters: each new skill is a recurring cost, not a one-time investment.
User-level versus project-level placement: User-level skills load for every project a developer opens. A skill that is genuinely universal belongs at user level and costs tokens on every session regardless of project relevance. A skill specific to one project belongs at project level. Misplaced skills at user level add startup cost to sessions where they will never activate.

Chroma's Context Rot research (2025) confirmed that performance degrades consistently as input length grows, with even a single irrelevant addition reducing accuracy relative to a focused prompt baseline. The effect is structural, not task-specific. A bloated skill metadata layer is exactly that: irrelevant additions paid every session.

We treat the startup token budget as a hard design constraint. In client builds with 15+ skills, we audit the total description footprint before delivery: count every description's character length, estimate the token cost, and flag any descriptions that exceed 200 characters. A 30-skill library where half the descriptions run 300+ characters is a library built without this constraint in mind, and it will underperform.

For a deeper look at how the progressive disclosure architecture manages token costs across the full skill load cycle, see How Does Progressive Disclosure Save Tokens and Improve Performance?.

What Happens When the Description Budget Is Exceeded?

When the total character count of all skill descriptions exceeds the system prompt budget, Claude Code trims descriptions or excludes skills from the metadata layer entirely. The trimming is not predictable from the skill author's perspective. The result is skills that appear to vanish: they exist on disk but do not appear in /skills listings and do not activate.

The fix is to reduce total description character count, either by shortening individual descriptions or removing skills. The budget ceiling is a hard architectural constraint. Exceeding it produces silent failures.

How Do I Measure My Library's Token Footprint?

Count the characters in each skill's description field. Divide by 4 to estimate tokens. Sum across all installed skills. Add 20-30 tokens per skill for structural overhead. This gives a reliable estimate of your library's total startup token cost in under a minute.

# Count description lengths across your skill library
grep -r "^description:" .claude/skills/ | awk -F'"' '{print length($2), $1}'

The 4-characters-per-token ratio follows byte-pair encoding, the tokenization method used by Claude and most modern LLMs (Anthropic documentation; Sennrich et al., 2016). A library where any single description exceeds 400 characters needs a rewrite. A library where total description tokens exceed 3,000 tokens needs an audit.

FAQ

For libraries under 50 skills, description token costs are not a meaningful constraint on their own. The full index stays well under 5,000 tokens and consumes a small fraction of Claude Sonnet's context window. The questions below address specific measurement scenarios, edge cases, and the tradeoffs that appear at higher skill counts.

How do I check my library's total description token cost?

Sum the character lengths of all description fields across installed skills. Divide by 4 to estimate tokens. Claude Code's SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable exposes the current character budget for debugging purposes.

Does the skill body (the instructions) also load at startup?

No. The skill body loads only when the skill activates. The startup cost is metadata-only: name and description. This is the core of the progressive disclosure architecture. The body's token cost is paid when the skill activates, not at session start.

Is 100 tokens per skill an exact number or an estimate?

An estimate based on observed behavior across typical skill configurations (Claude Code Skill Engineering research, 2026). The exact figure depends on description length, name length, and Claude Code's internal metadata formatting. A skill with a 50-character description costs closer to 70 tokens. A skill with a 300-character description costs closer to 130 tokens.

Do reference files add to the startup cost?

No. Reference files are loaded on demand when the skill body instructs Claude to read them. They have no startup cost. Their token cost is paid only when they are read, during skill execution.

At what skill count should I start actively managing token costs?

At 20 skills with average descriptions, you're consuming roughly 2,000 startup tokens. Manageable, but worth tracking. At 30 skills, active management starts. Above 40 skills, the cost is affecting session budget and description conflicts are affecting activation accuracy.

Should I use the shortest possible descriptions to minimize token costs?

Not at the expense of routing accuracy. A 30-character description that fails to activate correctly costs more in misdirected sessions than a 150-character description that routes precisely. Minimize description length within the constraint that activation accuracy is maintained. That is the design target, not minimum character count as an end in itself.

Last updated: 2026-05-03