There is a skill count past which performance starts to slip. It is not a hard cliff. It is a gradual fog. Agent Engineer Master (AEM) identified the threshold at 30 skills through production skill engineering research conducted across real Claude Code libraries.

TL;DR: Claude Code skill performance starts degrading noticeably around 30 installed skills. The mechanism is context budget, not a hard limit: each skill description costs roughly 100 tokens at startup, and as the total rises, Claude allocates less attention to your actual request. The fix is curation, not removal.


What is the performance degradation threshold for Claude Code skills?

Research on Claude Code skill libraries put the practical curation threshold at approximately 30 skills (2026-03-29 Claude Code skill engineering research synthesis). Below that number, most developers report no noticeable degradation. Above it, symptoms start appearing gradually: trigger misfires first, then partial instruction following, then cross-skill confusion as the startup context fills with competing descriptions.

The threshold is not a product limit. Claude Code does not refuse to load 31 skills. The degradation is a consequence of context window economics: every skill description loaded at startup consumes space that would otherwise be available for your actual work.

At 30 skills with average descriptions, you are consuming roughly 3,000 tokens just in skill metadata (AEM skill engineering research synthesis, 2026-03-29). That is real attention Claude is not spending on your code, your request, or the reference files your skills need to load.

How does skill count affect Claude's attention?

Claude Code loads skills in layers. At startup, the metadata layer reads every skill description, roughly 100 tokens per skill (Claude Code architecture docs, 2024). This is how Claude knows which skill to invoke for which task. The descriptions load into the system prompt as a block, before any conversational context.

The system prompt has a character budget controlled by the SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable, with a default of approximately 15,000 characters (Claude Code architecture docs, 2024). At 100 tokens per description (roughly 75 characters at typical English density), you can fit about 200 descriptions before hitting the hard limit. But hitting the soft degradation point happens much earlier.

The reason is attention distribution, not just space. Research on large language models in long contexts found that models lose track of instructions placed in the middle of long contexts at a rate that makes mid-context policy placement unreliable, with performance dropping more than 20 percentage points when relevant information moves from context edges to middle positions (Nelson Liu et al., Stanford NLP Group, "Lost in the Middle," 2023, ArXiv 2307.03172). Skill descriptions face the same dynamic. As the total grows, later-loaded descriptions receive proportionally less attention.

"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." — Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)

A separate study found that code generation performance drops 47.6% at 30K tokens even when all relevant evidence is perfectly retrievable (Du et al., "Context Length Alone Hurts LLM Performance Despite Perfect Retrieval," 2024, ArXiv 2510.05381). Skill descriptions create the same pressure at a much smaller scale.

What are the early symptoms of too many skills?

The symptoms appear in a specific order as skill count rises past 30. Inconsistent auto-triggering is the first signal: skills that fired reliably at 15 total start misfiring at 35. Partial instruction following comes next. Cross-skill confusion and reference-file performance issues appear only at higher counts.

  • First symptom: inconsistent auto-triggering. Skills that reliably activated at 15 total start misfiring at 35. The description is unchanged, but Claude is distributing less attention to each one during classification.
  • Second symptom: partial instruction following. A skill with eight steps starts reliably executing only the first five. The later steps are not broken; they are being crowded out by context pressure from other loaded skills.
  • Third symptom: cross-skill confusion. Skills with overlapping domains start interfering with each other. A code review skill and a documentation skill both respond to "review this," but with 40 skills loaded, the classifier makes less reliable choices about which one fits the current request.
  • Fourth symptom: slower response to complex skills. Skills that load reference files start performing inconsistently because the reference file content compounds an already-crowded context.

Research on LLM multi-instance processing confirms that instance count has a stronger degradation effect than context length alone: models show slight degradation for 20–100 concurrent instructions, then collapse at higher counts (Clauw et al., "Understanding LLM Performance Degradation in Multi-Instance Processing," 2026, ArXiv 2603.22608). Each loaded skill description is a concurrent instruction instance.

At 30 skills, you won't see all four symptoms. You'll see the first one occasionally. That occasional misfire is the signal to audit.

In our skill engineering work at Agent Engineer Master, we flag any project with more than 25 installed skills for a curation review. The question is not "which skills do we remove" but "which skills are still earning their place in the startup context."

How do you manage a library approaching the threshold?

Curation means asking whether each skill is actively used, not whether it is theoretically useful. A skill you invoked twice last quarter is not earning its 100-token startup cost. The four steps below reduce a bloated library to a tight one without removing capabilities you actually need.

  1. Audit for recency. If a skill has not been invoked in 30 days, remove it or archive it (AEM skill engineering research synthesis, 2026-03-29). An installed skill costs tokens regardless of whether anyone uses it.
  2. Consolidate overlapping skills. Two skills with similar trigger conditions and complementary outputs are often better as one skill with a broader scope. The startup cost drops by half; the capability stays the same.
  3. Move personal skills to user-level. If a skill is only used by one person, it should not be in the project-level library where it loads for the entire team. A team of five with 20 shared skills and 5 personal skills each has effectively 20 team skills; the personal ones should not inflate that count.
  4. Split large skills with progressive disclosure. A 400-line SKILL.md with domain knowledge embedded inline is a better candidate for refactoring than removal. The core logic stays in SKILL.md; the reference material moves to external files loaded on demand. The description stays short; the startup token cost drops.

The 30-skill threshold is an average, not a universal rule. Opus handles higher skill counts more gracefully than Haiku. Teams with short, tight descriptions can sustain higher counts than teams with verbose ones. The right number for your library is the highest count at which all skills trigger reliably and execute completely.

For the mechanics behind why descriptions are the main driver of this cost, see How Do Skills Load Into Claude's Context Window?. For the full architecture of progressive disclosure and how it keeps startup costs low, see the pillar Progressive Disclosure: How Production Skills Manage Token Economics.

Does progressive disclosure solve the problem?

Partially. Progressive disclosure keeps the per-skill startup cost at approximately 100 tokens because only the description loads at session start. The full SKILL.md body loads only when the skill is invoked. Reference files load on demand within a skill run.

This architecture delays the context cost; it does not eliminate it. With 50 skills installed, you still load 50 descriptions at startup, roughly 5,000 tokens (AEM skill engineering research synthesis, 2026-03-29), regardless of how tightly those skills are written. Progressive disclosure helps you get more value per token spent on that description, but it does not compress the 50-description overhead to zero.

The practical implication: progressive disclosure makes each skill more efficient, but it does not give you unlimited scalability. The curation threshold still exists.


FAQ: Skill count and performance

Is there a hard limit on how many skills Claude Code can load? No hard cap in the standard configuration. The limit is the system prompt character budget (approximately 15,000 characters by default). Descriptions that push past that budget get truncated, which silently breaks those skills. The performance degradation zone at 30+ skills happens well before the hard truncation limit.

Does skill count affect Claude's speed? Session startup time increases marginally with each skill because Claude reads more descriptions. For most users this is imperceptible. The more relevant impact is quality, not speed: attention dilution causes missed triggers and partial execution, not slower responses.

Should I delete unused skills or just disable them? Delete them. An archived skill still takes up mental space when you're scanning the library. If you think you'll need a skill again, keep the SKILL.md file in a separate archive folder outside .claude/skills/. It is not loaded from there.

Do user-level skills count toward the threshold? Yes. All loaded skill descriptions count against the same system prompt budget, regardless of whether they are project-level or user-level. A developer with 15 user-level personal skills who joins a project with 20 project-level skills has 35 loaded skills total.

How do I know my current skill count? Run /skills in a Claude Code session. The output lists every loaded skill. Count the entries.

Can I increase the skill count budget beyond 30? You can raise the SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable to allow more characters in the system prompt. This extends the hard truncation limit but does not fix the attention dilution problem. A larger budget with 60 skills still produces less reliable triggering than a tighter library with 25 well-curated skills.

Last updated: 2026-05-02