Several focused skills almost always outperform one complex skill. A Claude Code skill built to do one thing well delivers two properties that complex skills lose: trigger precision (the description matches the right request and nothing else) and composability (the skill chains cleanly with others in a workflow). A skill that tries to do five things produces a description so broad it either over-triggers or forces the user to remember which part of the skill they want.

TL;DR: Default to narrower skills with clear, single-domain purposes. A complex skill can always be split later. A library of 8 focused skills is easier to maintain, trigger, and compose than a library of 2 omnibus skills. The exception is a workflow so tightly sequential that every step depends on all previous outputs: split those only at natural handoff points.

Why do complex skills fail at scale?

The failure mode is not that complex skills break on day one. They degrade over time: each added edge case broadens the description field, each branching instruction lengthens the SKILL.md, and six months later Claude is either triggering the skill constantly or ignoring it entirely because no single invocation scenario fits anymore.

A skill starts with a clear purpose. The developer adds an edge case. Then another. Then a branching instruction for a special format. Six months later the SKILL.md is 400 lines long, the description field attempts to cover 8 different activation scenarios, and Claude is either triggering it constantly or never triggering it when it should.

At AEM, we've refactored more than a dozen complex skills that clients built before understanding how Claude's activation mechanism works. In every case, the problem was the same: the developer added complexity to the skill instead of creating a new skill for the new requirement. The result was a description so broad it matched everything, and instructions so long they exceeded the context budget for a single invocation (source: AEM commission analysis, 2026, skill refactoring data).

"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." — Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)

Long skill files are long contexts. The instructions in the middle get followed inconsistently. The fix is not to write better long instructions — it is to write shorter, focused skill files.

What is the AEM design rule for splitting a skill?

The split test: if you can name the distinct purpose of each part of the skill independently, and each part has value on its own, split them. If all parts serve the same unified purpose and are meaningless in isolation, keep them together.

Applied examples:

  • A skill with a "draft" section and a "review" section: split. Drafting and reviewing are independently useful. A developer might want to draft without reviewing or review without drafting. These are two skills.
  • A skill with a 5-step code review checklist: keep together. Each step is part of one review workflow. The steps are not independently useful; they form a single process.
  • A skill with instructions for three different output formats: split into three skills or use a single skill with a clear format selection step. The formats serve different user needs and should have different trigger conditions.

The naming test reinforces this: if you cannot give each part a clear, distinct name that tells the user exactly when to use it, the split is not clean enough to be worth doing. Instruction specificity compounds: research shows a weaker model with a tightly constrained prompt consistently outperforms a stronger model given vague instructions (AGENTIF benchmark, Tsinghua University, 2024). Splitting a skill forces that constraint to exist.

What are the four advantages of focused skills?

Focused skills deliver four concrete advantages over complex skills: trigger precision, maintainability, composability, and context efficiency. Each compounds over the lifetime of a skill library. A library built on focused skills is cheaper to maintain, easier to extend, and more reliable in production than one built around omnibus skills that attempt multiple workflows.

  1. Trigger precision: Claude matches a request to a skill based on the description. A narrow description ("Performs a structured code security review checking OWASP Top 10, authentication patterns, and input validation") matches precisely. A broad description ("Helps with code quality and documentation and style and security") matches everything, activating when it should not and competing with other skills that should activate instead. A 650-trial study found directive, narrow descriptions achieve 94–100% activation rates, while passive broad descriptions drop to 37–87% (Ivan Seleznov, Medium, 2026).
  2. Maintainability: A skill that does one thing has a clear quality bar: does it do that one thing well? A skill that does five things has five quality bars, and improvements to one dimension create regressions in another. Focused skills are easier to test, easier to improve, and easier to deprecate when the workflow changes.
  3. Composability: A library of focused skills can be combined in conversation. Claude activates skill A for step 1, then skill B for step 2, then skill C for step 3. A complex single skill cannot be partially activated: you get all of it or none of it.
  4. Context efficiency: Claude Code uses progressive disclosure: skill bodies are only loaded when the skill activates. A focused skill loads exactly the instructions relevant to the current task. A complex skill loads all of its instructions even when only one section applies.

"When you give a model an explicit output format with examples, consistency goes from around 60% to over 95% in our benchmarks." — Addy Osmani, Engineering Director, Google Chrome (2024)

The same logic applies at the skill level. A focused skill with a clear output contract produces consistent results. An omnibus skill with 8 output formats produces inconsistency because Claude is always resolving which format applies.

When should you keep complexity in one skill?

Two cases justify keeping complexity in a single skill. The first is a tightly sequential workflow where each step depends entirely on the previous step's output and none of the steps has independent value. The second is controlled variation, where the same domain expertise produces the same artifact in multiple formats with a clear selection step.

  1. Tightly sequential workflows: A workflow where each step depends on the complete output of the previous step, and the steps are never used independently, can be kept in one skill. A specific data transformation pipeline with 6 steps that always run in sequence, with each step receiving the previous step's output as input, is one workflow. Split it and you lose the output passing.
  2. Controlled variation: A skill that produces the same artifact in 3 formats (Slack message, email, internal document) can keep all three format options in one skill if the selection logic is clear. "Ask the user which format they need, then produce the appropriate version." This is one skill, not three, because the domain expertise is identical across formats.

Outside these cases, default to splitting. The long-term maintenance cost of a complex skill is almost always higher than the setup cost of two focused skills.

For the skill extraction process in detail, see If I Copy the Same Instructions Across Projects, Should That Become a Skill?.

How many skills is too many?

There is no hard ceiling. Claude's system prompt budget for skill descriptions constrains how many skills can be loaded simultaneously, but with well-written descriptions (under 1,024 characters each), a library of 25-30 skills is practical. For individual developers, 20 well-focused skills is the practical ceiling: description overlap at higher counts creates trigger conflicts regardless of description quality. Beyond that range, description overlap drives trigger conflicts before the token budget becomes the binding constraint (source: Claude Code documentation, Anthropic, 2025).

The constraint is technical as much as cognitive. Claude Code's available_skills section caps at approximately 15,500–16,000 characters total. With typical 263-character descriptions, roughly 42 skills fit; install 63 skills and 21 of them (33%) are silently hidden from Claude with no warning — they cannot be discovered or invoked (source: Alexey Pelykh, GitHub skills budget research, 2025). A library of 40 skills with overlapping descriptions creates both a technical visibility gap and a trigger conflict risk that outweighs the value of having every possible workflow covered. At AEM, our recommended ceiling for a personal skill library is 20 well-focused skills. For team libraries, 15-25 skills covering the most repeated workflows, with quarterly reviews to retire skills that are no longer used.

Focused skills don't eliminate trigger conflicts. A library of 20 poorly described narrow skills creates the same over-triggering problem as one broad skill: without distinct, directive descriptions, Claude cannot resolve which skill applies. The focused-skill pattern solves scope complexity; it does not substitute for description quality.

More than that and the library becomes a codebase. It needs its own documentation, testing, and governance. That overhead is appropriate for large engineering teams. For most individual developers and small teams, 20 focused, tested skills is the production bar.

For a deeper look at skill organization and naming, see When Should I Use a Skill Instead of Writing Instructions in CLAUDE.md?.

Frequently Asked Questions

For most Claude Code skill libraries, narrow scope and a single output contract are the right defaults. A focused skill with a description under 1,024 characters, instructions under 1,500 words, and one clearly named trigger condition will outperform a complex skill in trigger reliability, maintainability, and context efficiency across every production measure we track.

What is the right length for a focused skill? Between 500 and 1,500 words for the SKILL.md body, excluding reference files. Under 500 words and the skill is likely too vague to be reliable. Over 1,500 words and the skill is likely trying to cover too much. Reference files handle domain knowledge without bloating the main skill body.

Can I have a single SKILL.md with sections for different users? You can, but it is usually a sign that you need separate skills. A skill with "if the user is a developer, do X; if they are a designer, do Y" has two user profiles with potentially different trigger conditions, different workflows, and different output contracts. These are two skills.

What happens if I do not split a skill that should be split? The behavior degrades gradually. Trigger reliability drops because the description is trying to cover too many scenarios. Sandboxed eval research found that without description optimization, baseline skill activation runs at approximately 50%; overly broad descriptions that try to cover multiple use cases land in that same degraded range (Scott Spence, sandboxed evals, 2026). Instruction following becomes inconsistent because the model is choosing between competing instructions in a long context. Users start invoking the skill explicitly instead of letting it trigger naturally, which signals the description is not working.

Is it OK to start with one complex skill and split later? Yes. Building one skill first and splitting when you understand the distinct use cases is a valid development pattern. The risk is that a complex skill accumulates dependencies — instructions that reference each other, output formats that feed each other — that make splitting harder later. Splitting earlier is always easier.

Do focused skills cost more in API tokens than one complex skill? No, usually less. Progressive disclosure means each skill body is loaded only when that skill activates. A library of 10 focused skills costs 1 skill body's worth of tokens per invocation. A single complex skill costs the whole skill body every time any part of it is needed.

Should each skill have its own reference file set? Yes, if the skill requires domain knowledge that is specific to its workflow. A code security review skill needs security reference files. A documentation generation skill needs style guides and format references. Shared domain knowledge can live in a common reference file that multiple skills point to.

Last updated: 2026-05-06