The Complete Guide to Building Claude Code Skills in 2026

Quick answer: A Claude Code skill is a SKILL.md file that gives Claude a named, triggerable capability with defined inputs, outputs, and process steps. It requires four components: a description that activates consistently, a structured process Claude follows in sequence, an output contract that constrains results, and a test suite that proves it works before the team touches it.

What is a Claude Code skill and what problem does it actually solve?

A Claude Code skill is a SKILL.md file installed in your .claude/skills/ directory that adds a named, reusable capability to Claude Code — it gives Claude a defined trigger, a sequential process to execute, and an output contract that constrains results, so every team member gets consistent behavior without re-entering instructions each session. When you invoke it — by name with /skill-name or through natural language matching the description — Claude loads the file, follows the defined process, and produces output in the specified format.

The problem it solves is systematic repetition. Every time you paste the same instructions into a new Claude session, you're doing manual work that a skill handles automatically. In a typical development week, the same project context gets re-entered into Claude 6 to 8 times (AEM production analysis, 2026). A skill brings that count to zero.

There are over 400,000+ skills in the Claude Code community (Source: SkillKit community repository, github.com/rohitg00/awesome-claude-code-toolkit). A large portion of them are prompts with a file extension. They trigger sometimes, follow inconsistent paths, and produce different output on Tuesday than on Wednesday. A real skill has four non-negotiable components:

A trigger (the description that activates it)
A defined process (numbered steps Claude executes in sequence)
An output contract (what it produces and explicitly does not produce)
A test suite (proof it works before the team touches it)

Without all four, you have a fair-weather skill.

A skill that only works when you're the one typing is a well-formatted prompt, not a production skill.

For a detailed breakdown of what separates skills from prompts and agents in Claude Code, see What is a Claude Code skill?.

How does the SKILL.md file structure work?

A SKILL.md file has two required parts — YAML frontmatter and a body — where the frontmatter holds the name and description that Claude uses to discover and activate the skill, while the body holds the sequential process steps, output contract, rules, and optional self-improvement section that determine exactly what Claude does when the skill runs.

The minimum viable frontmatter:

---
name: skill-name
description: "One-line trigger description — under 1,024 characters"
---

The body contains process steps, an output contract, rules, and optionally a self-improvement section. The 500-line total limit exists because Claude Code loads all skill metadata at startup (Claude Code specification, 2026). A 900-line SKILL.md does not just cost more tokens, it competes with your other skills for Claude's working context and reduces selection accuracy across the library.

Recommended body structure, in order:

What this skill does (1-2 sentences, not a paragraph)
When to use it and when not to
Process steps (numbered, sequential)
Output contract (what it produces AND what it does NOT produce)
Rules and constraints
Self-improvement section (optional, captures reviewer feedback for future versions)

Reference files belong in a references/ subdirectory alongside the SKILL.md. One structural rule applies: reference files cannot point to other reference files. The one-level-deep constraint exists because recursive reference loading produces unpredictable context window behavior. Deep chains defeat structured retrieval and make skill behavior non-reproducible.

A well-built skill keeps SKILL.md under 150 lines and uses reference files for domain knowledge that would bloat the main file. AEM analysis of 120 production skills found that skills exceeding 200 lines had a 34% higher rate of step-skipping failures than skills under 150 lines (AEM internal analysis, 2026). The distinction matters: SKILL.md holds process knowledge (what steps to take), reference files hold domain knowledge (facts needed to execute those steps correctly).

See What goes in a SKILL.md file? for a field-by-field breakdown of every SKILL.md section and the reasoning behind each one.

How do I write a skill description that triggers reliably?

The description field is the single highest-leverage element in a Claude Code skill because it controls activation: a description that fails to trigger means the skill never runs, and a description that activates on the wrong input creates workflow noise that erodes trust in the entire library. Getting it right requires three specific elements — trigger phrase, anti-trigger, and core action.

AEM testing across 650 activation trials found that imperative descriptions achieve 100% activation rates. Passive descriptions sit at 77% (AEM internal testing, 2026). That 23% gap translates directly to user trust. A skill that fails to activate 1 in 5 times gets abandoned, regardless of how good the underlying instructions are.

An imperative description follows this pattern:

"Use this skill when [trigger condition]. Do NOT use this skill for [anti-trigger].
[Core action the skill takes]."

Compare these two:

Weak: "A skill for writing commit messages."

Production: "Use this skill when the user wants to create a git commit message. Do NOT use this skill for PR descriptions, changelogs, or release notes. Reads staged changes and outputs a Conventional Commits-format message with type, scope, and body."

The production version has three required elements: trigger phrase, anti-trigger, and core action. The weak version has none of them, it describes the skill's topic, not its activation conditions. Those are different things.

Technical constraint: the description must stay on a single line in the frontmatter. Multi-line descriptions break YAML parsing and Claude Code treats the skill as malformed. If a code formatter like Prettier reformats it to multiple lines, the skill fails silently on every subsequent session. Wrap in double quotes and configure your formatter to skip frontmatter blocks.

The character limit is 1,024 per description (Claude Code specification, 2026). At library scale, descriptions also compete for the 15,000-character system prompt budget allocated to skill discovery. Keep each description under 200 characters where the trigger is unambiguous, shorter descriptions leave room for more skills in the library.

For the full guide to description optimization and trigger testing, see What does the description field do in a Claude Code skill?.

How do I write process steps that Claude actually follows?

Process steps are the operational core of your skill — they define what Claude does, in what order, and with which tools — and three requirements produce steps that work consistently across users and model versions. Missing any one produces different output on different runs for different team members.

Sequence clarity: Numbered steps in execution order. If steps can run in parallel, state it explicitly: "Steps 3 and 4 can execute in parallel." Claude treats unordered lists as optional suggestions. Numbered lists get followed as sequences. The distinction is not subtle, it is the difference between a skill that completes in one pass and one that skips sections based on Claude's inference about what matters.

Tool specificity: Name the tool when one is required. "Read the file" is ambiguous. "Use the Read tool to load the file at the path the user provided" removes that ambiguity. Named tools produce stable output. Leaving tool selection to Claude's judgment produces variation across sessions, across models, and across users (AEM production data, 2026).

Failure handling: Tell Claude what to do when a step fails or the input is unexpected. "If the file doesn't exist, ask the user for the correct path before continuing" is a complete instruction. Leaving it unhandled means Claude improvises. In AEM production audits, 68% of skill inconsistency reports traced back to unhandled edge cases in process steps, not errors in core logic (AEM production audits, 2026). Improvised error handling is the leading source of skill inconsistency in production environments.

A production-quality step:

Step 3: Use the Grep tool to search for all TODO comments in the codebase.
Search pattern: "TODO|FIXME|HACK". Include all .ts and .js files.
If no results are found, output "No TODO items found" and stop.

That step covers all four requirements:

Names the tool (Grep)
Specifies the search pattern ("TODO|FIXME|HACK")
Scopes the file types (.ts and .js)
Handles the empty result case ("output 'No TODO items found' and stop")

Four lines. Zero ambiguity.

What to avoid in process steps:

Steps that assume context Claude does not have ("Review the project requirements", where are they stored?)
Multi-action steps bundled into one ("Analyze the code and suggest improvements and write the tests")
Reasoning principles disguised as steps ("Think carefully about edge cases", this is not an executable instruction)

For the full step-writing guide with patterns for parallel execution, decision branches, and user-confirmation gates, see How do I write step-by-step instructions for a Claude Code skill?.

How do I test a Claude Code skill before the team uses it?

A skill clears the production bar when it produces expected output for at least 5 distinct inputs — including edge cases — without instruction edits between runs; AEM validates this through four sequential checkpoints where each checkpoint catches the failure mode the previous one cannot see:

Trigger test
Process adherence
Output contract verification
Edge case stress testing

Checkpoint 1: Trigger test. Invoke the skill using natural language that matches the description. Confirm it activates. Then use inputs that should NOT trigger it (your anti-trigger cases). Confirm it stays silent. If a skill triggers on inputs it should ignore, the library becomes unreliable.

Checkpoint 2: Process adherence test. Walk through each numbered step during a live test run. Verify Claude executes them in order without skipping. If a step gets skipped, the instruction is too soft. Add "Always complete this step before moving to the next" or make the step's output a prerequisite for step N+1.

Checkpoint 3: Output contract test. Verify the output exactly matches the format defined in the output contract. Missing fields, extra unrequested sections, or format variations are contract violations. Fix the output contract or the process instructions until they align (AEM quality protocol, 2026).

Checkpoint 4: Edge case test. Run the skill on malformed input, missing files, empty search results, and boundary conditions. This is where fair-weather skills fail. A skill that passes checkpoints 1 through 3 but fails here is not production-ready.

The most common source of checkpoint 4 failure: process steps cover only the happy path. Adding explicit failure handling for the 2-3 most likely edge cases costs 4 to 6 lines in SKILL.md. Skipping it costs user trust every time that edge case appears in a real session.

What makes a skill production-ready rather than a fair-weather skill?

A production-ready skill works reliably for every team member — not just the engineer who built it — because the description alone triggers it correctly on a first attempt, the process steps produce consistent output across Haiku, Sonnet, and Opus, and the output contract explicitly prevents Claude from appending unrequested content that would break downstream consumers. The gap between "works for me" and "works for everyone" is almost entirely documentation, not technical complexity: the skill file must be explicit enough that someone new can trigger it correctly, follow it to completion, and receive the expected output on the first attempt.

Three diagnostic signs you have a fair-weather skill:

It works when you run it but fails when a colleague tries it
It requires verbal coaching before someone can trigger it correctly
Output varies enough between runs that manual review is always necessary

"When Claude ignores a skill, the failure can happen in two places. The first is activation: Claude does not invoke the skill at all and defaults to its own approach. The second is execution: Claude loads the skill but skips steps inside it." — Marc Bara, Project Management Consultant (March 2026, https://medium.com/@marc.bara.iniesta/claude-skills-have-two-reliability-problems-not-one-299401842ca8)

A production-ready skill is self-documenting. The output contract's "does NOT produce" list prevents Claude from appending unrequested content that breaks downstream consumers.

The fastest diagnostic: hand the skill to someone who has never seen it. Watch what happens without explaining anything. Every point of failure is a gap in the skill file, not a gap in the user's knowledge:

Wrong trigger (description did not activate correctly)
Skipped step (process instruction was too soft)
Unexpected output format (output contract was incomplete)

AEM onboarding data shows that new team members successfully trigger a production-grade skill on the first attempt 94% of the time, versus 41% for fair-weather skills (AEM onboarding data, 2026). Production skills close those gaps in the file. Fair-weather skills close them in Slack messages.

How do I scale from one skill to a working library?

A skill library scales when every skill has a clear, non-overlapping description — description overlap is the primary failure mode at library scale — so the first architectural decision is assigning each skill a distinct vocabulary of trigger verbs, because two descriptions sharing the same trigger phrase create ambiguity Claude resolves arbitrarily, and three or four misfires destroy user trust in the entire library. AEM library audits found that description overlap accounts for 61% of trigger misfires in libraries with more than 15 active skills (AEM production analysis, 2026).

Token economics constrain library size in practice. Each skill adds approximately 100 tokens of metadata overhead at startup (Claude Code specification, 2026). At 30 to 40 active skills, a library starts competing with itself for context. Discovery reliability drops as Claude's selection process evaluates more candidates per query.

Practical guidelines for a library that stays functional:

10 to 30 active skills per project is the functional range
Each skill covers one well-defined, named task
Descriptions use distinct vocabulary, no two descriptions share the same key trigger verb
Archive skills unused in 90 days rather than leaving them active

This pattern works for single-domain libraries. For cross-domain orchestration across multiple product areas, route to specialized skill sets via subagent architecture rather than loading everything into a single session. At 50+ skills across multiple domains, you need routing logic, not a longer library.

What are the most common mistakes in first skills?

The first skill most engineers build fails checkpoint 4 — not because the instructions are wrong, but because they cover only the inputs the author had in mind — making deliberate edge case authoring before the skill ships the single most effective quality intervention: any input not specifically tested becomes a live failure in production, visible to every team member who was not the author. First skills consistently break on the same three inputs:

Empty results (no output from a search or read operation)
Malformed input (content that doesn't match the expected format)
Missing files (a path reference that doesn't exist at runtime)

The second most common mistake: a description that is too broad. "Use this skill when working with code" triggers on nearly everything. The skill becomes a noise generator that Claude activates for unrelated tasks, and the user disables it.

The third: no output contract. Without one, Claude decides what to include. The output format varies across runs. Downstream processes that depend on consistent structure break. The skill works but creates manual cleanup work every time it runs. In AEM's review of first-skill submissions, 73% of skills that lacked an output contract required at least one manual correction per 10 runs before the author added one (AEM internal analysis, 2026).

Three fixes, applied in order:

Write the output contract before the process steps. Define what the skill produces and what it explicitly does NOT produce. This constrains step design from the start.
Write 5 anti-trigger examples into the description. "Do NOT use this skill for X, Y, or Z." This forces you to think about the boundaries.
Test checkpoint 4 with inputs you did not author. Use inputs from a colleague. The mismatches reveal what you assumed but never specified.

Frequently Asked Questions

The questions below cover the most common points of confusion when building and deploying Claude Code skills — from library sizing and subscription requirements to folder structure, SKILL.md length, and the fastest path to a working first skill.

How many Claude Code skills should I have in one project?

Between 10 and 30 active skills per project is the practical range: below 10, most repetitive tasks remain manual and you are not extracting full value from the system; above 30, description overlap creates trigger conflicts and startup token overhead degrades session quality. Archive skills unused in 90 days instead of keeping them in the active library.

Can I use Claude Code skills without a paid subscription?

Claude Code requires an Anthropic account but does include a free tier with usage limits, so you can get started without a paid subscription — the SKILL.md format is open and writable on any tier, meaning you can author, install, and test skill files locally before committing to a plan. Whether specific model capabilities work as expected depends on your account tier. Skills designed for Opus produce degraded output on Haiku for complex reasoning tasks.

What's the simplest Claude Code skill I can build in 5 minutes?

The simplest working Claude Code skill takes under 5 minutes to write and needs only three things — a name, a one-line description with a trigger phrase, and a single process instruction that tells Claude what to produce — which is enough to get a functional, consistently-triggering skill installed in your .claude/skills/ directory today. Create .claude/skills/quick-summary/SKILL.md with this content:

---
name: quick-summary
description: "Use this skill when asked to summarize a document or text."
---
Read the content the user provides. Output a 3-bullet summary. Each bullet is one complete sentence under 20 words.

That is a working skill. It triggers reliably, follows a defined process, and constrains output. It will not pass checkpoint 4 for edge cases, but it is a functional starting point you can harden over time.

Do Claude Code skills work in VS Code or only in the terminal?

Claude Code skills work across all Claude Code surfaces — because the skill system operates at the Claude Code layer, not the interface layer, any Claude Code installation reads SKILL.md files from the .claude/skills/ directory regardless of which editor surface you are working in:

CLI
VS Code extension
JetBrains extension
Claude desktop app

What happens if I put SKILL.md in the wrong folder?

If you put SKILL.md in the wrong folder the skill does not load and fails silently — Claude Code scans only .claude/skills/[skill-name]/SKILL.md at the project level and ~/.claude/skills/[skill-name]/SKILL.md at the user level, which means any file placed at .claude/SKILL.md or .claude/skills/SKILL.md without a named subdirectory is invisible to the discovery system entirely.

Should I use first person or second person in SKILL.md instructions?

Use second person for process steps directed at Claude — "Read the file," "Ask the user," "Output the result" — because first person is ambiguous inside Claude's context window: "I should read the file" does not specify who "I" refers to, and imperative second person removes that ambiguity entirely, producing steps Claude treats as direct instructions rather than descriptive statements.

My SKILL.md is over 500 lines. How do I refactor it?

A SKILL.md over 500 lines is almost always fixable by moving domain knowledge to reference files in references/, splitting it into two skills with separate descriptions if it covers more than one distinct task, or both — a file that long is doing too much, and 500+ lines of process instructions is usually two or three skills sharing one file.

Last updated: 2026-04-13