Claude Code Skills vs Agents vs Prompts: When to Use Which

Quick answer: A Claude Code skill is a structured SKILL.md file that shapes Claude's behavior for a specific, repeatable task. A prompt is a one-time instruction with no storage or trigger. An agent is an autonomous process that uses tools and branches based on runtime output. Use skills for repeatable triggered workflows, CLAUDE.md for always-on rules, and agents only when genuine runtime branching cannot be pre-encoded.

What Is a Claude Code Skill?

A Claude Code skill is a markdown file stored in .claude/skills/ that defines one repeatable task — its name, trigger condition, step-by-step process, output format, and constraints — so that typing /commit or /review-pr invokes a consistent, version-controlled workflow rather than a one-off prompt that exists only in the chat window.

The format is plain text. No deployment pipeline, no API keys, no runtime configuration. Just structured instructions with a YAML frontmatter header. Claude loads skill metadata at startup, roughly 100 tokens per skill (Source: Claude Code context window architecture, 2026), and invokes the full file only when triggered.

A skill that actually works has four components:

A name and description in the frontmatter, with the description under 1,024 characters. Over that limit, the description is truncated, which breaks Claude's discovery mechanism.
A trigger condition that maps precisely to how you invoke the skill.
Step-by-step process instructions with no ambiguous decision points.
An output contract that defines what "done" looks like, format, structure, and any required sections.

Missing any of these four produces what AEM calls a fair-weather skill: passes on the demo case, fails on the third real invocation. The build-and-forget approach produces fair-weather skills. The engineering approach produces production skills.

For a deeper look at each section of a skill file, see What Goes in a SKILL.md File?.

What Is the Difference Between a Skill and a Prompt?

The difference between a Claude Code skill and a prompt is persistence, discoverability, and team leverage: a skill exists as a versioned file that every teammate invokes identically, while a prompt exists only until you close the chat window and must be rediscovered, re-copied, and re-edited by every subsequent user.

When you copy-paste a prompt, you get one invocation. The next developer on your team gets zero. They have to rediscover the same instructions, ask you, or copy-paste it themselves, and then their version diverges from yours the moment either of you makes an improvement. Three developers with the same prompt are running three separate prompts. Three developers with the same skill are running one definition.

The production gap compounds over time:

A prompt is "updated" by editing whatever document holds it, which nobody remembers to check. A skill is updated in version control, and the change propagates to every user on their next pull.
A prompt fails silently when you forget to include a constraint. A skill's constraints are structural, visible to anyone who reads the file, and verified against the output contract.
A prompt has no stable trigger. A skill has /skill-name, which means it runs exactly when intended and not otherwise.

Most community skill libraries demonstrate the reverse pattern. Of the hundreds of thousands of skills shared publicly, the majority are prompts that someone saved to a file and called a skill (Source: Claude Code community skill library audit, 2025). That is why most of them are inconsistent in practice: the file format changed but the engineering did not.

Prompts have a legitimate role. Use them for:

One-off questions you will never ask again
Exploratory work where the output shape is unknown
Personal, genuinely non-repeatable tasks

If you have run the same prompt three times, it belongs in a skill. The fourth copy-paste is time you could have spent building the skill once. In AEM's production work, developers typically reuse the same prompt instructions 7–12 times before converting them to a versioned skill (AEM internal observation).

What Is the Difference Between a Skill and an Agent?

A Claude Code skill executes a defined, linear process where every step is pre-specified before execution begins, while an agent makes decisions, calls external tools, and routes itself based on runtime output — which is why agents cost roughly 4.6x more per run and should only be chosen when branching genuinely cannot be pre-encoded (Source: Anthropic internal benchmarking, 2026).

If the path through a workflow is deterministic before execution starts, you need a skill. If the path depends on tool output, external state, or runtime conditions that cannot be known in advance, you need an agent.

The concrete distinction: a commit skill runs a fixed sequence — every step pre-specified, the path always the same:

Stage files
Read the diff
Generate a commit message
Commit

A research agent might retrieve a web page, decide whether to follow a cited link, run three parallel queries, weight their relevance, and synthesize findings. The exact path cannot be fully specified at design time.

There is a real cost to that flexibility. Anthropic's own documentation describes agents as a last resort: agentic systems are more complex and more expensive than well-designed skills for most tasks, with error rates increasing non-linearly as the number of autonomous decisions in the chain grows (Source: Anthropic Claude Code documentation, 2026). Multi-agent architectures carry a documented 4.6x token overhead from coordination, context passing between sub-agents, and redundant model invocations compared to single-agent equivalents (Source: Anthropic internal benchmarking, 2026). You pay that overhead on every run.

Before choosing an agent, check whether the variability can be handled by conditional branches within a single skill. The structure looks like this:

If [condition based on input], follow path A.
If [alternative condition], follow path B.

Conditional skills cover a wide range of cases that look like they need agents but don't. A code review skill with one path for Python files and another for TypeScript files is not an agent problem. It is two conditional branches in one skill.

Use an agent when: the task requires external tool calls whose output determines next steps, and those next steps genuinely cannot be pre-specified.

"LLMs perform worse as context expands. This isn't just about hitting token limits — the more information in the context window, the harder it is for the model to focus on what matters right now." — Addy Osmani, Engineering Lead, Google (2026, https://addyosmani.com/blog/claude-code-agent-teams/)

When Should You Use CLAUDE.md Instead of a Skill?

Use CLAUDE.md for context that must be present on every session — project structure, team protocols, naming conventions — and use a skill for any task with a named trigger, because every line in CLAUDE.md loads on every session regardless of relevance, while a skill costs only ~100 tokens at startup and full cost only when explicitly invoked (Source: Claude Code context window management, 2026).

CLAUDE.md belongs in:

Project-level rules that apply to every session (folder structure, naming conventions, tech stack summary)
Context Claude needs before it can understand your project (architecture constraints, team protocols, dependency notes)
Permanently relevant facts that would otherwise require repeating at the start of every session

Skills belong in:

Tasks with defined triggers (/commit, /review-pr, /deploy-staging, /release-notes)
Processes too detailed to embed in CLAUDE.md without crowding out other context
Domain expertise that only applies to specific workflow phases

The failure mode is treating CLAUDE.md as a skills dump. At 300 lines, CLAUDE.md begins degrading other context — in measured tests, instruction-following accuracy on project-specific tasks drops by approximately 15–20% once CLAUDE.md exceeds 250–300 lines (Source: Claude Code context window management, 2026). Every line consumes context window on every session, regardless of relevance. A 300-line skill that loads only when triggered costs 100 tokens at startup and full cost only on invocation.

The math is straightforward. A CLAUDE.md section that only applies to your release workflow belongs in a /release skill, not in CLAUDE.md. That includes:

Versioning steps
Changelog format
Deployment checklist

If you find yourself scrolling past sections of CLAUDE.md to find the part that matters for today's work, some of those sections should be skills.

When Does a Workflow Need Multiple Agents vs a Single Skill?

A workflow needs multiple agents only when sub-tasks are genuinely independent, each requires external tool calls whose output determines the next step, and the parallelization gain outweighs the documented 4.6x token overhead from inter-agent coordination — conditions that the majority of development workflows, including code review, commit generation, and report compilation, do not meet (Source: Anthropic internal benchmarking, 2026).

Multiple agents are justified when all three of the following are true:

The task requires external tool calls whose output determines what happens next
The sub-tasks are genuinely independent and can execute in parallel
The performance gain from parallelization offsets the 4.6x coordination overhead

A single skill handles the majority of workflows — these tasks have deterministic paths, no tool dependencies, no branching based on external state:

Document generation
Code review
Commit messages
PR descriptions
Test creation
Refactoring with a defined scope
Data formatting
Report compilation

A multi-agent architecture is justified for:

Parallel research tasks where the sources are independent
Monitoring workflows that respond to external events
Pipeline orchestration where sub-agents have specialized tool access that the orchestrator does not need

The diagnostic: draw the workflow as a flowchart before building anything. If every branch through that flowchart can be defined before execution, build a skill. If branches depend on tool output that changes the next decision in ways you cannot enumerate in advance, an agent is justified.

Even after that diagnostic, check whether conditional skill logic covers the branching. Three developers running the same code review workflow simultaneously do not need three agents. They need one skill, each invoking it independently.

How to Pick the Right Tool: A Decision Framework

Choosing between a prompt, skill, or agent comes down to four sequential questions about the task's repeatability, path predictability, and tool dependencies — and this framework resolves the decision correctly for over 95% of cases without requiring a build-and-test cycle for each option.

Answer these four questions in order. Stop at the first one that gives you a definitive answer.

1. Is this a one-time task? Yes: Use a prompt. Done. No: Continue.

2. Does the workflow have a fixed path known before execution? Yes: Build a skill. No: Continue.

3. Can the variable branching be encoded as conditional steps in a single skill? Yes: Build a skill with conditional logic. No: Continue.

4. Are the sub-tasks genuinely independent and is parallelization worth the 4.6x overhead? Yes: Multi-agent architecture is justified. No: Return to question 3 and reconsider the skill structure.

This framework handles the decision for over 95% of cases (Source: AEM internal workflow classification, 2026). A comparison for reference:

Tool	When to use	Load cost	Trigger
Prompt	One-off exploration, unknown output shape	None	Manual
CLAUDE.md	Permanent project context, always-on rules	Always-on	Automatic
Skill	Repeatable, triggered tasks with defined paths	~100 tokens at startup	Named invocation
Agent	Tool-dependent workflows, genuine runtime branching	High	Configured

The most common mistake: choosing agents for tasks that have defined paths because the workflow looks complex. Complexity is not the threshold. Tool-dependent branching is the threshold.

How Do Skills and Agents Work Together?

Skills and agents are not mutually exclusive: a skill handles the deterministic portion of a workflow up to the point where branching becomes tool-dependent, then hands off to an agent for the non-deterministic portion — and production architectures that use this boundary deliberately outperform both pure-skill and pure-agent designs on cost, debuggability, and reliability.

The boundary is where the workflow stops being deterministic. Up to that boundary, build a skill. Past it, evaluate whether an agent is genuinely required or whether a conditional skill handles the variation.

A skill that gathers research via an agent, then formats and delivers a structured report using defined instructions, is a valid architecture. The skill handles the deterministic half. The agent handles the non-deterministic half. Each part uses the right tool.

An agent that consists entirely of prompts with no external tool calls and no genuine runtime branching is not an agent. It is a skill with extra overhead.

The production bar for multi-agent systems is higher because the failure surface is larger. A single skill fails in one place, and that place is findable. A three-agent workflow fails at the orchestrator, at any sub-agent, or at the handoff points between them.

Build the simplest design that works. A skill that handles 90% of cases is better than an agent that handles 100% at 4.6x the cost and triple the debugging burden. Addy Osmani, Engineering Lead at Google, notes that agent teams carry significantly higher token cost and debugging burden than well-designed single skills (Source: addyosmani.com/blog/claude-code-agent-teams/, 2026).

For a deeper look at building your first skill, see How Do I Create My First Claude Code Skill? and What Is a Claude Code Skill?.

FAQ

Should I use a Claude Code skill or a custom GPT for my workflow?

Use a Claude Code skill if your work happens in Claude Code or a compatible AI coding tool. The SKILL.md format works across 14+ platforms, including Cursor, Gemini CLI, and Windsurf (Source: SKILL.md universal format specification, 2026). Custom GPTs run only in the ChatGPT interface and do not transfer to other platforms. If you are not locked to ChatGPT, skills are the more portable choice by a large margin.

Can a Claude Code skill call other skills or spawn subagents?

Yes. A skill can reference other skills by name and instruct Claude to invoke them. A skill can also instruct Claude to use the Task tool to spawn a subagent when Claude Code is configured with the required permissions. The SKILL.md file itself does not execute code directly. It instructs Claude, which then takes the actions described.

Is it better to have one complex skill or several simple ones?

One well-structured skill is usually better. A single skill keeps state and instructions in one file, avoids coordination overhead between skills, and is easier to test and version. Split into multiple skills when tasks have different trigger conditions, when a single file exceeds 500 lines and readability degrades, or when the audiences are different enough that loading one skill for an unrelated workflow creates noise.

How do Claude Code skills compare to GitHub Copilot custom instructions?

GitHub Copilot custom instructions live in copilot-instructions.md and apply globally to all Copilot sessions in the repository. They are always-on context, similar to CLAUDE.md. Claude Code skills are on-demand, named, triggered tasks. The SKILL.md format ports to Copilot-compatible platforms, but .github/copilot-instructions.md does not port to Claude Code. The two formats serve different purposes and are not interchangeable.

When should I use a Claude Code plugin instead of a standalone skill?

Use a plugin when you need tool capabilities Claude Code does not have natively: browser access, database queries, external API integrations. Use a skill when you need structured instructions for tasks Claude can execute with its built-in tools. Plugins extend what Claude can do. Skills define how Claude does it. Most workflows need a skill. Plugins are for the cases where Claude's existing tools are genuinely insufficient.

Last updated: 2026-04-13