When Should I Use a Subagent Instead of a Skill in Claude Code?

A subagent is a separate Claude instance spawned by a parent agent to handle a specific task in isolation. A skill is a structured procedure running within a single Claude session. The choice is not about which is more powerful. It is about whether the workflow actually benefits from coordination between two instances, and whether you are willing to pay the overhead that coordination costs.

At Agent Engineer Master, we see subagent architectures proposed in the first week of almost every complex commission. We build them in roughly one in five. The rest are solved with a well-designed skill and a clear output contract.

TL;DR: Use a subagent when tasks can run in parallel, require isolated context, or need independent verification from a fresh perspective. Use a skill when the workflow is sequential, a single context is sufficient, and you want to avoid the latency and token cost of agent coordination.

What is a subagent in Claude Code?

A subagent is a Claude instance launched by a parent agent using the Agent tool. The parent passes a prompt, a description, and optionally a set of tool permissions. The subagent runs independently, completes its task, and returns a result to the parent. The parent continues once the result is received.

Multiple subagents can run in parallel when the parent launches them in a single tool call batch. This is the primary case where subagents are worth their overhead: when independent tasks can execute simultaneously and you cannot sequence them without adding latency. Anthropic's engineering team measured that multi-agent systems use approximately 15x more tokens than single-turn chat interactions for equivalent tasks, compared to roughly 4x for single-agent agentic runs (source: Anthropic Engineering, "How we built our multi-agent research system," 2025).

A subagent has its own context window. It does not inherit the parent's conversation history unless you explicitly pass it. This isolation is both a benefit and a liability. The benefit: the subagent starts fresh on a focused task without noise from the parent's accumulated context. The liability: you must construct a self-contained prompt for every subagent call, which adds engineering effort and token consumption.

What does a Claude Code skill do instead of a subagent?

A Claude Code skill is a SKILL.md file that instructs a single Claude instance to follow a sequence of steps. The skill runs within the active session's context. No new instance is spawned. No coordination overhead is incurred. It can read files, call tools, and produce structured output entirely within a single context window.

Skills handle the majority of complex workflows without any multi-agent architecture:

Sequential multi-step operations: A skill can read files, call tools, apply transformations, and produce structured output in sequence, all within one context window.
Reference file loading: A skill's process steps explicitly load domain knowledge at the right point in the workflow. No inter-agent handoff is required.
Conditional branching: A skill can include decision points that change step execution based on input conditions.

The constraint is parallelism. A single skill instance executes one step at a time. If your workflow contains five independent tasks that each take 30 seconds, running them sequentially in a skill takes 2.5 minutes. Running them in parallel subagents takes 30 seconds plus coordination overhead. That is the narrow case where subagents earn their cost.

See When Does a Workflow Need Multiple Agents vs a Single Skill? for the full decision framework.

What are the real costs of using subagents?

Subagents impose three compounding costs: a 4.6x token multiplier on every task, a separate API round-trip per spawn, and coordination engineering that grows with workflow complexity. These costs are real and they accumulate fast. Weigh each one against a concrete benefit before choosing a multi-agent architecture over a skill.

Token overhead: Multi-agent systems incur a 4.6x token multiplier compared to single-agent approaches for the same task (source: Anthropic research synthesis on agent architectures, 2024). Each subagent receives a full self-contained prompt, processes it in its own context, and returns a result that the parent must process. That accumulation is not a rounding error.

Latency: Each subagent spawn is a separate API call with its own network round-trip, context loading, and generation time. A workflow with three sequential subagent calls is at minimum three times slower in wall-clock time than the same workflow in a single skill. Research on multi-agent GenAI systems found that orchestration alone adds 50-200ms of coordination overhead per spawn, before any generation begins (source: Srivastava, "Understanding Latency in Multi-Agent GenAI Systems," Medium, 2024).

Coordination complexity: The parent agent must handle subagent results correctly: checking for errors, combining outputs, resolving conflicts between independent results. Every handoff is a failure point. In our builds at AEM, the most common failure mode in multi-agent commissions is not the subagent failing. It is the parent mishandling a valid subagent result (source: AEM internal, 2026).

Context isolation tax: Subagents need self-contained prompts. Every piece of context the parent has accumulated must be explicitly re-packaged for each subagent call. This is engineering overhead that grows with workflow complexity.

"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." - Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)

The silver lining of subagent isolation: each subagent starts with a clean, focused context. For tasks where the parent's accumulated context would introduce noise, starting fresh is genuinely valuable. But that benefit only compensates for the overhead in specific situations. A 2025 Stanford study found that single-agent systems match or outperform multi-agent systems on multi-hop reasoning tasks when the token budget is held equal across both architectures (source: Tran and Kiela, "Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets," Stanford University, arXiv 2604.02460, 2025).

When does a subagent make sense over a skill?

Four conditions justify a subagent over a skill: tasks that can run in parallel, tasks that must not inherit the parent's context, workflows that need scoped tool permissions, and independent verification passes. Below the line on all four is cost. Pay the 4.6x token overhead only when one of these conditions is clearly met.

Parallel independent tasks. You have N tasks where N is greater than 2, the tasks are genuinely independent (no result depends on another), and the latency saving from parallelism exceeds the coordination overhead. This is the canonical case. Indexing 50 files simultaneously is faster with parallel subagents than sequentially in a skill. Running 3 steps that depend on each other is not.
Isolation requirements. The task must not inherit any context from the parent session. An independent code reviewer must not know what the author was thinking. A verification pass must start from the output, not the generation process. Context isolation is a requirement, not just a preference.
Different tool permissions. Subagents can be granted a restricted subset of tools. A parent with broad permissions can spawn a subagent with read-only access for a verification step. Skills cannot scope tool access mid-workflow.
LLM-as-judge patterns. You need a second, independent opinion on a first-pass output. The judge subagent must not have seen the generation process. This requires a clean context that a skill cannot provide within the same session. Note that verification subagents carry a real token cost: in agentic software engineering tasks, the code review phase alone accounts for an average of 59.4% of all token consumption across a workflow (source: Weyssow et al., "Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering," arXiv 2601.14470, 2025).

When is a skill clearly better than a subagent?

A skill is the right choice when the workflow is sequential, the task runs inside a single context window, and speed from parallelism is not the actual bottleneck. The majority of workflows that come through AEM commissions fit this profile. The conditions below are where a skill wins clearly, without paying the coordination tax:

The workflow is sequential and each step depends on the previous result.
The full workflow takes less than 90 seconds in a single session (the parallelism benefit rarely exceeds coordination overhead at this scale).
The task is run by a single user and the primary goal is consistency of output, not speed.
You are building for the first time and have not yet confirmed that parallelism is the actual bottleneck.

The 4.6x token multiplier is not a feature. It is the price of coordination. Pay it only when the coordination produces a proportionate benefit: faster output, true isolation, or independent verification that a single session cannot provide. At current Claude Sonnet API rates of $3.00 per million input tokens and $15.00 per million output tokens (source: Anthropic API pricing, 2025), that 4.6x multiplier translates into a cost increase you can calculate before committing to a multi-agent architecture.

The limit of skills at scale: For workflows involving hundreds of parallel file operations, real-time streaming from multiple sources, or dynamic agent networks where the number of subagents is not known at design time, a skill's sequential execution model is not adequate. These are genuine multi-agent use cases. They are also a small fraction of the workflows teams build in practice.

See Can I Use Skills and Agents Together? for the hybrid pattern that covers most real use cases.

What does a typical decision look like in practice?

Before recommending a subagent architecture, AEM maps the workflow into steps, identifies which are genuinely independent, estimates the latency saving from parallelism, and multiplies the token cost by 4.6x. If the latency saving does not clearly exceed the multiplied cost, a skill is the right choice. The four steps below are the test:

Map the workflow into steps.
Identify which steps are genuinely independent (not just could-run-in-parallel, but produce no result another step needs).
Estimate the latency saving from parallelism.
Multiply the token cost by 4.6x and check whether the latency saving justifies it.

In 80% of the workflows we audit, the bottleneck is not sequential execution. It is an underspecified output contract or missing reference file. Fixing the specification produces more improvement than adding subagents, at zero additional token cost (source: AEM commission audits, 2026).

See What's the Difference Between a Claude Code Skill and an Agent? for a broader comparison of the two architectures.

Frequently Asked Questions

For most workflows, the right choice is a skill: sequential, low token overhead, and no coordination engineering required. Subagents are worth their cost in four specific cases: genuine parallelism, required context isolation, scoped tool permissions, and LLM-as-judge verification. The questions below cover edge cases and implementation details.

Can a skill spawn a subagent? Yes. A SKILL.md process step can instruct the parent Claude instance to use the Agent tool, which spawns a subagent. This creates a hybrid where the skill handles the structured sequence and delegates specific tasks to subagents when isolation or parallelism is needed.

Does using a subagent always cost more tokens? Yes, by design. The minimum overhead is the self-contained prompt for the subagent plus the result returned to the parent. For long-running independent tasks, this overhead is small relative to the task cost. For short tasks, the overhead can exceed the task cost itself.

How many subagents is too many? There is no fixed limit, but the coordination complexity grows non-linearly with subagent count. Two or three subagents are manageable. Ten or more requires careful orchestration to avoid result conflicts and error accumulation. If you are designing a workflow with ten or more subagents, evaluate whether a pipeline architecture with smaller, independent runs is better suited.

Can I run subagents with different models? Yes. The parent can spawn subagents with different model specifications. A common pattern is using Opus for the orchestrating parent and Haiku for high-volume parallelizable subagents where the task is simpler. This reduces cost without sacrificing orchestration quality.

What happens if a subagent fails? The parent receives an error result from the failed subagent. The parent's process steps must handle this case explicitly. If the parent has no error handling for subagent failures, the workflow stops or produces incorrect output silently. This is the most common multi-agent failure mode in production builds.

Last updated: 2026-05-05