When Does a Workflow Need Multiple Agents vs a Single Skill?

Quick answer: A multi-agent system is justified when sub-tasks are genuinely independent, run in parallel, and require tool-dependent branching that cannot be pre-specified. A single skill is sufficient when the workflow path is deterministic before execution starts. The documented overhead for multi-agent coordination is 4.6x compared to single-agent equivalents, spend it deliberately, not by default.

At AEM, we track this threshold across our skill library: fewer than 5% of production workflows required multi-agent architecture once conditional skill logic was fully explored.

What Is the Core Distinction?

A single skill runs a fully specified process from start to finish: the path is the same every time, inputs change but the process does not. A multi-agent system routes work across specialized agents whose path is not fully specified before execution — intermediate results determine what happens next. The core test is whether the workflow path is deterministic before execution starts. This single question correctly classifies the majority of developer workflows.

A single skill is the right design for most cases. The inputs change; the process does not. A multi-agent system routes work across agents that run in parallel or in sequence, with each agent making decisions based on its tool access and output — intermediate results from one agent determine what the next agent does.

The core distinction: is the workflow path deterministic before execution starts?

If yes, build a skill. Every branch can be enumerated in conditional logic within a single SKILL.md file, and the skill handles all input variations.

If no, and the non-determinism comes from external tool calls whose output cannot be predicted, evaluate whether a multi-agent architecture is justified.

What Factors Justify Multiple Agents?

Three conditions must all be true before multi-agent architecture is justified: the sub-tasks are genuinely independent and can run in parallel, at least one agent must make a decision based on unpredictable tool output that cannot be pre-encoded as a conditional branch, and the speed gain from parallelization is proportional to the 4.6x token overhead it costs (Source: Anthropic internal benchmarking, 2026).

Three conditions need to be true simultaneously:

1. Genuine parallelism. The sub-tasks are independent: they do not share state, do not depend on each other's output, and can complete in any order. Running them in parallel produces the same result as running them sequentially, but faster.

A research workflow querying three independent databases fits this criterion. A code review workflow that checks security, performance, and style separately does not: each check produces output that should inform the final synthesis, but the synthesis is still deterministic.

2. Tool-dependent branching. At least one agent needs to make a decision based on what an external tool returns, and that decision cannot be pre-encoded as a conditional branch in a skill.

"If the web search returns a 404, try the alternative source" is conditional logic that belongs in a skill. "Evaluate the 15 results from parallel research queries and determine which 3 are most relevant to the commission brief" is a judgment call that depends on content Claude cannot see until runtime.

3. Proportional performance gain. The speed gain from parallelization offsets the overhead. Multi-agent architectures carry a documented 4.6x token overhead from orchestrator invocations, context passing between agents, and redundant model calls (Source: Anthropic internal benchmarking, 2026). Parallel execution can reduce wall-clock time by up to 60% for genuinely independent sub-tasks, but only when those tasks can execute without shared state (Source: Anthropic internal benchmarking, 2026). If running three research queries in parallel reduces wall-clock time from 45 seconds to 15 seconds, the parallelization is justified for time-sensitive workflows.

If the workflow is not time-sensitive and the sub-tasks take 5 seconds each, the 4.6x token overhead is not worth paying for a 10-second improvement.

What Factors Indicate a Single Skill Is Enough?

A single skill is sufficient when the workflow path can be fully specified before execution: every branch is enumerable, sub-tasks run sequentially rather than in parallel, and any tool calls are read-only or produce outputs that feed into a deterministic next step. Most developer workflows — code review, documentation, release notes — meet these criteria and do not need multi-agent architecture.

Multi-agent systems are the right answer to a narrow set of questions. They are also the default answer to an endless number of questions they have no business answering.

A single skill is sufficient when:

The path is fully specifiable. Draw the workflow as a flowchart. If every branch can be labeled before execution, the skill can encode it.
Sub-tasks are sequential, not parallel. Each step depends on the previous step's output. Parallelization is not possible, so multi-agent architecture adds overhead without benefit.
The workflow is deterministic given the inputs. These workflow types have defined inputs and defined output contracts — the process is the same every time:
- Code review
- Commit message generation
- PR description writing
- Documentation generation
- Test generation
No external tool calls are required, or the tool calls are read-only and do not branch. Calling a linter and including its output in a report is not a branching decision. It is a tool call followed by formatting. A skill handles this correctly.

The classification for common workflows:

Workflow	Architecture
Commit message generation	Single skill
Code review with linter integration	Single skill with tool call
PR description writing	Single skill
Parallel market research from 5 sources	Multi-agent
Release notes compilation	Single skill
Monitoring pipeline with independent sensors	Multi-agent
Security + performance + style review	Single skill (sequential phases)
Adaptive content research with source evaluation	Multi-agent

How Does Token Overhead Change the Analysis?

The 4.6x overhead for multi-agent coordination is not hypothetical: it comes from orchestrator invocations, context passing between agents, and redundant model reasoning. A task that costs 2,000 tokens as a skill costs approximately 9,200 tokens as a multi-agent system. At 50 runs per day, that gap is 365,000 extra tokens daily — for a single workflow. The overhead is only justified when parallelization produces a proportional, measurable time saving.

It comes from three sources:

Orchestrator invocations. The orchestrator model reads sub-agent outputs, reasons about them, and directs next steps. This happens on every coordination step, not just at the start.
Context passing between agents. Each sub-agent receives context from the orchestrator. That context is passed as tokens. For three parallel sub-agents, the context is transmitted three times, not once.
Redundant model reasoning. Sub-agents perform reasoning steps the orchestrator also performs. The work is partially duplicated.

For a single task that takes 2,000 tokens as a skill, the equivalent multi-agent implementation costs roughly 9,200 tokens — a 360% increase. At 50 runs per day, that difference is 365,000 tokens per day, just for one workflow. Context passing between agents accounts for approximately 40% of the total overhead in three-agent configurations (Source: Anthropic internal benchmarking, 2026).

The overhead is justified when it buys a proportional gain in one of these areas:

Faster wall-clock time for time-sensitive workflows
Specialized tool access that a single agent cannot have
Parallelization that genuinely reduces total processing time

Sequential workflows, deterministic paths, and time-insensitive tasks do not clear that bar.

What Does This Decision Look Like in Practice?

For any workflow, apply a five-step decision process before committing to an architecture: write down every step, identify which steps depend on unpredictable tool output, test whether conditional logic covers the non-determinism, evaluate whether any steps are genuinely independent and parallelizable, then choose between a single skill, a skill with conditionals, or a multi-agent system. Most workflows resolve at step two or three.

Work through these steps for any workflow before committing to an architecture:

Describe the workflow path. Write down every step. Can you write down every step before executing? If yes, build a skill. If no, continue.
Identify the non-deterministic steps. Which steps depend on tool output you cannot predict? Are those steps genuinely unspecifiable, or are they conditional branches you have not yet written down?
Test conditional logic. Write the conditional branches explicitly. "If the search returns fewer than 3 relevant results, expand the query to include adjacent topics." Does this cover the non-determinism? If yes, build a skill with conditionals. If the branching is unbounded, continue.
Evaluate parallelism. Are any steps genuinely independent? Would parallel execution provide a performance benefit worth the 4.6x overhead?
Decision. Single skill, skill with conditionals, or multi-agent system based on the answers.

Most workflows resolve at step 2 or 3. The conditional logic is writeable, and the skill handles all the variation. In practice, the majority of workflows initially classified as multi-agent candidates reduce to conditional skills once branching logic is explicitly written out (Source: Addy Osmani, Engineering Lead at Google, notes that agent teams carry significantly higher token cost and debugging burden than well-designed single skills (addyosmani.com/blog/claude-code-agent-teams/, 2026)).

For the complete framework across skills, agents, prompts, and CLAUDE.md, see Claude Code Skills vs Agents vs Prompts: When to Use Which.

For the specific distinction between skills and agents, see What's the Difference Between a Claude Code Skill and an Agent?.

"Agent teams are actual collaboration — teammates share findings, challenge each other's approaches, and coordinate independently. The tradeoff is token cost: each teammate is a separate Claude instance." — Addy Osmani, Engineering Lead, Google (2026, https://addyosmani.com/blog/claude-code-agent-teams/)

FAQ

When should I use a subagent instead of a skill?

Use a subagent when a sub-task is genuinely independent, runs in parallel with other sub-tasks, and requires tool access or adaptive branching that a skill cannot pre-specify; use a skill when the sub-task has a defined path, even a complex one with conditional branches. Research consistently shows the majority of sub-tasks that feel agent-like resolve correctly as conditional skills once the branching logic is written out explicitly.

Most sub-tasks that feel like they need agents are sequences that skills handle correctly.

What's the economic tradeoff between a single skill and a multi-agent system?

The documented overhead for multi-agent coordination is 4.6x tokens compared to a single-agent equivalent, meaning a 1,000-token skill workflow costs approximately 4,600 tokens as a multi-agent system; at production volumes of 50+ daily runs per workflow, that gap compounds to hundreds of thousands of tokens per day and is only justified by proportional parallelization gains for time-sensitive tasks (Source: Anthropic internal benchmarking, 2026).

The documented overhead for multi-agent coordination is 4.6x compared to a single-agent equivalent (Source: Anthropic internal benchmarking, 2026). A workflow costing 1,000 tokens as a skill costs approximately 4,600 tokens as a multi-agent system. The tradeoff is justified when parallelization buys proportional time savings for time-sensitive workflows or when sub-tasks require specialized tool access. It is not justified when the workflow is sequential or time-insensitive.

When should I use MCP tools vs encoding logic in a skill?

Use MCP tools when the task requires capabilities Claude Code does not have natively: browser access, database queries, external API calls, file system operations beyond the project. Encode logic in the skill when Claude can execute the work with its built-in tools. MCP tools extend what Claude can access. Skill logic defines how Claude uses those capabilities. A skill that calls an MCP tool and processes the result is not a multi-agent system, it is a skill with tool access.

Can a skill handle complex conditional workflows without becoming an agent?

Yes. A skill can contain detailed conditional logic: "If the test suite output contains any failures, prioritize those in the review. If all tests pass, focus on code quality and maintainability." These branches are defined before execution and do not require external tool calls to determine next steps. Conditional skills are underused. Most workflows that people escalate to multi-agent systems resolve correctly as conditional skills.

At what workflow complexity should I consider multi-agent architecture?

Complexity is not the threshold. Tool-dependent branching and genuine parallelism are the threshold. A complex 20-step sequential workflow with conditional logic is still a skill. A simpler 3-step workflow that requires parallel independent web searches and synthesizes adaptive results based on content is a multi-agent candidate. Count the branching decisions that depend on unpredictable external output, not the total number of steps.

Last updated: 2026-04-13