How Do I Write Step-by-Step Instructions for a Claude Code Skill?

Quick answer: Number every step. Name the tool Claude should use at each step. Handle the most common failure case within the step itself. Keep each step to one action. These four rules produce steps that Claude follows consistently across users, sessions, and model versions — without instructions that drift or break on the third run. These patterns come from AEM (Agent Engineering Master), a skill-as-a-service platform for Claude Code skill production.

What makes a process step work reliably vs fail silently?

A process step works reliably when it is numbered, names a specific tool, and handles its most common failure case explicitly. Steps that omit any of these three elements fail silently — Claude produces plausible-looking output that is adjacent to what you intended rather than exactly what the step specified. The fix is always in the step definition, not in Claude's behavior.

A process step fails silently when Claude executes something adjacent to what you meant without you noticing. The output looks plausible. The step ran. But the result is not what the step was designed to produce.

Three requirements for a step that works consistently:

1. It is numbered. Claude Code treats numbered lists as sequential instructions and unordered lists as optional suggestions. This is not a nuance — it is the difference between a step Claude follows as an obligation and one it treats as a guideline. Number every step. In practice, the majority of step-skipping failures in production skills trace back to unordered lists — Claude treating bullet points as optional rather than sequential.

2. It names the tool. "Search the codebase" is an instruction Claude interprets by choosing a tool. "Use the Grep tool to search the codebase" removes that choice. Named tools produce stable output. Tool-choice freedom produces variation across sessions, across models, and across team members (AEM production data, 2026).

3. It handles the common failure case. Every step has at least one predictable failure mode. A file that does not exist. A search that returns no results. An API call that times out. A step without failure handling forces Claude to improvise. Improvised error handling is the primary source of skill inconsistency in production environments.

"Probably the most important thing to get great results out of Claude Code: give Claude a way to verify its work. If Claude has that feedback loop, it will 2-3x the quality of the final result." — Boris Cherny, Creator of Claude Code, Anthropic (January 2026, https://x.com/bcherny/status/2007179861115511237)

If a new team member followed your step list literally and produced something wrong, the step was the problem, not the team member. Write steps that survive literal interpretation.

How do I write a complete, production-quality step?

A production-quality step contains four components: an action verb that tells Claude what to do, a named tool that tells Claude how to do it, an input specification that removes ambiguity about scope, and a failure branch that handles the most common error case. All four are required. A step missing any component delegates a decision to Claude that the step should make itself.

A complete step has four components:

The action verb
The tool name
The input specification
A failure branch

Step 3: Use the Grep tool to search for all TODO comments in the codebase.
  Search pattern: "TODO|FIXME|HACK"
  Include files: **/*.ts, **/*.js
  If no results are found, output "No TODO items found" and stop. Do not continue to Step 4.

That step:

Names the tool (Grep)
Specifies the search pattern precisely
Scopes the file types
Handles the empty result case with an explicit stop condition

What is NOT in that step: reasoning about whether to search, choice of alternative tools, open-ended interpretation of "TODO comments." Those decisions are made in the step. Claude executes them.

For comparison, a weak version of the same step:

Step 3: Look for TODO comments in the code and see what needs to be done.

That step relies on Claude to choose a tool, decide what "TODO comments" means, determine the scope, and handle the empty case. Four decisions the step should make for Claude. The result varies across every run.

How do I handle conditional logic in steps?

Conditional logic belongs inside the step as an explicit branch — not as a separate reasoning step and not as something implied by context. Write "If X, do Y. If not X, do Z." for binary conditions. For multi-branch conditionals, enumerate every case and make each branch exhaustive so Claude cannot reach a no-match state where none of the conditions apply.

Conditional logic belongs inside the step as an explicit branch. Not as a separate reasoning step. Not as something implied by context.

Pattern for simple conditionals:

Step 4: Check if the tests directory exists using the Glob tool (pattern: "tests/**").
  If the directory exists: continue to Step 5.
  If the directory does not exist: tell the user "No tests directory found. Create tests/ first." and stop.

Pattern for multi-branch conditionals:

Step 2: Read the package.json file using the Read tool.
  If the file contains a "scripts.test" field: proceed to Step 3 using that script.
  If the file contains a "scripts.spec" field: proceed to Step 3 using that script instead.
  If neither field exists: ask the user which test command to run before continuing.

Two rules for conditional logic in steps:

Keep conditions binary where possible. "If X, do Y. Otherwise, do Z." Three-way branches need careful wording — make each branch exhaustive so Claude does not reach a case where none of the conditions match.

Make the stop condition explicit. "Stop" means stop. Do not rely on Claude to infer that the process is complete. A step that ends with "output the error message" without saying "and stop" sometimes produces the error message and then continues anyway. Most cascading skill failures in production trace back to missing stop conditions on failure branches.

How do I write steps that run in parallel?

Parallel steps in a Claude Code skill require an explicit parallel label — "Steps 3 and 4 can execute in parallel" — and a synchronization marker on the first dependent step. Without both labels, Claude treats all steps as sequential. This approach reduces token round-trips for skills that gather independent information before a synthesis step, and produces measurable speed gains in production use.

For steps that can execute concurrently, state it explicitly:

Steps 3 and 4 can execute in parallel:

Step 3: Use the Read tool to load CHANGELOG.md.
Step 4: Use the Bash tool to run `git log --oneline -20` and capture the output.

Step 5: (Waits for Steps 3 and 4 to complete) Compare the changelog entries against the git log output.

Label the dependency explicitly. "Waits for Steps 3 and 4 to complete" tells Claude that Step 5 is a sequencing boundary, not another parallel step. Without that label, Claude interprets the structure ambiguously.

Parallel steps reduce token round-trips for skills that perform independent information gathering before synthesis. A skill that reads 3 files and then combines them runs faster with Steps 1, 2, and 3 in parallel than with sequential execution (AEM performance testing, 2026).

How do I write a step that waits for user approval?

An approval gate step presents the pending action to the user, asks for explicit "yes/no" confirmation, and handles three response branches: confirm, cancel, and ambiguous. All three branches must be covered within the step. There is no path through a correctly written gate where Claude takes the irreversible action without an explicit confirmation from the user.

User-approval gates prevent skills from taking irreversible actions without confirmation. The pattern:

Step 4: Present the user with the list of files to be deleted. Ask: "Confirm deletion of these N files? (yes/no)."
  If the user responds "yes" or confirms: proceed to Step 5.
  If the user responds "no" or cancels: stop immediately. Output "Deletion cancelled. No files were changed."
  If the user responds with anything else: re-ask the confirmation question once before stopping.

The gate has three branches: confirm, cancel, and ambiguous response. All three are handled. There is no path through this step where Claude takes the irreversible action without an explicit "yes."

Use approval gates for steps that perform any of the following:

Delete files
Modify database records
Send messages or emails
Push to remote repositories
Call external APIs that charge per request

Any step where the cost of an unintended run exceeds the cost of one extra round-trip gets a gate.

What are the most common step-writing mistakes?

The four most frequent step failures are: bundled steps that pack multiple actions into one instruction, assumed context that references files or data Claude has not been given, missing tool names that leave tool choice to Claude's discretion, and absent stop conditions on failure branches that allow errors to cascade through subsequent steps. Each mistake has a direct fix at the step level.

The four most frequent step failures across AEM skill engineering work (AEM production data, 2026):

1. Bundled steps. "Analyze the code and fix the bugs and update the tests" is three steps. Write three steps. Bundled steps produce partial execution — Claude completes the first sub-action and moves on before finishing the bundle. In practice, bundled steps are among the most common sources of production skill failures.

2. Assumed context. "Review the requirements document" — where is it? A step that refers to context Claude does not have results in Claude searching for it, guessing at it, or asking the user at an unexpected moment. Name the file path or add a prior step that retrieves the context.

3. Missing tool names. Covered above, but worth repeating: tool-free steps produce the most variation of any single step mistake. Name the tool.

4. No stop condition on failure. A step that fails without a stop condition allows Claude to continue to the next step with incorrect or empty input. The next step fails too. And the one after it. The user receives output that appears complete but is built on a cascade of empty or incorrect intermediate results. Add a stop condition to every failure branch.

What step-by-step instructions do not solve: Step-by-step instructions work well for deterministic workflows with predictable inputs and known tool paths. They do not solve skills that require open-ended reasoning, subjective judgment, or dynamic goal-setting — those require output contracts and rules sections, not process steps alone. If your skill's logic cannot be expressed as a numbered sequence of bounded actions, step writing is not the right layer to fix.

For the complete skill building guide covering all four checkpoints and the full production bar, see The Complete Guide to Building Claude Code Skills in 2026.

For how the process steps fit within the full SKILL.md structure including output contracts and rules, see What goes in a SKILL.md file?.

For how to write a description that triggers the skill so these steps actually run, see What does the description field do in a Claude Code skill?.

Frequently Asked Questions

Claude keeps skipping step 3 of my skill. What am I doing wrong?

Step 3 is either not numbered (Claude treats it as optional), its precondition is not met (Step 2 produced output that makes Step 3 seem unnecessary), or the step is bundled with Step 2 (Claude executed part of it and counted that as done). Check all three. If Step 2 ends with a result that Claude interprets as the skill's goal, add a continuation instruction: "After completing Step 2, always continue to Step 3 regardless of the output."

Should my skill steps tell Claude which tools to use or let it decide?

Tell Claude which tools to use. Tool-choice freedom produces variation — the same step runs differently across Claude Haiku, Sonnet, and Opus, and across different sessions with the same model. Named tools produce stable, reproducible output. The only exception: steps where the tool choice depends on input that is not known at skill-authoring time. In that case, specify the decision rule ("If the input is a file path, use the Read tool. If it is a URL, use the WebFetch tool.").

How do I write a skill step that waits for user approval before continuing?

Use the approval gate pattern: present the action, ask for explicit confirmation with "yes/no" framing, and handle all three response cases (confirm, cancel, ambiguous) within the step. Add "If the user responds with anything other than yes or a clear confirmation: re-ask once, then stop." Without the re-ask limit, Claude loops indefinitely on ambiguous responses.

Can a skill step tell Claude to spawn a subagent?

Yes. A step can include an instruction to spawn a subagent: "Use the Task tool to spawn a background agent with the following prompt:..." The subagent runs with its own context. For single-domain tasks, a well-designed skill produces better results than a single-skill-plus-subagent architecture at 4.6x lower token cost (Anthropic multi-agent research, 2026). Reserve subagent steps for tasks that genuinely require parallel execution or specialized model routing.

How do I make Claude run two steps in parallel inside my skill?

Label them as parallel steps and mark the first dependent step explicitly. "Steps 3 and 4 can execute in parallel" enables concurrent execution. "Step 5 (waits for Steps 3 and 4)" marks the synchronization point. Without the explicit labels, Claude interprets all steps as sequential by default.

Last updated: 2026-04-13