Building a Claude Code skill without an approved brief is the fastest way to build the wrong thing correctly. The output looks finished. The file exists. The skill runs. Then you show it to the person who asked for it and discover the trigger scope is wrong, the output format is what they did not want, and the "does NOT produce" list is empty because nobody discussed it. In 3 of 4 early AEM commissions where builders jumped straight into SKILL.md before aligning on the brief, we issued a full rewrite request.

TL;DR: Building without design approval turns assumptions into architecture. Once assumptions are encoded in SKILL.md, output contracts, and reference files, fixing them requires rewriting all three. A 30-minute alignment call prevents 4+ hours of rewriting. The single most common rework trigger in production skill work is an output contract discovered to be wrong only after build.

What does "design approval" mean in skill engineering?

Design approval is confirmation that the person requesting the skill agrees on four specific items before you write a single line of SKILL.md: the output format, the trigger scope, the explicit non-goals, and at least two example input-output pairs. Without sign-off on all four, every line of code you write is your best guess dressed up as a deliverable.

  1. The output contract — what format the skill produces, which fields it includes, how long the output is, and who consumes it
  2. The trigger scope — which user inputs activate the skill and which inputs should not
  3. Non-goals — what the skill will explicitly not produce
  4. Quality signal — at least 2 example inputs with the expected outputs

These four items define the contract between the skill and the person using it. Everything inside SKILL.md — process steps, reference file pointers, edge case rules — is the builder's decision. Design approval is only about the contract.

"The failure mode isn't that the model is bad at the task — it's that the task wasn't specified tightly enough. Almost every production failure traces back to an ambiguous instruction." — Simon Willison, creator of Datasette and llm CLI (2024)

This is what happens when a skill gets built before design approval: the spec is the builder's best guess, not the requester's actual requirement. Research by Addy Osmani, Engineering Director at Google Chrome, found that giving a model an explicit output format with examples raises consistency from roughly 60% to over 95%. The output contract is not a formality: it is the single variable with the highest leverage on output quality (Addy Osmani, Google Chrome, 2024).

What goes wrong when you skip the alignment step?

Three failure modes develop when building precedes alignment, and each one is invisible until the requester sees the finished skill: the output format is wrong, the trigger fires on prompts it should ignore, or the skill produces content nobody said to exclude. Spotting any of these after build means rewriting SKILL.md, evals, and assets together.

  • Output format mismatch — the builder produces Markdown; the requester needed JSON for a downstream pipeline. Or the builder writes a 500-word synthesis; the requester needed three bullets. Output format is the most common misalignment point, and also the one that propagates furthest: the output contract is referenced throughout SKILL.md, evals.json, and any output templates in the assets folder.
  • Trigger scope mismatch — without explicit trigger conditions, descriptions default to over-inclusive phrasing. "Use this skill when the user wants to analyze..." ends up activating on tangentially related prompts. Or the inverse: the scope is narrowed to where the skill never fires on the use cases that matter most.
  • Missing non-goals — Claude, without explicit exclusions, produces what seems helpful. If nobody specified the skill should not generate follow-up questions, it will. If nobody said it should not add context from external sources, it does. Non-goals are invisible until you see the output you did not want.

Each of these is a 5-minute conversation before build. After build, each requires updating SKILL.md, revising evals, and potentially restructuring reference files. Hooks and Farry (2001) measured that rework caused by requirements errors consumes 28% to 42% of total development cost across software projects. Skills are not exempt from that math.

How do wrong assumptions compound across the skill architecture?

A production Claude Code skill is not one file. Wrong assumptions encoded in the output contract propagate into every layer of the architecture: SKILL.md, reference files, asset templates, and evals.json all derive from the same spec. Fix one layer without fixing the others and the skill still fails on the use cases that matter.

  • SKILL.md (description, output contract, process steps)
  • Reference files (domain knowledge, approved examples, rules)
  • Assets (output templates, scripts)
  • evals.json (test cases asserting the expected behavior)

When the output contract is wrong, all four break. The evals test the wrong behavior. Asset templates produce the wrong format. Reference files encode the wrong domain context. Fixing one does not fix the others.

In our builds, rework hours track to two distinct patterns. When alignment happens before build, rework averages under 90 minutes — mostly edge case refinements. When build precedes alignment, rework averages over 6 hours for a medium-complexity skill, because the whole architecture derives from a faulty spec. You are not editing. You are starting over with a corrected brief.

The same logic applies in traditional software. Requirements are cheapest to change when they are words in a conversation. They cost the most to change when they are deployed code. Skills are no different. The IBM Systems Sciences Institute found that fixing a requirements defect during implementation costs approximately 6.5x more than catching it at design time; after release, that multiplier rises to between 60x and 100x (IBM Systems Sciences Institute, 1981, cited in NIST research).

What should be confirmed before you write your first line of SKILL.md?

Confirm four items before writing a single line of SKILL.md: the output contract, the trigger scope, the non-goals list, and at least one quality signal with an example input and expected output. If any field is blank after the alignment conversation, the conversation is not finished. The checklist below makes that concrete.

Pre-build checklist:

Output contract (required)

  • [ ] What is the output format? (Markdown, JSON, plain text, a file, a command)
  • [ ] What fields or sections does the output include?
  • [ ] What is the target output length?
  • [ ] Who receives the output? (User, downstream pipeline, another skill)

Trigger scope (required)

  • [ ] What user phrase or intent activates the skill?
  • [ ] What does NOT activate it? (At least 2 non-triggering examples)

Non-goals (required)

  • [ ] What will the skill explicitly not produce?
  • [ ] What adjacent tasks go to a different skill?

Quality signal (required)

  • [ ] What does good output look like? (1-2 example inputs with expected outputs)
  • [ ] What does bad output look like? (Name at least 1 failure mode)

If any field is blank, the alignment conversation is not done. This checklist does not require a formal document. A shared note with the requester's confirmation on each item is sufficient. The format does not matter. Confirmation does.

For a deeper look at what a production-grade output contract looks like inside SKILL.md, see What Is an Output Contract in a Claude Code Skill?.

What happens if you try to fix design issues during testing?

Testing validates that the skill does what the spec says. When the spec is wrong, testing does not catch the problem: it confirms the wrong behavior with high confidence. You pass every eval, ship the skill, and discover the brief was incorrect only when the requester sees the output. The test suite was measuring the wrong thing the whole time.

This is the inverse of evaluation-first development. In evaluation-first development, you write test cases before writing the skill — which forces you to specify the output contract precisely before building. When you build first and test later, the tests get written to match the build, not the actual need. For how this pattern works in practice, see What Is Evaluation-First Development?.

Design approval is not bureaucracy added to slow work down. It is protection against the specific rework that costs the most: the kind where you discover the brief was wrong after you have already built to it. NASA's software engineering research found that a requirements error costs 29 times more to fix after deployment than it costs to fix at the requirements stage (NASA, cited in NIST SP 500-234). The earlier you catch a wrong assumption, the cheaper the correction.


FAQ

Does a personal workflow tool still need design approval?

Yes, even solo builds. "Design approval" with yourself means writing the four checklist items before opening SKILL.md. Writing the output contract surfaces assumptions. If you cannot fill in all four fields, the spec is incomplete.

How detailed does the initial brief need to be?

Specific enough to answer the four checklist items. A brief that defines the output format, the trigger condition, one non-goal, and one example input with expected output is sufficient. Length is not the criterion. Completeness on those four dimensions is.

What if the requester does not know what format they need yet?

That is useful information — it means the alignment conversation is not done. Prototype two output formats and show them before building the skill. Prototypes are cheaper than rewrites.

Can I build a rough draft and get approval on the draft?

Only if you frame the draft explicitly as a spec-discovery tool, not a deliverable. A 10-line sketch that produces one example output is a prototype. A 200-line SKILL.md built out fully is a deliverable that will now need rewriting. Know which one you are building before you start.

Is this only a concern for commissioned skills?

No. Every skill has a requester, even personal tools. That requester is a future version of you who will use the skill. Writing the brief is a conversation with that future user. Without it, you discover two weeks in that the skill produces a format you no longer want — the same rework problem, smaller scale.

How does skipping design approval connect to context bloat?

Directly. When the output contract is wrong from the start, builders compensate by adding more context, more rules, more reference files. The result is a skill overbuilt for a task it was never properly specified to do. See What Is Context Bloat and How Does It Hurt Skill Performance? for what that looks like in practice.

Last updated: 2026-04-19