How Do I Debug a Skill That Triggers on the Wrong Prompts?

A Claude Code skill that triggers on the wrong prompts has an over-broad description. In AEM (Agent Engineer Master), we treat this as the first debugging step: the skill discovery classifier reads the SKILL.md description field to decide when to activate, so the description is the only lever you control. The fix is surgical: identify which trigger phrases are matching unintended requests, tighten them, and add explicit negative triggers for the patterns you want to exclude.

TL;DR: Run five test prompts that should NOT trigger your skill. If the skill fires on any of them, compare those prompts against your description's trigger phrases to find what's matching. Add negative triggers to exclude the false-positive patterns. Then run five prompts that SHOULD trigger your skill to verify nothing broke.

Why does my skill fire on requests it shouldn't?

The skill discovery classifier reads your description and builds a match model from the vocabulary it finds there. If your description says "Invoke for code tasks," it matches everything containing the word "code," not just the task you intended. The description field is the only activation control you have, so every over-broad word is a potential false-positive source.

Three description patterns cause most false positives:

Trigger phrases that are too generic — "Invoke for writing tasks" matches every request involving writing, not just the specific writing workflow your skill handles. The narrower you make the trigger condition, the fewer false positives.
Missing context about what the skill does NOT handle — without negative triggers, the classifier has no signal to reject edge cases. A code review skill with no negative triggers fires on "write me some code," because writing code overlaps with reviewing code in the classifier's pattern space.
Description phrasing that matches adjacent domains — a skill for reviewing marketing copy activates on technical documentation requests because "review" and "copy" both appear in the description, and documentation review shares vocabulary with copy review.

"The failure mode isn't that the model is bad at the task — it's that the task wasn't specified tightly enough. Almost every production failure traces back to an ambiguous instruction." — Simon Willison, creator of Datasette and llm CLI (2024)

The same applies to trigger conditions. "Review X" is ambiguous. "Review X for Y on Z" is a specification. The Anthropic skill authoring documentation notes that Claude selects from potentially 100+ available skills using only the name and description fields — making those two fields the entire surface area of the trigger problem (Anthropic, Skill authoring best practices, 2025).

How do I identify which trigger phrases are matching the wrong requests?

Compare the false-positive prompts to your description word by word, looking for shared vocabulary. Vague descriptions that use topic words like "code" or "writing" match far more than intended: real-world testing across 200+ prompts shows unoptimized descriptions activate correctly only about 20% of the time (mellanon, GitHub Gist, Jan 2026). The overlap between prompt and description is your trigger phrase.

Write down three prompts that triggered your skill incorrectly.
Open your SKILL.md and read the description.
For each false-positive prompt, identify which words or phrases overlap with your description.

The overlap is the trigger phrase that's matching too broadly.

Example: Your skill description says "Invoke for content editing and feedback." A false-positive trigger is "Can you give me feedback on this code?" The shared vocabulary: "feedback." That word is broad enough to match any feedback request, not just content editing.

The fix: replace "Invoke for content editing and feedback" with "Invoke when the user shares a written draft (blog post, email, article, or LinkedIn post) for editing, tone review, or structural feedback." The more specific phrasing excludes code review, design feedback, and other "feedback" contexts.

What are negative triggers and how do I add them?

Negative triggers are explicit "do not invoke for" statements in the SKILL.md description field. They give the classifier a second signal: not just what to match, but what to reject. Without them, the classifier has no basis for excluding adjacent patterns that share vocabulary with your skill's trigger phrases. Add them after your positive trigger phrase.

Format:

Invoke for [specific trigger condition].
Do NOT invoke for [exclusion 1], [exclusion 2], or [exclusion 3].

Example before (false positives on code feedback):

description: "Invoke for content review and feedback on written work"

Example after (negative triggers added):

description: "Invoke for content review of written drafts: blog posts, emails, articles. Do NOT invoke for code review, design feedback, or technical documentation."

In our testing across 650 activation trials, descriptions with explicit negative triggers reduced false-positive rates by 34% compared to descriptions with equivalent positive trigger content but no negative triggers (AEM activation testing, 2025).

The tradeoff: each negative trigger consumes description characters. Your description must stay under 1,024 characters total. With an average negative trigger phrase of 40 characters, you have room for roughly 4-6 negative triggers before you start cutting positive trigger content. Note also that Claude Code caps each skill's description in the /skills listing at 250 characters for display purposes, so front-load your most critical trigger phrases at the start of the description field (Anthropic, Claude Code docs, 2025).

For a deeper look at negative trigger design principles, see What Are Negative Triggers and Why Should I Include Them in the Description.

How do I test my description changes without breaking real activation?

Build a test set before you edit the description, not after. Editing first and testing second means you have no baseline: you cannot tell whether a problem existed before your change or your change introduced it. The test set is two lists of five prompts each, written from memory before you look at the description.

Should-trigger list (5 prompts): Prompts that correctly activate your skill. Write these from memory before looking at the description. These represent what the skill is actually for.

Should-not-trigger list (5 prompts): Prompts that are incorrectly activating. Include the specific false-positive cases you observed.

After editing the description, start a fresh Claude Code session. Run each prompt from both lists. A correct fix produces this result:

All 5 should-trigger prompts activate the skill
All 5 should-not-trigger prompts do not activate the skill

If you improve the should-not-trigger list but lose items from the should-trigger list, your negative triggers are too broad. Narrow them. If neither list changes, you didn't actually change the trigger condition — look for whether your edit modified the description or just reformatted it.

This is the Claude A/Claude B method: Claude A is you testing the edited skill; Claude B is a fresh session with no context about what the skill is supposed to do. Always test in Claude B conditions — a fresh session with no prior skill-building context. Testing with concrete example prompts in the description itself raises activation from 72% to 90% in the same benchmark set (mellanon, GitHub Gist, Jan 2026).

What if tightening the description makes the skill miss valid triggers?

This is the specificity tradeoff. A more specific description triggers less often, and that is the point: optimized descriptions activate correctly on roughly 50% of inputs vs. 20% for unoptimized ones, because they exclude false positives the unoptimized version accepted (mellanon, GitHub Gist, Jan 2026). The goal is specificity that covers your real use cases without covering adjacent ones.

Two techniques help:

Use intent language, not topic language — "Invoke when the user wants to publish content" is intent-based. "Invoke for content" is topic-based. Intent-based descriptions are more precise because they capture the purpose, not just the subject matter.
Use positive-AND-negative framing — instead of removing trigger phrases to narrow the scope, add a negative trigger alongside the positive ones. This preserves the positive trigger coverage while explicitly excluding the problem patterns.

Example:

description: "Invoke for publishing and scheduling blog content. Do NOT invoke for content drafting or editing — only for the final publish step."

The positive trigger ("publishing and scheduling blog content") stays in place. The negative trigger ("do NOT invoke for content drafting or editing") adds precision without removing the original scope.

What is the minimum change that fixes the problem?

The smallest correct fix beats the most thorough rewrite. If one negative trigger eliminates all false positives in your test set, add that trigger and stop. Rewriting the whole description introduces activation patterns you have not tested, and a new false positive is worse than the one you started with because now you have two problems instead of one.

The iterative cycle:

Identify the single most common false-positive pattern
Add one negative trigger that excludes it
Test both lists
Repeat for the next false-positive pattern

Most skills reach stable activation with 2-3 negative triggers. Skills that need more than 5 negative triggers are describing their scope too broadly to begin with — the root cause is a trigger condition that's trying to cover too many use cases in one skill. Keep descriptions compact: in a 63-skill library, 21 skills (33%) became invisible to the agent because the combined description budget exceeded Claude Code's context allocation (GitHub issue #13099, anthropics/claude-code, 2025).

For a complete methodology covering all description activation problems, see How Do I Troubleshoot Skill Description Activation Issues Systematically.

FAQ

Most false-positive cases share the same root: a description that names a topic instead of specifying intent. Adding one targeted negative trigger resolves the majority of single-pattern false positives without touching the rest of the description. The questions below address specific edge cases including over-broad exclusions, character budget limits, and multi-skill conflicts.

Q: My skill triggers correctly most of the time but fires on one specific type of request. What's the minimum fix? Add one negative trigger that names that specific type. "Do NOT invoke for [specific pattern]." Test it against your should-trigger list to confirm you haven't broken correct activation. One targeted negative trigger is almost always enough for a specific false-positive case.

Q: I added negative triggers and now my skill doesn't trigger at all. What happened? Your negative trigger is matching too broadly and overriding the positive triggers. Check whether words in your negative trigger also appear in the should-trigger prompts. If they do, narrow the negative trigger. "Do NOT invoke for code" blocks code review tasks, but it also blocks "I wrote some code in this blog post — can you review the post?" Narrow the exclusion: "Do NOT invoke for requests where the primary subject is code, not written prose."

Q: How many negative triggers can I add before the description gets too long? The description field has a 1,024-character hard limit. Count your current description characters. Each negative trigger clause runs 30-60 characters on average. If you're above 900 characters, you need to compress the positive trigger section to make room. Never exceed 1,024 characters — descriptions that exceed the limit are truncated in ways that are hard to predict.

Q: Two of my skills both trigger on the same prompts. How do I figure out which description to change? Read both descriptions and identify the overlapping trigger vocabulary. The skill that should own those prompts keeps its description unchanged. The skill that shouldn't activate changes its description to add negative triggers that exclude the shared vocabulary. If both skills legitimately apply to the same prompts, you may need to consolidate them into one skill or define clearer boundaries in each description.

Q: Can I use examples in my description to help the classifier understand the trigger condition more precisely? Yes, and it reduces classifier ambiguity. In our AEM activation testing, descriptions that included concrete content-type examples reduced false positives by 34% compared to topic-only descriptions of equivalent length. A description that says "Invoke when the user shares text like an email draft, blog post, or LinkedIn update for editing" activates more precisely than one that says "Invoke for writing editing." The examples signal the exact kind of content the skill handles. Keep them concise — examples inside the description count toward the 1,024-character limit.

Q: My skill fires on slash-command invocation but also on unrelated prompts that happen to contain relevant words. How do I separate intentional invocation from accidental triggering? This is the hardest case. Add context requirements to your trigger phrases: instead of "Invoke for review tasks," write "Invoke when the user explicitly asks for a review using words like 'review,' 'check,' or 'evaluate,' and provides a draft or artifact to review." The additional context requirement ("and provides a draft or artifact") filters out prompts that mention review without providing something to review.

Last updated: 2026-04-21