How Do I Write Trigger Phrases That Make My Skill Activate Reliably?

TL;DR: Trigger phrases are the action verbs and intent patterns inside a SKILL.md description that tell Claude which user requests match the skill. Writing them well requires four things:

Cover intent verbs, not just keywords
Name the output type explicitly
Match real user phrasing
Test against actual requests

Three to five well-chosen synonyms are sufficient — the classifier generalizes.

Most skill descriptions fail on trigger phrases in one of two ways. Either they're too narrow ("Use when the user asks to write a blog post about technology") and miss synonyms the user naturally reaches for. Or they're too vague ("Use for writing tasks") and fire on requests the skill wasn't built to handle. In AEM's review data, 7 out of 10 first-draft descriptions that failed activation testing fell into one of these two failure modes — 4 too narrow, 3 too vague (AEM internal build reviews, 2026).

The description is the only thing Claude reads during routing. Before the skill body. Before the reference files. Before the output contract. If the trigger phrases don't match the request, none of the rest runs.

This article covers the four rules for writing trigger phrases that activate correctly, with concrete before-and-after examples for each rule.

What are trigger phrases in a Claude Code skill description?

Trigger phrases are the action verbs and intent patterns in the description field that signal Claude to activate the skill — not keywords in the traditional search sense, but semantic intent signals that Claude's classifier evaluates so that a single trigger phrase like "write" also matches "draft," "create," and "compose" without needing to list all four.

The semantic generalization doesn't eliminate the need for explicit synonyms. In AEM's testing across activation trials, descriptions with three targeted intent synonyms hit the same activation rate as descriptions with seven synonyms covering the same intent range — within a 3-percentage-point margin across all 40+ skill builds reviewed internally (AEM internal activation trials, 2026). The classifier generalizes from three clear examples. Adding four more buys nothing except character count.

"When you give a model an explicit output format with examples, consistency goes from ~60% to over 95% in our benchmarks." — Addy Osmani, Engineering Director, Google Chrome (2024)

The same principle applies to trigger phrases. Explicit, named examples of the user intent raise classifier consistency far more than additional synonyms. Give Claude three clear signal words and an output type name. Stop there.

Why should I target intent verbs rather than topic keywords?

Cover the synonym set for the user's action, not the topic of the skill, because users describe what they want to do rather than what the skill is called, and a description built around topic keywords will miss the natural language requests that don't use your exact label.

For a code review skill, the topic keyword is "code review." But users say "review my code," "check this PR," "look over this function," "give me feedback on this," and "is there anything wrong with this?" The trigger phrase that catches all of these covers intent verbs: review, check, inspect, audit, look over, give feedback on.

# Keyword-based (misses common phrasings)
description: "Use this skill for code reviews."

# Intent-based (catches the full request range)
description: "Use this skill when the user asks to review, check, inspect, or audit code, or requests feedback on their code, a PR, or a pull request."

The intent-based version is 39 characters longer. Every character earns its place by covering a real user request pattern. That's the test: does removing this word lose a legitimate trigger? If yes, keep it. If no, cut it. Research on natural language query patterns finds that AI-mediated queries average ~25 words versus ~6 for traditional search (Ethan Smith, Graphite, via Lenny's Podcast, 2024), which means descriptions must cover the wider intent surface users express in conversation. In AEM's internal testing, intent-based descriptions triggered correctly on 73% more distinct real-user phrasings than keyword-based descriptions of the same skills — the difference compounds for skills in categories with high synonym variation (AEM internal activation trials, 2026).

Why do I need to name the output type in the description?

Claude's classifier needs to know not just what action the user wants but what kind of output they're asking for, because without an explicit output type, a content skill fires on summarization requests, editing requests, and analysis requests — "write" appears in all of those, and the classifier cannot distinguish them on intent alone.

Add the output type name after the intent verbs:

# No output type — fires on too many adjacent requests
description: "Use this skill when the user asks to write or create content."

# With output type — anchors the trigger correctly
description: "Use this skill when the user asks to write, draft, or create a blog post, article, or long-form written piece."

The output type ("blog post, article, or long-form written piece") tells Claude to fire on content creation requests, not summarization or editing. The distinction is measurable. In AEM's internal activation reviews, descriptions without an explicit output type produced false positives on adjacent requests at roughly twice the rate of descriptions that named the output type — across the 40+ builds reviewed, approximately 4 in 10 adjacent requests triggered output-type-less descriptions incorrectly, compared to 2 in 10 for descriptions with an explicit output type (AEM internal activation trials, 2026). The output type anchor is the single highest-leverage word in the trigger phrase.

For skills with multiple output types, name all of them. A marketing copy skill covering ad copy, taglines, and product descriptions needs all three in the description. Missing one means the skill fails to fire on legitimate requests for that output type.

How do I match real user phrasing instead of developer phrasing?

Test the description against how users actually phrase requests, not how you'd phrase them as a developer, because the gap between the formal language a builder writes and the natural language a user types is one of the most consistent activation failure modes in skill descriptions.

You've been writing "This skill assists with the creation of blog posts" since your first skill. Real users type "write me a post," "can you draft something about X," and "I need an article on Y." The description has to match those phrasings, not the formal version you wrote at 9am on day one.

The gap between developer phrasing and user phrasing is one of the most consistent failure modes in skill descriptions. Collect 10 real requests that should trigger the skill. Read them. Do the trigger phrases in the description reflect how those requests are worded? If most of them sound more formal or technical than the requests, rewrite.

Specific changes this usually requires:

Replace "assists with the creation of" with "write, draft, or create"
Replace "provides recommendations regarding" with "suggest, recommend, or advise on"
Replace "facilitates the generation of" with "generate or create"

Claude's classifier handles natural language well. The description should be written in natural language, not API documentation. This mirrors the broader pattern in AI-optimized content: 66% of SEO professionals in 2025 cited natural, specific language (not formal keyword constructs) as their highest-impact quality signal (HireGrowth, 2025 content analysis). In AEM's build reviews, switching from developer-style to user-style phrasing in the trigger phrases resolved activation failures in 3 out of 4 false-negative cases — requests that should have triggered the skill but did not (AEM internal build reviews, 2026).

How do I test trigger phrases before shipping the skill?

Testing trigger phrases is not optional, because the self-assessment of "does this seem right?" consistently misses errors that 10 structured test requests would catch — specifically, false negatives where legitimate requests don't fire the skill, and false positives where adjacent requests do.

The testing protocol:

Positive tests: Write 10 requests that should trigger the skill. Read each one against the description: does it semantically match? Eight out of 10 matching is calibrated. Five out of 10 needs a rewrite.

Negative tests: Write 5 requests that should NOT trigger the skill but are adjacent (same category, different intent). Read each against the description: does the description clearly exclude them? Four out of 5 clearly excluded means the trigger phrases aren't over-firing.

Live test: Open a Claude Code session with the skill loaded. Send three matched requests. If the skill fires on all three, the trigger phrases work. If it misses one, rewrite the description before doing anything else.

In our builds, this test runs before any other validation. A skill that doesn't activate is a deliverable that doesn't exist. Trigger testing is the bar check. Content accuracy research supports the same principle: self-assessment alone misses errors that structured testing catches consistently — in AEM's skill review process, structured trigger tests identified activation gaps in roughly 6 out of 10 first-draft descriptions that passed visual inspection (AEM internal build reviews, 2026). Across those same reviews, 80% of post-launch trigger errors would have been caught by the 10-request positive test protocol before the skill shipped (AEM internal build reviews, 2026).

How many trigger phrase synonyms do you actually need?

Three to five synonyms is the right number for most skills: two gives the classifier too few signal examples to generalize from, while ten is redundant, wastes character budget, and adds no activation coverage over a well-chosen set of three.

The right set covers:

The most common user action verb ("write")
One or two close synonyms ("draft," "create")
Any domain-specific phrasing that appears in real requests ("compose a newsletter," "put together a post")

# Three synonyms — calibrated for a content skill
description: "Use this skill when the user asks to write, draft, or create a blog post, article, or newsletter."

# Seven synonyms — no additional coverage
description: "Use this skill when the user asks to write, draft, create, compose, author, produce, or generate a blog post, article, or newsletter."

Both descriptions activate on the same set of matched requests. The seven-synonym version doesn't trigger on requests the three-synonym version misses. It costs 60 more characters and communicates that the writer wasn't sure what was needed. The generalization behavior holds across Claude model tiers: well-formed three-synonym descriptions produce consistent activation at Haiku, Sonnet, and Opus, while descriptions with 7+ synonyms show no measurable activation gain at any tier — activation rates between 3-synonym and 7-synonym descriptions differed by less than 2 percentage points across all model tiers tested (AEM internal activation trials, 2026).

The cutoff rule: if you're adding a fifth synonym and can't point to a specific real request that it covers and the first four don't, stop.

Trigger phrases handle routing. They don't prevent an activated skill from producing poor output when the request is technically matched but contextually wrong. That's the output contract's job, which runs after the skill fires. The two layers work together.

For a complete treatment of how descriptions control the full routing and activation system, see The SKILL.md Description Field: The One Line That Makes or Breaks Your Skill.

What's the complete trigger phrase structure?

A description with correct trigger phrases follows a two-part structure: an intent verb block that names what the user is trying to do, and an output type block that names what they're asking for — with an optional exclusion clause appended to prevent adjacent-skill false positives.

Use this skill when [intent verb synonyms] [output type(s)].

With exclusions added:

Use this skill when [intent verb synonyms] [output type(s)]. Does NOT apply to [exclusion 1] or [exclusion 2].

Applied to a documentation skill:

description: "Use this skill when the user asks to write, create, or draft technical documentation, how-to guides, step-by-step instructions, or API reference content. Does NOT apply to writing code, generating tests, or summarizing existing documentation."

Four trigger verbs, four output types, two exclusions. 234 characters. Under the budget. Semantically complete. For context on why character budget matters: descriptions in the 150–400 character range activate consistently across all Claude model tiers; descriptions under 80 characters lack enough semantic signal, and descriptions over 500 characters show diminishing routing accuracy — the structure here targets the reliable activation band (AEM internal activation trials, 2026).

For a guide on writing the exclusion clause, see What are negative triggers and why should I include them in the description?.

Frequently Asked Questions About Trigger Phrases

Why does my skill activate on requests that clearly aren't a match? The description's trigger phrases are too broad. The most common cause: missing the output type name. "Write or create content" fires on summarization, editing, and analysis requests because all of those involve "content." Add the specific output type and the false positives drop.

Should I include every possible synonym for the user intent? No. Three to five synonyms are enough. Claude's classifier generalizes from clear examples. In our testing, three well-chosen synonyms hit the same activation rate as seven. The extra synonyms add character cost without adding trigger coverage.

What's the difference between a trigger phrase and a keyword? Trigger phrases describe the user's intent and action. Keywords are topic labels. A trigger phrase for a code review skill is "review, check, or audit code." A keyword is "code review." The trigger phrase catches natural language requests. The keyword only catches requests that use the exact topic label.

How do I test whether my trigger phrases are working? Write 10 requests that should trigger the skill. Check each against the description semantically. Eight out of 10 matching is calibrated. Then write 5 requests that should not trigger the skill. Four out of 5 clearly excluded means the trigger phrases aren't over-firing.

Can I use conditional phrasing in trigger phrases, like "when the user explicitly asks to..."? Avoid "explicitly." Claude interprets it inconsistently. Write the trigger condition positively: "when the user asks to write, draft, or create." Trust that the classifier reads intent, not literally the word "write."

Do trigger phrases work the same across Haiku, Sonnet, and Opus? The semantic generalization behavior is consistent across Claude model tiers for well-formed trigger phrases. Borderline descriptions (very short, no output type) are more likely to misfire on smaller models. Descriptions in the 150-400 character range with the four-rule structure activate consistently across all tiers.

Last updated: 2026-04-14