Two ways to run a Claude Code skill. One is explicit. One is intelligent. They are not interchangeable.
TL;DR: Typing /skill-name invokes the skill directly, bypassing classification. Auto-triggering depends on description quality: imperative descriptions reach 100% activation, passive descriptions average 77% across 650 activation trials (AEM skill activation testing, 2026-04-14). Use manual invocation for explicit, complex, or irreversible actions. Use auto-triggering for repetitive workflows where natural language is faster than a slash command.
How does manual invocation work?
Manual invocation is a direct command. When you type /code-review, Claude executes the code-review skill without classification. It does not matter what is in the conversation context before it. It does not matter how the skill's description is written. The slash command bypasses the discovery layer entirely.
This makes manual invocation completely predictable. The skill runs every time, for any input that follows the slash command.
The tradeoff is friction. You have to remember the exact skill name. You have to type it explicitly every time. For workflows you run often, that friction adds up. For workflows you run occasionally, or where you want absolute certainty the right skill runs, manual invocation is the right call.
Manual invocation also works for skills without descriptions. A SKILL.md without a frontmatter description can still be invoked by its slash command. It cannot auto-trigger.
How does auto-triggering work?
Auto-triggering is Claude's classification mechanism. At session start, Claude loads all skill descriptions into a metadata index. When you send a message, Claude compares your request against that index and decides whether any skill should activate, based on semantic intent matching between your words and the trigger conditions named in each description.
The classification runs on natural language: Claude is reading the semantic intent of your message and matching it against the semantic intent in your descriptions. Community benchmarks set the default activation baseline at roughly 50% before description optimization, essentially a coin flip (Ivan Seleznov, medium.com, 2025).
"The single biggest predictor of whether an agent works reliably is whether the instructions are written as a closed spec, not an open suggestion." -- Boris Cherny, TypeScript compiler team, Anthropic (2024)
A description written as a closed spec, naming specific trigger conditions in imperative language, produces reliable auto-triggering. A description written as an open suggestion, describing what the skill "can help with," produces unreliable triggering.
This is not a quirk. It is a structural property of how the classifier works. The classifier needs a clear signal. Vague descriptions give it a vague signal.
What determines auto-trigger reliability?
Description style is the dominant factor. In our AEM testing across 650 activation trials, imperative descriptions achieved 100% activation rates while passive descriptions averaged 77%, a gap that compounds across every team member and every session. That 23-point difference is the cost of a vague description.
A skill that triggers 77% of the time is not a skill with a 77% reliability score. It is a skill that fails randomly. And a randomly failing skill is worse than not having one, because you assume it is running when it isn't.
Imperative description (triggers reliably):
Use this skill when asked to review code for quality, bugs, or style issues. Trigger on: "review this code," "check my code," "code review." Do not trigger on: "explain this code," "refactor this," "add tests."
Passive description (triggers inconsistently):
A helpful skill for reviewing code and providing feedback on quality and best practices.
The imperative version names exact trigger phrases and explicit negative triggers. The passive version gives the classifier no specific signal to match against. Independent replication of the 650-trial experiment found directive descriptions showed 20.6x higher odds of activation versus passive descriptions, significant at p less than 0.0001 (Ivan Seleznov, medium.com, 2025).
Two other factors affect auto-trigger reliability: total skill count and description conflicts. With 30+ skills loaded, classification becomes less precise because the classifier is distributing attention across more descriptions. With two skills that have overlapping trigger phrases, Claude will pick one based on whichever description is more specific, and the result can be unpredictable. Adding a hook-based invocation layer brings activation from 20% (10/50 runs) with a simple instruction hook to 84% (42/50 runs) with a forced eval hook, independent of description quality (Scott Spence, scottspence.com, November 2025).
For a full breakdown of description writing for reliable auto-triggering, see How Do I Write Trigger Phrases That Make My Skill Activate Reliably?.
When should you use manual invocation?
Use manual invocation when failure is not recoverable. Any skill that publishes content, sends a message, commits code, or depletes a resource needs an explicit slash command, not a classifier guess. At high-stakes moments, the 100% reliability of manual invocation is not a preference. It is a requirement.
- Irreversible actions: Any skill that sends a message, publishes content, commits code, or depletes a resource should require explicit invocation. Auto-triggering an irreversible action on a misread prompt is a production incident.
- Low-frequency workflows: If you run a skill once a week, the effort of learning the trigger phrase is not worth it. Just type the slash command.
- High-ambiguity context: A message like "review this" could trigger a code review skill, a PR description skill, or a documentation review skill. If the right skill depends on context that is hard to encode in a description, manual invocation removes the ambiguity.
- During development: When testing a new skill, use slash command invocation to isolate the skill behavior from the classifier. If the skill works correctly via slash command but fails via auto-trigger, the problem is the description. If it fails both ways, the problem is the instructions.
The stakes are real. The 2025 DORA State of AI-Assisted Software Development report found that incidents per PR increased 242.7% as AI coding assistants accelerated delivery without a matching improvement in incident response. Adding autonomous invocation for irreversible skills without explicit controls compounds that risk directly.
When should you use auto-triggering?
Use auto-triggering for high-frequency, predictable workflows where natural language is faster than a slash command. The threshold is roughly 10 or more invocations per day: below that, the description precision required to reach reliable activation does not justify the engineering investment in writing and testing trigger conditions.
If you ask Claude to write a commit message 20 times per day, auto-triggering your commit-message skill saves you 20 explicit slash commands. At that frequency, the description precision required to reach 100% reliability is worth investing in. The Stack Overflow 2025 Developer Survey found that 51% of professional developers use AI tools daily, which means the average developer already has the repetition density to justify auto-triggering for their most common tasks (Stack Overflow, Developer Survey 2025).
Auto-triggering also reduces onboarding friction for team skills. New team members do not need to learn the slash command names. They describe the task naturally and the right skill fires. This matters at scale: LangChain's 2024 State of AI Agents report found that quality is cited as the top barrier to production by one-third of survey respondents, with 51% of organizations already running agents in production (LangChain, State of AI Agents, 2024).
Both approaches can coexist. A skill with a well-written description can be invoked by slash command or auto-triggered. The slash command overrides classification and runs the skill directly. The description enables auto-triggering when natural language is more convenient.
For the full workflow of building skills that auto-trigger reliably, see What's the Typical Workflow for Developing a Skill from Scratch? and Why Your Claude Code Skill Isn't Triggering (and How to Fix It).
FAQ: Manual invocation vs auto-triggering
Slash commands work without a description and give you 100% reliability. Auto-triggering requires a description with imperative trigger conditions and gives you classifier-dependent reliability, ranging from 77% with passive descriptions to 100% with directive ones. The questions below address specific edge cases and the boundary conditions between the two modes.
Can a skill be invoked by slash command without a description? Yes. A SKILL.md without a description frontmatter field can still be invoked with its folder name as the slash command. It cannot auto-trigger. If you only need manual invocation, the description is optional.
What if I type the wrong slash command? Claude Code will respond that the skill was not found. No partial execution, no wrong skill activated. Manual invocation is safe in that sense.
Can two skills have the same slash command name? No. The slash command is derived from the skill folder name, which must be unique within the loaded skill set. If two skills have the same folder name at different install levels, the project-level version takes precedence.
Does auto-triggering still work if I type a slash command before my message? A slash command before your message bypasses classification entirely and runs that specific skill. Auto-triggering is only active when you send a plain natural language message without a leading slash command.
How do I test whether auto-triggering is working? In a fresh Claude Code session (no prior context), send the exact prompts from your trigger condition list as plain messages without slash commands. If the skill activates, auto-triggering is working. If it does not, the description needs revision.
What should I do if my skill auto-triggers when it shouldn't? Add negative trigger phrases to your description. Explicit negative triggers ("do not trigger on: explain, refactor, add tests") reduce false positives significantly. For a deeper look at this, see What Are Negative Triggers and Why Should I Include Them in the Description?.
Last updated: 2026-05-02