At What Point Does Adding More Rules to a Skill Start Making It Worse?

The answer is not a line count. It is a ratio. When rules start outnumbering process steps by more than 3:1, Claude begins attending to rules selectively rather than exhaustively. The skill does not break dramatically — it starts drifting in quiet ways: following most rules most of the time, skipping the low-salience ones based on context pressure.

TL;DR: Adding rules to a Claude Code skill improves reliability up to roughly 8-12 rules per skill body (AEM, 2025). Beyond that, rule density causes selective adherence: Claude follows the rules it sees as most contextually salient and deprioritizes the rest. The fix is not removing rules arbitrarily. It is structuring constraints into the process steps themselves rather than listing them separately.

Why does rule count eventually hurt performance?

Claude processes skill instructions in sequence, but it does not weight them equally. Instructions near the beginning of a context block and instructions tied directly to a specific action receive more adherence than instructions placed in a "rules" or "never do this" section at the end of a SKILL.md body (Nelson Liu et al., Stanford NLP Group, "Lost in the Middle," 2023, ArXiv 2307.03172).

A rule section is a late-context block by design. You write process steps first, then append rules and constraints at the end. That placement means every rule competes for attention against the semantic weight of everything above it. With 5 rules, that competition is manageable. With 20 rules, the low-salience ones drop out of effective working context.

This is not a flaw in Claude. It is an artifact of how transformer attention works. Instructions with clear referents — "in step 3, before writing the file, check for em dashes" — receive reliable attention because the referent is specific. Rules phrased as general prohibitions — "never use passive voice" — are abstract and context-dependent, which makes their salience variable.

The practical result: long rule lists create fair-weather skills. The skill follows the rules when the task is simple and unambiguous. It starts dropping rules when the task is complex or the rules begin to conflict with what the process steps are leading toward.

What are the signs that your skill has too many rules?

Three patterns appear when rule density exceeds the effective threshold: selective adherence, where Claude follows high-salience rules and quietly drops the rest; rule conflict without resolution, where two rules compete and the lower-salience one loses every time; and rule creep, where each individual addition was correct but the cumulative count passes the point of reliable enforcement.

Selective adherence. The skill follows some rules consistently and ignores others. You notice the ignored rules tend to be the ones added later, or the ones phrased most abstractly. You add a note in the rule to "always remember" this constraint, which makes no difference.
Rule conflict without resolution. Two rules exist in the same section that are technically compatible but practically hard to satisfy simultaneously. Claude resolves the conflict by satisfying the rule with higher contextual salience and treating the other as background guidance. The resolution is not random, it is consistent, but it is also not what you intended.
Rule creep. The skill started with 4 rules, now has 19. Each rule was added to fix a specific failure observed in production. The failures stopped after each addition. You cannot tell which rules are still active, which are redundant, and which now conflict with each other.

"The single biggest predictor of whether an agent works reliably is whether the instructions are written as a closed spec, not an open suggestion." — Boris Cherny, TypeScript compiler team, Anthropic (2024)

The irony of rule creep is that each individual rule addition was correct. The problem is cumulative. In our work on skills that have gone through 6 or more iteration cycles, roughly 40% of the rules added after iteration 3 are redundant with constraints already encoded elsewhere in the skill body (AEM internal review, 2025-2026).

Where is the practical threshold?

In production skills, performance starts degrading at 12-15 rules in a single SKILL.md body (AEM, 2025). Below 8 rules, the relationship is linear: each rule added produces a measurable reliability improvement on the target failure mode. Above 12 rules, each additional rule has diminishing returns and starts introducing indirect interference with adjacent rules.

The threshold is not just about count — it is about placement and specificity:

Rules embedded within process steps ("Before saving the file in step 4, verify there are no em dashes in the output") hold at any density level
Rules placed in a dedicated section at the end of the skill body follow the 12-rule threshold
Abstract rules ("maintain quality throughout") have effective weight near zero regardless of placement

The 500-line SKILL.md limit exists for related reasons. Claude Code's internal skill loading starts showing attention dilution above approximately 400-450 lines (Claude Code documentation, 2025). Rule sections that push a skill past that length do not just add constraints — they dilute adherence to everything else in the skill body.

How do you reduce rules without losing coverage?

Reducing rules without losing coverage requires consolidation, not deletion. The four-step process below groups redundant rules by the failure they prevent, embeds surviving rules inside the process steps that trigger them, moves domain constraints to reference files, and deletes anything that duplicates a structural constraint. The result is a tighter skill body with higher per-rule adherence.

Step 1: Group rules by the failure they prevent. Most rule sections, when audited, contain 3-5 rules that all prevent the same category of failure. Group them. A skill with 18 rules about "output quality" often reduces to 4 rules covering distinct failure categories.
Step 2: Embed rules inside process steps. Every rule that applies to a specific step should be written into that step, not into a separate rules section. "In step 3, before outputting the summary, check that no paragraph exceeds 3 sentences" is followed more reliably than a global rule saying "paragraphs must be under 3 sentences."
Step 3: Move domain constraints to reference files. Rules that encode domain knowledge ("the taxonomy of error types," "the approved list of output formats") belong in reference files loaded on demand, not in the SKILL.md body. This keeps the body focused on process, not domain.
Step 4: Delete redundant rules without hesitation. Any rule that duplicates a constraint already enforced by a process step is noise. Delete it. The skill will not perform worse: the constraint is already there.

What should you do instead of adding more rules?

When you hit a production failure and the instinct is to add a rule, pause and ask the specific question: "Where in the process did this failure occur?" If the failure happened in step 3, the constraint belongs in step 3, not in a general rules section.

The alternative to rules, in most cases, is tighter process steps. A step that says "write the first draft" allows failures that a step saying "write the first draft in 3 sections: overview (2 sentences), details (5 bullet points), and next action (1 sentence)" does not allow.

Structural constraints built into process steps outperform rules sections in reliability by approximately 2:1 in our production testing (AEM, 2025). The reason is specificity: a process step with structural constraints gives Claude a template, not a prohibition. Templates are easier to follow than prohibitions.

For the broader architecture question, see Why Can't I Just Put Everything in One Big SKILL.md File and How Does Progressive Disclosure Save Tokens and Improve Performance.

Frequently asked questions

The rules-to-coverage tradeoff raises practical questions about threshold enforcement, testing, and refactoring. The short answer: keep rule count under 12 in any single SKILL.md body, test each rule in isolation with 5 fresh sessions, and refactor the architecture before adding any rule to a skill already at the attention dilution threshold.

Should I ever add rules to a skill that already has 15?

Only if you can remove two existing rules first. The target is a maximum of 12 rules in any single SKILL.md body. If you cannot consolidate or delete existing rules to make room, the new rule should go into the relevant process step as a structural constraint, not into the general rules section.

How do I tell if a rule is being followed or ignored?

Design a test case that specifically requires the rule to activate. Run it 5 times in fresh sessions. If the rule is followed fewer than 4 of 5 times on a straightforward test case, the rule is not receiving reliable attention. Move it into the relevant process step.

Can conflicting rules cancel each other out entirely?

Yes. When two rules directly conflict — "always output 3 options" and "output only the single best recommendation" — Claude resolves the conflict based on which rule is more salient in context. The resolution is consistent within a session but varies across sessions. Neither rule is reliably followed. Delete one.

Is a rule that I wrote for an edge case still worth keeping?

If the edge case appears less than 1 in 20 invocations and the consequence of missing it is low, the rule is probably noise. Edge-case rules belong in edge-cases.md as a reference file, not in the SKILL.md body. The body is for rules that apply to a substantial fraction of invocations.

My skill is at 400 lines and I keep finding more things to fix. What's the right move?

Refactor the architecture. A 400-line SKILL.md is close to the attention dilution threshold. The correct move is extracting domain knowledge into reference files, reducing the rule section to the core constraints, and letting the process steps carry the structural guidance. A well-refactored 200-line skill outperforms a 400-line one on almost every quality metric.

Last updated: 2026-04-20