If Claude follows some of your SKILL.md rules and ignores others, the problem is structural, not random. Claude isn't skipping rules arbitrarily. It's applying the same attention model to your skill file that it applies to everything: content at the beginning and end of the context gets more attention than content in the middle. Your rules that matter are sitting in the middle.
TL;DR: Rule non-adherence in Claude Code skills has three causes — positional burial (rules too deep in the file), density overload (too many rules for the attention budget), and vagueness (rules Claude can't operationalize). The fix is converting rules to numbered steps, front-loading your most important constraints, and pruning redundant rules from flat lists down to precise, enforceable ones.
Why does Claude ignore rules in long SKILL.md files?
The cause is a documented attention bias called the "Lost in the Middle" effect. Models attend strongly to the beginning and end of their input and poorly to the middle. Rules buried past line 150 of a SKILL.md file sit in that low-attention zone, which is why adherence drops without any obvious failure signal.
Liu et al. (2023, ArXiv 2307.03172) measured a 20-percentage-point accuracy drop in multi-document question answering when the relevant document moved from position 1 to position 10 in a 20-document context: roughly 75% accuracy at position 1 versus 55% at position 10 (Stanford NLP Group, "Lost in the Middle," TACL 2024). The same positional penalty applies to instructions in SKILL.md.
A list of 25 rules is not a skill. It's a terms-of-service agreement. Claude reads both the same way.
In our bar check process at AEM, we test rule adherence in fresh sessions. In Claude Code skills with more than 15 rules in a flat list, we see 40–60% adherence on the rules below the 150-line mark. When the same content is restructured into numbered steps, adherence exceeds 85% on equivalent tests.
The reason: steps are procedural. Claude is designed to follow procedures sequentially. Rules are policy. Policy requires Claude to hold an abstract constraint in working memory while executing a completely separate procedure, and to apply that constraint at the right moment without a structural cue to do so. Calibration approaches that explicitly address positional attention bias improve long-context retrieval by up to 15 percentage points over baseline (Junqing He et al., "Found in the Middle," ACL Findings 2024, ArXiv 2406.16008).
What are the three specific failure modes?
The three causes of rule non-adherence in SKILL.md files are positional burial (rules placed too deep in the file), density overload (more rules than the attention budget can hold), and vagueness (rules Claude cannot operationalize into a pass/fail test). All three can exist simultaneously in the same file and compound each other.
Positional burial: Rules placed past line 150 of a SKILL.md file are in the attention danger zone. This doesn't mean they're always ignored. It means their adherence is unreliable. The skill might follow them 7 times out of 10 in isolated testing, and 4 times out of 10 in production where the context window has more competing content. Positional burial is a consistency problem, not an always-fails problem. That makes it harder to diagnose.
Density overload: A flat list of 20+ rules creates an implicit priority problem. Claude has no structural cue about which rules matter more than others. When two rules could apply to the same situation, and there's no hierarchy, Claude interpolates. That interpolation is not your intended behavior. InFoBench (ACL Findings 2024) found that even GPT-4 fails to fulfill over 10% of individual requirements in complex multi-constraint scenarios when evaluated per-constraint rather than at aggregate output level, and that failure rate climbs as constraint count rises. AGENTIF (NeurIPS 2025, ArXiv 2505.16944) puts a harder number on the ceiling: across 50 real-world agentic tasks averaging 11.9 constraints per instruction, the best-performing model achieved a 27.2% Instruction Success Rate, meaning it satisfied all constraints in fewer than 3 out of 10 instructions.
Vagueness: Some rules fail because Claude cannot operationalize them. "Always maintain a professional tone" is not enforceable: Claude's model of "professional" may not match yours. "Never use exclamation marks or em dashes in the output" is enforceable. The first is a preference. The second is a testable constraint.
"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." — Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)
How do you fix a rules-heavy SKILL.md?
Four changes fix the majority of rule non-adherence: embed constraints inside the steps that trigger them, move non-negotiables to the first 50 lines, collapse redundant rules to one precise statement, and cut any rule you can't write a test for. In our AEM bar checks, these four changes together lift adherence from 40-60% to above 85%.
- Convert rules to steps: The adherence gain from embedding constraints in steps is larger than any other single change. In our testing, moving from flat rule lists to step-embedded constraints produces the jump from 40-60% adherence to above 85%. Addy Osmani's benchmarks at Google Chrome document the same pattern: "When you give a model an explicit output format with examples, consistency goes from ~60% to over 95%" (Addy Osmani, Engineering Director, Google Chrome, 2024). Instead of a "Rules" section with a bullet list, embed the constraint at the point in the process where it applies:
# Before: rules list
## Rules
- Never include statistics without a source citation
- Always bold the key term in each paragraph
- Output must be 150-200 words
# After: constraint embedded in step
## Step 3 — Draft the section
Write 150-200 words. Bold the key term in each paragraph.
Cite every statistic with the source name and year in parentheses.
The constraint is now part of the step that triggers it. Claude encounters it at the right moment. It doesn't need to hold the constraint in memory across unrelated steps.
Front-load the non-negotiables: Constraints that apply to every output (format, length, tone, prohibited content) belong in the first 50 lines of SKILL.md. Not because Claude reads sequentially and stops, but because early-context instructions have the highest adherence probability. If a rule is load-bearing, put it where the attention is highest.
Consolidate redundant rules: An audit of the rules section typically reveals 3-5 rules that say the same thing in different words. "Don't use jargon" and "Write for a non-technical audience" and "Avoid technical terms without definitions" are three rules expressing one constraint. Collapse them: "Write for a reader with no technical background. Define any term that a non-developer would not recognize."
Prune unenforced rules: Rules that Claude can't verify from the output are not enforceable. "Sound confident" fails. "Do not use hedging words (perhaps, might, seems, could be) in declarative statements" passes. If you can't write a test case that distinguishes adherence from violation, the rule is too vague to enforce.
What's the correct structure for behavioral constraints?
Production Claude Code skills divide behavioral constraints into three named categories: step-level, output-level, and always-on. Chroma's context rot research (2025) tested 18 frontier models and found every one degrades as input length grows, which makes the 3-5 cap on always-on constraints a hard rule, not a suggestion.
Step-level constraints: Rules that apply during a specific step belong inside that step's description. This is where most constraints should live.
Output-level constraints: Rules that describe the required characteristics of the final output belong in the output contract section. "The output must be a numbered list under 300 words with no markdown beyond the list formatting" is output contract language, not a rule.
Always-on constraints: Rules that apply throughout the entire execution belong in the first section of SKILL.md, clearly labeled. Limit this to 3-5 constraints maximum. If you have 12 "always-on" rules, you have a density problem.
For how to write step-by-step instructions that Claude actually follows, see How Do I Write Step-by-Step Instructions for a Claude Code Skill?. For why a bloated SKILL.md causes this problem and how to reduce it, see What Is Context Bloat and How Does It Hurt Skill Performance?. For the full list of structural anti-patterns that affect skill reliability, see What Are the Most Common Mistakes When Building Claude Code Skills?.
Frequently asked questions
The most common SKILL.md rule adherence issues come down to three variables: rule count (above 15 flat rules, adherence becomes unreliable), rule position (past line 150, attention drops), and rule format (vague preferences fail where testable constraints pass). The questions below address each in detail.
How many rules is too many in a SKILL.md file?
More than 10 in a flat rules list is a warning sign. More than 15 is a structural problem. The number itself isn't the constraint — the constraint is whether Claude can hold all rules in effective working memory during a single invocation. At 15+ flat rules, it can't.
Does the order of rules in a SKILL.md file matter?
Yes. Rules in the first third of the file have higher adherence than rules in the second or third. Within the first third, the very first 10–20 lines have the highest attention. If you have a rule that's non-negotiable, put it first.
My skill has a "Rules" section at the bottom. Is that the problem?
Almost certainly yes. The bottom of a long file is slightly better than the middle for attention, but still far worse than the top. Move the load-bearing rules to step-level constraints and put anything truly always-on near the beginning.
Why does my skill follow rules in testing but not in production?
Two likely causes: testing with short contexts that don't dilute attention, or testing in a session where the skill was just discussed (residual context provides reinforcement). In production, the session starts fresh with a longer, denser context. Test rules adherence in fresh sessions with realistic context loads.
Can I use headers to organize rules without converting them to steps?
Headers help with navigation but don't improve adherence. The attention problem is about position in the context, not visual organization. A rule under a header at line 250 still has the adherence characteristics of line 250. Convert to steps, not to headed sections.
Last updated: 2026-04-18