When refs/domain-context.md instructs Claude to read refs/edge-cases.md, which instructs it to read refs/legacy-rules.md, the skill has a reference chain. Claude cannot reliably follow that chain. The loading fails silently, partially, or unpredictably. Skill output degrades in ways that are hard to diagnose without fresh-session regression tests.
At Agent Engineer Master (AEM), we see reference chaining in the majority of skills that arrive for repair after 6+ months in production. It's the structural equivalent of a software circular dependency: technically possible, practically broken.
TL;DR: Reference chaining breaks when one reference file points to another. Claude's progressive disclosure model loads files on demand from SKILL.md instructions, not from instructions inside reference files. The fix is the one-level-deep rule: SKILL.md controls loading, reference files do not. Every reference file must work as a standalone, independently useful document.
What is reference file chaining?
Reference chaining is when a reference file contains an instruction to read another reference file. SKILL.md issues read instructions; reference files carry content. When a reference file issues a read instruction, Claude encounters a loading directive inside a file loaded for a separate purpose. The chain fails silently, partially, or not at all. A typical example:
SKILL.md → "Read refs/domain-context.md before evaluating input"
refs/domain-context.md → "For edge case rules, see refs/edge-cases.md"
refs/edge-cases.md → "For legacy behavior, see refs/v1-patterns.md"
Three hops to get to usable content. The Claude Code skill now depends on Claude executing a reading sequence it was never designed to manage.
The one-level-deep rule is the architectural principle that prevents this. Reference files exist one level below SKILL.md. SKILL.md points to reference files. Reference files do not point to other reference files. The chain stops at depth one.
Why does chaining break Claude's loading behavior?
Progressive disclosure works because SKILL.md is the control plane: one instruction, one file loaded, one context injection. Reference files were never designed to issue their own loading directives. Claude has no mechanism for tracking which layer a nested read came from. The result is ignored instructions or a broken primary flow.
When refs/context.md says "for more detail, see refs/edge-cases.md," Claude encounters that instruction inside a file it already loaded for a different purpose. The model is now in a different execution layer than the one that issued the original instruction. In our builds, this breaks in 3 of 4 commissions where it appears: Claude either ignores the nested instruction entirely, or reads the second file and loses its position in the primary skill flow.
"The failure mode isn't that the model is bad at the task — it's that the task wasn't specified tightly enough. Almost every production failure traces back to an ambiguous instruction." — Simon Willison, creator of Datasette and llm CLI (2024)
A chain creates exactly this ambiguity. The primary task competes with the secondary read instruction. One of them loses.
The additional cost is token pressure. If the chain executes fully, all three reference files land in context simultaneously. A skill with three 200-line reference files has loaded 600+ lines of content before accounting for the skill body, the user prompt, and the conversation history. Research from Stanford's NLP Group found that models lose track of instructions placed in the middle of long contexts at a rate that makes mid-context policy placement unreliable for production systems (Liu et al., "Lost in the Middle," 2023, ArXiv 2307.03172). A 600-line reference dump creates that failure condition by design.
The problem compounds with instruction count. Research published at ICLR 2025 found that Claude 3.5 Sonnet's success rate on following all instructions simultaneously drops from approximately 90% on a single instruction to 44% when managing multiple concurrent instructions (Jaroslawicz et al., "Curse of Instructions," OpenReview, 2024). A reference chain forces Claude to manage competing loading instructions as parallel demands, exactly the condition where instruction-following rates collapse.
What are the actual symptoms?
Reference chains produce three failure modes, none of which fire an error. The skill still activates, outputs still look plausible, and basic testing often passes. The damage is structural: Claude loads the wrong files, loads too many, or applies context without correct weighting. Diagnosing these failures requires fresh-session regression tests, not eyeballing outputs.
Token overhead makes this worse. A CLAUDE.md that grew to 1,207 lines over nine months consumed 42,200 tokens per conversation, roughly 40% of a standard 200K context window before any actual work began (Cem Karaca, Medium, 2026). A reference chain loading three extra files on top of an already-large skill body accelerates that burn.
- Partial loading: Claude reads the first reference file and stops. Outputs look almost correct because most of the relevant context loaded, but fail on edge cases that only appear in the unchained files. The skill passes basic testing, ships, and fails on the unusual cases nobody tested for.
- Context overload: Claude follows the full chain and loads everything simultaneously. Attention distributes across all three files without priority weighting. The skill produces outputs that blend all three reference files but don't apply them correctly; the most important rules get the same weight as legacy footnotes.
- Silent degradation: The most dangerous pattern. The skill worked correctly for months. Someone added a reference chain during a maintenance update. No error fires. The skill still triggers. The output gets subtly worse: more generic, less precise, increasingly inconsistent. You won't catch this without running fresh-session tests after every structural change to the skill folder.
How do you fix a reference chain?
Three approaches fix a reference chain, in order of preference. The first two reduce total file count; the third preserves separation without creating a dependency. All three share the same outcome: SKILL.md controls every read instruction, and reference files carry content only.
- Merge the files: If refs/edge-cases.md only exists because refs/context.md got too long, the correct fix is consolidation. One well-organized reference file under 300 lines is better than two files with a dependency between them.
- Flatten into the skill body: If the chained content is small (under 150 lines total across both files), move it back into SKILL.md. The progressive disclosure model exists to offload content Claude doesn't need on every run. Content Claude needs on most runs belongs in the body, not behind an extra file load.
- Load both from SKILL.md: If both files genuinely need to remain separate, cite them both in SKILL.md steps directly: "Step 3: Read refs/domain-context.md and refs/edge-cases.md before evaluating input." Two parallel loads from the skill body. Zero chain. Both reference files become standalone, and neither instructs Claude to read the other.
The third option preserves separation of concerns without the dependency. SKILL.md controls the loading sequence. Reference files carry content, not loading instructions. The architecture this restores is the same one that delivers a 98% token reduction when skills are installed but not active: only the name and description load at startup, roughly 100 tokens per skill (CodeWithSeb, 2026). Reference chains defeat that model by forcing additional file loads from inside already-loaded content.
What's the right architecture for shared reference content?
When two skills need the same reference content, the correct architecture depends on ownership and content size. Shared file paths break the one-level-deep rule if they route through another reference file. Each option below keeps SKILL.md as the sole issuer of read instructions, regardless of where the shared content lives.
- Option A: Duplicate the file. Each skill folder gets its own copy. More maintenance overhead, zero cross-skill dependencies. For skills owned by different teams, this is the right default.
- Option B: Share via the skills directory. A file at
.claude/skills/shared/can be referenced by any skill's SKILL.md directly: "Read .claude/skills/shared/brand-voice.md before drafting." No intermediary chain. - Option C: Inline in the description. For small constants (a threshold, a rule, a specific format), embedding the content in the description field eliminates the file load entirely. This only works for content that fits within the 1,024-character description budget.
Never create a "master-reference.md" that aggregates everything and gets cited by other reference files. That is a reference chain, packaged as a single document. The production sweet spot for a Claude Code skill is approximately 4,400 tokens for the core SKILL.md body, with six reference files providing up to 26,000 tokens of context loaded on demand (Anthropic skill engineering documentation, 2024). A master-reference pattern collapses that separation and forces the entire 26,000 tokens into every run.
For how reference files fit into the loading model, see What Are Reference Files in a Claude Code Skill? and What Are the Three Layers of Progressive Disclosure?. For what belongs in the skill body versus reference files, see Why Shouldn't I Embed Domain Knowledge Directly in SKILL.md?.
Frequently asked questions
The safe boundary is one level deep: SKILL.md loads reference files, reference files carry content. Any reference file that instructs Claude to load another file has crossed that boundary. Skills with five or more reference files are at elevated risk, particularly after maintenance updates that split growing context across new files without updating SKILL.md.
Can a reference file mention another file without creating a chain?
Yes. A reference file can name another file as documentation: "See also: edge-cases.md for unusual inputs." That is informational, not a loading instruction. A chain begins when the reference file tells Claude to read another file. Loading instructions belong in SKILL.md.
How many reference files is too many for one skill?
No hard ceiling, but each file is a potential load operation. In our builds, skills with more than 5 reference files start showing context overload symptoms, especially when multiple files load in the same run. At 6+, audit which files Claude reads on a typical invocation and prune the ones that load on fewer than 20% of runs. The MindStudio analysis of Claude Code skill architecture found that keeping process in SKILL.md and context in reference files is the key structural separation that prevents overload failures in production (MindStudio, 2024).
Does chaining cause trigger failures or output failures?
Output failures, not trigger failures. Reference chains don't affect whether the skill activates. They degrade what happens after it activates.
What if I need Claude to read reference files in a specific sequence?
Specify the sequence in SKILL.md, not inside a reference file: "Step 4: Read refs/base-rules.md, then refs/client-rules.md in that order." Sequential loading from the skill body is predictable. Sequential loading triggered from inside a reference file is not.
Is this the same problem as context bloat?
Related but different. Context bloat is loading content Claude doesn't need. Reference chaining is a structural problem with how content gets loaded. A compact, relevant reference file can still cause chain failures if it instructs Claude to read additional files.
My skill has been working fine for 6 months. Should I check for chains?
Yes. Reference chains often form during iterative maintenance as knowledge gets split across files. The skill works because the most common inputs only need the first reference file. The chain fails on less common inputs, and the failure is subtle enough to miss without targeted testing.
Last updated: 2026-04-18