Reference files are documents a Claude Code skill loads on demand during execution. They live outside SKILL.md and carry content that does not belong in the process instructions:
- output templates
- domain knowledge
- approved examples
- API documentation
- field definitions
AEM (Agent Engineer Master) uses reference files in every production-grade Claude Code skill to keep SKILL.md short and focused on process logic.
They exist because SKILL.md has a practical ceiling. Put everything in one file and performance degrades: GPT-4 showed 15.4% accuracy degradation when context extended from 4,000 to 128,000 tokens (Context Discipline and Performance Correlation, arXiv:2601.11564, 2025). Pull the right content into separate files and the skill body stays clean, the files stay maintainable, and Claude loads only what it needs for each run.
TL;DR: Reference files are the knowledge layer of a Claude Code skill. They sit in a references/ subfolder, load on demand when a process step explicitly reads them, and must be self-contained (no chains to other files). SKILL.md stays short. Reference files carry the detail.
Where Do Reference Files Go in a Skill?
Reference files live in a references/ subfolder inside the skill folder, alongside SKILL.md in the skill's root directory — a Claude Code convention that signals content type to anyone reading the skill structure, separates knowledge from process logic, and keeps the file tree predictable across every skill in the library, whether you have two reference files or twelve. The full structure:
.claude/skills/
your-skill-name/
SKILL.md
references/
output-template.md
field-definitions.md
approved-examples.md
assets/
scripts/
templates/
The references/ folder is a Claude Code convention, not a technical requirement. You can name the folder differently, but references/ signals the content type clearly to anyone reading the skill structure.
Reference files can be any plain text format: markdown (.md), JSON (.json), YAML (.yaml), or plain text (.txt). Binary formats — images, PDFs, spreadsheets — are not useful as reference files because Claude cannot read them natively. If you need structured data, convert it to JSON or YAML first.
What Goes in a Reference File vs SKILL.md?
Reference files carry knowledge that changes independently of the skill's process logic — content you can update without touching the process steps or triggering a full retest — while SKILL.md carries the process steps, rules, and trigger conditions that define what the skill does and in what order. The practical decision test is one question: if this content changes, does the process need to change? If yes, SKILL.md; if no, a reference file.
| Content Type | Where It Belongs |
|---|---|
| Process steps (what to do) | SKILL.md |
| Rules (what not to do) | SKILL.md |
| Overview and trigger conditions | SKILL.md |
| Output format template | Reference file |
| Field definitions and examples | Reference file |
| API documentation | Reference file |
| Approved output examples | Reference file (assets/) |
| Domain taxonomy or glossary | Reference file |
A competitive analysis skill's SKILL.md tells Claude: "Step 3: Read references/output-template.md and produce the output in that format." The reference file itself contains the JSON structure with field names, types, and example values. When the format changes — new field added, length limit adjusted — you update the reference file. The process step does not change.
Without reference files, every format change is a SKILL.md edit. SKILL.md edits affect the whole skill and require retesting the full process. Reference file edits affect only the content they carry.
The benefit of explicit output templates is measurable. Giving a model an explicit output format with examples pushes output consistency from around 60% to over 95% (Addy Osmani, Engineering Director, Google Chrome, 2024). That gap is exactly what a well-structured reference file closes.
"Models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems." — Nelson Liu et al., Stanford NLP Group, "Lost in the Middle" (2023, ArXiv 2307.03172)
A 900-line SKILL.md is a reference file that lost its way home. All that content loads into the middle of Claude's context window, where instruction fidelity is lowest. Reference files keep that content out of the middle and load it only when the relevant step runs.
The performance gap is measurable. In multi-document question answering tasks with 20 documents, accuracy dropped by more than 30% when the relevant document was placed in positions 5–15 rather than at position 1 or 20 (Liu et al., Transactions of the Association for Computational Linguistics, 2024).
How Does Claude Load a Reference File?
Reference files do not load automatically: each one loads only when a process step explicitly names it by path, keeping Claude's context window lean by loading only what the current step requires, and ensuring that unrelated knowledge from other reference files never occupies tokens that belong to the active task. A step that does not mention a reference file will not read it, even if that file is directly relevant.
The process step instruction pattern:
## Process
1. Read the user's input and identify the company name and target market.
2. Read `references/output-template.md` to load the exact output format.
3. Analyze the company's competitive position based on the loaded template.
4. Produce output in the format specified in `references/output-template.md`.
This is deliberate. If all reference files loaded at startup, a skill with 5 reference files averaging 200 lines each would add 1,000 lines to Claude's context on every run — including runs where only 1 reference file is needed. On-demand loading means Claude reads what it needs, when it needs it.
The cost of unnecessary context loading is not hypothetical. Chroma's 2025 research tested 18 frontier LLMs and found that every model exhibits continuous performance degradation at every input length increment tested, with the decline starting well before the context window approaches its limit (Chroma Research, "Context Rot," 2025). Across models, the gap between advertised and effective context window can reach 99%, with most top-performing models showing measurable accuracy degradation by 1,000 tokens (Paulsen, "Context Is What You Need," arXiv:2509.21361, 2025).
Two consequences of this design:
- Name the file explicitly: A step that says "load the appropriate template" without naming a file may or may not work. It depends on whether Claude can infer what "appropriate template" means. Name the file explicitly.
- Condition the load in the step: If you want Claude to use a reference file for certain inputs and not others, build that condition into the process step: "If the user requests a comparison of more than one company, read
references/multi-company-template.mdand use that format instead."
What Is the One-Level-Deep Rule?
The one-level-deep rule is a Claude Code architecture constraint: reference files cannot chain to other reference files, so every file must be fully self-contained — meaning all information a reference file needs to be useful must reside within that file itself, not in another file it points to. If output-template.md includes "See also field-definitions.md for field types," Claude will not follow that pointer. It reads output-template.md and stops. Any content the file points to is silently ignored at runtime, with no error or warning.
This means reference files must be self-contained. Every piece of information a reference file needs to be useful must be in that file — not in a file it points to.
Practical design approach: if you find yourself wanting to link between reference files, the linked content belongs inside the referring file. Duplicate it if necessary. Two files with some overlapping content are more useful than a chain that Claude cannot follow.
In our builds, the one-level-deep rule catches a specific design pattern we call the taxonomy trap: a skill has a taxonomy reference file that points to 8 category definition files, one per category. Claude reads the taxonomy file, sees the pointers, and stops — it does not read the definitions. The fix is to consolidate the category definitions into the taxonomy file, or to have the process step load each definition file individually when needed.
Even setting aside the chaining constraint, keeping reference files self-contained protects against the underlying context mechanics. Research across five open- and closed-source models found that performance degradation occurs well within models' claimed context lengths, independent of retrieval failures — meaning a long chained read, even if Claude could follow it, would still produce less reliable output than a single focused file (Hsieh et al., "Context Length Alone Hurts LLM Performance Despite Perfect Retrieval," arXiv:2510.05381, 2025).
What Are Reference Files Not Good For?
Reference files are designed for static or slowly-changing knowledge — content that does not change between runs and has no runtime dependencies on user inputs or external systems — and three categories of content fall outside that scope and will create maintenance problems if you use a reference file for them.
- Dynamic data: Reference files are read at the time the step runs. They do not fetch live data. If you need current pricing, today's news, or real-time API responses, use an MCP tool — not a reference file.
- Computed values: Reference files are documents, not scripts. They cannot produce output based on inputs. If you need Claude to compute something from input data, that logic goes in a process step or a deterministic script in the
assets/folder. - Session state: Reference files do not persist changes between runs. If the skill writes to a reference file during a run (which is technically possible), that change is visible on the next run — but that pattern creates a maintenance problem. Use a database or a log file for persistent state, not a reference file.
For how reference files fit into the full skill engineering process, see From Prompt to Production: The Five-Phase Skill Engineering Process.
Frequently Asked Questions
How large can a reference file be?
There is no hard technical limit. The practical limit is context window economics: a reference file that is 500+ lines contributes substantially to context usage when it loads. A 50-line file is essentially free. Design reference files to be as short as the domain knowledge allows. If a reference file is growing past 200 lines, it is worth asking whether it contains more than one type of knowledge — if so, split it. The length concern is not abstract: the NoLiMa benchmark found that at 32K tokens, 11 out of 12 frontier models dropped below 50% of their short-context performance on recall tasks (NoLiMa, 2024).
Can I share a reference file between multiple skills?
Technically yes, by using the same relative path from multiple SKILL.md files. In practice, shared reference files create a fragile coupling: changing the file to serve one skill may break another. Unless the content is genuinely identical across both skills (same output template, same taxonomy), keep reference files per-skill.
What's the difference between a reference file and an asset?
Reference files carry knowledge Claude reads during execution: templates, definitions, documentation. Assets are resources Claude uses to produce output: output templates with placeholders, approved examples to match, scripts to run. The distinction is fuzzy at the edges — a template file is both a reference (Claude reads it) and an asset (Claude uses it to format output). The references/ and assets/ folders are organizational, not functional.
Should I put my API documentation in a reference file?
Yes, if the skill needs it and it changes on a different schedule than the process steps. A reference file called references/api-docs.md that documents field names, endpoint responses, and error codes is exactly the right use case. Update it when the API changes; leave SKILL.md untouched.
My skill reads a reference file but seems to ignore the content. What is wrong?
Three common causes: (1) the process step names the wrong file path — check that the path in the step matches the actual file location; (2) the reference file is too long and the relevant content is past the 60% depth of Claude's context window, where attention degrades; (3) the reference file points to another file via a link or instruction, and Claude stopped at the first file (the one-level-deep rule applies). Anthropic's context engineering guidance identifies the underlying mechanism: "good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome" (Anthropic Engineering, "Effective Context Engineering for AI Agents," September 2025). A reference file stuffed with loosely relevant content violates that principle at the file level. The fastest diagnosis: add an explicit instruction in the step: "Read references/output-template.md. Confirm you have read it by summarizing the first field definition before proceeding."
Last updated: 2026-04-17