A skill that duplicates Claude's base knowledge costs approximately 100 tokens per session (Anthropic, Claude Code Docs 2025) and produces no measurable quality improvement. A skill that encodes institutional knowledge, your team's specific decisions, constraints, and conventions, produces output that Claude cannot produce without it. The difference is not about skill length or complexity. It is about whether the encoded information exists anywhere outside your organization. AEM builds Claude Code skills at production scale, and this is the most common mistake we see in skill libraries.

TL;DR: Institutional knowledge skills encode information Claude cannot derive from training: your naming conventions, your PR format requirements, your client's tone-of-voice rules. Duplicate knowledge skills encode what Claude already knows: Python best practices, REST API design, writing clarity principles. Delete the skill and rerun the same prompts: unchanged output means the skill is dead weight.

What Counts as Institutional Knowledge in a Skill?

Institutional knowledge is information that exists only inside your project, team, or organization. It cannot be derived from Claude's training data, the codebase alone, or general best practices. A new hire would write clean code but still miss your specific constraints, conventions, and quality bar. Those decisions exist nowhere outside your team.

Examples of genuine institutional knowledge worth encoding:

  • Team conventions not in the codebase: "All async functions return typed results. No bare except: clauses. Dataclasses over raw dicts for structured internal data." — your code review rejections reveal what the team enforces but has never written down
  • Client-specific rules: "This client's brand voice uses active verbs, avoids the word 'ensure,' and never uses exclamation points. All copy ends with a clear action, not a summary." — no general writing skill produces this
  • Project-specific constraints: "This project targets Python 3.9 for AWS Lambda compatibility. Do not use match statements, walrus operators, or 3.10+ features even if they would be cleaner."
  • Process decisions: "All PRs require a migration plan section if they touch the database schema. The migration plan must include rollback steps and estimated downtime."
  • Quality bar definitions: "A 'done' feature in this project includes unit tests, an integration test, and a documentation update in /docs/. Draft PRs do not count as done."

The shared characteristic: a new developer joining the team would make mistakes on all of these without being told. They would write clean Python code. They would not know about the walrus operator restriction.

What Counts as Duplicate Knowledge?

Claude Code's training includes most of software engineering's documented best practices, design patterns, and writing principles. A skill that instructs Claude to follow these adds no information. It adds context cost: each skill occupies system prompt budget every session, and the Anthropic docs flag CLAUDE.md bloat as a cause of instruction-following degradation (Anthropic, Claude Code Best Practices 2025).

Examples of duplicate knowledge skills that underperform relative to their context cost:

  • "Write clean, readable Python code" — Claude already aims for this
  • "Review PRs for code quality issues" — Claude performs this without specific instruction
  • "Explain technical concepts clearly" — Claude's default behavior without guidance
  • "Write functions that follow single-responsibility principle" — this is training data, not institutional knowledge
  • "Document all public functions with docstrings" — Claude does this when asked to write code with documentation

The problem is not that these skills produce wrong output. The problem is they produce the same output as asking Claude without the skill, while consuming 100 tokens of system prompt budget per session and one discovery classification slot.

At 20 skills, two or three duplicate-knowledge skills in the library is a minor drag. At 35 skills, they are the reason you're hitting the curation threshold (AEM internal, 2026). Every context slot spent on a skill Claude didn't need is a slot that could hold a skill Claude does need. One practitioner audit of Claude Code session overhead found individual skill files ranging from 800 to 2,500 tokens each, overhead that repeats on every prompt, not just once (AC Digest, 2025).

Building skills is free to do and expensive to maintain. A duplicate-knowledge skill is dead weight on both counts.

How Do You Diagnose Whether a Skill Encodes Real Value?

Apply the deletion test first, then the new hire test. The deletion test checks whether Claude's output quality actually drops without the skill. The new hire test identifies whether the skill encodes something a skilled developer would not know from training alone. Both tests together distinguish genuine institutional knowledge from documented common sense.

  1. Deletion test — Remove the skill and run the same prompts. If output quality is unchanged, the skill is encoding what Claude already knows. If Claude misses your specific requirements, the skill is earning its context cost.
  2. New hire test — If a skilled developer joined your team and used Claude without this skill, where would their work fall short? "Nowhere specific" means the skill encodes what any competent developer already knows. "They would miss the walrus operator restriction" or "they would not include the migration plan section" means the skill encodes something real.

The new hire test has a measurable baseline: 72% of engineering leaders report that new hires take more than one month to submit their first three meaningful pull requests, with gathering project-specific context cited as the primary bottleneck (Cortex, State of Developer Productivity, 2024). That gap is exactly what institutional knowledge skills close.

"The failure mode isn't that the model is bad at the task — it's that the task wasn't specified tightly enough. Almost every production failure traces back to an ambiguous instruction." — Simon Willison, creator of Datasette and llm CLI (2024)

Tightly specifying something Claude already knows produces no improvement in output quality. Tightly specifying your institutional rules produces exactly the output gap that makes the skill worth building.

What Makes the Institutional Knowledge Advantage Compound Over Time?

Institutional knowledge changes. A skill that encodes your team's conventions from Q1 will need updating by Q3 as decisions evolve. A well-maintained institutional knowledge skill compounds in value because it captures the current state of decisions that are expensive to re-explain. Research on context window behavior shows that LLM instruction-following degrades measurably as context fills: models placed in the middle of long contexts lose track of instructions at a rate that makes mid-context policy placement unreliable for production systems (Liu et al., Stanford NLP Group, "Lost in the Middle", ArXiv 2307.03172, 2023). Precise, compact institutional knowledge skills fight that decay directly.

We track active lifetime as a quality signal for skill commissions. Institutional knowledge skills in production environments have an average active lifetime of 18 months before major revision (AEM internal, 2026). Duplicate knowledge skills in the same environments are archived or deleted within 6 months, once the team notices they provide no measurable quality difference.

A skill built on institutional knowledge also gets better under the self-improvement architecture. Every time the skill's output is reviewed and corrected, the correction is a new institutional decision: "we now prefer X over Y for this type of task." That feedback has nowhere to go in a duplicate knowledge skill, because the knowledge it encodes is already stable and correct.

For building skills that last, see From Prompt to Production: The Five-Phase Skill Engineering Process. For instruction specificity decisions, see How Specific Should My Skill Instructions Be?.

What Institutional Knowledge Is Worth Encoding First?

Start with constraints that cause rework, then conventions your tooling cannot enforce, then anything that takes more than 30 seconds to explain in onboarding. These three tiers cover the institutional knowledge with the highest cost-of-absence: the rules that produce PR rejections, client revisions, and repeated onboarding questions when they are missing from a skill.

  1. Constraints that cause rework — If a missed rule regularly causes PR rejections or client revisions, that rule becomes a skill. The cost of not having the skill is measured in reviewer time and revision cycles.
  2. Conventions not enforced by tooling — Linters catch some things. Code style guides catch others. What falls through the gaps and requires human intervention is the highest-value category for skills.
  3. Context that takes more than 30 seconds to explain — If onboarding a new team member requires explaining something that takes a paragraph, that paragraph belongs in a skill.
  4. Quality bar definitions — The most commonly missing institutional knowledge is the definition of "done." What makes output acceptable versus not acceptable for your specific project.

The cost of missing constraints shows in review cycles. LinearB's analysis of 8.1 million pull requests found that half sit idle for more than 50% of their lifespan (LinearB, 2024). Skills that encode the constraints reviewers enforce most are the skills that shorten those cycles.

This pattern applies to single-team libraries. For multi-team or enterprise-scale libraries, the institutional knowledge tier gains an additional layer: organizational conventions that span teams but don't appear in any single project's codebase.

Frequently Asked Questions

The deletion test answers most edge cases: if removing the skill produces equivalent output, the skill encodes knowledge Claude already has. The questions below cover mixing strategies, scale effects, and the minimum institutional knowledge content a skill needs to justify its context cost.

Can a skill mix institutional knowledge and general best practices?

Yes, but the general best practices portion adds no value. A skill can say "follow our walrus operator restriction AND write clean Python" — the second clause costs tokens and adds nothing Claude doesn't already do. Strip general best practices from institutional knowledge skills to reduce description length and improve focus.

What happens if I build a skill that Claude already knows well?

The output quality will be similar to Claude without the skill. The skill will consume 100 tokens of system prompt budget per session, occupy one discovery classification slot, and require maintenance effort. Eventually it will be noticed as providing no measurable improvement and archived. The cost is low; the benefit is zero.

Can I turn a general best-practices document into a skill?

Not productively. "Follow the Google Python style guide" gives Claude a reference that largely overlaps with its training. Encoding specific deviations from the style guide, or additions your team has made to it, encodes institutional knowledge. Encoding the guide itself does not.

How do I know if my current skill library has too many duplicate-knowledge skills?

Test each skill with the deletion test: remove it, run the same prompts, compare output quality. Skills that produce indistinguishable output with and without the skill are duplicate-knowledge candidates for archiving.

Is there a heuristic for how much institutional knowledge a skill needs to justify its context cost?

One concrete rule that no general-purpose AI model would apply correctly is sufficient to justify building the skill. One unconditional constraint your team enforces, one client-specific rule, one project-specific quality bar definition. If a skill contains zero rules that meet that bar, it is a duplicate-knowledge skill.

Do duplicate-knowledge skills cause problems, or just fail to add value?

They cause a real problem at scale: they consume system prompt budget and contribute to the degradation threshold. In a library of 35 skills where 10 are duplicate-knowledge, removing those 10 restores 1,000 tokens of discovery budget and drops the active skill count to 25, which is below the performance degradation threshold. See At What Skill Count Does Claude's Performance Actually Degrade?.

Last updated: 2026-05-04