Claude Code Skills: Building Reusable Knowledge Packages for AI Agents
A project with 8 skills loads 500 tokens at startup. Loading everything would cost 70,000 tokens. That 140x difference is progressive disclosure, and it is the reason agent teams can carry deep domain knowledge without drowning in context.

Figure 1 - The Progressive Disclosure Advantage: Without skills, every agent loads all domain documentation into context at startup, consuming 70,000 tokens before any work begins. With skills, only a lightweight index loads at startup (500 tokens), and full skill content loads only when relevant to the current task. This 140x efficiency difference is what makes it practical for agent teams to carry deep domain knowledge.
You have a 6-agent team building a collaborative document editor. The CRDT specialist needs Yjs conflict resolution patterns. The Kubernetes agent needs your deployment conventions. The risk analyst needs VaR computation methodology. The frontend dev needs your component library patterns. Together, that is 10,000 lines of domain documentation.
You could dump all of it into CLAUDE.md. Every agent would load every line at startup, consuming 70,000 tokens of context before a single line of code gets written. The CRDT specialist pays the token cost for Kubernetes documentation it will never use. The frontend dev carries risk analysis methodology it has no use for.
Or you could use skills.
Claude Code Skills solve this with a pattern called progressive disclosure [1]. Each skill is a folder containing a SKILL.md file and optional reference materials. At startup, Claude reads only the YAML frontmatter: a name and description consuming perhaps 60 tokens per skill. The full content loads only when the skill matches the current task. Deeper reference files load only when the agent needs that specific detail.
The result: a project with 8 skills consumes 500 tokens at startup instead of 70,000. At any given moment, an agent typically has one skill body and one or two reference files loaded, perhaps 2,000 tokens instead of the full 70,000. The knowledge is comprehensive. The context cost is minimal.
This article covers how skills work, how to design them effectively, and the patterns that make the difference between a skill that gathers dust and one that transforms agent productivity.
What Are Skills?
Skills are structured knowledge packages that live in your project’s .claude/skills/ directory [1]. Each skill is a folder containing a SKILL.md file and optionally a reference/ directory with additional documentation and utility scripts.

Figure 2 - The Skills Directory Structure: Each skill is a self-contained folder with a SKILL.md file (the main knowledge document) and optional reference files for deep dives and utility scripts. The 3-tier structure maps directly to progressive disclosure: frontmatter is always available, the body loads when relevant, and reference files load only when specifically needed.
The key design principle is that skills are discovered by metadata but loaded by need. At session startup, Claude reads only the YAML frontmatter of each SKILL.md, just the name and description fields. When the agent encounters a task where the skill is relevant (based on the name and description matching the task context), it loads the SKILL.md body. If it needs deeper detail, it loads specific reference files. This three-tier loading strategy keeps context lean while making comprehensive knowledge available [1].
How Progressive Disclosure Works
Let’s trace what happens when an agent encounters a task that requires CRDT knowledge.
Tier 1, Always loaded (startup): The agent’s context includes a skills index showing available skills:
Available skills:- crdt-implementation: "Conflict-free replicated data type implementation patterns for real-time collaborative editing. Use when implementing document sync, conflict resolution, or operational transform logic."- kubernetes-deployment: "Kubernetes deployment patterns and conventions..."- portfolio-risk-analysis: "Portfolio risk assessment methodology..."This costs perhaps 200 tokens total for all skill descriptions. The agent sees what is available without loading any of the actual content.
Tier 2, Loaded on relevance: When the agent’s current task involves implementing real-time sync, it recognizes the crdt-implementation skill is relevant and reads the SKILL.md body. This might be 200-400 lines covering the overall approach, key decisions, and pointers to deeper references.
Tier 3, Loaded on specific need: When the agent needs to implement a specific conflict resolution strategy, it reads reference/conflict_resolution.md. When it needs Yjs-specific API patterns, it reads reference/yjs_patterns.md. Each reference file is loaded individually, only when needed.

Figure 3 - Three-Tier Progressive Disclosure in Action: Knowledge loads in 3 stages, each triggered by increasing specificity of need. At startup, only descriptions load (200 tokens). When a task matches, the skill body loads (1,500 tokens). When specific detail is needed, individual reference files load (500 tokens each). At any moment, the agent carries exactly the knowledge it needs, and no more.
The practical impact is significant. Consider a project with 8 skills, each containing 300 lines in the SKILL.md body and 1,000 lines across reference files. Loading everything at startup would cost approximately 70,000 tokens of context. With progressive disclosure, the startup cost is approximately 500 tokens (just descriptions), and at any given time, typically only one skill’s body plus one or two reference files are loaded, perhaps 2,000 tokens instead of 70,000.
Progressive disclosure turns comprehensive documentation into efficient context. You can bundle 10,000 lines of domain knowledge into skills without paying any context cost until the knowledge is actually needed. The startup cost is proportional to the number of skills (their descriptions), not the total volume of knowledge they contain.
Executable Scripts
Skills can include scripts in their reference/scripts/ directory that Claude runs without loading the source code into context. Only the script’s output consumes tokens. This is powerful for computational tasks.
Consider a risk calculation script that is 200 lines of Python: pandas imports, numpy computations, statistical functions, data loading, error handling. If the agent loaded this source into context, it would consume approximately 2,000 tokens. Instead, the agent runs the script and receives 3 lines of output:
VaR (95%): -0.0234CVaR (95%): -0.0389Portfolio Beta: 1.15Three lines. Perhaps 20 tokens. The computation happened; the context stayed clean.

Figure 4 - Executable Scripts: 100x Token Efficiency: When computation lives in bundled scripts, the agent runs the script and receives only the output, 20 tokens instead of 2,000. The script source never enters context. This is the most token-efficient pattern for any skill involving data processing, metric computation, or complex validation.
# This script is 200 lines, but the agent never loads it into context.# It runs the script and only sees the output.
import pandas as pdimport numpy as npfrom scipy import stats
def compute_var(returns, confidence=0.95): """Compute Value at Risk using historical simulation.""" return np.percentile(returns, (1 - confidence) * 100)
def compute_cvar(returns, confidence=0.95): """Compute Conditional Value at Risk.""" var = compute_var(returns, confidence) return returns[returns <= var].mean()
# ... 180 more lines of computation ...
if __name__ == "__main__": data = pd.read_csv("/data/portfolio_returns.csv") print(f"VaR (95%): {compute_var(data['returns']):.4f}") print(f"CVaR (95%): {compute_cvar(data['returns']):.4f}") print(f"Portfolio Beta: {compute_beta(data):.4f}")The agent runs python reference/scripts/compute_risk_metrics.py and gets 3 lines of output in its context, not 200 lines of script source.
Anatomy of a SKILL.md File
A well-structured SKILL.md has 3 parts: YAML frontmatter (required), a body section (the main content), and references to bundled files.
The Frontmatter
The frontmatter is the skill’s index card: what Claude reads at startup to decide whether this skill exists and when to use it.
---name: crdt-implementationdescription: > Conflict-free replicated data type (CRDT) implementation patterns for real-time collaborative editing. Use when implementing document sync, conflict resolution, or operational transform logic. Covers Yjs library patterns, document model design, and WebSocket sync.---Name constraints: Maximum 64 characters, lowercase with hyphens only. The name should be descriptive enough that Claude can match it to relevant tasks. crdt-implementation is clear; utils is not [1].
Description constraints: Maximum 1,024 characters. No XML tags, no reserved words. The description is the discovery mechanism; it is how Claude decides whether to load the skill. Write it as if you are telling a colleague when they should consult this reference. Include specific triggers: “Use when implementing…”, “Use when debugging…”, “Use when configuring…” [1].

Figure 5 - Anatomy of a SKILL.md File: Three distinct sections, each with a specific purpose and loading trigger. The frontmatter is the discovery mechanism (always loaded, approximately 60 tokens). The body provides working-level guidance (loaded on relevance, under 500 lines). Reference pointers connect to deep-dive files (loaded individually on specific need). Together, they create a knowledge package that is comprehensive yet context-efficient.
The Body
The SKILL.md body provides the working-level guidance that an agent needs to apply the skill effectively. Keep it under 500 lines [1]. This is not a comprehensive reference manual; it is the information an experienced developer would want before starting implementation.
# CRDT Implementation Guide
## Chosen Approach: Yjs
This project uses Yjs as the CRDT library for real-time collaboration.
## Document Model
The document is represented as a Y.Doc with the following structure:- Y.XmlFragment for rich text content- Y.Map for document metadata (title, author, last modified)- Y.Array for version history entries
## Key Decisions
1. **Merge strategy**: Last-writer-wins for metadata, CRDT merge for content2. **Persistence**: Y.Doc state is serialized to PostgreSQL on every change3. **Transport**: WebSocket with binary encoding (more efficient than JSON)
## Common Patterns
### Creating a shared documentNotice the pointer to the reference file for deeper detail. The body gives enough context to start working; the reference provides the exhaustive detail for specific sub-problems.
Encode project-specific decisions, not general knowledge. Claude already knows what WebSockets are and how CRDTs work. Your skill should capture how this project uses them: the specific library, the connection pattern, the message format, the merge strategy. Project-specific decisions are what make skills valuable; general knowledge is what makes them bloated.
Bundled Reference Files
Reference files in the reference/ directory contain deep-dive documentation on specific topics. These are loaded individually when the agent needs that specific knowledge.
Keep references one level deep. A reference file should not point to another reference file that points to another. One level of progressive disclosure (SKILL.md to reference file) is the practical limit before agents get lost in a documentation tree [1].
Three Types of Skills
Skills serve 3 distinct purposes, and designing for the right type determines how effective the skill will be.
Domain Knowledge Skills
The most common type. These encode specialized knowledge about a specific technology, methodology, or domain.
---name: portfolio-risk-analysisdescription: > Portfolio risk assessment methodology including VaR, CVaR, beta computation, and hedging strategies. Use when computing risk metrics, assessing portfolio exposure, or recommending hedging actions.---Domain knowledge skills work best when they capture project-specific decisions rather than general knowledge. Claude already knows what VaR is; what it does not know is that this project uses historical simulation rather than parametric VaR, that the confidence level is 95%, and that hedging recommendations must include cost estimates in basis points.
Workflow Pattern Skills
These encode how to perform a multi-step process, covering what to do and in what order.
---name: safe-deploymentdescription: > Production deployment workflow with pre-flight checks, staged rollout, and automated rollback. Use when deploying any service to production or staging environments.---The body of a workflow skill reads like a playbook: pre-flight checklist, deployment stages with specific thresholds (error rate below 0.1%, P95 latency below 500ms), rollback criteria, and pointers to the detailed rollback playbook in the reference directory.
Workflow skills are particularly powerful when combined with embedded hooks. A deployment skill can include hooks that validate each step of the checklist, creating a self-enforcing workflow.
Utility Script Skills
These primarily provide executable tools that agents can run. The SKILL.md explains when and how to use the scripts; the scripts themselves do the heavy lifting.
---name: data-quality-validationdescription: > Data quality validation utilities. Use when ingesting data from external sources, after ETL transformations, or before loading data into production databases. Includes schema validation, completeness checks, and anomaly detection scripts.---
Figure 6 - Three Types of Skills: Domain Knowledge skills encode project-specific decisions and technical context. Workflow Pattern skills encode step-by-step processes with validation criteria. Utility Script skills provide executable computational tools. All 3 types follow the same progressive disclosure pattern; the difference is what they contain, not how they load.
The agent reads the SKILL.md to understand which script to run for its current task, executes the script, and uses the output, never loading hundreds of lines of validation logic into its context.
Skills with Embedded Hooks
The most advanced skill pattern combines knowledge, workflow instructions, and embedded hooks into a single distributable package. The skill tells the agent what to do, how to do it, and automatically validates that it was done correctly.
---name: api-endpoint-developmentdescription: > API endpoint development patterns with automatic validation. Use when creating new REST API endpoints, modifying existing routes, or adding API middleware.hooks: PostToolUse: - matcher: "Write|Edit" hooks: - type: command command: "$CLAUDE_PROJECT_DIR/.claude/skills/api-endpoint-development/scripts/validate_endpoint.sh" Stop: - matcher: "*" hooks: - type: command command: "$CLAUDE_PROJECT_DIR/.claude/skills/api-endpoint-development/scripts/check_api_tests.sh"---When an agent uses this skill, the embedded hooks activate automatically. Every file write triggers endpoint validation. Session completion requires passing API tests. The skill is self-contained: knowledge, workflow, and quality enforcement in one package.

Figure 7 - Self-Validating Skills: Knowledge + Workflow + Enforcement: The most powerful skill pattern bundles all 3 elements. Knowledge tells the agent what to do. Workflow tells it how to do it. Embedded hooks validate that it was done correctly. Install the skill, and you get the knowledge and the enforcement with no additional configuration required.
This pattern makes skills distributable as plugins. A team can create a secure-coding skill with embedded security scanning hooks, publish it, and any project that installs it gets both the security best practices documentation and the automated enforcement, with no additional configuration required [2].
Skills with embedded hooks create self-validating workflows: knowledge, process, and quality enforcement bundled into a single distributable package. Install the skill, and you get the expertise and the guardrails. This is what makes skills more than documentation; it makes them executable standards.
Architectural Example
To illustrate how skills fit into a real architecture, consider a financial research agent team with specialized analysts. Each agent needs different domain knowledge, but the skills system ensures context stays efficient.
.claude/skills/├── market-regime-detection/│ ├── SKILL.md # What regimes exist, how to detect them│ └── reference/│ ├── regime_indicators.md # VIX thresholds, correlation benchmarks│ ├── historical_regimes.md # Past regime changes for calibration│ └── scripts/│ └── compute_regime_signals.py├── portfolio-risk-analysis/│ ├── SKILL.md # Risk assessment methodology│ └── reference/│ ├── var_methodology.md # VaR/CVaR computation approaches│ ├── hedging_strategies.md # Common hedging instruments and costs│ └── scripts/│ └── compute_risk_metrics.py├── earnings-analysis/│ ├── SKILL.md # How to analyze earnings reports│ └── reference/│ ├── earnings_template.md # Standard analysis format│ ├── key_metrics.md # Revenue, EPS, guidance metrics│ └── scripts/│ └── parse_earnings_data.py└── swarm-orchestration/ ├── SKILL.md # How to reconfigure the swarm └── reference/ ├── team_compositions.md # Optimal team for each regime ├── handoff_protocol.md # How agents hand off work └── reconfiguration_playbook.md
Figure 8 - Skills in a Financial Research Swarm: Each agent loads only the skills relevant to its role. The risk monitor uses portfolio-risk-analysis. The earnings analyst uses earnings-analysis. The regime detector uses market-regime-detection. The team lead uses swarm-orchestration. Total context cost across all 4 agents: approximately 8,300 tokens instead of 280,000 if everything were loaded into every agent.
At startup, all agents see the descriptions of all 4 skills (approximately 300 tokens). The risk monitor agent loads portfolio-risk-analysis when computing VaR. The earnings analyst loads earnings-analysis when processing quarterly reports. The team lead loads swarm-orchestration when regime changes require team reconfiguration. No agent ever loads skills it does not need.
The regime detection skill’s bundled script (compute_regime_signals.py) is particularly effective here. The script contains complex statistical computation (VIX analysis, correlation matrix calculation, sector dispersion measurement) that would consume significant context if loaded as source code. Instead, the agent runs the script and receives a compact JSON output of regime signals, consuming perhaps 20 tokens instead of 2,000.
Common Anti-Patterns
Every team building skills hits the same failure modes.
The kitchen-sink skill. A single skill that tries to cover everything: coding standards, deployment, testing, security, and performance. This defeats the purpose of progressive disclosure because the entire body loads whenever any sub-topic is relevant. Split it into focused skills.
The description-less skill. A skill with a vague description like “project utilities” or “helpful patterns.” Claude cannot match this to tasks reliably, so the skill rarely gets loaded when it is actually needed. Descriptions should be specific and action-oriented.
The copy-paste skill. A skill that duplicates content from CLAUDE.md or from another skill. This creates maintenance burden and risks inconsistency. Each piece of knowledge should live in exactly one place.
The script-in-body skill. Including long utility scripts directly in the SKILL.md body instead of as bundled scripts. This wastes context tokens because the full script loads whenever the skill is referenced, even if the agent only needs to run it. Put scripts in reference/scripts/ where the agent can execute them without loading the source.
The infinite-depth skill. Reference files that point to other reference files that point to more reference files. Agents get lost in deep documentation trees. Keep it to one level: SKILL.md to reference file.

Figure 9 - Five Anti-Patterns That Kill Skill Effectiveness: Each anti-pattern undermines the progressive disclosure model that makes skills efficient. Kitchen-sink skills defeat focused loading. Vague descriptions prevent discovery. Copy-paste creates maintenance debt. Scripts in the body waste context. Deep reference chains confuse agents. Each has a concrete fix.
The skill description is the discovery mechanism. If Claude cannot match your description to a task, the skill never loads when it is needed. Write descriptions as if you are telling a colleague exactly when to consult this reference. Include specific task verbs: “Use when implementing…”, “Use when debugging…”, “Use when deploying…”
Best Practices
Write descriptions as discovery triggers. The description is how Claude decides whether to load a skill. Include specific task verbs: “Use when implementing…”, “Use when debugging…”, “Use when deploying…” If the description is vague (“general utilities”), Claude will not reliably match it to relevant tasks [1].
Keep the SKILL.md body under 500 lines. The body should provide working-level guidance, not encyclopedic coverage. Anything deeper belongs in reference files. A 500-line body is enough for the key decisions, common patterns, a few code examples, and pointers to deeper references [1].
Use one level of progressive disclosure. SKILL.md to reference files is the practical limit. Do not create reference files that point to other reference files. If your knowledge structure is that deep, reorganize it into multiple skills or flatten the hierarchy.
Use executable scripts for computation. Any skill involving data processing, metric computation, or complex validation should include scripts. The agent runs the script and receives output; the script source never enters context [1].
Match skills to agent roles. In a team, design skills so that each agent typically needs only 1 or 2 skills for its role. If a single agent needs 5 skills loaded simultaneously, either the agent’s role is too broad or the skills are too granular.
Use consistent terminology. If your project calls something a “document” in CLAUDE.md, call it a “document” in your skills too, not a “file” or “artifact” or “page.” Inconsistent terminology confuses agents and reduces the reliability of skill discovery [1].

Figure 10 - Skill Design Best Practices Checklist: Eight principles that separate effective skills from documentation that gathers dust. Each practice directly supports the progressive disclosure model, ensuring skills are discoverable, efficiently loaded, and contain the right level of detail for each tier.
Avoid time-sensitive information. Skills should contain patterns and knowledge that remain stable. Do not include version numbers that change frequently, links that might break, or information about current market conditions. Skills are reference material, not news feeds [1].
Test skills with different models. A skill that works well with Opus might be too ambiguous for Haiku. Test your skills across the models your team uses to ensure the instructions are clear enough for the least capable model that will use them [1].
Conclusion
Skills represent a careful solution to the fundamental tension in agent systems between comprehensive knowledge and efficient context usage. By encoding domain expertise, workflow patterns, and utility scripts into progressively disclosed packages, skills give agents access to deep knowledge without the context cost of loading everything upfront.
The most effective skills share common characteristics: specific discovery-oriented descriptions, concise bodies under 500 lines that capture project-specific decisions, reference files for deep dives into specific topics, and executable scripts that keep computation out of context. When combined with embedded hooks, skills become self-validating workflows that bundle knowledge, process, and quality enforcement into distributable packages.

Figure 11 - Skills in the Agent Ecosystem: Skills sit between the CLAUDE.md project constitution and the individual agents, providing on-demand knowledge through progressive disclosure. The constitution sets standards. Agents work within scope. Skills deliver domain expertise exactly when needed. Embedded hooks enforce quality. Together, they create an architecture where comprehensive knowledge and efficient context coexist.
As agent teams grow in size and tackle more complex domains, skills become the knowledge infrastructure that makes specialization practical. Without skills, every agent would need to carry every piece of domain knowledge in its context. With skills, each agent loads exactly the knowledge it needs, exactly when it needs it.
The most valuable skill you can write is the one that captures the decisions your team has already made: the specific choices, conventions, and patterns that no external documentation covers. That is the knowledge that turns a general-purpose AI into a domain specialist.
The Series
This is Part 3 of a 6-part series on Claude Code:
- Orchestrating AI Agent Teams — The control layer architecture that makes autonomous coding reliable
- Building Effective Claude Code Agents — Agent definitions, tool restrictions, and least privilege
- Claude Code Skills (this article) — Progressive disclosure and reusable knowledge packages
- Claude Code Hooks — PreToolUse, PostToolUse, and deterministic enforcement
- Claude Code Agent Teams — Multi-agent coordination and file ownership
- Claude Code Security — Defense-in-depth with agents, skills, hooks, commands, and teams
References
[1] Anthropic, “Skill authoring best practices,” Claude Platform Documentation, 2025. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices
[2] Anthropic, “Create plugins,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/plugins
[3] Anthropic, “Automate workflows with hooks,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/hooks-guide
[4] E. Schluntz and B. Zhang, “Building effective agents,” Anthropic Engineering Blog, Dec 2024. https://www.anthropic.com/engineering/building-effective-agents
[5] J. Young et al., “Effective harnesses for long-running agents,” Anthropic Engineering Blog, Nov 2025. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
[6] N. Carlini, “Building a C compiler with a team of parallel Claudes,” Anthropic Engineering Blog, Feb 2025. https://www.anthropic.com/engineering/building-c-compiler
[7] Disler, “Agentic Finance Review,” GitHub Repository, 2025. https://github.com/disler/agentic-finance-review
[8] Disler, “Claude Code Hooks Mastery,” GitHub Repository, 2025. https://github.com/disler/claude-code-hooks-mastery
[9] Anthropic, “Orchestrate teams of Claude Code sessions,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/agent-teams
[10] Anthropic, “Extend Claude Code,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/features-overview
[11] A. Osmani, “Claude Code Swarms,” AddyOsmani.com, Feb 2026. https://addyosmani.com/blog/claude-code-agent-teams/
[12] Disler, “Claude Code Hooks Multi-Agent Observability,” GitHub Repository, 2025. https://github.com/disler/claude-code-hooks-multi-agent-observability