# skill-maker
An Agent Skill that creates other agent skills.
## What it does
Skill-maker guides an AI coding agent through the full skill-creation lifecycle: intent capture, drafting a SKILL.md, running an eval loop with isolated subagents, refining based on grading signals, and optimizing the trigger description.
## The 5 Phases
## The Eval Loop
The core of skill-maker. For each iteration it:
- spawn isolated subagents per test case
- grade assertions with bundled Bun TypeScript scripts
- aggregate results into a benchmark
- iterate until pass_rate plateaus // delta < 2% for 3 consecutive runs
## Benchmark
Evaluated across 8 skills, 189 assertions // with-skill vs without-skill subagent pairs
| Skill | Baseline | Delta |
|---|---|---|
| database-migration | 4.2% | +95.8% |
| pdf-toolkit | 4.2% | +95.8% |
| error-handling | 8.3% | +91.7% |
| api-doc-generator | 16.7% | +83.3% |
| pr-description | 20.8% | +79.2% |
| changelog-generator | 20.8% | +79.2% |
| monitoring-setup | 26.1% | +73.9% |
| code-reviewer | 41.7% | +58.3% |
| git-conventional-commits | 72.3% | +27.7% |
All skills reach 100% pass rate after the eval loop. See examples/README.md for convergence charts, timing data, and per-skill breakdowns.
## vs Anthropic's official skill-creator
Head-to-head benchmark on the 3 highest-delta domains // same prompts, same assertions, different skill-creation approach
| Domain | Ours | Official |
|---|---|---|
| database-migration | 24/24 | 21/24 |
| error-handling | 24/24 | 22/24 |
| pdf-toolkit | 24/24 | 24/24 |
Edge from "Common mistakes" sections and reasoning-based instructions. See the full comparison report for per-assertion breakdowns and failure analysis.