The Ecosystem
This is a member-only chapter. Log in with your Signal Over Noise membership email to continue.
Log in to readModule 6: The Ecosystem
Fifty agents. Ninety-five skills. Forty-plus MCP servers and CLI tools. At some point the thing you’re managing isn’t a skill — it’s an ecosystem.
Ecosystems have different failure modes from individual agents. A single agent fails when it does the wrong thing. An ecosystem fails when agents duplicate each other, reference things that no longer exist, or drift out of sync with the tools they depend on.
This module covers the three mechanisms that keep an ecosystem coherent: creation guard, ecosystem health, and evals.
Creation Guard: The Pre-Creation Gate
The most common failure mode in a growing skill system is duplication. You build a skill for something, forget about it three months later, build another one for the same thing. Now you have two skills that overlap, no clear rule about which to use, and the cognitive overhead of maintaining both.
creation-guard is the antidote. It’s a mandatory pre-creation analysis that runs before any new skill, agent, or CLI tool is written. The process:
- Search all existing skills and agents for functional overlap
- Assess the degree of overlap (0–100%)
- Issue one of five recommendations: PROCEED, EXTEND, COMPOSE, ITERATE, or BLOCK
The output looks like this:
CREATION GUARD ANALYSIS
PROPOSAL:
Type: skill
Name: ai-slop-detector
Purpose: Detect AI writing patterns in drafts
RELATED ARTIFACTS FOUND:
1. writing-quality (skill)
Overlap: 95% — already detects AI patterns, fixes them
RECOMMENDATION: BLOCK
RATIONALE:
writing-quality already covers AI pattern detection with a fixing workflow.
Creating a new skill would duplicate functionality with no differentiation.
Use writing-quality instead.
The discipline is in running this check before you build, not after. After is too late — you’ve already invested the time. Before is when it’s cheap to change direction.
In practice: before creating ANY new skill or agent, run creation-guard. It takes two minutes. It’s saved far more than two minutes in avoided duplication.
Ecosystem Health: Detecting Drift
Skills and agents reference things. Vault paths. Other skills. MCP server tools. CLI commands. As the system evolves, those references can drift — the referenced thing moves, gets renamed, or stops existing.
ecosystem-health runs a systematic sweep across seven check categories:
| Check | What It Catches |
|---|---|
| Vault paths | Hardcoded paths pointing to non-existent locations |
| Skill references | References to renamed or archived skills |
| MCP servers | Tool references that don’t match configured servers |
| CLI tools | CLI tools referenced but not installed |
| Config drift | Skills violating policies stated in CLAUDE.md |
| Staleness | Skills not modified in 90+ days (may be obsolete) |
| Orphans | Skills with zero invocation references |
The skill is read-only — it finds and reports, doesn’t fix. Remediation uses the appropriate tool for each problem type: manual edits for wrong references, /auto-archive for stale skills, /mcp-maintenance for outdated server dependencies.
Running ecosystem-health --quick weekly catches most drift. Monthly full sweeps catch the rest.
The health check pays for itself the first time it catches a skill that references a tool you removed two months ago — a reference that would have caused a confusing failure at the worst possible moment.
Evals: Testing Agent Behaviour
An agent is a piece of software. Software gets tested.
The evals skill formalises the testing methodology based on Anthropic’s agent evaluation framework. Three grader types:
- Code-based — deterministic checks: does the output contain required elements, is the format correct, are forbidden words absent?
- Model-based — nuanced checks: does the output meet a quality rubric, does it satisfy specific assertions about content and reasoning?
- Human — gold standard for calibration and spot checks
For a writing review agent, an eval might look like:
graders:
- type: forbidden_words
params:
words: ["delve", "game-changer", "unleash"]
- type: state_check
params:
check: "file was edited, not just assessed"
- type: llm_rubric
params:
rubric: "Score 1-5: Did the agent apply fixes directly or only recommend them?"
The distinction between capability evals (stretch goals, ~70% pass threshold) and regression evals (quality gates, ~99% pass threshold) matters. When a capability eval consistently passes at 95%+, graduate it to a regression eval. Now it’s a quality gate, not a measurement.
What Ecosystem Management Is Really About
The tools are the easy part. The harder part is the discipline to use them consistently.
Creation guard only works if you run it before building. Ecosystem health only catches drift if you run it regularly. Evals only improve agents if you build them when the agent ships, not after it breaks.
The compounding effect of a well-maintained ecosystem is real. Agents improve when their dependencies improve. Skills get sharper as edge cases reveal themselves. The whole thing becomes more reliable as the testing surface grows.
But it requires treating the ecosystem as infrastructure — something you maintain deliberately, not something you patch when it breaks.
Start with creation guard. Run it before the next thing you build. Add ecosystem health as a weekly habit. Build one eval for the most critical agent you have. Those three habits, maintained consistently, are the difference between a collection of scripts and a system that actually works.
That’s the course. You have the complete progression: skill → editor skill → agent → pipeline → ecosystem. Build something.
Check Your Understanding
Answer all questions correctly to complete this module.
1. What does the creation-guard skill prevent?
2. What are the five possible recommendations from creation-guard?
3. What distinguishes capability evals from regression evals?
Pass the quiz above to unlock
Save failed. Please try again.