Keep Runbooks Executable
Require every runbook to carry real commands, a verification step, a rollback, and an owner.
The worst time to discover a runbook says "then verify it worked" with no command is 3am, mid-incident.
Runbooks rot silently: steps drift into prose, verification becomes a promise, and the owner left last quarter. Style linters can't see any of that — the document is perfectly well-formed Markdown. A contract beside the runbook checks the parts an on-call engineer actually needs: commands in the steps, a provable verification, a rollback path, and accountable metadata.
The contract
<!-- mdv: front owner required -->
<!-- mdv: front last-validated required date=iso -->
<!-- mdv: document noLLMResidue noTruncation noFenceWrapper -->
<!-- mdv: block required minWords=10 noPlaceholder -->
## When to use this runbook
The symptom that triggers this runbook — the alert, the dashboard reading, the user report.
<!-- mdv: endblock -->
<!-- mdv: block required contains=code noPlaceholder -->
## Steps
Each step needs the actual command, not "restart the service".
<!-- mdv: endblock -->
<!-- mdv: block required contains=code minWords=10 -->
## Verification
The command that proves the fix worked, and what its output should say.
<!-- mdv: endblock -->
<!-- mdv: block required minWords=5 -->
## Rollback
What to do if the steps above make things worse.
<!-- mdv: endblock -->contains=code is the load-bearing rule: a Steps section written as narrative ("restart the service and wait") fails until it carries an actual fenced command. The two front keys make ownership and freshness machine-readable — date=iso keeps last-validated a real date instead of "recently".
What it catches
The runbook that reads fine and helps nobody:
## When to use this runbook
When Redis has issues.
## Steps
Restart the Redis service and wait for it to come back. If that does not work, restart it
again or escalate to the platform team.
## Verification
TBD
## Rollback
N/Aredis-outage.md:1 error [missing-frontmatter-key] The frontmatter key "owner" is required, but is missing or empty.
redis-outage.md:2 error [missing-frontmatter-key] The frontmatter key "last-validated" is required, but is missing or empty.
redis-outage.md:12 error [missing-block] The "Steps" section should contain at least one code, but none was found.
redis-outage.md:18 error [missing-block] The "Verification" section should contain at least one code, but none was found.
redis-outage.md:6 error [too-few-words] The "When to use this runbook" section is too short (4 words).
redis-outage.md:18 error [too-few-words] The "Verification" section is too short (1 word).No owner, no validation date, prose where commands belong, and a TBD verification — exit 1 until someone writes the runbook they'd actually want to follow mid-incident.
Run it in CI
Name the contract as a sibling (redis-outage.mdv.md next to redis-outage.md) and the zero-config scan pairs them automatically:
npx mediva checkOr validate explicitly against a shared contract:
npx mediva check runbooks/redis-outage.md --schema runbooks/template.mdv.mdVariations
- Agents draft runbooks from incident transcripts now — add
document noLLMResidue noTruncation noFenceWrapper(as above) so assistant chatter, cut-off output, and fence-wrapped bodies never merge. - Game-days can't check what isn't there: mediva guarantees the shape a game-day needs (commands present, verification stated); whether the commands still work is the game-day's job.
- For a directory of runbooks sharing one contract, validate in a loop today; native series discovery is tracked in #428.