MeDiVa
DocsRecipes

Keep Runbooks Executable

Require every runbook to carry real commands, a verification step, a rollback, and an owner.

The worst time to discover a runbook says "then verify it worked" with no command is 3am, mid-incident.

Runbooks rot silently: steps drift into prose, verification becomes a promise, and the owner left last quarter. Style linters can't see any of that — the document is perfectly well-formed Markdown. A contract beside the runbook checks the parts an on-call engineer actually needs: commands in the steps, a provable verification, a rollback path, and accountable metadata.

The contract

runbooks/redis-outage.mdv.md
<!-- mdv: front owner required -->
<!-- mdv: front last-validated required date=iso -->

<!-- mdv: document noLLMResidue noTruncation noFenceWrapper -->

<!-- mdv: block required minWords=10 noPlaceholder -->
## When to use this runbook

The symptom that triggers this runbook — the alert, the dashboard reading, the user report.
<!-- mdv: endblock -->

<!-- mdv: block required contains=code noPlaceholder -->
## Steps

Each step needs the actual command, not "restart the service".
<!-- mdv: endblock -->

<!-- mdv: block required contains=code minWords=10 -->
## Verification

The command that proves the fix worked, and what its output should say.
<!-- mdv: endblock -->

<!-- mdv: block required minWords=5 -->
## Rollback

What to do if the steps above make things worse.
<!-- mdv: endblock -->

contains=code is the load-bearing rule: a Steps section written as narrative ("restart the service and wait") fails until it carries an actual fenced command. The two front keys make ownership and freshness machine-readable — date=iso keeps last-validated a real date instead of "recently".

What it catches

The runbook that reads fine and helps nobody:

## When to use this runbook

When Redis has issues.

## Steps

Restart the Redis service and wait for it to come back. If that does not work, restart it
again or escalate to the platform team.

## Verification

TBD

## Rollback

N/A
redis-outage.md:1   error  [missing-frontmatter-key] The frontmatter key "owner" is required, but is missing or empty.
redis-outage.md:2   error  [missing-frontmatter-key] The frontmatter key "last-validated" is required, but is missing or empty.
redis-outage.md:12  error  [missing-block] The "Steps" section should contain at least one code, but none was found.
redis-outage.md:18  error  [missing-block] The "Verification" section should contain at least one code, but none was found.
redis-outage.md:6   error  [too-few-words] The "When to use this runbook" section is too short (4 words).
redis-outage.md:18  error  [too-few-words] The "Verification" section is too short (1 word).

No owner, no validation date, prose where commands belong, and a TBD verification — exit 1 until someone writes the runbook they'd actually want to follow mid-incident.

Run it in CI

Name the contract as a sibling (redis-outage.mdv.md next to redis-outage.md) and the zero-config scan pairs them automatically:

npx mediva check

Or validate explicitly against a shared contract:

npx mediva check runbooks/redis-outage.md --schema runbooks/template.mdv.md

Variations

  • Agents draft runbooks from incident transcripts now — add document noLLMResidue noTruncation noFenceWrapper (as above) so assistant chatter, cut-off output, and fence-wrapped bodies never merge.
  • Game-days can't check what isn't there: mediva guarantees the shape a game-day needs (commands present, verification stated); whether the commands still work is the game-day's job.
  • For a directory of runbooks sharing one contract, validate in a loop today; native series discovery is tracked in #428.

On this page