Runbooks
Runbook: Webhook signature failure spike

Symptoms

  • Grafana: webhook_received_total{event_type="*", dedup_skipped="false"} AND HTTP 401 rate from /webhooks/github > 10/min sustained
  • Sentry: cluster of "Invalid signature" responses from webhook handler
  • GitHub App "Recent deliveries" page показывает spike of 401 responses

Severity & escalation

  • PAGE 24/7 — potential security incident (attempted webhook forgery OR our secret leak)
  • Ack window: 15 min
  • Escalate immediately если pattern indicates external attack (distributed IPs, scripted requests)
  • Engineering lead + security review

Immediate actions (< 5 min)

  1. Tail webhook traffic:
    cd apps/api && npx wrangler tail --config wrangler.toml --format=pretty | grep webhooks
  2. Check source distribution: Sentry → group by client IP / user agent. GitHub webhooks come from known IPs (https://api.github.com/meta (opens in a new tab)hooks)
  3. Check GitHub App settings: https://github.com/settings/apps/arno-dev-vadimpianov/advanced (opens in a new tab) → recent deliveries
    • Если all recent deliveries 401 (включая legitimate ones) → наш secret broken / mismatched
    • Если only some 401 → external scanner/attacker
  4. Compare secret hash: wrangler secret list --config wrangler.toml | grep WEBHOOK shows secret name (not value). Verify it matches GitHub App webhook secret в dashboard

Diagnosis (5-20 min)

Branch A: Our secret mismatch (легитимные webhooks 401-ятся)

  • Cause: webhook secret rotated в GitHub App settings но не updated в Workers (or vice versa)
  • Risk: все incoming webhooks отвергаются → drift status not updated → component_md_raw stale
  • Recovery: re-sync secret (см. ниже)

Branch B: External attacker (random IPs, юр legit webhooks по-прежнему ok)

  • Cause: scanner probing endpoints, OR scripted attempt to forge webhook
  • Risk: low (HMAC verification working as designed, attacker не получит payload acceptance)
  • Mitigation:
    • Cloudflare WAF rule: block requests to /webhooks/github без User-Agent: GitHub-Hookshot
    • Rate limit per IP в CF dashboard
    • Log + monitor; don't escalate если no successful forgery

Branch C: Our secret leaked

  • Indicator: 200 responses на forged webhooks (we accept them as valid) → check pushed_by_us KV для unexpected SHAs
  • Verify: review git history для accidental secret commits, scan logs за secret value
  • Recovery: rotate immediately, audit Worker writes triggered between leak time and rotation

Recovery

IssueAction
Secret mismatch (our side)Generate new: openssl rand -hex 32 → update GitHub App settings → wrangler secret put GITHUB_APP_WEBHOOK_SECRET --config wrangler.toml (both must be same)
External attackAdd CF WAF rule blocking non-GitHub User-Agent на /webhooks/github. Log for forensics.
Secret leakedRotate immediately. Audit all writes that happened between leak detection и rotation. Notify affected projects if data integrity affected.

Verification

  • webhook_received_total{status_code="200"} recovers к baseline
  • 401 rate < 0.5/min sustained
  • GitHub App "Recent deliveries" — recent ones are 200
  • Test webhook: trigger ping from App settings → должен пройти 200

Aftermath

  • Post-mortem trigger: any signature failure due to our config OR ANY leak suspicion
  • Document: cause, blast radius, rotation timeline
  • Add to quarterly secret rotation schedule если leak occurred

Known false positives

  • GitHub App settings edit — briefly disables webhook → recent deliveries показывают 401 temporarily при rotation
  • Webhook redelivery from old App config — если recently changed webhook URL, old deliveries fail. Не PAGE.