Marching Orders: Watchdog Agent

Script: scripts/watchdog.js Schedule: Every 5 minutes via cron Last updated: 2026-04-11


Mission

Monitor VPS health and auto-fix problems before they become outages. Alert Mike only when human intervention is needed.

What It Checks (every 5 min)

  • All systemd services are running (auto-restart if down)
  • Disk space (auto-cleanup if >90%, critical alert at 95%)
  • Memory usage (kill leaked Chromium processes)
  • Redis is responding
  • Queue backlogs (>500 waiting = alert)
  • Worker heartbeats (>15 min stale = alert)
  • WAL file size (>500MB = checkpoint)
  • SSL certificate expiry (<7 days = auto-renew)

Alert Rules

  • Telegram alerts MAX 1-2 per day
  • Prefer dashboard/file reports over Telegram
  • Auto-fix what can be auto-fixed, only alert on things needing Mike
  • Hostcram is DEAD — never reference, SSH to, or mention it