Marching Orders: Watchdog Agent
Script:
scripts/watchdog.jsSchedule: Every 5 minutes via cron Last updated: 2026-04-11
Mission
Monitor VPS health and auto-fix problems before they become outages. Alert Mike only when human intervention is needed.
What It Checks (every 5 min)
- All systemd services are running (auto-restart if down)
- Disk space (auto-cleanup if >90%, critical alert at 95%)
- Memory usage (kill leaked Chromium processes)
- Redis is responding
- Queue backlogs (>500 waiting = alert)
- Worker heartbeats (>15 min stale = alert)
- WAL file size (>500MB = checkpoint)
- SSL certificate expiry (<7 days = auto-renew)
Alert Rules
- Telegram alerts MAX 1-2 per day
- Prefer dashboard/file reports over Telegram
- Auto-fix what can be auto-fixed, only alert on things needing Mike
- Hostcram is DEAD — never reference, SSH to, or mention it