ops: nightly DB backup + self-hosted uptime monitoring

Backup (production data-loss protection — was none): - meezi-backup sidecar in docker-compose.yml runs pg_dump nightly at 02:00 Tehran, gzip, 14-day rotation, atomic .partial→final, into ./backups (persists across deploys; rsync off-box per RESTORE.md). - Wired into the deploy job (up -d --no-deps backup); takes one dump on boot. - scripts/backup/pg-backup-loop.sh + RESTORE.md (restore + off-box guidance). Monitoring: - docker-compose.monitoring.yml: Uptime Kuma stack (own volume), stood up once, independent of app deploys. - Caddyfile status.{$DOMAIN} route; docs/monitoring.md lists the exact monitors (incl. /q guest-menu 200 check) + TLS-expiry alerts (catches the ~90-day cert breakage early) + alert-channel setup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 18:45:07 +03:30
parent d407f0b3e9
commit 32a7cf5b25
7 changed files with 231 additions and 0 deletions
@@ -0,0 +1,47 @@
+# Meezi uptime monitoring (Uptime Kuma)
+
+Self-hosted uptime + TLS-expiry monitoring with alerting. Runs as a separate
+compose stack so it stays up independently of app deploys.
+
+## Stand it up (one time, on the prod host)
+```bash
+cd /path/to/meezi
+docker compose -f docker-compose.monitoring.yml up -d
+```
+Then either:
+- add a DNS A record `status.meezi.ir → server IP` and reload Caddy
+  (`docker exec meezi-caddy caddy reload` or restart the caddy stack) — the
+  `status.{$DOMAIN}` block is already in the Caddyfile, **or**
+- reach it directly at `http://SERVER:3201` for the initial setup.
+
+First visit creates the admin account — set a strong password.
+
+## Monitors to add (in the Uptime Kuma UI)
+Add one **HTTP(s)** monitor per public surface, interval 60s, accept 2xx/3xx:
+
+| Name | URL | Notes |
+|------|-----|-------|
+| Website | https://meezi.ir/fa | marketing |
+| Dashboard | https://app.meezi.ir/fa/login | merchant panel |
+| API health | https://api.meezi.ir/api/public/security-config | returns JSON 200 |
+| Koja | https://koja.meezi.ir/fa | public discovery |
+| Admin | https://admin.meezi.ir | internal panel |
+| Guest menu | https://app.meezi.ir/q/healthcheck | should be 200 (not 500) |
+
+For each HTTPS monitor enable **"Certificate Expiry Notification"** — this
+catches the recurring ~90-day Let's Encrypt cert-chain breakages early
+(see the mirror-cert runbook). Set the threshold to 14 days.
+
+## Alerts
+Settings → Notifications → add a channel (Telegram bot or email/SMTP), then
+attach it to every monitor. Telegram is simplest: create a bot via @BotFather,
+get the chat id, paste both into Uptime Kuma.
+
+## What this does NOT replace
+- **Backups** — see `scripts/backup/RESTORE.md`.
+- **Crash auto-recovery** — Docker `restart: unless-stopped` already restarts
+  crashed containers; Uptime Kuma tells you when one is flapping or down.
+
+## Status page (optional)
+Uptime Kuma can publish a public status page (Settings → Status Pages) at
+`status.meezi.ir/status/meezi` if you want customers to see uptime.