Incidents
Post-incident reports for service disruptions.
2026-02-27 — Game State Loss (Redis Restart)
Duration: ~06:37 UTC until manual detection Severity: High Impact: All in-progress games lost
What happened
Redis restarted during an unattended-upgrades cycle at 06:37 UTC. Since game state lived only in Redis with no durable backup, every active game was wiped. The cleanup cron then marked all affected tables as expired.
Decks, accounts, and match history were unaffected (stored in Postgres).
Root cause
- Redis had no persistence (no RDB/AOF) — a restart means total data loss
- No backup layer existed for game state
needrestartwas configured to auto-restart services including Redis
Remediation
- Disabled
unattended-upgradesto prevent uncontrolled service restarts - Enabled Redis RDB persistence (
save 60 1) — worst-case data loss reduced from "entire game" to "last 60 seconds of actions"