Files
nick-doc/08 - Operations/Backup & Recovery.md
2026-05-23 20:35:34 +03:30

316 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Backup & Recovery
tags: [operations]
---
# Backup & Recovery
How to keep the marketplace recoverable from data loss. Covers MongoDB, Redis, the `uploads/` directory, and environment secrets, plus the disaster-recovery runbook.
---
## 1. RTO / RPO targets
| Asset | RPO (data loss tolerated) | RTO (downtime tolerated) | Backup cadence |
|-------|---------------------------|--------------------------|----------------|
| MongoDB | 1 hour | 1 hour | Hourly `mongodump` + nightly offsite |
| `uploads/` directory | 24 hours | 2 hours | Nightly `rsync` to offsite |
| Redis | 1 hour (regeneratable) | 0 minutes (app survives empty cache) | Nightly RDB snapshot |
| Production `.env` | n/a (manual) | 5 minutes | Stored in 1Password / Bitwarden vault |
| Container images | n/a (CI rebuilds) | 15 minutes | Tagged in registry by version |
Adjust these targets when product SLAs change.
---
## 2. MongoDB
### 2.1 Dump
```bash
#!/usr/bin/env bash
# scripts/backup-mongo.sh — run hourly via cron
set -euo pipefail
STAMP=$(date -u +%FT%H%M%SZ)
DEST=/var/backups/mongo
mkdir -p "$DEST"
docker exec nickapp-mongodb \
mongodump --db=marketplace --archive --gzip \
> "$DEST/marketplace-$STAMP.gz"
# Keep last 24 hourly + 14 daily
find "$DEST" -name 'marketplace-*.gz' -mtime +14 -delete
```
Cron entry:
```
0 * * * * /usr/local/bin/backup-mongo.sh >> /var/log/backup-mongo.log 2>&1
```
### 2.2 Offsite
Push the most recent dump to S3 (or Backblaze B2, or `rclone` to any provider) nightly:
```bash
aws s3 cp "$DEST"/marketplace-*.gz \
"s3://marketplace-backups/mongo/" \
--recursive --exclude "*" --include "marketplace-*.gz" \
--storage-class STANDARD_IA
```
Set a 90-day lifecycle policy on the bucket to age out old copies.
### 2.3 Restore
> [!warning] Restoring is **destructive** to the current data. Always practise on a staging clone before doing it for real.
```bash
# Restore against an empty database (fresh container)
docker exec -i nickapp-mongodb \
mongorestore --archive --gzip --drop \
< /var/backups/mongo/marketplace-2026-05-20T0300Z.gz
# Verify
docker exec nickapp-mongodb mongosh \
--eval "use marketplace; db.users.countDocuments()"
```
For partial restore (single collection):
```bash
docker exec -i nickapp-mongodb \
mongorestore --archive --gzip --drop \
--nsInclude='marketplace.payments' \
< /var/backups/mongo/marketplace-2026-05-20T0300Z.gz
```
### 2.4 Validate backups
A monthly drill — restore the latest dump into a throwaway container and run smoke queries:
```bash
docker run --rm -v $(pwd)/marketplace-latest.gz:/dump.gz mongo:8.2 \
sh -c "mongorestore --archive=/dump.gz --gzip && mongosh --eval 'db.getMongo().getDBNames()'"
```
If validation fails, treat as a sev-2 incident (see [[Incident Response]]).
---
## 3. Redis
Redis data is regeneratable — losing it means logged-out users + cold caches, no business data lost. Still cheap to back up.
### 3.1 Snapshot
```bash
# Trigger a save and copy out
docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
sleep 5
docker cp nickapp-redis:/data/dump.rdb /var/backups/redis/redis-$(date -u +%FT%H%M%SZ).rdb
```
Daily cron is sufficient.
### 3.2 Restore
```bash
# Stop redis, drop the RDB into the volume, start
docker compose -f docker-compose.production.yml stop redis
docker cp /var/backups/redis/redis-2026-05-20T0300Z.rdb nickapp-redis:/data/dump.rdb
docker compose -f docker-compose.production.yml start redis
```
If you've enabled AOF, also copy `appendonly.aof`. See [[Database Operations#persistence]].
---
## 4. `uploads/` directory
Stored on the host at `/opt/backend/uploads/` and bind-mounted into both backend and nginx containers. This is where every user upload lives — losing it means broken images, missing dispute evidence, and unhappy users.
### 4.1 Nightly sync
```bash
rsync -av --delete /opt/backend/uploads/ \
s3://marketplace-backups/uploads/
# Or rclone to any provider
rclone sync /opt/backend/uploads/ backblaze:marketplace-uploads --transfers 8
```
Cron:
```
30 3 * * * /usr/local/bin/backup-uploads.sh >> /var/log/backup-uploads.log 2>&1
```
### 4.2 Restore
```bash
rsync -av s3://marketplace-backups/uploads/ /opt/backend/uploads/
# fix ownership for the marketplace container (uid 1001)
chown -R 1001:1001 /opt/backend/uploads
```
Restart the backend container so any in-flight uploads find the right directory layout.
---
## 5. Secrets & configuration
### 5.1 `.env` files
The production `.env` lives at `/opt/backend/.env`. It is **not** version-controlled and **not** in any standard backup. Source of truth: the team password manager (1Password / Bitwarden vault).
After any change:
1. Update the host file.
2. Update the vault entry with the new value, a one-line "why", and the date.
3. `docker compose -f docker-compose.production.yml up -d` to apply.
### 5.2 SSL certs
If you run a host-level Caddy / Nginx with Let's Encrypt, certs auto-renew. Back up `/var/lib/caddy/.local/share/caddy/` (Caddy) or `/etc/letsencrypt/` (Certbot) — useful if you migrate hosts.
### 5.3 Container registry credentials
`/root/.docker/config.json` on the production host holds the `git.manko.yoga` login Watchtower uses. Recreate after a rebuild:
```bash
docker login git.manko.yoga -u manawenuz
```
---
## 6. Disaster recovery runbook
> Scenario: production host is unrecoverable (disk failure, cloud provider lost the VM, etc.).
### Phase 1 — Provision
1. Spin up a new VM matching the previous spec (≥ 4 vCPU, 8 GB RAM, 100 GB SSD).
2. Install Docker Engine + compose plugin.
3. Restore DNS pointing or stand up a temporary subdomain (`recovery.amn.gg`).
### Phase 2 — Code
```bash
cd /opt
git clone ssh://git@git.manko.yoga:222/nick/backend.git
git clone ssh://git@git.manko.yoga:222/nick/frontend.git
cd backend && git checkout main
```
### Phase 3 — Config
```bash
# Restore .env from the vault
nano /opt/backend/.env
# Restore nginx config
mkdir -p nginx/logs
# copy nginx.conf from the vault / repo / your laptop
```
### Phase 4 — Data
```bash
# Mongo
mkdir -p /var/backups/mongo
aws s3 cp s3://marketplace-backups/mongo/marketplace-LATEST.gz /var/backups/mongo/
# Uploads
mkdir -p /opt/backend/uploads
aws s3 sync s3://marketplace-backups/uploads/ /opt/backend/uploads/
chown -R 1001:1001 /opt/backend/uploads
# Redis (optional — empty is fine)
mkdir -p /var/backups/redis
aws s3 cp s3://marketplace-backups/redis/redis-LATEST.rdb /var/backups/redis/
```
### Phase 5 — Start stack
```bash
cd /opt/backend
docker login git.manko.yoga -u manawenuz
docker compose -f docker-compose.production.yml up -d
# wait ~60s
docker compose -f docker-compose.production.yml ps
```
### Phase 6 — Restore data into running containers
```bash
# Mongo
docker exec -i nickapp-mongodb \
mongorestore --archive --gzip --drop \
< /var/backups/mongo/marketplace-LATEST.gz
# Redis
docker compose stop redis
docker cp /var/backups/redis/redis-LATEST.rdb nickapp-redis:/data/dump.rdb
docker compose start redis
```
### Phase 7 — Verify
```bash
curl -fsS http://localhost:8083/api/health | jq
docker exec nickapp-mongodb mongosh --eval "use marketplace; db.users.countDocuments()"
docker compose logs --tail=200 nickapp-backend | grep -E "✅|❌"
```
### Phase 8 — Restart Watchtower & cut over DNS
```bash
docker run -d --name watchtower --restart unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /root/.docker/config.json:/config.json \
-e WATCHTOWER_POLL_INTERVAL=300 \
-e WATCHTOWER_LABEL_ENABLE=true \
containrrr/watchtower
# Update DNS for amn.gg / dev.amn.gg to the new host's IP
```
### Phase 9 — Post-mortem
Write a post-mortem (template in [[Incident Response#postmortem-template]]) and update this runbook with anything that surprised you.
---
## 7. Quick-reference commands
```bash
# Mongo dump
docker exec nickapp-mongodb mongodump --db=marketplace --archive --gzip > backup.gz
# Mongo restore
docker exec -i nickapp-mongodb mongorestore --archive --gzip --drop < backup.gz
# Redis snapshot
docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
docker cp nickapp-redis:/data/dump.rdb redis.rdb
# Uploads to S3
rclone sync /opt/backend/uploads/ s3:marketplace-backups/uploads/
# Restore .env
# Pull from vault, paste into /opt/backend/.env, docker compose up -d
```
---
## 8. Testing the plan
> [!tip] Backups are not real until they've been restored. Drill quarterly:
>
> 1. Spin up a throwaway VM.
> 2. Walk Phases 27 of the DR runbook with the most recent backups.
> 3. Time it. If RTO is busted, fix the gap before the next drill.
> 4. Capture lessons in this file.