Initial commit: nick docs
This commit is contained in:
315
08 - Operations/Backup & Recovery.md
Normal file
315
08 - Operations/Backup & Recovery.md
Normal file
@@ -0,0 +1,315 @@
|
||||
---
|
||||
title: Backup & Recovery
|
||||
tags: [operations]
|
||||
---
|
||||
|
||||
# Backup & Recovery
|
||||
|
||||
How to keep the marketplace recoverable from data loss. Covers MongoDB, Redis, the `uploads/` directory, and environment secrets, plus the disaster-recovery runbook.
|
||||
|
||||
---
|
||||
|
||||
## 1. RTO / RPO targets
|
||||
|
||||
| Asset | RPO (data loss tolerated) | RTO (downtime tolerated) | Backup cadence |
|
||||
|-------|---------------------------|--------------------------|----------------|
|
||||
| MongoDB | 1 hour | 1 hour | Hourly `mongodump` + nightly offsite |
|
||||
| `uploads/` directory | 24 hours | 2 hours | Nightly `rsync` to offsite |
|
||||
| Redis | 1 hour (regeneratable) | 0 minutes (app survives empty cache) | Nightly RDB snapshot |
|
||||
| Production `.env` | n/a (manual) | 5 minutes | Stored in 1Password / Bitwarden vault |
|
||||
| Container images | n/a (CI rebuilds) | 15 minutes | Tagged in registry by version |
|
||||
|
||||
Adjust these targets when product SLAs change.
|
||||
|
||||
---
|
||||
|
||||
## 2. MongoDB
|
||||
|
||||
### 2.1 Dump
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# scripts/backup-mongo.sh — run hourly via cron
|
||||
set -euo pipefail
|
||||
|
||||
STAMP=$(date -u +%FT%H%M%SZ)
|
||||
DEST=/var/backups/mongo
|
||||
mkdir -p "$DEST"
|
||||
|
||||
docker exec nickapp-mongodb \
|
||||
mongodump --db=marketplace --archive --gzip \
|
||||
> "$DEST/marketplace-$STAMP.gz"
|
||||
|
||||
# Keep last 24 hourly + 14 daily
|
||||
find "$DEST" -name 'marketplace-*.gz' -mtime +14 -delete
|
||||
```
|
||||
|
||||
Cron entry:
|
||||
|
||||
```
|
||||
0 * * * * /usr/local/bin/backup-mongo.sh >> /var/log/backup-mongo.log 2>&1
|
||||
```
|
||||
|
||||
### 2.2 Offsite
|
||||
|
||||
Push the most recent dump to S3 (or Backblaze B2, or `rclone` to any provider) nightly:
|
||||
|
||||
```bash
|
||||
aws s3 cp "$DEST"/marketplace-*.gz \
|
||||
"s3://marketplace-backups/mongo/" \
|
||||
--recursive --exclude "*" --include "marketplace-*.gz" \
|
||||
--storage-class STANDARD_IA
|
||||
```
|
||||
|
||||
Set a 90-day lifecycle policy on the bucket to age out old copies.
|
||||
|
||||
### 2.3 Restore
|
||||
|
||||
> [!warning] Restoring is **destructive** to the current data. Always practise on a staging clone before doing it for real.
|
||||
|
||||
```bash
|
||||
# Restore against an empty database (fresh container)
|
||||
docker exec -i nickapp-mongodb \
|
||||
mongorestore --archive --gzip --drop \
|
||||
< /var/backups/mongo/marketplace-2026-05-20T0300Z.gz
|
||||
|
||||
# Verify
|
||||
docker exec nickapp-mongodb mongosh \
|
||||
--eval "use marketplace; db.users.countDocuments()"
|
||||
```
|
||||
|
||||
For partial restore (single collection):
|
||||
|
||||
```bash
|
||||
docker exec -i nickapp-mongodb \
|
||||
mongorestore --archive --gzip --drop \
|
||||
--nsInclude='marketplace.payments' \
|
||||
< /var/backups/mongo/marketplace-2026-05-20T0300Z.gz
|
||||
```
|
||||
|
||||
### 2.4 Validate backups
|
||||
|
||||
A monthly drill — restore the latest dump into a throwaway container and run smoke queries:
|
||||
|
||||
```bash
|
||||
docker run --rm -v $(pwd)/marketplace-latest.gz:/dump.gz mongo:8.2 \
|
||||
sh -c "mongorestore --archive=/dump.gz --gzip && mongosh --eval 'db.getMongo().getDBNames()'"
|
||||
```
|
||||
|
||||
If validation fails, treat as a sev-2 incident (see [[Incident Response]]).
|
||||
|
||||
---
|
||||
|
||||
## 3. Redis
|
||||
|
||||
Redis data is regeneratable — losing it means logged-out users + cold caches, no business data lost. Still cheap to back up.
|
||||
|
||||
### 3.1 Snapshot
|
||||
|
||||
```bash
|
||||
# Trigger a save and copy out
|
||||
docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
|
||||
sleep 5
|
||||
docker cp nickapp-redis:/data/dump.rdb /var/backups/redis/redis-$(date -u +%FT%H%M%SZ).rdb
|
||||
```
|
||||
|
||||
Daily cron is sufficient.
|
||||
|
||||
### 3.2 Restore
|
||||
|
||||
```bash
|
||||
# Stop redis, drop the RDB into the volume, start
|
||||
docker compose -f docker-compose.production.yml stop redis
|
||||
docker cp /var/backups/redis/redis-2026-05-20T0300Z.rdb nickapp-redis:/data/dump.rdb
|
||||
docker compose -f docker-compose.production.yml start redis
|
||||
```
|
||||
|
||||
If you've enabled AOF, also copy `appendonly.aof`. See [[Database Operations#persistence]].
|
||||
|
||||
---
|
||||
|
||||
## 4. `uploads/` directory
|
||||
|
||||
Stored on the host at `/opt/backend/uploads/` and bind-mounted into both backend and nginx containers. This is where every user upload lives — losing it means broken images, missing dispute evidence, and unhappy users.
|
||||
|
||||
### 4.1 Nightly sync
|
||||
|
||||
```bash
|
||||
rsync -av --delete /opt/backend/uploads/ \
|
||||
s3://marketplace-backups/uploads/
|
||||
|
||||
# Or rclone to any provider
|
||||
rclone sync /opt/backend/uploads/ backblaze:marketplace-uploads --transfers 8
|
||||
```
|
||||
|
||||
Cron:
|
||||
|
||||
```
|
||||
30 3 * * * /usr/local/bin/backup-uploads.sh >> /var/log/backup-uploads.log 2>&1
|
||||
```
|
||||
|
||||
### 4.2 Restore
|
||||
|
||||
```bash
|
||||
rsync -av s3://marketplace-backups/uploads/ /opt/backend/uploads/
|
||||
# fix ownership for the marketplace container (uid 1001)
|
||||
chown -R 1001:1001 /opt/backend/uploads
|
||||
```
|
||||
|
||||
Restart the backend container so any in-flight uploads find the right directory layout.
|
||||
|
||||
---
|
||||
|
||||
## 5. Secrets & configuration
|
||||
|
||||
### 5.1 `.env` files
|
||||
|
||||
The production `.env` lives at `/opt/backend/.env`. It is **not** version-controlled and **not** in any standard backup. Source of truth: the team password manager (1Password / Bitwarden vault).
|
||||
|
||||
After any change:
|
||||
|
||||
1. Update the host file.
|
||||
2. Update the vault entry with the new value, a one-line "why", and the date.
|
||||
3. `docker compose -f docker-compose.production.yml up -d` to apply.
|
||||
|
||||
### 5.2 SSL certs
|
||||
|
||||
If you run a host-level Caddy / Nginx with Let's Encrypt, certs auto-renew. Back up `/var/lib/caddy/.local/share/caddy/` (Caddy) or `/etc/letsencrypt/` (Certbot) — useful if you migrate hosts.
|
||||
|
||||
### 5.3 Container registry credentials
|
||||
|
||||
`/root/.docker/config.json` on the production host holds the `git.manko.yoga` login Watchtower uses. Recreate after a rebuild:
|
||||
|
||||
```bash
|
||||
docker login git.manko.yoga -u manawenuz
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Disaster recovery runbook
|
||||
|
||||
> Scenario: production host is unrecoverable (disk failure, cloud provider lost the VM, etc.).
|
||||
|
||||
### Phase 1 — Provision
|
||||
|
||||
1. Spin up a new VM matching the previous spec (≥ 4 vCPU, 8 GB RAM, 100 GB SSD).
|
||||
2. Install Docker Engine + compose plugin.
|
||||
3. Restore DNS pointing or stand up a temporary subdomain (`recovery.amn.gg`).
|
||||
|
||||
### Phase 2 — Code
|
||||
|
||||
```bash
|
||||
cd /opt
|
||||
git clone ssh://git@git.manko.yoga:222/nick/backend.git
|
||||
git clone ssh://git@git.manko.yoga:222/nick/frontend.git
|
||||
cd backend && git checkout main
|
||||
```
|
||||
|
||||
### Phase 3 — Config
|
||||
|
||||
```bash
|
||||
# Restore .env from the vault
|
||||
nano /opt/backend/.env
|
||||
|
||||
# Restore nginx config
|
||||
mkdir -p nginx/logs
|
||||
# copy nginx.conf from the vault / repo / your laptop
|
||||
```
|
||||
|
||||
### Phase 4 — Data
|
||||
|
||||
```bash
|
||||
# Mongo
|
||||
mkdir -p /var/backups/mongo
|
||||
aws s3 cp s3://marketplace-backups/mongo/marketplace-LATEST.gz /var/backups/mongo/
|
||||
|
||||
# Uploads
|
||||
mkdir -p /opt/backend/uploads
|
||||
aws s3 sync s3://marketplace-backups/uploads/ /opt/backend/uploads/
|
||||
chown -R 1001:1001 /opt/backend/uploads
|
||||
|
||||
# Redis (optional — empty is fine)
|
||||
mkdir -p /var/backups/redis
|
||||
aws s3 cp s3://marketplace-backups/redis/redis-LATEST.rdb /var/backups/redis/
|
||||
```
|
||||
|
||||
### Phase 5 — Start stack
|
||||
|
||||
```bash
|
||||
cd /opt/backend
|
||||
docker login git.manko.yoga -u manawenuz
|
||||
docker compose -f docker-compose.production.yml up -d
|
||||
# wait ~60s
|
||||
docker compose -f docker-compose.production.yml ps
|
||||
```
|
||||
|
||||
### Phase 6 — Restore data into running containers
|
||||
|
||||
```bash
|
||||
# Mongo
|
||||
docker exec -i nickapp-mongodb \
|
||||
mongorestore --archive --gzip --drop \
|
||||
< /var/backups/mongo/marketplace-LATEST.gz
|
||||
|
||||
# Redis
|
||||
docker compose stop redis
|
||||
docker cp /var/backups/redis/redis-LATEST.rdb nickapp-redis:/data/dump.rdb
|
||||
docker compose start redis
|
||||
```
|
||||
|
||||
### Phase 7 — Verify
|
||||
|
||||
```bash
|
||||
curl -fsS http://localhost:8083/api/health | jq
|
||||
docker exec nickapp-mongodb mongosh --eval "use marketplace; db.users.countDocuments()"
|
||||
docker compose logs --tail=200 nickapp-backend | grep -E "✅|❌"
|
||||
```
|
||||
|
||||
### Phase 8 — Restart Watchtower & cut over DNS
|
||||
|
||||
```bash
|
||||
docker run -d --name watchtower --restart unless-stopped \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v /root/.docker/config.json:/config.json \
|
||||
-e WATCHTOWER_POLL_INTERVAL=300 \
|
||||
-e WATCHTOWER_LABEL_ENABLE=true \
|
||||
containrrr/watchtower
|
||||
|
||||
# Update DNS for amn.gg / dev.amn.gg to the new host's IP
|
||||
```
|
||||
|
||||
### Phase 9 — Post-mortem
|
||||
|
||||
Write a post-mortem (template in [[Incident Response#postmortem-template]]) and update this runbook with anything that surprised you.
|
||||
|
||||
---
|
||||
|
||||
## 7. Quick-reference commands
|
||||
|
||||
```bash
|
||||
# Mongo dump
|
||||
docker exec nickapp-mongodb mongodump --db=marketplace --archive --gzip > backup.gz
|
||||
# Mongo restore
|
||||
docker exec -i nickapp-mongodb mongorestore --archive --gzip --drop < backup.gz
|
||||
|
||||
# Redis snapshot
|
||||
docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
|
||||
docker cp nickapp-redis:/data/dump.rdb redis.rdb
|
||||
|
||||
# Uploads to S3
|
||||
rclone sync /opt/backend/uploads/ s3:marketplace-backups/uploads/
|
||||
|
||||
# Restore .env
|
||||
# Pull from vault, paste into /opt/backend/.env, docker compose up -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Testing the plan
|
||||
|
||||
> [!tip] Backups are not real until they've been restored. Drill quarterly:
|
||||
>
|
||||
> 1. Spin up a throwaway VM.
|
||||
> 2. Walk Phases 2–7 of the DR runbook with the most recent backups.
|
||||
> 3. Time it. If RTO is busted, fix the gap before the next drill.
|
||||
> 4. Capture lessons in this file.
|
||||
Reference in New Issue
Block a user