--- title: Backup & Recovery tags: [operations] --- # Backup & Recovery How to keep the marketplace recoverable from data loss. Covers MongoDB, Redis, the `uploads/` directory, and environment secrets, plus the disaster-recovery runbook. --- ## 1. RTO / RPO targets | Asset | RPO (data loss tolerated) | RTO (downtime tolerated) | Backup cadence | |-------|---------------------------|--------------------------|----------------| | MongoDB | 1 hour | 1 hour | Hourly `mongodump` + nightly offsite | | `uploads/` directory | 24 hours | 2 hours | Nightly `rsync` to offsite | | Redis | 1 hour (regeneratable) | 0 minutes (app survives empty cache) | Nightly RDB snapshot | | Production `.env` | n/a (manual) | 5 minutes | Stored in 1Password / Bitwarden vault | | Container images | n/a (CI rebuilds) | 15 minutes | Tagged in registry by version | Adjust these targets when product SLAs change. --- ## 2. MongoDB ### 2.1 Dump ```bash #!/usr/bin/env bash # scripts/backup-mongo.sh — run hourly via cron set -euo pipefail STAMP=$(date -u +%FT%H%M%SZ) DEST=/var/backups/mongo mkdir -p "$DEST" docker exec nickapp-mongodb \ mongodump --db=marketplace --archive --gzip \ > "$DEST/marketplace-$STAMP.gz" # Keep last 24 hourly + 14 daily find "$DEST" -name 'marketplace-*.gz' -mtime +14 -delete ``` Cron entry: ``` 0 * * * * /usr/local/bin/backup-mongo.sh >> /var/log/backup-mongo.log 2>&1 ``` ### 2.2 Offsite Push the most recent dump to S3 (or Backblaze B2, or `rclone` to any provider) nightly: ```bash aws s3 cp "$DEST"/marketplace-*.gz \ "s3://marketplace-backups/mongo/" \ --recursive --exclude "*" --include "marketplace-*.gz" \ --storage-class STANDARD_IA ``` Set a 90-day lifecycle policy on the bucket to age out old copies. ### 2.3 Restore > [!warning] Restoring is **destructive** to the current data. Always practise on a staging clone before doing it for real. ```bash # Restore against an empty database (fresh container) docker exec -i nickapp-mongodb \ mongorestore --archive --gzip --drop \ < /var/backups/mongo/marketplace-2026-05-20T0300Z.gz # Verify docker exec nickapp-mongodb mongosh \ --eval "use marketplace; db.users.countDocuments()" ``` For partial restore (single collection): ```bash docker exec -i nickapp-mongodb \ mongorestore --archive --gzip --drop \ --nsInclude='marketplace.payments' \ < /var/backups/mongo/marketplace-2026-05-20T0300Z.gz ``` ### 2.4 Validate backups A monthly drill — restore the latest dump into a throwaway container and run smoke queries: ```bash docker run --rm -v $(pwd)/marketplace-latest.gz:/dump.gz mongo:8.2 \ sh -c "mongorestore --archive=/dump.gz --gzip && mongosh --eval 'db.getMongo().getDBNames()'" ``` If validation fails, treat as a sev-2 incident (see [[Incident Response]]). --- ## 3. Redis Redis data is regeneratable — losing it means logged-out users + cold caches, no business data lost. Still cheap to back up. ### 3.1 Snapshot ```bash # Trigger a save and copy out docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE sleep 5 docker cp nickapp-redis:/data/dump.rdb /var/backups/redis/redis-$(date -u +%FT%H%M%SZ).rdb ``` Daily cron is sufficient. ### 3.2 Restore ```bash # Stop redis, drop the RDB into the volume, start docker compose -f docker-compose.production.yml stop redis docker cp /var/backups/redis/redis-2026-05-20T0300Z.rdb nickapp-redis:/data/dump.rdb docker compose -f docker-compose.production.yml start redis ``` If you've enabled AOF, also copy `appendonly.aof`. See [[Database Operations#persistence]]. --- ## 4. `uploads/` directory Stored on the host at `/opt/backend/uploads/` and bind-mounted into both backend and nginx containers. This is where every user upload lives — losing it means broken images, missing dispute evidence, and unhappy users. ### 4.1 Nightly sync ```bash rsync -av --delete /opt/backend/uploads/ \ s3://marketplace-backups/uploads/ # Or rclone to any provider rclone sync /opt/backend/uploads/ backblaze:marketplace-uploads --transfers 8 ``` Cron: ``` 30 3 * * * /usr/local/bin/backup-uploads.sh >> /var/log/backup-uploads.log 2>&1 ``` ### 4.2 Restore ```bash rsync -av s3://marketplace-backups/uploads/ /opt/backend/uploads/ # fix ownership for the marketplace container (uid 1001) chown -R 1001:1001 /opt/backend/uploads ``` Restart the backend container so any in-flight uploads find the right directory layout. --- ## 5. Secrets & configuration ### 5.1 `.env` files The production `.env` lives at `/opt/backend/.env`. It is **not** version-controlled and **not** in any standard backup. Source of truth: the team password manager (1Password / Bitwarden vault). After any change: 1. Update the host file. 2. Update the vault entry with the new value, a one-line "why", and the date. 3. `docker compose -f docker-compose.production.yml up -d` to apply. ### 5.2 SSL certs If you run a host-level Caddy / Nginx with Let's Encrypt, certs auto-renew. Back up `/var/lib/caddy/.local/share/caddy/` (Caddy) or `/etc/letsencrypt/` (Certbot) — useful if you migrate hosts. ### 5.3 Container registry credentials `/root/.docker/config.json` on the production host holds the `git.manko.yoga` login Watchtower uses. Recreate after a rebuild: ```bash docker login git.manko.yoga -u manawenuz ``` --- ## 6. Disaster recovery runbook > Scenario: production host is unrecoverable (disk failure, cloud provider lost the VM, etc.). ### Phase 1 — Provision 1. Spin up a new VM matching the previous spec (≥ 4 vCPU, 8 GB RAM, 100 GB SSD). 2. Install Docker Engine + compose plugin. 3. Restore DNS pointing or stand up a temporary subdomain (`recovery.amn.gg`). ### Phase 2 — Code ```bash cd /opt git clone ssh://git@git.manko.yoga:222/nick/backend.git git clone ssh://git@git.manko.yoga:222/nick/frontend.git cd backend && git checkout main ``` ### Phase 3 — Config ```bash # Restore .env from the vault nano /opt/backend/.env # Restore nginx config mkdir -p nginx/logs # copy nginx.conf from the vault / repo / your laptop ``` ### Phase 4 — Data ```bash # Mongo mkdir -p /var/backups/mongo aws s3 cp s3://marketplace-backups/mongo/marketplace-LATEST.gz /var/backups/mongo/ # Uploads mkdir -p /opt/backend/uploads aws s3 sync s3://marketplace-backups/uploads/ /opt/backend/uploads/ chown -R 1001:1001 /opt/backend/uploads # Redis (optional — empty is fine) mkdir -p /var/backups/redis aws s3 cp s3://marketplace-backups/redis/redis-LATEST.rdb /var/backups/redis/ ``` ### Phase 5 — Start stack ```bash cd /opt/backend docker login git.manko.yoga -u manawenuz docker compose -f docker-compose.production.yml up -d # wait ~60s docker compose -f docker-compose.production.yml ps ``` ### Phase 6 — Restore data into running containers ```bash # Mongo docker exec -i nickapp-mongodb \ mongorestore --archive --gzip --drop \ < /var/backups/mongo/marketplace-LATEST.gz # Redis docker compose stop redis docker cp /var/backups/redis/redis-LATEST.rdb nickapp-redis:/data/dump.rdb docker compose start redis ``` ### Phase 7 — Verify ```bash curl -fsS http://localhost:8083/api/health | jq docker exec nickapp-mongodb mongosh --eval "use marketplace; db.users.countDocuments()" docker compose logs --tail=200 nickapp-backend | grep -E "✅|❌" ``` ### Phase 8 — Restart Watchtower & cut over DNS ```bash docker run -d --name watchtower --restart unless-stopped \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /root/.docker/config.json:/config.json \ -e WATCHTOWER_POLL_INTERVAL=300 \ -e WATCHTOWER_LABEL_ENABLE=true \ containrrr/watchtower # Update DNS for amn.gg / dev.amn.gg to the new host's IP ``` ### Phase 9 — Post-mortem Write a post-mortem (template in [[Incident Response#postmortem-template]]) and update this runbook with anything that surprised you. --- ## 7. Quick-reference commands ```bash # Mongo dump docker exec nickapp-mongodb mongodump --db=marketplace --archive --gzip > backup.gz # Mongo restore docker exec -i nickapp-mongodb mongorestore --archive --gzip --drop < backup.gz # Redis snapshot docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE docker cp nickapp-redis:/data/dump.rdb redis.rdb # Uploads to S3 rclone sync /opt/backend/uploads/ s3:marketplace-backups/uploads/ # Restore .env # Pull from vault, paste into /opt/backend/.env, docker compose up -d ``` --- ## 8. Testing the plan > [!tip] Backups are not real until they've been restored. Drill quarterly: > > 1. Spin up a throwaway VM. > 2. Walk Phases 2–7 of the DR runbook with the most recent backups. > 3. Time it. If RTO is busted, fix the gap before the next drill. > 4. Capture lessons in this file.