Files
nick-doc/08 - Operations/Backup & Recovery.md
2026-05-23 20:35:34 +03:30

8.5 KiB
Raw Blame History

title, tags
title tags
Backup & Recovery
operations

Backup & Recovery

How to keep the marketplace recoverable from data loss. Covers MongoDB, Redis, the uploads/ directory, and environment secrets, plus the disaster-recovery runbook.


1. RTO / RPO targets

Asset RPO (data loss tolerated) RTO (downtime tolerated) Backup cadence
MongoDB 1 hour 1 hour Hourly mongodump + nightly offsite
uploads/ directory 24 hours 2 hours Nightly rsync to offsite
Redis 1 hour (regeneratable) 0 minutes (app survives empty cache) Nightly RDB snapshot
Production .env n/a (manual) 5 minutes Stored in 1Password / Bitwarden vault
Container images n/a (CI rebuilds) 15 minutes Tagged in registry by version

Adjust these targets when product SLAs change.


2. MongoDB

2.1 Dump

#!/usr/bin/env bash
# scripts/backup-mongo.sh — run hourly via cron
set -euo pipefail

STAMP=$(date -u +%FT%H%M%SZ)
DEST=/var/backups/mongo
mkdir -p "$DEST"

docker exec nickapp-mongodb \
  mongodump --db=marketplace --archive --gzip \
  > "$DEST/marketplace-$STAMP.gz"

# Keep last 24 hourly + 14 daily
find "$DEST" -name 'marketplace-*.gz' -mtime +14 -delete

Cron entry:

0 * * * * /usr/local/bin/backup-mongo.sh >> /var/log/backup-mongo.log 2>&1

2.2 Offsite

Push the most recent dump to S3 (or Backblaze B2, or rclone to any provider) nightly:

aws s3 cp "$DEST"/marketplace-*.gz \
  "s3://marketplace-backups/mongo/" \
  --recursive --exclude "*" --include "marketplace-*.gz" \
  --storage-class STANDARD_IA

Set a 90-day lifecycle policy on the bucket to age out old copies.

2.3 Restore

[!warning] Restoring is destructive to the current data. Always practise on a staging clone before doing it for real.

# Restore against an empty database (fresh container)
docker exec -i nickapp-mongodb \
  mongorestore --archive --gzip --drop \
  < /var/backups/mongo/marketplace-2026-05-20T0300Z.gz

# Verify
docker exec nickapp-mongodb mongosh \
  --eval "use marketplace; db.users.countDocuments()"

For partial restore (single collection):

docker exec -i nickapp-mongodb \
  mongorestore --archive --gzip --drop \
  --nsInclude='marketplace.payments' \
  < /var/backups/mongo/marketplace-2026-05-20T0300Z.gz

2.4 Validate backups

A monthly drill — restore the latest dump into a throwaway container and run smoke queries:

docker run --rm -v $(pwd)/marketplace-latest.gz:/dump.gz mongo:8.2 \
  sh -c "mongorestore --archive=/dump.gz --gzip && mongosh --eval 'db.getMongo().getDBNames()'"

If validation fails, treat as a sev-2 incident (see Incident Response).


3. Redis

Redis data is regeneratable — losing it means logged-out users + cold caches, no business data lost. Still cheap to back up.

3.1 Snapshot

# Trigger a save and copy out
docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
sleep 5
docker cp nickapp-redis:/data/dump.rdb /var/backups/redis/redis-$(date -u +%FT%H%M%SZ).rdb

Daily cron is sufficient.

3.2 Restore

# Stop redis, drop the RDB into the volume, start
docker compose -f docker-compose.production.yml stop redis
docker cp /var/backups/redis/redis-2026-05-20T0300Z.rdb nickapp-redis:/data/dump.rdb
docker compose -f docker-compose.production.yml start redis

If you've enabled AOF, also copy appendonly.aof. See Database Operations#persistence.


4. uploads/ directory

Stored on the host at /opt/backend/uploads/ and bind-mounted into both backend and nginx containers. This is where every user upload lives — losing it means broken images, missing dispute evidence, and unhappy users.

4.1 Nightly sync

rsync -av --delete /opt/backend/uploads/ \
  s3://marketplace-backups/uploads/

# Or rclone to any provider
rclone sync /opt/backend/uploads/ backblaze:marketplace-uploads --transfers 8

Cron:

30 3 * * * /usr/local/bin/backup-uploads.sh >> /var/log/backup-uploads.log 2>&1

4.2 Restore

rsync -av s3://marketplace-backups/uploads/ /opt/backend/uploads/
# fix ownership for the marketplace container (uid 1001)
chown -R 1001:1001 /opt/backend/uploads

Restart the backend container so any in-flight uploads find the right directory layout.


5. Secrets & configuration

5.1 .env files

The production .env lives at /opt/backend/.env. It is not version-controlled and not in any standard backup. Source of truth: the team password manager (1Password / Bitwarden vault).

After any change:

  1. Update the host file.
  2. Update the vault entry with the new value, a one-line "why", and the date.
  3. docker compose -f docker-compose.production.yml up -d to apply.

5.2 SSL certs

If you run a host-level Caddy / Nginx with Let's Encrypt, certs auto-renew. Back up /var/lib/caddy/.local/share/caddy/ (Caddy) or /etc/letsencrypt/ (Certbot) — useful if you migrate hosts.

5.3 Container registry credentials

/root/.docker/config.json on the production host holds the git.manko.yoga login Watchtower uses. Recreate after a rebuild:

docker login git.manko.yoga -u manawenuz

6. Disaster recovery runbook

Scenario: production host is unrecoverable (disk failure, cloud provider lost the VM, etc.).

Phase 1 — Provision

  1. Spin up a new VM matching the previous spec (≥ 4 vCPU, 8 GB RAM, 100 GB SSD).
  2. Install Docker Engine + compose plugin.
  3. Restore DNS pointing or stand up a temporary subdomain (recovery.amn.gg).

Phase 2 — Code

cd /opt
git clone ssh://git@git.manko.yoga:222/nick/backend.git
git clone ssh://git@git.manko.yoga:222/nick/frontend.git
cd backend && git checkout main

Phase 3 — Config

# Restore .env from the vault
nano /opt/backend/.env

# Restore nginx config
mkdir -p nginx/logs
# copy nginx.conf from the vault / repo / your laptop

Phase 4 — Data

# Mongo
mkdir -p /var/backups/mongo
aws s3 cp s3://marketplace-backups/mongo/marketplace-LATEST.gz /var/backups/mongo/

# Uploads
mkdir -p /opt/backend/uploads
aws s3 sync s3://marketplace-backups/uploads/ /opt/backend/uploads/
chown -R 1001:1001 /opt/backend/uploads

# Redis (optional — empty is fine)
mkdir -p /var/backups/redis
aws s3 cp s3://marketplace-backups/redis/redis-LATEST.rdb /var/backups/redis/

Phase 5 — Start stack

cd /opt/backend
docker login git.manko.yoga -u manawenuz
docker compose -f docker-compose.production.yml up -d
# wait ~60s
docker compose -f docker-compose.production.yml ps

Phase 6 — Restore data into running containers

# Mongo
docker exec -i nickapp-mongodb \
  mongorestore --archive --gzip --drop \
  < /var/backups/mongo/marketplace-LATEST.gz

# Redis
docker compose stop redis
docker cp /var/backups/redis/redis-LATEST.rdb nickapp-redis:/data/dump.rdb
docker compose start redis

Phase 7 — Verify

curl -fsS http://localhost:8083/api/health | jq
docker exec nickapp-mongodb mongosh --eval "use marketplace; db.users.countDocuments()"
docker compose logs --tail=200 nickapp-backend | grep -E "✅|❌"

Phase 8 — Restart Watchtower & cut over DNS

docker run -d --name watchtower --restart unless-stopped \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /root/.docker/config.json:/config.json \
  -e WATCHTOWER_POLL_INTERVAL=300 \
  -e WATCHTOWER_LABEL_ENABLE=true \
  containrrr/watchtower

# Update DNS for amn.gg / dev.amn.gg to the new host's IP

Phase 9 — Post-mortem

Write a post-mortem (template in Incident Response#postmortem-template) and update this runbook with anything that surprised you.


7. Quick-reference commands

# Mongo dump
docker exec nickapp-mongodb mongodump --db=marketplace --archive --gzip > backup.gz
# Mongo restore
docker exec -i nickapp-mongodb mongorestore --archive --gzip --drop < backup.gz

# Redis snapshot
docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
docker cp nickapp-redis:/data/dump.rdb redis.rdb

# Uploads to S3
rclone sync /opt/backend/uploads/ s3:marketplace-backups/uploads/

# Restore .env
# Pull from vault, paste into /opt/backend/.env, docker compose up -d

8. Testing the plan

[!tip] Backups are not real until they've been restored. Drill quarterly:

  1. Spin up a throwaway VM.
  2. Walk Phases 27 of the DR runbook with the most recent backups.
  3. Time it. If RTO is busted, fix the gap before the next drill.
  4. Capture lessons in this file.