nick/nick-doc

Fork 0

Files

moojttaba 0da235ae27 Initial commit: nick docs

2026-05-23 20:35:34 +03:30

8.5 KiB

Raw Blame History

title, tags

title

Monitoring

What's instrumented today and what to watch. Today's stack is intentionally lean — health endpoints, Docker healthchecks, Sentry, and access logs. Bigger metric pipelines (Prometheus, Grafana, OpenSearch) are a future addition.

1. Health endpoint

Path: GET /health (backend, port 5001).

Defined in backend/src/app.ts:

app.get("/health", (req, res) => {
  res.json({
    success: true,
    message: "Marketplace Backend API is running",
    timestamp: new Date().toISOString(),
    environment: config.nodeEnv,
    version: packageJson.version,
  });
});

Returns 200 with a JSON envelope as soon as Express is up. Does not currently probe MongoDB or Redis — they are checked via separate Docker healthchecks. If you want deep health, extend the endpoint to ping both data stores and return 503 on failure.

Public URL behind Nginx: https://amn.gg/api/health.

2. Docker healthchecks

Each long-lived container has a HEALTHCHECK baked in or declared in compose.

Container	Probe	Interval	Failure threshold
`nickapp-backend`	`node healthcheck.js` (HTTP GET `/health`)	30s	3 retries
`nickapp-frontend`	`curl -f http://localhost:8083/`	30s	3 retries
`mongodb`	`mongosh --eval "db.adminCommand('ping')"`	30s	3 retries
`redis`	`redis-cli -a $REDIS_PASSWORD ping`	30s	3 retries

healthcheck.js (backend) is a tiny Node script that does a local HTTP GET to /health and exits 0 / 1.

Inspect health:

docker ps --format "table {{.Names}}\t{{.Status}}"

# Detailed
docker inspect --format='{{json .State.Health}}' nickapp-backend | jq

If a container is unhealthy, Watchtower will not roll it (it expects the new container to pass healthcheck). Investigate with docker logs <container>.

3. Sentry — error tracking

Frontend

@sentry/nextjs ^10.22.0 is wired in via three config files at the repo root:

sentry.client.config.ts — browser SDK (with Session Replay enabled at 10% session / 100% error rate).
sentry.server.config.ts — server-rendered components (no Replay).
sentry.edge.config.ts — edge runtime (not currently used heavily).

Common settings:

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
  environment: process.env.NODE_ENV || 'development',
  enabled: process.env.NODE_ENV === 'production',
  ignoreErrors: ['ResizeObserver loop limit exceeded', 'ChunkLoadError', ...],
});

Errors from localhost are filtered out — only prod errors land in the dashboard.

Backend

@sentry/node ^10.22.0 + @sentry/profiling-node ^10.22.0 are initialised first in src/app.ts (before any other import) via src/config/sentry.ts. DSN comes from SENTRY_DSN env var (see Environment Variables#sentry).

What's captured:

Uncaught exceptions in route handlers
Promise rejections inside asyncHandler-wrapped routes
Manually-captured errors via Sentry.captureException(err)
Performance traces (10% sample rate in prod)
Profiling samples via @sentry/profiling-node

Source maps

Frontend uploads source maps to Sentry at build time when SENTRY_AUTH_TOKEN, SENTRY_ORG, and SENTRY_PROJECT are set in the CI env. Without them the build still succeeds but Sentry traces will show minified frames.

Alerts

Configure in the Sentry dashboard (Issues → Alerts) — common alerts:

Any new issue in production → Slack
Error frequency > 50/minute → page on-call
Performance regression on /api/payments/* traces → email

4. Logs

Backend application logs

Routed through src/utils/logger.ts — currently a thin console.log wrapper with emoji prefixes. Output goes to stdout, captured by Docker:

# Live tail
docker compose -f docker-compose.production.yml logs -f --tail=200 nickapp-backend

# Search for a request
docker logs nickapp-backend 2>&1 | grep "POST /api/payments"

# Pre-filter by date
docker logs --since 1h nickapp-backend

Notable log lines to look for:

Prefix	Meaning
`✅ Connected to MongoDB`	DB connection established
`🚀 Server running on port 5001`	App fully started
`🔌 User connected: <id>`	Socket.IO connection
`📥`	Inbound HTTP request log
`💳 SHKeeper`	SHKeeper webhook / API call
`🔐 Webhook verification`	Webhook signature check result
`❌ Error`	Manual error log (also captured by Sentry)

Nginx access + error logs

Bind-mounted to ./nginx/logs/ on the host:

tail -f /opt/backend/nginx/logs/access.log
tail -f /opt/backend/nginx/logs/error.log

Rotate these via host logrotate to avoid disk fill.

Frontend logs

Next.js logs go to the container stdout:

docker logs -f nickapp-frontend

Browser-side logs that need attention go through Sentry (above) — src/utils/logger.ts in the frontend forwards via Sentry breadcrumbs.

5. Key metrics to watch

Today these are read manually from logs / Sentry. As Prometheus is added, encode them as alerting rules.

Application

Metric	Where to check	Healthy	Alert
5xx rate	Sentry, Nginx access.log	< 0.5 %	> 2 % over 5 min
`/health` p95 latency	curl + timer	< 100 ms	> 1 s
Login success rate	Sentry custom event	> 95 %	< 90 %
Socket disconnect storm	`🔌 User disconnected` log frequency	< 1/s sustained	> 10/s sustained
OpenAI 429s	Backend log `OpenAI ... 429`	0	any

Payments

Metric	Where	Healthy	Alert
Payment success rate	`db.payments.aggregate([{$group:{_id:"$status",n:{$sum:1}}}])`	> 95 % completed of 24h-old payments	< 90 %
Webhook signature failures	log `Webhook verification failed`	0	> 0
SHKeeper API errors (5xx)	log + Sentry	0	> 5/min sustained
Payouts stuck in `pending` > 30 min	`db.payments.find({type:'payout',status:'pending',createdAt:{$lt:ISODate(30 min ago)}})`	empty	non-empty
Missing `transactionHash` after `completed`	the same query that drives `fix-transaction-hashes.js`	empty	non-empty

MongoDB

db.serverStatus().connections           // active connections; alert if >1000
db.serverStatus().opcounters            // ops/sec
db.serverStatus().wiredTiger.cache      // cache hit ratio; aim > 95 %
db.currentOp({ secs_running: { $gte: 5 } })  // long-running queries

Redis

docker exec nickapp-redis redis-cli -a "$REDIS_PASSWORD" INFO stats
# Watch: instantaneous_ops_per_sec, keyspace_hits/misses, rejected_connections, evicted_keys

Alert thresholds: rejected_connections > 0, evicted_keys rising while you don't expect cache pressure, latency_ms p99 > 5ms.

Host

Metric	Tool	Healthy	Alert
Disk usage on `/var/lib/docker`	`df -h`	< 80 %	> 90 %
`/opt/backend/uploads` size	`du -sh`	watch trend	bursty growth (>5 GB/day)
Memory pressure	`free -h`, `docker stats`	< 80 %	swap actively used
Open file descriptors	`cat /proc/<pid>/limits`	well under hard limit	nearing limit

6. Smoke tests after a deploy

Drop these in a runbook for the on-call:

# 1. API health
curl -fsS https://amn.gg/api/health | jq '.success,.version,.environment'

# 2. Login
curl -fsS -X POST https://amn.gg/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@marketplace.com","password":"<prod-admin-pwd>"}' \
  | jq '.success,.data.user.email'

# 3. Frontend HTML loads
curl -fsS https://amn.gg/ -I | head -1   # expect 200

# 4. Socket.IO handshake
curl -fsS "https://amn.gg/socket.io/?EIO=4&transport=polling" -I | head -1

# 5. Containers healthy
docker ps --filter "name=nickapp-" --format "table {{.Names}}\t{{.Status}}"

Any non-OK → see Incident Response.

7. Future work

Prometheus + Grafana with Node exporter + Mongo exporter + Redis exporter — for proper time-series.
OpenTelemetry spans from backend → Sentry / Jaeger.
Healthcheck endpoint that probes Mongo + Redis and returns 503 when degraded.
PagerDuty / OpsGenie wiring from Sentry alerts.
Synthetic checks (Pingdom / UptimeRobot) hitting /health from multiple regions.

For now, Sentry + Docker healthchecks + manual log checks cover the basics. See Incident Response for what to do when something fires.

8.5 KiB Raw Blame History