From e8a1bba471edb9d7dd8ba11372a7db5a5243dd79 Mon Sep 17 00:00:00 2001 From: Siavash Sameni Date: Sun, 31 May 2026 16:28:09 +0400 Subject: [PATCH] =?UTF-8?q?docs:=20sync=20from=20backend=208e03360=20?= =?UTF-8?q?=E2=80=94=20auth=20health=20hotfix?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../Gatus Monitoring - Proposed Config.md | 6 +++--- 08 - Operations/Monitoring.md | 10 ++++++++-- 09 - Audits/Activity Log.md | 12 ++++++++++++ 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/08 - Operations/Gatus Monitoring - Proposed Config.md b/08 - Operations/Gatus Monitoring - Proposed Config.md index cbde7f5..039a25e 100644 --- a/08 - Operations/Gatus Monitoring - Proposed Config.md +++ b/08 - Operations/Gatus Monitoring - Proposed Config.md @@ -67,10 +67,10 @@ The `GET /api/health` endpoint was shipped in backend 2.6.49. It is public, rate **Shape of the endpoint:** ```ts -// GET /api/health (public, rate-limited but not auth-gated) +// GET /api/health (public, skipped by the global rate limiter) { "status": "ok" | "degraded" | "down", - "version": "2.6.48", + "version": "2.6.84", "uptimeSec": 12345, "checks": { "db": { "ok": true, "latencyMs": 4 }, @@ -82,7 +82,7 @@ The `GET /api/health` endpoint was shipped in backend 2.6.49. It is public, rate } ``` -Each `checks.*.ok` must reflect the actual current state, not a cached one. If any check fails, `status` flips to `degraded`. If `db.ok === false`, `status` flips to `down`. +Each `checks.*.ok` reflects the current backend state, except `rnApi`, which is cached for 60 seconds as of backend `2.6.84` to avoid monitoring-induced upstream rate limits. `rnApi.status === 429` is treated as reachable because Request Network answered; 5xx/timeouts still degrade the report. If any non-DB check fails, `status` flips to `degraded`. If `db.ok === false`, `status` flips to `down`. **Why this shape rather than per-check endpoints:** - One probe, all invariants — cheaper for Gatus and clearer in the dashboard. diff --git a/08 - Operations/Monitoring.md b/08 - Operations/Monitoring.md index 5391ac2..070b344 100644 --- a/08 - Operations/Monitoring.md +++ b/08 - Operations/Monitoring.md @@ -14,7 +14,7 @@ What's instrumented today and what to watch. Today's stack is intentionally lean Two paths are registered (both are public, rate-limited, not auth-gated): - `GET /health` — simple ping used by Docker healthchecks. Returns `200 { success, message, timestamp, environment, version }`. Does **not** probe MongoDB or Redis. -- `GET /api/health` — deep health check added in commit `44579d6` (backend v2.6.49). Calls `runHealthChecks` from `backend/src/services/health/healthCheckService.ts`. Probes MongoDB and Redis, collects memory/uptime stats, and returns a structured report. Returns `503` when `report.status === 'down'`. +- `GET /api/health` — deep health check added in commit `44579d6` (backend v2.6.49). Calls `runHealthChecks` from `backend/src/services/health/healthCheckService.ts`. Probes MongoDB, Redis, Request Network registry data, and Request Network API reachability. Returns `503` only when `report.status === 'down'`. As of backend `2.6.84`, the RN API subcheck is cached for 60 seconds and treats non-5xx HTTP responses, including `429`, as upstream reachable so Gatus/perf probes do not turn a live backend into `degraded`. `GET /api/health` response shape (from `healthCheckService`): ```json @@ -22,7 +22,13 @@ Two paths are registered (both are public, rate-limited, not auth-gated): "status": "ok", "version": "2.6.xx", "timestamp": "...", - "checks": { "mongodb": "ok", "redis": "ok", "uptime": 3600, "memoryMB": 120 } + "checks": { + "db": { "ok": true, "latencyMs": 4 }, + "redis": { "ok": true, "latencyMs": 1 }, + "rnChainRegistry": { "ok": true, "latencyMs": 0, "chainCount": 7 }, + "rnTokenRegistry": { "ok": true, "latencyMs": 0, "tokenCount": 12 }, + "rnApi": { "ok": true, "latencyMs": 134, "status": 401 } + } } ``` diff --git a/09 - Audits/Activity Log.md b/09 - Audits/Activity Log.md index b8798f8..0bd290e 100644 --- a/09 - Audits/Activity Log.md +++ b/09 - Audits/Activity Log.md @@ -11,6 +11,18 @@ entries on top. Maintained by agents per the rule in `../AGENTS.md`. --- +### 2026-05-31 — backend@8e03360, frontend@228eed2 — keep auth and health checks resilient under load + +**Commits:** backend `8e03360`, frontend `228eed2` (backend `2.6.84`, frontend `2.7.24`) +**Touched:** +- Backend: `src/app.ts`, `src/services/health/healthCheckService.ts`, `package.json`, `package-lock.json` +- Frontend: `package.json`, `package-lock.json` version bump only. +**Why:** A dev performance run consumed the global 100/15m limiter and blocked `/api/auth/login`; repeated `/api/health` calls also drove the external Request Network reachability probe into `429`, making Gatus report `status: degraded` even though Mongo/Redis/app were healthy. Auth routes now bypass the global limiter and rely on the auth-specific limiter, and the RN health subcheck is cached and treats non-5xx HTTP responses as upstream reachable. +**Verification:** Backend `npm test -- --runTestsByPath __tests__/health-check.test.ts`; backend `npm run typecheck`; backend `git diff --check`; frontend `npx tsc --noEmit --ignoreDeprecations 6.0`; frontend `git diff --check`. Dev login was manually verified after resetting the backend limiter state. +**Linked docs updated:** [[Monitoring]], [[Gatus Monitoring - Proposed Config]] + +--- + ### 2026-05-31 — backend@cbc32dc, frontend@08e8da9 — seller-owned template delivery and payment rails **Commits:** backend `cbc32dc`, frontend `08e8da9` (backend `2.6.83`, frontend `2.7.23`)