nick/nick-doc

Fork 0

Files

Siavash Sameni 6a451040d9 Complete task 4 backend security architecture docs

2026-05-24 11:31:40 +04:00

43 KiB

Raw Blame History

title, tags, created, status, reviewers

title

Session and Authentication Architecture Decision

Architecture Decision Record. This document resolves deferred decisions D-1, D-4, D-5, and D-6 from Security Ownership and Launch Decision Criteria and addresses threats T04, T10, T13, T15, and T22 from Threat Model - Amanat Escrow Platform.

All decisions in this document are binding for the Amanat platform. Changes require sign-off from the accountable role per the RACI in Security Ownership and Launch Decision Criteria Section 1.

Section 1: Decision Summary

Decision	Chosen Option	Rejected Alternatives	Status
Access token storage	In-memory only (JavaScript variable), not persisted to any browser storage	localStorage (current), httpOnly cookie	Decided
Refresh token storage	httpOnly, Secure, SameSite=Strict cookie	localStorage (current), response body	Decided
Access token lifetime	15 minutes	7 days (current), 60 minutes, 30 minutes	Decided
Refresh token lifetime	7 days with rotation on every use	30 days (current), 14 days, 1 day	Decided
CSRF strategy	SameSite=Strict on refresh cookie; no CSRF token needed because access token is never in a cookie	Double-submit cookie, synchronizer token pattern	Decided
Passkey/WebAuthn	Option C: feature-flagged off in production; stubs remain for development	Remove entirely, implement production WebAuthn pre-launch	Decided
OAuth requirements	Authorization code with PKCE; same session model as email/password; implicit account linking by email	Implicit flow, separate session model for OAuth	Decided
Device/session revocation	user.refreshTokens[] with device metadata, revocation endpoints, max 5 sessions per user	No revocation (current), Redis-only session store	Decided
Admin step-up auth	Required for high-risk actions; 5-minute elevated session via re-authentication (password)	No step-up (current), hardware 2FA mandatory, passkey-only	Decided
Two-person approval	Required for payouts above $1,000 USD equivalent; second admin must confirm	Single-admin approval for all amounts, multi-sig wallet only	Decided

Section 2: Access Token Storage and Lifetime

Decision

We will store the access token in JavaScript memory only (a module-level variable or React context), not in localStorage, sessionStorage, IndexedDB, or any cookie. The access token is never written to any persistent browser storage.

Rejected alternatives

Alternative	Rejection reason
localStorage (current state)	Fully accessible to any XSS payload. Threat T13: any script injection in any dependency, user-generated content field, or uploaded file exfiltrates the token. Threat T04: 7-day lifetime means a stolen token grants access for up to a week without triggering refresh rotation.
httpOnly cookie	Eliminates XSS theft but introduces CSRF risk on every API call. Requires CSRF token infrastructure for all state-changing endpoints. Adds CORS complexity because cookies are sent automatically by the browser on cross-origin requests if SameSite is not Strict. The access token must be sent on every API call, so the CSRF surface is every endpoint.

Rationale

In-memory storage is immune to XSS-based persistent theft. An attacker who achieves XSS can read the token during the lifetime of the page session, but the token evaporates on tab close, navigation away, or page refresh. Combined with a 15-minute expiry, the maximum exploitation window from any single XSS event is 15 minutes, not 7 days.

The access token is only needed for the duration of a browser session. The Axios interceptor holds the token in a closure variable. On page refresh, the refresh-token flow re-establishes the access token transparently.

Token lifetime

15 minutes from issuance.

Rationale: Threat T04 identifies the 7-day access token lifetime as a critical risk. A stolen access token is usable for its entire lifetime without triggering refresh-token rotation. Reducing to 15 minutes limits the damage window to 15 minutes maximum. This is short enough that an attacker who exfiltrates the token via a transient XSS event (e.g., a reflected XSS that closes the page) loses access almost immediately, while long enough to avoid excessive refresh calls on a normal browsing session.

Token format

JWT with HS256 signing (same algorithm as current). Claims:

Claim	Value	Purpose
`sub`	User `_id` (ObjectId string)	Subject identification
`role`	`buyer`, `seller`, `admin`, `support`	Authorization enforcement
`iat`	Unix timestamp	Issued-at
`exp`	`iat + 900` (15 minutes)	Expiry enforcement
`jti`	UUID v4	Token-specific identifier for audit trail and future revocation
`iss`	`marketplace-backend`	Issuer verification
`aud`	`marketplace-users`	Audience verification

No change to the signing algorithm or secret management. The jti claim is new and required for audit logging and future deny-list capabilities.

Renewal strategy

Silent refresh with rotation. The Axios response interceptor detects 401 TOKEN_INVALID or 403 responses. It attempts a refresh using the httpOnly cookie (sent automatically by the browser). On success, the backend returns a new access token in the response body and sets a new refresh cookie. The interceptor updates the in-memory access token and retries the original request.

Concurrent requests during refresh: implement a refresh mutex (single inflight refresh promise). If multiple requests fail with 401 simultaneously, the first triggers the refresh; subsequent requests await the same promise and retry with the new token.

Absolute expiry: The refresh token has a maximum lifetime of 7 days (see Section 3). After 7 days, the user must re-authenticate. There is no sliding window that extends beyond 7 days.

Token theft detection and response

Refresh-token reuse detection (already exists, remains in place): If a previously-used refresh token is presented, the backend invalidates all sessions for that user and forces re-authentication. The user receives an email notification: "Your session was terminated due to suspicious activity. Please sign in again."
Access token theft: Because the token is in-memory only, persistent theft requires a sustained XSS presence. The 15-minute window limits exposure. No server-side detection is possible for access-token theft alone (the token is valid until expiry). The primary defense is the short lifetime.
Audit logging: Every token issuance and refresh is logged with jti, userId, ip, userAgent, and timestamp. Abnormal patterns (e.g., refresh from a new IP geolocation far from the previous one) trigger an alert for investigation.

Section 3: Refresh Token Storage, Rotation, and Revocation

Storage location

httpOnly, Secure, SameSite=Strict cookie named __Host-refresh-token.

Cookie attributes:

Attribute	Value	Rationale
`httpOnly`	`true`	Not accessible to JavaScript; immune to XSS theft
`secure`	`true`	Transmitted only over HTTPS
`sameSite`	`Strict`	Not sent on cross-origin requests; CSRF protection
`path`	`/api/auth/refresh-token`	Cookie sent only on the refresh endpoint, not on every API call
`domain`	(omitted; host-only)	Scoped to the exact origin
`maxAge`	`604800000` (7 days in ms)	Matches refresh token lifetime

The __Host- prefix ensures the cookie cannot be set by a subdomain and requires Secure.

Refresh token lifetime

7 days with rotation on every use.

Rationale: The current 30-day lifetime is excessive for a financial platform. Seven days provides a reasonable balance between user convenience (users are not forced to re-authenticate daily) and security (a compromised refresh token is only exploitable for 7 days maximum, and rotation reduces the window further). The deferred decision D-1 in Security Ownership and Launch Decision Criteria accepts that the migration from localStorage to httpOnly cookies must happen within 30 days post-launch. This ADR specifies the target state.

Rotation strategy

On every refresh:

Backend receives the refresh token from the cookie.
Verifies the JWT signature and expiry.
Looks up the user and checks the token hash is present in user.refreshTokens[].
Removes the old token from the array.
Issues a new access token (in response body) and a new refresh token (in Set-Cookie header).
Pushes the new refresh token hash to user.refreshTokens[].

Token reuse detection

If a previously-consumed refresh token is presented (i.e., the token was already rotated away and is no longer in user.refreshTokens[]):

Invalidate ALL sessions for that user: set user.refreshTokens = [].
Revoke all Redis session records for that user.
Send an email to the user: "A potential security issue was detected. All your sessions have been terminated. Please sign in again and change your password if you did not initiate this activity."
Log the event with both token hashes, the IP addresses of both the legitimate and reuse attempts, and timestamps.

This is the existing behavior documented in Authentication Flow and Security Architecture section 2.4. It remains unchanged.

MongoDB storage schema

The user.refreshTokens[] array currently stores raw token strings. We will migrate to a subdocument array with metadata:

refreshTokens: [{
  tokenHash: String,       // SHA-256 hash of the refresh token (not the raw token)
  deviceInfo: String,      // User-Agent string (truncated to 200 chars)
  ipAddress: String,       // IP at time of token issuance
  createdAt: Date,         // When the token was issued
  lastUsedAt: Date,        // Updated on each refresh
}]

The raw refresh token is never stored in MongoDB. Only its SHA-256 hash is stored. Verification compares sha256(receivedToken) === stored.tokenHash.

Migration: existing plain-string entries in user.refreshTokens[] are invalidated on the first refresh after deployment. Users with existing sessions are force-re-authenticated once.

Revocation endpoints

POST /api/auth/revoke-session

Requires Bearer JWT. Body: { sessionTokenHash: string } (the hash of the session to revoke, obtained from the session listing endpoint). The backend removes the matching entry from user.refreshTokens[] and deletes any Redis session keyed by tokens associated with that refresh token.

POST /api/auth/revoke-all-sessions

Requires Bearer JWT. Removes all entries from user.refreshTokens[] except the one used by the current request (identified by the refresh cookie). Deletes all Redis sessions for that user. The user remains logged in on the current device.

Both endpoints are audited: action, actor, target session hash, timestamp, IP, user-agent.

Section 4: CSRF Strategy

Current state

JWT is sent in the Authorization: Bearer header. Browsers do not attach Authorization headers on cross-origin requests. CSRF is currently mitigated by design (Threat T15: "Mitigated").

Decision: no CSRF token needed for access tokens

Because the access token is stored in JavaScript memory and sent via the Authorization header, CSRF is not a concern for the majority of API calls. The browser will not include the Authorization header on a forged cross-origin request.

CSRF protection for the refresh endpoint

The refresh token is stored in an httpOnly cookie. However, the cookie attributes provide CSRF protection:

SameSite=Strict: The cookie is not sent on any cross-origin request, including top-level navigations from external sites. This eliminates CSRF on the refresh endpoint.
Path=/api/auth/refresh-token: The cookie is only sent on requests to the refresh endpoint, not on any other API call.

Combined, these attributes mean an attacker cannot trigger a refresh from a cross-origin page. The SameSite=Strict policy is appropriate here because the refresh endpoint is never called from an external context (no OAuth callback, no payment provider redirect targets the refresh endpoint).

If the architecture migrates access tokens to cookies later

If a future decision moves the access token to a cookie (which we explicitly reject in this ADR), CSRF tokens become mandatory. The recommended approach would be:

Double-submit cookie pattern: set a CSRF token in a non-httpOnly cookie; the frontend reads it and includes it in a custom header (X-CSRF-Token). The backend verifies the header matches the cookie.
Apply to all state-changing endpoints (POST, PUT, PATCH, DELETE).

This fallback is documented but not implemented. No action is needed unless the access token storage decision is revisited.

Web3 wallet interactions

Web3 wallet connections (wagmi, WalletConnect) open popup windows or browser extensions for signing. These interactions do not involve the platform's cookies or tokens. The signed transaction or message is returned to the platform's JavaScript context and sent to the backend via the normal Axios interceptor with the in-memory access token. CSRF is not a concern for Web3 interactions.

Section 5: Passkey/WebAuthn Decision

Decision: Option C -- feature-flagged off in production

We will keep the current stubbed passkey implementation in the codebase, gate it behind a feature flag (ENABLE_PASSKEYS), and set this flag to false in production environments.

Rejected alternatives

Alternative	Rejection reason
Option A: Remove passkeys entirely	The frontend UI and backend routes already exist. Removing them means rebuilding the registration and sign-in UI later. The stubbed code does not pose a security risk when disabled via feature flag.
Option B: Implement production WebAuthn pre-launch	Per Platform Logical Audit - 2026-05-24 Finding 2, the current implementation has three critical flaws (stubbed attestation, in-memory challenges, missing refresh-token persistence). Fixing all three, testing across platforms, and auditing the result would take 2-3 weeks of focused engineering. This is not justifiable before launch when password + OAuth authentication is sufficient.

Rationale

The launch gate in Security Ownership and Launch Decision Criteria Section 2.1.7 requires: "Passkey/WebAuthn disabled in production until real cryptographic implementation is complete." A feature flag is the cleanest way to comply: the code exists but cannot be reached in production. The deferred decision D-4 sets a deadline of 90 days post-launch for real WebAuthn.

Feature flag implementation

Backend env var: ENABLE_PASSKEYS=false in production, true in development.
The passkey routes (/api/auth/passkey/*) return 404 Not Found when the flag is false. The route registration itself is conditional.
Frontend: the Passkey UI components are hidden when NEXT_PUBLIC_ENABLE_PASSKEYS is not "true".
Both flags are false by default; must be explicitly enabled.

Target WebAuthn implementation (for D-4 resolution)

When the team implements production WebAuthn within 90 days post-launch, the following specifications apply:

Library: @simplewebauthn/server (server-side) and @simplewebauthn/browser (client-side).

Relying Party configuration:

Parameter	Value
RP ID	Production eTLD+1 domain (e.g., `amn.gg`), NOT `localhost`
RP Name	`Amanat`
Origins	`https://amn.gg`, `https://www.amn.gg`
Timeout	60 seconds

Challenge storage: Redis-backed with 5-minute TTL. Key: webauthn:challenge:{challengeHash}. Value: { userId, type: 'registration' | 'authentication', createdAt }. The in-process Map is removed.

Attestation type: none. Rationale: Amanat does not need to verify the authenticator manufacturer or model. Direct or indirect attestation adds complexity (managing attestation certificates, privacy concerns) without security benefit for this platform. We rely on the authenticator's signature, not its attestation.

Credential storage (passkeys[] subdocument):

Field	Type	Description
`id`	String	Base64url-encoded credential ID
`publicKey`	Buffer	COSE public key (actual bytes, not a stub string)
`counter`	Number	Monotonic signature counter; incremented on each authentication
`deviceType`	String	`platform` or `cross-platform`
`deviceName`	String	User-provided label or auto-generated from user-agent
`transports`	String[]	Authenticator transports (e.g., `['internal', 'hybrid']`)
`registeredAt`	Date	Timestamp

Authentication flow integration: Passkey login issues the same JWT pair as password login. The refresh token is persisted in user.refreshTokens[] using the same schema as all other authentication methods. This closes the gap identified in Passkey (WebAuthn) Flow where passkey-issued tokens were not added to the allow-list.

Counter enforcement: On each authentication, the received counter must be strictly greater than the stored counter. If the counter is less than or equal, the authentication is rejected, the event is logged as a potential cloned authenticator, and the user is notified.

Cross-device authentication: Allowed. The transports field in registration options includes hybrid to support cross-device flows (e.g., phone authenticator on desktop login).

Migration from stubbed to real implementation:

Deploy feature flag change: ENABLE_PASSKEYS=true in staging only.
Run migration: delete all entries in user.passkeys[] (they contain the stub 'simulated-public-key' and are not valid credentials). Notify users that passkeys must be re-registered.
Deploy @simplewebauthn/server integration with Redis challenge store.
QA: test registration and authentication on Chrome (Touch ID / YubiKey), Firefox, Safari, Android, iOS.
Enable in production after QA sign-off.

Section 6: OAuth Requirements

Google OAuth

The current Google OAuth implementation documented in Google OAuth Flow is largely compatible with the new session model. The following adjustments apply:

Token exchange flow: The current implementation uses Google Identity Services (GIS) with initTokenClient and requestAccessToken. The frontend receives an ID token (Google-signed JWT). This is sent to the backend for verification. This flow is already correct -- it is equivalent to an authorization code with PKCE flow where Google handles the code exchange client-side. No change is needed.

If additional OAuth providers are added in the future (GitHub, Apple), we will use the authorization code flow with PKCE. The frontend obtains an authorization code via redirect and sends it to the backend. The backend exchanges the code for tokens server-side. This prevents the client from ever seeing the provider's access token.

Session integration: After Google token verification succeeds, the backend issues the same JWT access token (15-minute, in-memory) and refresh token (7-day, httpOnly cookie) as the email/password flow. There is no separate session type for OAuth users. This is the current behavior and it is correct.

Account linking: Account linking is implicit by email match (current behavior). If googleUser.email matches an existing user, the existing account is used. Risks and mitigations:

Risk	Mitigation
Attacker creates a Google account with a victim's email before the victim signs up	Google accounts are pre-verified; the attacker must control the email address at Google. This is a standard OAuth risk.
Victim signs up with email/password; attacker later creates Google account with same email and gains access	The backend checks for existing users on Google sign-in and does NOT create a new account. The attacker would need the victim's Google credentials.
User changes Google account email to match a different user	Google tokens are verified per-request; the backend trusts the `email` from the verified ID token. If Google allows email changes (they do not for gmail.com), this could be a vector. Mitigation: consider storing `googleId` (the `sub` claim) as a separate field in the future for multi-provider identity.

For launch, the current email-based linking is acceptable. Post-launch, we should store providers[].providerId (e.g., google:123456789) for robust multi-provider identity.

Token storage for OAuth sessions: Same as email/password. Access token in memory, refresh token in httpOnly cookie.

Logout behavior for OAuth sessions: Logout invalidates the refresh token in user.refreshTokens[], clears the cookie, and deletes Redis session records. The Google session on Google's side is not terminated (we do not call Google's revoke endpoint). This is standard practice. The user must sign out of Google separately if desired.

Section 7: Admin Step-Up Authentication

Decision

High-risk admin actions require re-authentication. Upon successful re-authentication, the admin receives a short-lived elevated session. Payouts above $1,000 USD equivalent also require two-person approval.

Definition of high-risk admin actions

Action	Step-Up Required	Two-Person Approval	Rationale
Payout/release escrow <= $1,000 USD	Yes (password)	No	Financial action; compromised session could release funds
Payout/release escrow > $1,000 USD	Yes (password)	Yes	High-value financial action; dual control per Threat T18
Manual wallet signing (any amount)	Yes (password)	Yes (if > $1,000)	Direct access to escrow wallet
Refund escrow > $500 USD	Yes (password)	No	Irreversible financial action
User suspension or deletion	Yes (password)	No	Account impact; potential for abuse
Role change (any)	Yes (password)	No	Privilege escalation vector
Dispute override (admin resolves against recommendation)	Yes (password)	No	Financial side-effect; high dispute value
API key rotation (`JWT_SECRET`, webhook secrets)	Yes (password)	No	Invalidates all sessions or compromises integrity
Disable rate limiting or security features	Yes (password)	No	Reduces platform security posture
Export user data (bulk)	Yes (password)	No	Privacy-sensitive bulk operation
View escrow wallet private key (if applicable)	Yes (password)	Yes	Critical asset exposure

Step-up mechanism

Re-authentication with password. The admin must enter their password to obtain an elevated session. No additional 2FA at launch (passkeys are disabled; TOTP is not yet implemented). Post-launch, when WebAuthn is production-ready (D-4), the step-up will also accept passkey authentication as a second factor.

Elevated session:

Attribute	Value
Duration	5 minutes
Storage	Server-side only (Redis key `stepup:{userId}` with TTL 300s)
Scope	Grants elevated permissions for the specific action categories listed above
Renewal	Re-authentication required after 5 minutes; no automatic renewal
Verification	Middleware `requireStepUp()` checks Redis key existence before allowing the action

Step-up flow

Admin attempts a high-risk action (e.g., POST /api/admin/payouts/release).
Middleware requireStepUp() checks for an active elevated session in Redis.
If no elevated session exists, the backend returns 403 STEP_UP_REQUIRED with { challengeId: uuid }.
Frontend displays a password prompt (modal dialog).
Frontend sends POST /api/auth/step-up with { password, challengeId }.
Backend verifies the password against user.password using bcrypt.
On success, backend creates Redis key stepup:{userId} with TTL 300s and returns { elevated: true, expiresAt: timestamp }.
Frontend retries the original high-risk action.
The action proceeds.

Traceability to Authorization Matrix

This matrix maps to:

AUTH-R025 (POST /api/auth/step-up) for the step-up API entry point.
AUTH-R026 (GET /api/auth/sessions), AUTH-R027 (POST /api/auth/revoke-session), AUTH-R028 (POST /api/auth/revoke-all-sessions) for session controls.
APV-R001, APV-R002, APV-R003 for approval queue + confirm/reject workflow.

Status: these rows are marked Not implemented in the matrix while this ADR remains in planning/rollout state.

Two-person approval flow

For actions requiring two-person approval:

Admin A completes the step-up flow above.
Admin A initiates the action (e.g., POST /api/admin/payouts/release).
The action is created in a PendingApproval state (stored in MongoDB).
The system notifies all other admin users via Socket.IO and email.
Admin B navigates to the pending approval, completes their own step-up flow, and confirms (POST /api/admin/approvals/{id}/confirm).
The action executes.
If Admin B rejects (POST /api/admin/approvals/{id}/reject), the action is cancelled.

Fallback when second admin is unavailable:

If no second admin has acted on a pending approval within 4 hours, the CTO (or designated fallback) receives an email and Slack notification. The CTO can approve directly. If no CTO action within 24 hours, the approval expires and must be re-initiated.

This fallback addresses the realistic scenario where Amanat has a small team with few admins. As the team grows, the 4-hour and 24-hour windows should be tightened.

Audit logging for step-up events

All step-up and two-person approval events are logged to an append-only audit collection:

Field	Value
`action`	`step-up.attempt`, `step-up.success`, `step-up.failed`, `approval.created`, `approval.confirmed`, `approval.rejected`, `approval.expired`
`actorId`	ObjectId of the admin performing the action
`targetAction`	The high-risk action being performed (e.g., `payout.release`)
`targetEntity`	ObjectId or identifier of the entity (e.g., Payment ID)
`ip`	Request IP
`userAgent`	Request user-agent
`timestamp`	ISO 8601
`metadata`	JSON object with action-specific details (e.g., payout amount)

This collection is not writable by the application after insert (no updates, no deletes). Access is restricted to admin read-only and system write-only.

Section 8: Session Management and Device Tracking

Session tracking

Sessions are tracked via user.refreshTokens[] subdocuments (see Section 3 schema). Each entry represents one authenticated device.

Device fingerprinting

We will use lightweight, non-invasive device identification:

Signal	Source	Storage	Notes
User-Agent	`req.headers['user-agent']`	`refreshTokens[].deviceInfo`	Truncated to 200 characters
IP address	`req.ip` (behind CloudFlare: `req.headers['x-forwarded-for']`)	`refreshTokens[].ipAddress`	Used for geolocation approximation
Platform hint	Derived from user-agent parsing	Display only	Not stored separately

We will NOT use browser fingerprinting (Canvas, WebGL, font enumeration), device IDs, or any tracking technique that requires user consent under privacy regulations. The user-agent and IP are already sent with every HTTP request.

Session listing

GET /api/auth/sessions (requires Bearer JWT)

Returns the list of active sessions for the current user:

{
  "sessions": [
    {
      "id": "sha256-hash-of-token",
      "device": "Chrome on macOS",
      "lastActive": "2026-05-24T14:30:00Z",
      "ip": "203.0.113.42",
      "location": "Tehran, Iran (approximate)",
      "isCurrent": true
    }
  ]
}

device is a parsed, human-readable string derived from the user-agent (e.g., "Chrome 125 on macOS", "Safari on iPhone").
location is derived from IP geolocation (city-level, approximate). We will use a local GeoIP database (MaxMind GeoLite2 or equivalent) to avoid sending user IPs to third-party services.
isCurrent identifies the session making the request (matched by the refresh cookie).

Session revocation

POST /api/auth/revoke-session (see Section 3).

Users can revoke any non-current session. Revoking the current session is equivalent to logout.

POST /api/auth/revoke-all-sessions (see Section 3).

Revokes all sessions except the current one. Useful if the user suspects compromise.

Maximum sessions per user

5 sessions. When a user attempts to create a 6th session (login from a new device), the oldest session (by createdAt) is automatically revoked. The user is notified via email: "A new sign-in was detected on [device] from [location]. If this was not you, please change your password immediately."

Rationale: 5 sessions accommodates typical usage (desktop, laptop, phone, tablet, one more) while preventing unbounded session accumulation.

Password change behavior

When a user changes their password:

All existing sessions are revoked (user.refreshTokens = []).
A new session is created for the current device.
All Redis session records for the user are deleted.
Email notification: "Your password was changed. If you did not make this change, contact support immediately."

This is the current behavior documented in Authentication Flow and it is correct.

Account lock/suspension behavior

When an admin suspends or deletes a user account:

user.status is set to suspended or deleted.
user.refreshTokens is set to [].
All Redis session records for the user are deleted.
Any in-flight requests with tokens for that user return 403 ACCOUNT_SUSPENDED or 403 ACCOUNT_DELETED on the next request (the authMiddleware already checks user.status).

Section 9: Migration Plan

Current state

Component	Current	Target	Change Level
Access token storage	localStorage	In-memory variable	Frontend only
Access token lifetime	7 days	15 minutes	Backend config
Refresh token storage	localStorage	httpOnly cookie (backend set)	Full stack
Refresh token lifetime	30 days	7 days	Backend config
Refresh token schema	`String[]`	Subdocument array with metadata	Backend + DB migration
CSRF protection	Not needed (header-based)	Not needed (header-based + SameSite cookie)	None
Passkey status	Stubbed, accessible	Feature-flagged off in production	Backend + Frontend
Session revocation	Not implemented	Endpoints + device listing	Backend + Frontend
Admin step-up	Not implemented	Password re-auth + elevated session	Backend + Frontend
Two-person approval	Not implemented	Pending approval workflow	Backend + Frontend

Migration steps (in order)

Step 1: Backend -- reduce access token lifetime to 15 minutes

Change JWT_EXPIRES_IN default from 7d to 15m.
Deploy. Existing 7-day tokens remain valid until they expire naturally (no force-invalidations).
Risk: users with long-lived sessions will notice more frequent refreshes. This is expected and acceptable.

Step 2: Backend -- refresh token lifetime to 7 days

Change REFRESH_TOKEN_EXPIRES_IN default from 30d to 7d.
Deploy. Existing 30-day refresh tokens remain valid until they expire or are rotated.

Step 3: Backend -- add refresh token metadata to refreshTokens[]

Deploy new schema: user.refreshTokens becomes a subdocument array with tokenHash, deviceInfo, ipAddress, createdAt, lastUsedAt.
Migration script: convert existing String[] entries to { tokenHash: sha256(entry), deviceInfo: 'Unknown (pre-migration)', ipAddress: 'unknown', createdAt: Date.now(), lastUsedAt: Date.now() }.
Login and refresh endpoints updated to write new schema.
Deploy. Old-format entries continue to work during migration.

Step 4: Backend -- set refresh token as httpOnly cookie

On login, refresh, and OAuth sign-in: set Set-Cookie header with the refresh token in an httpOnly cookie. Also return the refresh token in the response body for backward compatibility.
Add POST /api/auth/refresh-token-cookie endpoint that accepts the refresh token from the body and sets it as a cookie (migration helper for existing sessions).
Deploy. Frontend still works with body-based refresh tokens.

Step 5: Frontend -- move access token to in-memory storage

Replace localStorage.getItem('accessToken') and localStorage.setItem('accessToken', ...) with an in-memory store (module-level variable or React context).
On app load: check for refresh cookie. If present, call refresh endpoint to obtain a new access token. If no cookie, redirect to login.
Remove localStorage writes for both tokens. On logout, clear the in-memory token and the cookie (by calling the logout endpoint which sets an expired cookie).
Deploy frontend.

Step 6: Frontend -- send refresh via cookie instead of body

Modify the Axios interceptor to NOT send refreshToken in the body of POST /api/auth/refresh-token. The refresh token is sent automatically via the cookie.
Backend: accept refresh token from either cookie or body (backward compatible). Deprecate body-based refresh with a log warning.
Deploy both.

Step 7: Backend -- add session management endpoints

GET /api/auth/sessions -- list active sessions.
POST /api/auth/revoke-session -- revoke a specific session.
POST /api/auth/revoke-all-sessions -- revoke all other sessions.
Deploy. No frontend change yet (endpoints are available but unused).

Step 8: Frontend -- add session management UI

Account settings page: "Active Sessions" section listing devices, locations, and last active times.
"Revoke" button per session. "Revoke all other sessions" button.
Deploy frontend.

Step 9: Backend -- feature flag for passkeys

Add ENABLE_PASSKEYS env var (default false).
Gate all /api/auth/passkey/* routes behind the flag.
Return 404 when disabled.
Deploy.

Step 10: Frontend -- feature flag for passkey UI

Add NEXT_PUBLIC_ENABLE_PASSKEYS env var (default false).
Hide passkey UI components when disabled.
Deploy frontend.

Step 11: Backend -- admin step-up authentication

Add POST /api/auth/step-up endpoint.
Add requireStepUp() middleware.
Apply middleware to high-risk admin routes.
Add Redis-based elevated session store.
Deploy.

Step 12: Frontend -- admin step-up UI

Password prompt modal for step-up challenges.
Intercept 403 STEP_UP_REQUIRED responses and show modal.
Retry original request after successful step-up.
Deploy frontend.

Step 13: Backend -- two-person approval

Add PendingApproval collection.
Add approval workflow endpoints.
Apply to payout/release actions above $1,000.
Add notification logic for other admins.
Deploy.

Step 14: Frontend -- two-person approval UI

Pending approvals list in admin dashboard.
Confirm/reject actions with step-up.
Deploy frontend.

Step 15: Backend -- remove body-based refresh token acceptance

After all frontends are migrated (Step 6 + reasonable buffer of 2 weeks), stop accepting refresh tokens from the request body.
Accept refresh tokens only from the cookie.
Deploy.

Feature flags

Flag	Default	Environments	Purpose
`ENABLE_PASSKEYS`	`false`	All	Controls passkey route registration
`NEXT_PUBLIC_ENABLE_PASSKEYS`	`false`	All	Controls passkey UI visibility
`COOKIE_REFRESH_MIGRATION`	`false`	All	Enables cookie-based refresh token issuance
`REQUIRE_STEP_UP`	`false`	Staging, Production	Enables step-up auth for admin actions

Rollback plan

If any migration step causes issues:

Steps 1-2 (token lifetimes): Revert JWT_EXPIRES_IN and REFRESH_TOKEN_EXPIRES_IN to previous values. Redeploy. No data migration to undo.
Steps 3-4 (refresh token schema + cookies): Backend continues to accept body-based refresh tokens. Frontend can revert to localStorage storage. The httpOnly cookie is additive; removing it does not break existing sessions.
Step 5 (in-memory access token): Frontend can revert to localStorage. The backend does not care where the access token comes from.
Steps 7-8 (session management): These are additive endpoints and UI. Rolling back means removing the UI and endpoints. No data is affected.
Steps 9-10 (passkey feature flag): Set flags to true to restore passkey access (though passkeys remain stubbed and insecure). Rolling back is simply changing env vars.
Steps 11-14 (step-up and two-person approval): Remove requireStepUp() middleware. Admin actions proceed without step-up. This is a security regression but not a functional outage.

Timeline estimate

Phase	Steps	Duration	Dependencies
Token hardening	1-2	1 day	None
Cookie migration	3-6	3-5 days	Frontend + backend coordination
Session management	7-8	2-3 days	Cookie migration complete
Passkey feature flag	9-10	1 day	None
Admin step-up	11-12	3-4 days	None
Two-person approval	13-14	3-5 days	Admin step-up complete
Cleanup (step 15)	15	1 day (after 2-week buffer)	All frontends migrated
Total		14-21 days

Section 10: Threat Mitigation Traceability

Decision	Threats Addressed	Risk Reduction
Access token in memory (not localStorage)	T13 (XSS token theft)	XSS cannot persistently steal the token; it is lost on page unload
Access token lifetime reduced to 15 min	T04 (stolen token reuse)	Stealable token valid for 15 min instead of 7 days (672x reduction in exposure window)
Refresh token in httpOnly cookie	T04, T13	XSS cannot read the refresh token; it is not accessible to JavaScript
Refresh token lifetime reduced to 7 days	T04	Maximum exploitation window from a compromised refresh token is 7 days instead of 30 days
Refresh token rotation with reuse detection	T04	Reuse of a rotated token triggers full session invalidation; attacker and legitimate user are forced to re-authenticate
SameSite=Strict on refresh cookie	T15 (CSRF)	Cookie not sent on cross-origin requests; CSRF on refresh endpoint is eliminated
Refresh cookie scoped to `/api/auth/refresh-token` path	T15	Cookie sent only on the refresh endpoint; not on any state-changing endpoint
Passkey feature flag disabled in production	T10 (passkey bypass)	Stubbed passkey implementation is unreachable in production; cannot be exploited
Session revocation endpoints	T04	Users can terminate compromised sessions immediately; admins can revoke sessions for suspended users
Max 5 sessions per user	T04	Limits blast radius of session accumulation; oldest sessions auto-revoked
Admin step-up authentication	T09 (admin privilege escalation), T18 (insider fund manipulation)	Compromised admin session cannot perform high-risk actions without re-authenticating; elevated session lasts only 5 minutes
Two-person approval for large payouts	T05 (double payout), T18	No single admin can release high-value escrow; second admin must independently verify and approve
Audit logging for step-up and approval events	T09, T18	All elevated-access events are recorded in tamper-evident audit trail
Password change revokes all sessions	T04	If user detects compromise, password change immediately terminates all attacker sessions
Account suspension revokes all sessions	T09	Compromised admin accounts are immediately locked out when suspended
Device/session listing	T04	Users can detect unfamiliar sessions and revoke them; early detection of compromise
Email notification on new device login	T04	User is alerted to unauthorized access within minutes
Verification code removal from production logs	T22 (verification code leakage)	Codes are no longer loggable in production; only non-production environments may log them for debugging
Access token `jti` claim	T04	Each token has a unique identifier; enables future deny-listing of individual tokens
OAuth token storage same as email/password	T04, T13	OAuth sessions receive the same protections (in-memory access, httpOnly refresh)

Coverage analysis

Threat	Mitigated by this ADR?	Residual risk
T04 (stolen token reuse)	Yes -- 15-min access token, httpOnly refresh cookie, rotation, session revocation	Physical access to an unlocked device with an active session; keylogger capturing password during step-up
T10 (passkey bypass)	Yes -- feature flag disabled in production	None (passkeys are unreachable)
T13 (XSS token theft)	Yes -- in-memory access token, httpOnly refresh cookie	Transient XSS can read in-memory token for up to 15 minutes; XSS cannot access refresh token
T15 (CSRF)	Yes -- access token in Authorization header (unchanged), SameSite=Strict on refresh cookie	None
T22 (verification code leakage)	Partially -- ADR documents the requirement; implementation is a separate task	Codes still logged until code change is deployed

Threats NOT addressed by this ADR (addressed elsewhere)

Threat	Document
T01 (fake payment proof)	Funds Ledger and Escrow State Machine Specification, Payment Provider Adapter Spec
T02 (webhook replay)	Webhook Security Spec
T03 (arbitrary socket room join)	Realtime Authorization Spec
T05 (double payout)	Funds Ledger and Escrow State Machine Specification
T06 (dispute bypass)	Funds Ledger and Escrow State Machine Specification
T07 (email abuse)	Rate limiting implementation
T08 (AI cost abuse)	Rate limiting + auth implementation
T09 (admin privilege escalation)	Authorization Matrix - REST and Socket.IO + step-up auth (this ADR)
T11 (unauthenticated payment endpoints)	Auth middleware implementation
T12 (rate limit bypass)	Rate limiting implementation
T14 (supply-chain)	Secure Build and Supply-Chain Policy
T16 (deep-link tampering)	Telegram initData verification
T17 (provider outage)	Backend Funds Migration and Operational Runbooks
T18 (insider manipulation)	Multi-sig wallet + funds ledger + two-person approval (this ADR)
T19 (price manipulation)	Offer status enforcement
T20 (delivery brute force)	Rate limiting + code entropy
T21 (data exfiltration)	Auth middleware implementation
T23 (state machine inconsistency)	Canonical state machine specification

Cross-references

Threat Model - Amanat Escrow Platform -- T04, T10, T13, T15, T22
Security Ownership and Launch Decision Criteria -- D-1 (cookie migration), D-4 (real WebAuthn), D-5 (session revocation), D-6 (admin step-up)
Security Architecture -- current authentication implementation
Authentication Flow -- current token lifecycle
Passkey (WebAuthn) Flow -- current passkey implementation (stubbed)
Google OAuth Flow -- current OAuth implementation
Platform Logical Audit - 2026-05-24 -- Findings 2, 8, 10, 12
Backend Stack Security and Refactor Assessment - 2026-05-24 -- Phase 0 hardening requirements

This document was created on 2026-05-24 as part of the Taskmaster task 4 (authentication and session architecture) for the Amanat escrow platform. It must be reviewed by Backend Lead, Frontend Lead, and CTO before implementation begins. Changes to any decision in this document require sign-off per the RACI in Security Ownership and Launch Decision Criteria Section 1.

43 KiB Raw Blame History

Session and Authentication Architecture Decision

Section 1: Decision Summary

Section 2: Access Token Storage and Lifetime

Decision

Rejected alternatives

Rationale

Token lifetime

Token format

Renewal strategy

Token theft detection and response

Section 3: Refresh Token Storage, Rotation, and Revocation

Storage location

Refresh token lifetime

Rotation strategy

Token reuse detection

MongoDB storage schema

Revocation endpoints

Section 4: CSRF Strategy

Current state

Decision: no CSRF token needed for access tokens

CSRF protection for the refresh endpoint

If the architecture migrates access tokens to cookies later

Web3 wallet interactions

Section 5: Passkey/WebAuthn Decision

Decision: Option C -- feature-flagged off in production

Rejected alternatives

Rationale

Feature flag implementation

Target WebAuthn implementation (for D-4 resolution)

Section 6: OAuth Requirements

Google OAuth

Section 7: Admin Step-Up Authentication

Decision

Definition of high-risk admin actions

Step-up mechanism

Step-up flow

Traceability to Authorization Matrix

Two-person approval flow

Audit logging for step-up events

Section 8: Session Management and Device Tracking

Session tracking

Device fingerprinting

Session listing

Session revocation

Maximum sessions per user

Password change behavior

Account lock/suspension behavior

Section 9: Migration Plan

Current state

Migration steps (in order)

Feature flags

Rollback plan

Timeline estimate

Section 10: Threat Mitigation Traceability

Coverage analysis

Threats NOT addressed by this ADR (addressed elsewhere)

Cross-references

43 KiB

Raw Blame History