Files
nick-doc/09 - Audits/Session and Authentication Architecture Decision.md
2026-05-24 11:31:40 +04:00

43 KiB

title, tags, created, status, reviewers
title tags created status reviewers
Session and Authentication Architecture Decision
audit
security
adr
authentication
session
passkey
webauthn
admin
step-up
2026-05-24 decided
backend
security
frontend
cto

Session and Authentication Architecture Decision

Architecture Decision Record. This document resolves deferred decisions D-1, D-4, D-5, and D-6 from Security Ownership and Launch Decision Criteria and addresses threats T04, T10, T13, T15, and T22 from Threat Model - Amanat Escrow Platform.

All decisions in this document are binding for the Amanat platform. Changes require sign-off from the accountable role per the RACI in Security Ownership and Launch Decision Criteria Section 1.


Section 1: Decision Summary

Decision Chosen Option Rejected Alternatives Status
Access token storage In-memory only (JavaScript variable), not persisted to any browser storage localStorage (current), httpOnly cookie Decided
Refresh token storage httpOnly, Secure, SameSite=Strict cookie localStorage (current), response body Decided
Access token lifetime 15 minutes 7 days (current), 60 minutes, 30 minutes Decided
Refresh token lifetime 7 days with rotation on every use 30 days (current), 14 days, 1 day Decided
CSRF strategy SameSite=Strict on refresh cookie; no CSRF token needed because access token is never in a cookie Double-submit cookie, synchronizer token pattern Decided
Passkey/WebAuthn Option C: feature-flagged off in production; stubs remain for development Remove entirely, implement production WebAuthn pre-launch Decided
OAuth requirements Authorization code with PKCE; same session model as email/password; implicit account linking by email Implicit flow, separate session model for OAuth Decided
Device/session revocation user.refreshTokens[] with device metadata, revocation endpoints, max 5 sessions per user No revocation (current), Redis-only session store Decided
Admin step-up auth Required for high-risk actions; 5-minute elevated session via re-authentication (password) No step-up (current), hardware 2FA mandatory, passkey-only Decided
Two-person approval Required for payouts above $1,000 USD equivalent; second admin must confirm Single-admin approval for all amounts, multi-sig wallet only Decided

Section 2: Access Token Storage and Lifetime

Decision

We will store the access token in JavaScript memory only (a module-level variable or React context), not in localStorage, sessionStorage, IndexedDB, or any cookie. The access token is never written to any persistent browser storage.

Rejected alternatives

Alternative Rejection reason
localStorage (current state) Fully accessible to any XSS payload. Threat T13: any script injection in any dependency, user-generated content field, or uploaded file exfiltrates the token. Threat T04: 7-day lifetime means a stolen token grants access for up to a week without triggering refresh rotation.
httpOnly cookie Eliminates XSS theft but introduces CSRF risk on every API call. Requires CSRF token infrastructure for all state-changing endpoints. Adds CORS complexity because cookies are sent automatically by the browser on cross-origin requests if SameSite is not Strict. The access token must be sent on every API call, so the CSRF surface is every endpoint.

Rationale

In-memory storage is immune to XSS-based persistent theft. An attacker who achieves XSS can read the token during the lifetime of the page session, but the token evaporates on tab close, navigation away, or page refresh. Combined with a 15-minute expiry, the maximum exploitation window from any single XSS event is 15 minutes, not 7 days.

The access token is only needed for the duration of a browser session. The Axios interceptor holds the token in a closure variable. On page refresh, the refresh-token flow re-establishes the access token transparently.

Token lifetime

15 minutes from issuance.

Rationale: Threat T04 identifies the 7-day access token lifetime as a critical risk. A stolen access token is usable for its entire lifetime without triggering refresh-token rotation. Reducing to 15 minutes limits the damage window to 15 minutes maximum. This is short enough that an attacker who exfiltrates the token via a transient XSS event (e.g., a reflected XSS that closes the page) loses access almost immediately, while long enough to avoid excessive refresh calls on a normal browsing session.

Token format

JWT with HS256 signing (same algorithm as current). Claims:

Claim Value Purpose
sub User _id (ObjectId string) Subject identification
role buyer, seller, admin, support Authorization enforcement
iat Unix timestamp Issued-at
exp iat + 900 (15 minutes) Expiry enforcement
jti UUID v4 Token-specific identifier for audit trail and future revocation
iss marketplace-backend Issuer verification
aud marketplace-users Audience verification

No change to the signing algorithm or secret management. The jti claim is new and required for audit logging and future deny-list capabilities.

Renewal strategy

Silent refresh with rotation. The Axios response interceptor detects 401 TOKEN_INVALID or 403 responses. It attempts a refresh using the httpOnly cookie (sent automatically by the browser). On success, the backend returns a new access token in the response body and sets a new refresh cookie. The interceptor updates the in-memory access token and retries the original request.

Concurrent requests during refresh: implement a refresh mutex (single inflight refresh promise). If multiple requests fail with 401 simultaneously, the first triggers the refresh; subsequent requests await the same promise and retry with the new token.

Absolute expiry: The refresh token has a maximum lifetime of 7 days (see Section 3). After 7 days, the user must re-authenticate. There is no sliding window that extends beyond 7 days.

Token theft detection and response

  1. Refresh-token reuse detection (already exists, remains in place): If a previously-used refresh token is presented, the backend invalidates all sessions for that user and forces re-authentication. The user receives an email notification: "Your session was terminated due to suspicious activity. Please sign in again."

  2. Access token theft: Because the token is in-memory only, persistent theft requires a sustained XSS presence. The 15-minute window limits exposure. No server-side detection is possible for access-token theft alone (the token is valid until expiry). The primary defense is the short lifetime.

  3. Audit logging: Every token issuance and refresh is logged with jti, userId, ip, userAgent, and timestamp. Abnormal patterns (e.g., refresh from a new IP geolocation far from the previous one) trigger an alert for investigation.


Section 3: Refresh Token Storage, Rotation, and Revocation

Storage location

httpOnly, Secure, SameSite=Strict cookie named __Host-refresh-token.

Cookie attributes:

Attribute Value Rationale
httpOnly true Not accessible to JavaScript; immune to XSS theft
secure true Transmitted only over HTTPS
sameSite Strict Not sent on cross-origin requests; CSRF protection
path /api/auth/refresh-token Cookie sent only on the refresh endpoint, not on every API call
domain (omitted; host-only) Scoped to the exact origin
maxAge 604800000 (7 days in ms) Matches refresh token lifetime

The __Host- prefix ensures the cookie cannot be set by a subdomain and requires Secure.

Refresh token lifetime

7 days with rotation on every use.

Rationale: The current 30-day lifetime is excessive for a financial platform. Seven days provides a reasonable balance between user convenience (users are not forced to re-authenticate daily) and security (a compromised refresh token is only exploitable for 7 days maximum, and rotation reduces the window further). The deferred decision D-1 in Security Ownership and Launch Decision Criteria accepts that the migration from localStorage to httpOnly cookies must happen within 30 days post-launch. This ADR specifies the target state.

Rotation strategy

On every refresh:

  1. Backend receives the refresh token from the cookie.
  2. Verifies the JWT signature and expiry.
  3. Looks up the user and checks the token hash is present in user.refreshTokens[].
  4. Removes the old token from the array.
  5. Issues a new access token (in response body) and a new refresh token (in Set-Cookie header).
  6. Pushes the new refresh token hash to user.refreshTokens[].

Token reuse detection

If a previously-consumed refresh token is presented (i.e., the token was already rotated away and is no longer in user.refreshTokens[]):

  1. Invalidate ALL sessions for that user: set user.refreshTokens = [].
  2. Revoke all Redis session records for that user.
  3. Send an email to the user: "A potential security issue was detected. All your sessions have been terminated. Please sign in again and change your password if you did not initiate this activity."
  4. Log the event with both token hashes, the IP addresses of both the legitimate and reuse attempts, and timestamps.

This is the existing behavior documented in Authentication Flow and Security Architecture section 2.4. It remains unchanged.

MongoDB storage schema

The user.refreshTokens[] array currently stores raw token strings. We will migrate to a subdocument array with metadata:

refreshTokens: [{
  tokenHash: String,       // SHA-256 hash of the refresh token (not the raw token)
  deviceInfo: String,      // User-Agent string (truncated to 200 chars)
  ipAddress: String,       // IP at time of token issuance
  createdAt: Date,         // When the token was issued
  lastUsedAt: Date,        // Updated on each refresh
}]

The raw refresh token is never stored in MongoDB. Only its SHA-256 hash is stored. Verification compares sha256(receivedToken) === stored.tokenHash.

Migration: existing plain-string entries in user.refreshTokens[] are invalidated on the first refresh after deployment. Users with existing sessions are force-re-authenticated once.

Revocation endpoints

POST /api/auth/revoke-session

Requires Bearer JWT. Body: { sessionTokenHash: string } (the hash of the session to revoke, obtained from the session listing endpoint). The backend removes the matching entry from user.refreshTokens[] and deletes any Redis session keyed by tokens associated with that refresh token.

POST /api/auth/revoke-all-sessions

Requires Bearer JWT. Removes all entries from user.refreshTokens[] except the one used by the current request (identified by the refresh cookie). Deletes all Redis sessions for that user. The user remains logged in on the current device.

Both endpoints are audited: action, actor, target session hash, timestamp, IP, user-agent.


Section 4: CSRF Strategy

Current state

JWT is sent in the Authorization: Bearer header. Browsers do not attach Authorization headers on cross-origin requests. CSRF is currently mitigated by design (Threat T15: "Mitigated").

Decision: no CSRF token needed for access tokens

Because the access token is stored in JavaScript memory and sent via the Authorization header, CSRF is not a concern for the majority of API calls. The browser will not include the Authorization header on a forged cross-origin request.

CSRF protection for the refresh endpoint

The refresh token is stored in an httpOnly cookie. However, the cookie attributes provide CSRF protection:

  • SameSite=Strict: The cookie is not sent on any cross-origin request, including top-level navigations from external sites. This eliminates CSRF on the refresh endpoint.
  • Path=/api/auth/refresh-token: The cookie is only sent on requests to the refresh endpoint, not on any other API call.

Combined, these attributes mean an attacker cannot trigger a refresh from a cross-origin page. The SameSite=Strict policy is appropriate here because the refresh endpoint is never called from an external context (no OAuth callback, no payment provider redirect targets the refresh endpoint).

If the architecture migrates access tokens to cookies later

If a future decision moves the access token to a cookie (which we explicitly reject in this ADR), CSRF tokens become mandatory. The recommended approach would be:

  • Double-submit cookie pattern: set a CSRF token in a non-httpOnly cookie; the frontend reads it and includes it in a custom header (X-CSRF-Token). The backend verifies the header matches the cookie.
  • Apply to all state-changing endpoints (POST, PUT, PATCH, DELETE).

This fallback is documented but not implemented. No action is needed unless the access token storage decision is revisited.

Web3 wallet interactions

Web3 wallet connections (wagmi, WalletConnect) open popup windows or browser extensions for signing. These interactions do not involve the platform's cookies or tokens. The signed transaction or message is returned to the platform's JavaScript context and sent to the backend via the normal Axios interceptor with the in-memory access token. CSRF is not a concern for Web3 interactions.


Section 5: Passkey/WebAuthn Decision

Decision: Option C -- feature-flagged off in production

We will keep the current stubbed passkey implementation in the codebase, gate it behind a feature flag (ENABLE_PASSKEYS), and set this flag to false in production environments.

Rejected alternatives

Alternative Rejection reason
Option A: Remove passkeys entirely The frontend UI and backend routes already exist. Removing them means rebuilding the registration and sign-in UI later. The stubbed code does not pose a security risk when disabled via feature flag.
Option B: Implement production WebAuthn pre-launch Per Platform Logical Audit - 2026-05-24 Finding 2, the current implementation has three critical flaws (stubbed attestation, in-memory challenges, missing refresh-token persistence). Fixing all three, testing across platforms, and auditing the result would take 2-3 weeks of focused engineering. This is not justifiable before launch when password + OAuth authentication is sufficient.

Rationale

The launch gate in Security Ownership and Launch Decision Criteria Section 2.1.7 requires: "Passkey/WebAuthn disabled in production until real cryptographic implementation is complete." A feature flag is the cleanest way to comply: the code exists but cannot be reached in production. The deferred decision D-4 sets a deadline of 90 days post-launch for real WebAuthn.

Feature flag implementation

  • Backend env var: ENABLE_PASSKEYS=false in production, true in development.
  • The passkey routes (/api/auth/passkey/*) return 404 Not Found when the flag is false. The route registration itself is conditional.
  • Frontend: the Passkey UI components are hidden when NEXT_PUBLIC_ENABLE_PASSKEYS is not "true".
  • Both flags are false by default; must be explicitly enabled.

Target WebAuthn implementation (for D-4 resolution)

When the team implements production WebAuthn within 90 days post-launch, the following specifications apply:

Library: @simplewebauthn/server (server-side) and @simplewebauthn/browser (client-side).

Relying Party configuration:

Parameter Value
RP ID Production eTLD+1 domain (e.g., amn.gg), NOT localhost
RP Name Amanat
Origins https://amn.gg, https://www.amn.gg
Timeout 60 seconds

Challenge storage: Redis-backed with 5-minute TTL. Key: webauthn:challenge:{challengeHash}. Value: { userId, type: 'registration' | 'authentication', createdAt }. The in-process Map is removed.

Attestation type: none. Rationale: Amanat does not need to verify the authenticator manufacturer or model. Direct or indirect attestation adds complexity (managing attestation certificates, privacy concerns) without security benefit for this platform. We rely on the authenticator's signature, not its attestation.

Credential storage (passkeys[] subdocument):

Field Type Description
id String Base64url-encoded credential ID
publicKey Buffer COSE public key (actual bytes, not a stub string)
counter Number Monotonic signature counter; incremented on each authentication
deviceType String platform or cross-platform
deviceName String User-provided label or auto-generated from user-agent
transports String[] Authenticator transports (e.g., ['internal', 'hybrid'])
registeredAt Date Timestamp

Authentication flow integration: Passkey login issues the same JWT pair as password login. The refresh token is persisted in user.refreshTokens[] using the same schema as all other authentication methods. This closes the gap identified in Passkey (WebAuthn) Flow where passkey-issued tokens were not added to the allow-list.

Counter enforcement: On each authentication, the received counter must be strictly greater than the stored counter. If the counter is less than or equal, the authentication is rejected, the event is logged as a potential cloned authenticator, and the user is notified.

Cross-device authentication: Allowed. The transports field in registration options includes hybrid to support cross-device flows (e.g., phone authenticator on desktop login).

Migration from stubbed to real implementation:

  1. Deploy feature flag change: ENABLE_PASSKEYS=true in staging only.
  2. Run migration: delete all entries in user.passkeys[] (they contain the stub 'simulated-public-key' and are not valid credentials). Notify users that passkeys must be re-registered.
  3. Deploy @simplewebauthn/server integration with Redis challenge store.
  4. QA: test registration and authentication on Chrome (Touch ID / YubiKey), Firefox, Safari, Android, iOS.
  5. Enable in production after QA sign-off.

Section 6: OAuth Requirements

Google OAuth

The current Google OAuth implementation documented in Google OAuth Flow is largely compatible with the new session model. The following adjustments apply:

Token exchange flow: The current implementation uses Google Identity Services (GIS) with initTokenClient and requestAccessToken. The frontend receives an ID token (Google-signed JWT). This is sent to the backend for verification. This flow is already correct -- it is equivalent to an authorization code with PKCE flow where Google handles the code exchange client-side. No change is needed.

If additional OAuth providers are added in the future (GitHub, Apple), we will use the authorization code flow with PKCE. The frontend obtains an authorization code via redirect and sends it to the backend. The backend exchanges the code for tokens server-side. This prevents the client from ever seeing the provider's access token.

Session integration: After Google token verification succeeds, the backend issues the same JWT access token (15-minute, in-memory) and refresh token (7-day, httpOnly cookie) as the email/password flow. There is no separate session type for OAuth users. This is the current behavior and it is correct.

Account linking: Account linking is implicit by email match (current behavior). If googleUser.email matches an existing user, the existing account is used. Risks and mitigations:

Risk Mitigation
Attacker creates a Google account with a victim's email before the victim signs up Google accounts are pre-verified; the attacker must control the email address at Google. This is a standard OAuth risk.
Victim signs up with email/password; attacker later creates Google account with same email and gains access The backend checks for existing users on Google sign-in and does NOT create a new account. The attacker would need the victim's Google credentials.
User changes Google account email to match a different user Google tokens are verified per-request; the backend trusts the email from the verified ID token. If Google allows email changes (they do not for gmail.com), this could be a vector. Mitigation: consider storing googleId (the sub claim) as a separate field in the future for multi-provider identity.

For launch, the current email-based linking is acceptable. Post-launch, we should store providers[].providerId (e.g., google:123456789) for robust multi-provider identity.

Token storage for OAuth sessions: Same as email/password. Access token in memory, refresh token in httpOnly cookie.

Logout behavior for OAuth sessions: Logout invalidates the refresh token in user.refreshTokens[], clears the cookie, and deletes Redis session records. The Google session on Google's side is not terminated (we do not call Google's revoke endpoint). This is standard practice. The user must sign out of Google separately if desired.


Section 7: Admin Step-Up Authentication

Decision

High-risk admin actions require re-authentication. Upon successful re-authentication, the admin receives a short-lived elevated session. Payouts above $1,000 USD equivalent also require two-person approval.

Definition of high-risk admin actions

Action Step-Up Required Two-Person Approval Rationale
Payout/release escrow <= $1,000 USD Yes (password) No Financial action; compromised session could release funds
Payout/release escrow > $1,000 USD Yes (password) Yes High-value financial action; dual control per Threat T18
Manual wallet signing (any amount) Yes (password) Yes (if > $1,000) Direct access to escrow wallet
Refund escrow > $500 USD Yes (password) No Irreversible financial action
User suspension or deletion Yes (password) No Account impact; potential for abuse
Role change (any) Yes (password) No Privilege escalation vector
Dispute override (admin resolves against recommendation) Yes (password) No Financial side-effect; high dispute value
API key rotation (JWT_SECRET, webhook secrets) Yes (password) No Invalidates all sessions or compromises integrity
Disable rate limiting or security features Yes (password) No Reduces platform security posture
Export user data (bulk) Yes (password) No Privacy-sensitive bulk operation
View escrow wallet private key (if applicable) Yes (password) Yes Critical asset exposure

Step-up mechanism

Re-authentication with password. The admin must enter their password to obtain an elevated session. No additional 2FA at launch (passkeys are disabled; TOTP is not yet implemented). Post-launch, when WebAuthn is production-ready (D-4), the step-up will also accept passkey authentication as a second factor.

Elevated session:

Attribute Value
Duration 5 minutes
Storage Server-side only (Redis key stepup:{userId} with TTL 300s)
Scope Grants elevated permissions for the specific action categories listed above
Renewal Re-authentication required after 5 minutes; no automatic renewal
Verification Middleware requireStepUp() checks Redis key existence before allowing the action

Step-up flow

  1. Admin attempts a high-risk action (e.g., POST /api/admin/payouts/release).
  2. Middleware requireStepUp() checks for an active elevated session in Redis.
  3. If no elevated session exists, the backend returns 403 STEP_UP_REQUIRED with { challengeId: uuid }.
  4. Frontend displays a password prompt (modal dialog).
  5. Frontend sends POST /api/auth/step-up with { password, challengeId }.
  6. Backend verifies the password against user.password using bcrypt.
  7. On success, backend creates Redis key stepup:{userId} with TTL 300s and returns { elevated: true, expiresAt: timestamp }.
  8. Frontend retries the original high-risk action.
  9. The action proceeds.

Traceability to Authorization Matrix

This matrix maps to:

  • AUTH-R025 (POST /api/auth/step-up) for the step-up API entry point.
  • AUTH-R026 (GET /api/auth/sessions), AUTH-R027 (POST /api/auth/revoke-session), AUTH-R028 (POST /api/auth/revoke-all-sessions) for session controls.
  • APV-R001, APV-R002, APV-R003 for approval queue + confirm/reject workflow.

Status: these rows are marked Not implemented in the matrix while this ADR remains in planning/rollout state.

Two-person approval flow

For actions requiring two-person approval:

  1. Admin A completes the step-up flow above.
  2. Admin A initiates the action (e.g., POST /api/admin/payouts/release).
  3. The action is created in a PendingApproval state (stored in MongoDB).
  4. The system notifies all other admin users via Socket.IO and email.
  5. Admin B navigates to the pending approval, completes their own step-up flow, and confirms (POST /api/admin/approvals/{id}/confirm).
  6. The action executes.
  7. If Admin B rejects (POST /api/admin/approvals/{id}/reject), the action is cancelled.

Fallback when second admin is unavailable:

If no second admin has acted on a pending approval within 4 hours, the CTO (or designated fallback) receives an email and Slack notification. The CTO can approve directly. If no CTO action within 24 hours, the approval expires and must be re-initiated.

This fallback addresses the realistic scenario where Amanat has a small team with few admins. As the team grows, the 4-hour and 24-hour windows should be tightened.

Audit logging for step-up events

All step-up and two-person approval events are logged to an append-only audit collection:

Field Value
action step-up.attempt, step-up.success, step-up.failed, approval.created, approval.confirmed, approval.rejected, approval.expired
actorId ObjectId of the admin performing the action
targetAction The high-risk action being performed (e.g., payout.release)
targetEntity ObjectId or identifier of the entity (e.g., Payment ID)
ip Request IP
userAgent Request user-agent
timestamp ISO 8601
metadata JSON object with action-specific details (e.g., payout amount)

This collection is not writable by the application after insert (no updates, no deletes). Access is restricted to admin read-only and system write-only.


Section 8: Session Management and Device Tracking

Session tracking

Sessions are tracked via user.refreshTokens[] subdocuments (see Section 3 schema). Each entry represents one authenticated device.

Device fingerprinting

We will use lightweight, non-invasive device identification:

Signal Source Storage Notes
User-Agent req.headers['user-agent'] refreshTokens[].deviceInfo Truncated to 200 characters
IP address req.ip (behind CloudFlare: req.headers['x-forwarded-for']) refreshTokens[].ipAddress Used for geolocation approximation
Platform hint Derived from user-agent parsing Display only Not stored separately

We will NOT use browser fingerprinting (Canvas, WebGL, font enumeration), device IDs, or any tracking technique that requires user consent under privacy regulations. The user-agent and IP are already sent with every HTTP request.

Session listing

GET /api/auth/sessions (requires Bearer JWT)

Returns the list of active sessions for the current user:

{
  "sessions": [
    {
      "id": "sha256-hash-of-token",
      "device": "Chrome on macOS",
      "lastActive": "2026-05-24T14:30:00Z",
      "ip": "203.0.113.42",
      "location": "Tehran, Iran (approximate)",
      "isCurrent": true
    }
  ]
}
  • device is a parsed, human-readable string derived from the user-agent (e.g., "Chrome 125 on macOS", "Safari on iPhone").
  • location is derived from IP geolocation (city-level, approximate). We will use a local GeoIP database (MaxMind GeoLite2 or equivalent) to avoid sending user IPs to third-party services.
  • isCurrent identifies the session making the request (matched by the refresh cookie).

Session revocation

POST /api/auth/revoke-session (see Section 3).

Users can revoke any non-current session. Revoking the current session is equivalent to logout.

POST /api/auth/revoke-all-sessions (see Section 3).

Revokes all sessions except the current one. Useful if the user suspects compromise.

Maximum sessions per user

5 sessions. When a user attempts to create a 6th session (login from a new device), the oldest session (by createdAt) is automatically revoked. The user is notified via email: "A new sign-in was detected on [device] from [location]. If this was not you, please change your password immediately."

Rationale: 5 sessions accommodates typical usage (desktop, laptop, phone, tablet, one more) while preventing unbounded session accumulation.

Password change behavior

When a user changes their password:

  1. All existing sessions are revoked (user.refreshTokens = []).
  2. A new session is created for the current device.
  3. All Redis session records for the user are deleted.
  4. Email notification: "Your password was changed. If you did not make this change, contact support immediately."

This is the current behavior documented in Authentication Flow and it is correct.

Account lock/suspension behavior

When an admin suspends or deletes a user account:

  1. user.status is set to suspended or deleted.
  2. user.refreshTokens is set to [].
  3. All Redis session records for the user are deleted.
  4. Any in-flight requests with tokens for that user return 403 ACCOUNT_SUSPENDED or 403 ACCOUNT_DELETED on the next request (the authMiddleware already checks user.status).

Section 9: Migration Plan

Current state

Component Current Target Change Level
Access token storage localStorage In-memory variable Frontend only
Access token lifetime 7 days 15 minutes Backend config
Refresh token storage localStorage httpOnly cookie (backend set) Full stack
Refresh token lifetime 30 days 7 days Backend config
Refresh token schema String[] Subdocument array with metadata Backend + DB migration
CSRF protection Not needed (header-based) Not needed (header-based + SameSite cookie) None
Passkey status Stubbed, accessible Feature-flagged off in production Backend + Frontend
Session revocation Not implemented Endpoints + device listing Backend + Frontend
Admin step-up Not implemented Password re-auth + elevated session Backend + Frontend
Two-person approval Not implemented Pending approval workflow Backend + Frontend

Migration steps (in order)

Step 1: Backend -- reduce access token lifetime to 15 minutes

  • Change JWT_EXPIRES_IN default from 7d to 15m.
  • Deploy. Existing 7-day tokens remain valid until they expire naturally (no force-invalidations).
  • Risk: users with long-lived sessions will notice more frequent refreshes. This is expected and acceptable.

Step 2: Backend -- refresh token lifetime to 7 days

  • Change REFRESH_TOKEN_EXPIRES_IN default from 30d to 7d.
  • Deploy. Existing 30-day refresh tokens remain valid until they expire or are rotated.

Step 3: Backend -- add refresh token metadata to refreshTokens[]

  • Deploy new schema: user.refreshTokens becomes a subdocument array with tokenHash, deviceInfo, ipAddress, createdAt, lastUsedAt.
  • Migration script: convert existing String[] entries to { tokenHash: sha256(entry), deviceInfo: 'Unknown (pre-migration)', ipAddress: 'unknown', createdAt: Date.now(), lastUsedAt: Date.now() }.
  • Login and refresh endpoints updated to write new schema.
  • Deploy. Old-format entries continue to work during migration.

Step 4: Backend -- set refresh token as httpOnly cookie

  • On login, refresh, and OAuth sign-in: set Set-Cookie header with the refresh token in an httpOnly cookie. Also return the refresh token in the response body for backward compatibility.
  • Add POST /api/auth/refresh-token-cookie endpoint that accepts the refresh token from the body and sets it as a cookie (migration helper for existing sessions).
  • Deploy. Frontend still works with body-based refresh tokens.

Step 5: Frontend -- move access token to in-memory storage

  • Replace localStorage.getItem('accessToken') and localStorage.setItem('accessToken', ...) with an in-memory store (module-level variable or React context).
  • On app load: check for refresh cookie. If present, call refresh endpoint to obtain a new access token. If no cookie, redirect to login.
  • Remove localStorage writes for both tokens. On logout, clear the in-memory token and the cookie (by calling the logout endpoint which sets an expired cookie).
  • Deploy frontend.

Step 6: Frontend -- send refresh via cookie instead of body

  • Modify the Axios interceptor to NOT send refreshToken in the body of POST /api/auth/refresh-token. The refresh token is sent automatically via the cookie.
  • Backend: accept refresh token from either cookie or body (backward compatible). Deprecate body-based refresh with a log warning.
  • Deploy both.

Step 7: Backend -- add session management endpoints

  • GET /api/auth/sessions -- list active sessions.
  • POST /api/auth/revoke-session -- revoke a specific session.
  • POST /api/auth/revoke-all-sessions -- revoke all other sessions.
  • Deploy. No frontend change yet (endpoints are available but unused).

Step 8: Frontend -- add session management UI

  • Account settings page: "Active Sessions" section listing devices, locations, and last active times.
  • "Revoke" button per session. "Revoke all other sessions" button.
  • Deploy frontend.

Step 9: Backend -- feature flag for passkeys

  • Add ENABLE_PASSKEYS env var (default false).
  • Gate all /api/auth/passkey/* routes behind the flag.
  • Return 404 when disabled.
  • Deploy.

Step 10: Frontend -- feature flag for passkey UI

  • Add NEXT_PUBLIC_ENABLE_PASSKEYS env var (default false).
  • Hide passkey UI components when disabled.
  • Deploy frontend.

Step 11: Backend -- admin step-up authentication

  • Add POST /api/auth/step-up endpoint.
  • Add requireStepUp() middleware.
  • Apply middleware to high-risk admin routes.
  • Add Redis-based elevated session store.
  • Deploy.

Step 12: Frontend -- admin step-up UI

  • Password prompt modal for step-up challenges.
  • Intercept 403 STEP_UP_REQUIRED responses and show modal.
  • Retry original request after successful step-up.
  • Deploy frontend.

Step 13: Backend -- two-person approval

  • Add PendingApproval collection.
  • Add approval workflow endpoints.
  • Apply to payout/release actions above $1,000.
  • Add notification logic for other admins.
  • Deploy.

Step 14: Frontend -- two-person approval UI

  • Pending approvals list in admin dashboard.
  • Confirm/reject actions with step-up.
  • Deploy frontend.

Step 15: Backend -- remove body-based refresh token acceptance

  • After all frontends are migrated (Step 6 + reasonable buffer of 2 weeks), stop accepting refresh tokens from the request body.
  • Accept refresh tokens only from the cookie.
  • Deploy.

Feature flags

Flag Default Environments Purpose
ENABLE_PASSKEYS false All Controls passkey route registration
NEXT_PUBLIC_ENABLE_PASSKEYS false All Controls passkey UI visibility
COOKIE_REFRESH_MIGRATION false All Enables cookie-based refresh token issuance
REQUIRE_STEP_UP false Staging, Production Enables step-up auth for admin actions

Rollback plan

If any migration step causes issues:

  1. Steps 1-2 (token lifetimes): Revert JWT_EXPIRES_IN and REFRESH_TOKEN_EXPIRES_IN to previous values. Redeploy. No data migration to undo.
  2. Steps 3-4 (refresh token schema + cookies): Backend continues to accept body-based refresh tokens. Frontend can revert to localStorage storage. The httpOnly cookie is additive; removing it does not break existing sessions.
  3. Step 5 (in-memory access token): Frontend can revert to localStorage. The backend does not care where the access token comes from.
  4. Steps 7-8 (session management): These are additive endpoints and UI. Rolling back means removing the UI and endpoints. No data is affected.
  5. Steps 9-10 (passkey feature flag): Set flags to true to restore passkey access (though passkeys remain stubbed and insecure). Rolling back is simply changing env vars.
  6. Steps 11-14 (step-up and two-person approval): Remove requireStepUp() middleware. Admin actions proceed without step-up. This is a security regression but not a functional outage.

Timeline estimate

Phase Steps Duration Dependencies
Token hardening 1-2 1 day None
Cookie migration 3-6 3-5 days Frontend + backend coordination
Session management 7-8 2-3 days Cookie migration complete
Passkey feature flag 9-10 1 day None
Admin step-up 11-12 3-4 days None
Two-person approval 13-14 3-5 days Admin step-up complete
Cleanup (step 15) 15 1 day (after 2-week buffer) All frontends migrated
Total 14-21 days

Section 10: Threat Mitigation Traceability

Decision Threats Addressed Risk Reduction
Access token in memory (not localStorage) T13 (XSS token theft) XSS cannot persistently steal the token; it is lost on page unload
Access token lifetime reduced to 15 min T04 (stolen token reuse) Stealable token valid for 15 min instead of 7 days (672x reduction in exposure window)
Refresh token in httpOnly cookie T04, T13 XSS cannot read the refresh token; it is not accessible to JavaScript
Refresh token lifetime reduced to 7 days T04 Maximum exploitation window from a compromised refresh token is 7 days instead of 30 days
Refresh token rotation with reuse detection T04 Reuse of a rotated token triggers full session invalidation; attacker and legitimate user are forced to re-authenticate
SameSite=Strict on refresh cookie T15 (CSRF) Cookie not sent on cross-origin requests; CSRF on refresh endpoint is eliminated
Refresh cookie scoped to /api/auth/refresh-token path T15 Cookie sent only on the refresh endpoint; not on any state-changing endpoint
Passkey feature flag disabled in production T10 (passkey bypass) Stubbed passkey implementation is unreachable in production; cannot be exploited
Session revocation endpoints T04 Users can terminate compromised sessions immediately; admins can revoke sessions for suspended users
Max 5 sessions per user T04 Limits blast radius of session accumulation; oldest sessions auto-revoked
Admin step-up authentication T09 (admin privilege escalation), T18 (insider fund manipulation) Compromised admin session cannot perform high-risk actions without re-authenticating; elevated session lasts only 5 minutes
Two-person approval for large payouts T05 (double payout), T18 No single admin can release high-value escrow; second admin must independently verify and approve
Audit logging for step-up and approval events T09, T18 All elevated-access events are recorded in tamper-evident audit trail
Password change revokes all sessions T04 If user detects compromise, password change immediately terminates all attacker sessions
Account suspension revokes all sessions T09 Compromised admin accounts are immediately locked out when suspended
Device/session listing T04 Users can detect unfamiliar sessions and revoke them; early detection of compromise
Email notification on new device login T04 User is alerted to unauthorized access within minutes
Verification code removal from production logs T22 (verification code leakage) Codes are no longer loggable in production; only non-production environments may log them for debugging
Access token jti claim T04 Each token has a unique identifier; enables future deny-listing of individual tokens
OAuth token storage same as email/password T04, T13 OAuth sessions receive the same protections (in-memory access, httpOnly refresh)

Coverage analysis

Threat Mitigated by this ADR? Residual risk
T04 (stolen token reuse) Yes -- 15-min access token, httpOnly refresh cookie, rotation, session revocation Physical access to an unlocked device with an active session; keylogger capturing password during step-up
T10 (passkey bypass) Yes -- feature flag disabled in production None (passkeys are unreachable)
T13 (XSS token theft) Yes -- in-memory access token, httpOnly refresh cookie Transient XSS can read in-memory token for up to 15 minutes; XSS cannot access refresh token
T15 (CSRF) Yes -- access token in Authorization header (unchanged), SameSite=Strict on refresh cookie None
T22 (verification code leakage) Partially -- ADR documents the requirement; implementation is a separate task Codes still logged until code change is deployed

Threats NOT addressed by this ADR (addressed elsewhere)

Threat Document
T01 (fake payment proof) Funds Ledger and Escrow State Machine Specification, Payment Provider Adapter Spec
T02 (webhook replay) Webhook Security Spec
T03 (arbitrary socket room join) Realtime Authorization Spec
T05 (double payout) Funds Ledger and Escrow State Machine Specification
T06 (dispute bypass) Funds Ledger and Escrow State Machine Specification
T07 (email abuse) Rate limiting implementation
T08 (AI cost abuse) Rate limiting + auth implementation
T09 (admin privilege escalation) Authorization Matrix - REST and Socket.IO + step-up auth (this ADR)
T11 (unauthenticated payment endpoints) Auth middleware implementation
T12 (rate limit bypass) Rate limiting implementation
T14 (supply-chain) Secure Build and Supply-Chain Policy
T16 (deep-link tampering) Telegram initData verification
T17 (provider outage) Backend Funds Migration and Operational Runbooks
T18 (insider manipulation) Multi-sig wallet + funds ledger + two-person approval (this ADR)
T19 (price manipulation) Offer status enforcement
T20 (delivery brute force) Rate limiting + code entropy
T21 (data exfiltration) Auth middleware implementation
T23 (state machine inconsistency) Canonical state machine specification

Cross-references


This document was created on 2026-05-24 as part of the Taskmaster task 4 (authentication and session architecture) for the Amanat escrow platform. It must be reviewed by Backend Lead, Frontend Lead, and CTO before implementation begins. Changes to any decision in this document require sign-off per the RACI in Security Ownership and Launch Decision Criteria Section 1.