nick-doc/09 - Audits/Session and Authentication Architecture Decision.md

---
title: Session and Authentication Architecture Decision
tags: [audit, security, adr, authentication, session, passkey, webauthn, admin, step-up]
created: 2026-05-24
status: decided
reviewers: [backend, security, frontend, cto]
---

# Session and Authentication Architecture Decision

**Architecture Decision Record.** This document resolves deferred decisions D-1, D-4, D-5, and D-6 from [[Security Ownership and Launch Decision Criteria]] and addresses threats T04, T10, T13, T15, and T22 from [[Threat Model - Amanat Escrow Platform]].

All decisions in this document are binding for the Amanat platform. Changes require sign-off from the accountable role per the RACI in [[Security Ownership and Launch Decision Criteria]] Section 1.

---

## Section 1: Decision Summary

| Decision | Chosen Option | Rejected Alternatives | Status |
|---|---|---|---|
| Access token storage | In-memory only (JavaScript variable), not persisted to any browser storage | localStorage (current), httpOnly cookie | Decided |
| Refresh token storage | httpOnly, Secure, SameSite=Strict cookie | localStorage (current), response body | Decided |
| Access token lifetime | 15 minutes | 7 days (current), 60 minutes, 30 minutes | Decided |
| Refresh token lifetime | 7 days with rotation on every use | 30 days (current), 14 days, 1 day | Decided |
| CSRF strategy | SameSite=Strict on refresh cookie; no CSRF token needed because access token is never in a cookie | Double-submit cookie, synchronizer token pattern | Decided |
| Passkey/WebAuthn | Option C: feature-flagged off in production; stubs remain for development | Remove entirely, implement production WebAuthn pre-launch | Decided |
| OAuth requirements | Authorization code with PKCE; same session model as email/password; implicit account linking by email | Implicit flow, separate session model for OAuth | Decided |
| Device/session revocation | user.refreshTokens[] with device metadata, revocation endpoints, max 5 sessions per user | No revocation (current), Redis-only session store | Decided |
| Admin step-up auth | Required for high-risk actions; 5-minute elevated session via re-authentication (password) | No step-up (current), hardware 2FA mandatory, passkey-only | Decided |
| Two-person approval | Required for payouts above $1,000 USD equivalent; second admin must confirm | Single-admin approval for all amounts, multi-sig wallet only | Decided |

---

## Section 2: Access Token Storage and Lifetime

### Decision

**We will store the access token in JavaScript memory only (a module-level variable or React context), not in localStorage, sessionStorage, IndexedDB, or any cookie.** The access token is never written to any persistent browser storage.

### Rejected alternatives

| Alternative | Rejection reason |
|---|---|
| **localStorage** (current state) | Fully accessible to any XSS payload. Threat T13: any script injection in any dependency, user-generated content field, or uploaded file exfiltrates the token. Threat T04: 7-day lifetime means a stolen token grants access for up to a week without triggering refresh rotation. |
| **httpOnly cookie** | Eliminates XSS theft but introduces CSRF risk on every API call. Requires CSRF token infrastructure for all state-changing endpoints. Adds CORS complexity because cookies are sent automatically by the browser on cross-origin requests if SameSite is not Strict. The access token must be sent on every API call, so the CSRF surface is every endpoint. |

### Rationale

In-memory storage is immune to XSS-based persistent theft. An attacker who achieves XSS can read the token during the lifetime of the page session, but the token evaporates on tab close, navigation away, or page refresh. Combined with a 15-minute expiry, the maximum exploitation window from any single XSS event is 15 minutes, not 7 days.

The access token is only needed for the duration of a browser session. The Axios interceptor holds the token in a closure variable. On page refresh, the refresh-token flow re-establishes the access token transparently.

### Token lifetime

**15 minutes** from issuance.

Rationale: Threat T04 identifies the 7-day access token lifetime as a critical risk. A stolen access token is usable for its entire lifetime without triggering refresh-token rotation. Reducing to 15 minutes limits the damage window to 15 minutes maximum. This is short enough that an attacker who exfiltrates the token via a transient XSS event (e.g., a reflected XSS that closes the page) loses access almost immediately, while long enough to avoid excessive refresh calls on a normal browsing session.

### Token format

JWT with HS256 signing (same algorithm as current). Claims:

| Claim | Value | Purpose |
|---|---|---|
| `sub` | User `_id` (ObjectId string) | Subject identification |
| `role` | `buyer`, `seller`, `admin`, `support` | Authorization enforcement |
| `iat` | Unix timestamp | Issued-at |
| `exp` | `iat + 900` (15 minutes) | Expiry enforcement |
| `jti` | UUID v4 | Token-specific identifier for audit trail and future revocation |
| `iss` | `marketplace-backend` | Issuer verification |
| `aud` | `marketplace-users` | Audience verification |

No change to the signing algorithm or secret management. The `jti` claim is new and required for audit logging and future deny-list capabilities.

### Renewal strategy

**Silent refresh with rotation.** The Axios response interceptor detects `401 TOKEN_INVALID` or `403` responses. It attempts a refresh using the httpOnly cookie (sent automatically by the browser). On success, the backend returns a new access token in the response body and sets a new refresh cookie. The interceptor updates the in-memory access token and retries the original request.

Concurrent requests during refresh: implement a refresh mutex (single inflight refresh promise). If multiple requests fail with 401 simultaneously, the first triggers the refresh; subsequent requests await the same promise and retry with the new token.

**Absolute expiry:** The refresh token has a maximum lifetime of 7 days (see Section 3). After 7 days, the user must re-authenticate. There is no sliding window that extends beyond 7 days.

### Token theft detection and response

1. **Refresh-token reuse detection** (already exists, remains in place): If a previously-used refresh token is presented, the backend invalidates all sessions for that user and forces re-authentication. The user receives an email notification: "Your session was terminated due to suspicious activity. Please sign in again."

2. **Access token theft**: Because the token is in-memory only, persistent theft requires a sustained XSS presence. The 15-minute window limits exposure. No server-side detection is possible for access-token theft alone (the token is valid until expiry). The primary defense is the short lifetime.

3. **Audit logging**: Every token issuance and refresh is logged with `jti`, `userId`, `ip`, `userAgent`, and timestamp. Abnormal patterns (e.g., refresh from a new IP geolocation far from the previous one) trigger an alert for investigation.

---

## Section 3: Refresh Token Storage, Rotation, and Revocation

### Storage location

**httpOnly, Secure, SameSite=Strict cookie** named `__Host-refresh-token`.

Cookie attributes:

| Attribute | Value | Rationale |
|---|---|---|
| `httpOnly` | `true` | Not accessible to JavaScript; immune to XSS theft |
| `secure` | `true` | Transmitted only over HTTPS |
| `sameSite` | `Strict` | Not sent on cross-origin requests; CSRF protection |
| `path` | `/api/auth/refresh-token` | Cookie sent only on the refresh endpoint, not on every API call |
| `domain` | (omitted; host-only) | Scoped to the exact origin |
| `maxAge` | `604800000` (7 days in ms) | Matches refresh token lifetime |

The `__Host-` prefix ensures the cookie cannot be set by a subdomain and requires `Secure`.

### Refresh token lifetime

**7 days** with rotation on every use.

Rationale: The current 30-day lifetime is excessive for a financial platform. Seven days provides a reasonable balance between user convenience (users are not forced to re-authenticate daily) and security (a compromised refresh token is only exploitable for 7 days maximum, and rotation reduces the window further). The deferred decision D-1 in [[Security Ownership and Launch Decision Criteria]] accepts that the migration from localStorage to httpOnly cookies must happen within 30 days post-launch. This ADR specifies the target state.

### Rotation strategy

On every refresh:

1. Backend receives the refresh token from the cookie.
2. Verifies the JWT signature and expiry.
3. Looks up the user and checks the token hash is present in `user.refreshTokens[]`.
4. Removes the old token from the array.
5. Issues a new access token (in response body) and a new refresh token (in `Set-Cookie` header).
6. Pushes the new refresh token hash to `user.refreshTokens[]`.

### Token reuse detection

If a previously-consumed refresh token is presented (i.e., the token was already rotated away and is no longer in `user.refreshTokens[]`):

1. Invalidate ALL sessions for that user: set `user.refreshTokens = []`.
2. Revoke all Redis session records for that user.
3. Send an email to the user: "A potential security issue was detected. All your sessions have been terminated. Please sign in again and change your password if you did not initiate this activity."
4. Log the event with both token hashes, the IP addresses of both the legitimate and reuse attempts, and timestamps.

This is the existing behavior documented in [[Authentication Flow]] and [[Security Architecture]] section 2.4. It remains unchanged.

### MongoDB storage schema

The `user.refreshTokens[]` array currently stores raw token strings. We will migrate to a subdocument array with metadata:

```typescript
refreshTokens: [{
  tokenHash: String,       // SHA-256 hash of the refresh token (not the raw token)
  deviceInfo: String,      // User-Agent string (truncated to 200 chars)
  ipAddress: String,       // IP at time of token issuance
  createdAt: Date,         // When the token was issued
  lastUsedAt: Date,        // Updated on each refresh
}]
```

The raw refresh token is never stored in MongoDB. Only its SHA-256 hash is stored. Verification compares `sha256(receivedToken) === stored.tokenHash`.

Migration: existing plain-string entries in `user.refreshTokens[]` are invalidated on the first refresh after deployment. Users with existing sessions are force-re-authenticated once.

### Revocation endpoints

**`POST /api/auth/revoke-session`**

Requires Bearer JWT. Body: `{ sessionTokenHash: string }` (the hash of the session to revoke, obtained from the session listing endpoint). The backend removes the matching entry from `user.refreshTokens[]` and deletes any Redis session keyed by tokens associated with that refresh token.

**`POST /api/auth/revoke-all-sessions`**

Requires Bearer JWT. Removes all entries from `user.refreshTokens[]` except the one used by the current request (identified by the refresh cookie). Deletes all Redis sessions for that user. The user remains logged in on the current device.

Both endpoints are audited: action, actor, target session hash, timestamp, IP, user-agent.

---

## Section 4: CSRF Strategy

### Current state

JWT is sent in the `Authorization: Bearer` header. Browsers do not attach `Authorization` headers on cross-origin requests. CSRF is currently mitigated by design (Threat T15: "Mitigated").

### Decision: no CSRF token needed for access tokens

Because the access token is stored in JavaScript memory and sent via the `Authorization` header, CSRF is not a concern for the majority of API calls. The browser will not include the `Authorization` header on a forged cross-origin request.

### CSRF protection for the refresh endpoint

The refresh token is stored in an httpOnly cookie. However, the cookie attributes provide CSRF protection:

- **`SameSite=Strict`**: The cookie is not sent on any cross-origin request, including top-level navigations from external sites. This eliminates CSRF on the refresh endpoint.
- **`Path=/api/auth/refresh-token`**: The cookie is only sent on requests to the refresh endpoint, not on any other API call.

Combined, these attributes mean an attacker cannot trigger a refresh from a cross-origin page. The `SameSite=Strict` policy is appropriate here because the refresh endpoint is never called from an external context (no OAuth callback, no payment provider redirect targets the refresh endpoint).

### If the architecture migrates access tokens to cookies later

If a future decision moves the access token to a cookie (which we explicitly reject in this ADR), CSRF tokens become mandatory. The recommended approach would be:

- Double-submit cookie pattern: set a CSRF token in a non-httpOnly cookie; the frontend reads it and includes it in a custom header (`X-CSRF-Token`). The backend verifies the header matches the cookie.
- Apply to all state-changing endpoints (POST, PUT, PATCH, DELETE).

This fallback is documented but not implemented. No action is needed unless the access token storage decision is revisited.

### Web3 wallet interactions

Web3 wallet connections (wagmi, WalletConnect) open popup windows or browser extensions for signing. These interactions do not involve the platform's cookies or tokens. The signed transaction or message is returned to the platform's JavaScript context and sent to the backend via the normal Axios interceptor with the in-memory access token. CSRF is not a concern for Web3 interactions.

---

## Section 5: Passkey/WebAuthn Decision

### Decision: Option C -- feature-flagged off in production

We will keep the current stubbed passkey implementation in the codebase, gate it behind a feature flag (`ENABLE_PASSKEYS`), and set this flag to `false` in production environments.

### Rejected alternatives

| Alternative | Rejection reason |
|---|---|
| **Option A: Remove passkeys entirely** | The frontend UI and backend routes already exist. Removing them means rebuilding the registration and sign-in UI later. The stubbed code does not pose a security risk when disabled via feature flag. |
| **Option B: Implement production WebAuthn pre-launch** | Per [[Platform Logical Audit - 2026-05-24]] Finding 2, the current implementation has three critical flaws (stubbed attestation, in-memory challenges, missing refresh-token persistence). Fixing all three, testing across platforms, and auditing the result would take 2-3 weeks of focused engineering. This is not justifiable before launch when password + OAuth authentication is sufficient. |

### Rationale

The launch gate in [[Security Ownership and Launch Decision Criteria]] Section 2.1.7 requires: "Passkey/WebAuthn disabled in production until real cryptographic implementation is complete." A feature flag is the cleanest way to comply: the code exists but cannot be reached in production. The deferred decision D-4 sets a deadline of 90 days post-launch for real WebAuthn.

### Feature flag implementation

- Backend env var: `ENABLE_PASSKEYS=false` in production, `true` in development.
- The passkey routes (`/api/auth/passkey/*`) return `404 Not Found` when the flag is `false`. The route registration itself is conditional.
- Frontend: the Passkey UI components are hidden when `NEXT_PUBLIC_ENABLE_PASSKEYS` is not `"true"`.
- Both flags are `false` by default; must be explicitly enabled.

### Target WebAuthn implementation (for D-4 resolution)

When the team implements production WebAuthn within 90 days post-launch, the following specifications apply:

**Library:** `@simplewebauthn/server` (server-side) and `@simplewebauthn/browser` (client-side).

**Relying Party configuration:**

| Parameter | Value |
|---|---|
| RP ID | Production eTLD+1 domain (e.g., `amn.gg`), NOT `localhost` |
| RP Name | `Amanat` |
| Origins | `https://amn.gg`, `https://www.amn.gg` |
| Timeout | 60 seconds |

**Challenge storage:** Redis-backed with 5-minute TTL. Key: `webauthn:challenge:{challengeHash}`. Value: `{ userId, type: 'registration' | 'authentication', createdAt }`. The in-process `Map` is removed.

**Attestation type:** `none`. Rationale: Amanat does not need to verify the authenticator manufacturer or model. Direct or indirect attestation adds complexity (managing attestation certificates, privacy concerns) without security benefit for this platform. We rely on the authenticator's signature, not its attestation.

**Credential storage (passkeys[] subdocument):**

| Field | Type | Description |
|---|---|---|
| `id` | String | Base64url-encoded credential ID |
| `publicKey` | Buffer | COSE public key (actual bytes, not a stub string) |
| `counter` | Number | Monotonic signature counter; incremented on each authentication |
| `deviceType` | String | `platform` or `cross-platform` |
| `deviceName` | String | User-provided label or auto-generated from user-agent |
| `transports` | String[] | Authenticator transports (e.g., `['internal', 'hybrid']`) |
| `registeredAt` | Date | Timestamp |

**Authentication flow integration:** Passkey login issues the same JWT pair as password login. The refresh token is persisted in `user.refreshTokens[]` using the same schema as all other authentication methods. This closes the gap identified in [[Passkey (WebAuthn) Flow]] where passkey-issued tokens were not added to the allow-list.

**Counter enforcement:** On each authentication, the received counter must be strictly greater than the stored counter. If the counter is less than or equal, the authentication is rejected, the event is logged as a potential cloned authenticator, and the user is notified.

**Cross-device authentication:** Allowed. The `transports` field in registration options includes `hybrid` to support cross-device flows (e.g., phone authenticator on desktop login).

**Migration from stubbed to real implementation:**

1. Deploy feature flag change: `ENABLE_PASSKEYS=true` in staging only.
2. Run migration: delete all entries in `user.passkeys[]` (they contain the stub `'simulated-public-key'` and are not valid credentials). Notify users that passkeys must be re-registered.
3. Deploy `@simplewebauthn/server` integration with Redis challenge store.
4. QA: test registration and authentication on Chrome (Touch ID / YubiKey), Firefox, Safari, Android, iOS.
5. Enable in production after QA sign-off.

---

## Section 6: OAuth Requirements

### Google OAuth

The current Google OAuth implementation documented in [[Google OAuth Flow]] is largely compatible with the new session model. The following adjustments apply:

**Token exchange flow:** The current implementation uses Google Identity Services (GIS) with `initTokenClient` and `requestAccessToken`. The frontend receives an ID token (Google-signed JWT). This is sent to the backend for verification. This flow is already correct -- it is equivalent to an authorization code with PKCE flow where Google handles the code exchange client-side. No change is needed.

If additional OAuth providers are added in the future (GitHub, Apple), we will use the authorization code flow with PKCE. The frontend obtains an authorization code via redirect and sends it to the backend. The backend exchanges the code for tokens server-side. This prevents the client from ever seeing the provider's access token.

**Session integration:** After Google token verification succeeds, the backend issues the same JWT access token (15-minute, in-memory) and refresh token (7-day, httpOnly cookie) as the email/password flow. There is no separate session type for OAuth users. This is the current behavior and it is correct.

**Account linking:** Account linking is implicit by email match (current behavior). If `googleUser.email` matches an existing user, the existing account is used. Risks and mitigations:

| Risk | Mitigation |
|---|---|
| Attacker creates a Google account with a victim's email before the victim signs up | Google accounts are pre-verified; the attacker must control the email address at Google. This is a standard OAuth risk. |
| Victim signs up with email/password; attacker later creates Google account with same email and gains access | The backend checks for existing users on Google sign-in and does NOT create a new account. The attacker would need the victim's Google credentials. |
| User changes Google account email to match a different user | Google tokens are verified per-request; the backend trusts the `email` from the verified ID token. If Google allows email changes (they do not for gmail.com), this could be a vector. Mitigation: consider storing `googleId` (the `sub` claim) as a separate field in the future for multi-provider identity. |

For launch, the current email-based linking is acceptable. Post-launch, we should store `providers[].providerId` (e.g., `google:123456789`) for robust multi-provider identity.

**Token storage for OAuth sessions:** Same as email/password. Access token in memory, refresh token in httpOnly cookie.

**Logout behavior for OAuth sessions:** Logout invalidates the refresh token in `user.refreshTokens[]`, clears the cookie, and deletes Redis session records. The Google session on Google's side is not terminated (we do not call Google's revoke endpoint). This is standard practice. The user must sign out of Google separately if desired.

---

## Section 7: Admin Step-Up Authentication

### Decision

High-risk admin actions require re-authentication. Upon successful re-authentication, the admin receives a short-lived elevated session. Payouts above $1,000 USD equivalent also require two-person approval.

### Definition of high-risk admin actions

| Action | Step-Up Required | Two-Person Approval | Rationale |
|---|---|---|---|
| Payout/release escrow <= $1,000 USD | Yes (password) | No | Financial action; compromised session could release funds |
| Payout/release escrow > $1,000 USD | Yes (password) | Yes | High-value financial action; dual control per Threat T18 |
| Manual wallet signing (any amount) | Yes (password) | Yes (if > $1,000) | Direct access to escrow wallet |
| Refund escrow > $500 USD | Yes (password) | No | Irreversible financial action |
| User suspension or deletion | Yes (password) | No | Account impact; potential for abuse |
| Role change (any) | Yes (password) | No | Privilege escalation vector |
| Dispute override (admin resolves against recommendation) | Yes (password) | No | Financial side-effect; high dispute value |
| API key rotation (`JWT_SECRET`, webhook secrets) | Yes (password) | No | Invalidates all sessions or compromises integrity |
| Disable rate limiting or security features | Yes (password) | No | Reduces platform security posture |
| Export user data (bulk) | Yes (password) | No | Privacy-sensitive bulk operation |
| View escrow wallet private key (if applicable) | Yes (password) | Yes | Critical asset exposure |

### Step-up mechanism

**Re-authentication with password.** The admin must enter their password to obtain an elevated session. No additional 2FA at launch (passkeys are disabled; TOTP is not yet implemented). Post-launch, when WebAuthn is production-ready (D-4), the step-up will also accept passkey authentication as a second factor.

**Elevated session:**

| Attribute | Value |
|---|---|
| Duration | 5 minutes |
| Storage | Server-side only (Redis key `stepup:{userId}` with TTL 300s) |
| Scope | Grants elevated permissions for the specific action categories listed above |
| Renewal | Re-authentication required after 5 minutes; no automatic renewal |
| Verification | Middleware `requireStepUp()` checks Redis key existence before allowing the action |

### Step-up flow

1. Admin attempts a high-risk action (e.g., `POST /api/admin/payouts/release`).
2. Middleware `requireStepUp()` checks for an active elevated session in Redis.
3. If no elevated session exists, the backend returns `403 STEP_UP_REQUIRED` with `{ challengeId: uuid }`.
4. Frontend displays a password prompt (modal dialog).
5. Frontend sends `POST /api/auth/step-up` with `{ password, challengeId }`.
6. Backend verifies the password against `user.password` using bcrypt.
7. On success, backend creates Redis key `stepup:{userId}` with TTL 300s and returns `{ elevated: true, expiresAt: timestamp }`.
8. Frontend retries the original high-risk action.
9. The action proceeds.

### Two-person approval flow

For actions requiring two-person approval:

1. Admin A completes the step-up flow above.
2. Admin A initiates the action (e.g., `POST /api/admin/payouts/release`).
3. The action is created in a `PendingApproval` state (stored in MongoDB).
4. The system notifies all other admin users via Socket.IO and email.
5. Admin B navigates to the pending approval, completes their own step-up flow, and confirms (`POST /api/admin/approvals/{id}/confirm`).
6. The action executes.
7. If Admin B rejects (`POST /api/admin/approvals/{id}/reject`), the action is cancelled.

**Fallback when second admin is unavailable:**

If no second admin has acted on a pending approval within 4 hours, the CTO (or designated fallback) receives an email and Slack notification. The CTO can approve directly. If no CTO action within 24 hours, the approval expires and must be re-initiated.

This fallback addresses the realistic scenario where Amanat has a small team with few admins. As the team grows, the 4-hour and 24-hour windows should be tightened.

### Audit logging for step-up events

All step-up and two-person approval events are logged to an append-only audit collection:

| Field | Value |
|---|---|
| `action` | `step-up.attempt`, `step-up.success`, `step-up.failed`, `approval.created`, `approval.confirmed`, `approval.rejected`, `approval.expired` |
| `actorId` | ObjectId of the admin performing the action |
| `targetAction` | The high-risk action being performed (e.g., `payout.release`) |
| `targetEntity` | ObjectId or identifier of the entity (e.g., Payment ID) |
| `ip` | Request IP |
| `userAgent` | Request user-agent |
| `timestamp` | ISO 8601 |
| `metadata` | JSON object with action-specific details (e.g., payout amount) |

This collection is not writable by the application after insert (no updates, no deletes). Access is restricted to admin read-only and system write-only.

---

## Section 8: Session Management and Device Tracking

### Session tracking

Sessions are tracked via `user.refreshTokens[]` subdocuments (see Section 3 schema). Each entry represents one authenticated device.

### Device fingerprinting

We will use lightweight, non-invasive device identification:

| Signal | Source | Storage | Notes |
|---|---|---|---|
| User-Agent | `req.headers['user-agent']` | `refreshTokens[].deviceInfo` | Truncated to 200 characters |
| IP address | `req.ip` (behind CloudFlare: `req.headers['x-forwarded-for']`) | `refreshTokens[].ipAddress` | Used for geolocation approximation |
| Platform hint | Derived from user-agent parsing | Display only | Not stored separately |

We will NOT use browser fingerprinting (Canvas, WebGL, font enumeration), device IDs, or any tracking technique that requires user consent under privacy regulations. The user-agent and IP are already sent with every HTTP request.

### Session listing

**`GET /api/auth/sessions`** (requires Bearer JWT)

Returns the list of active sessions for the current user:

```json
{
  "sessions": [
    {
      "id": "sha256-hash-of-token",
      "device": "Chrome on macOS",
      "lastActive": "2026-05-24T14:30:00Z",
      "ip": "203.0.113.42",
      "location": "Tehran, Iran (approximate)",
      "isCurrent": true
    }
  ]
}
```

- `device` is a parsed, human-readable string derived from the user-agent (e.g., "Chrome 125 on macOS", "Safari on iPhone").
- `location` is derived from IP geolocation (city-level, approximate). We will use a local GeoIP database (MaxMind GeoLite2 or equivalent) to avoid sending user IPs to third-party services.
- `isCurrent` identifies the session making the request (matched by the refresh cookie).

### Session revocation

**`POST /api/auth/revoke-session`** (see Section 3).

Users can revoke any non-current session. Revoking the current session is equivalent to logout.

**`POST /api/auth/revoke-all-sessions`** (see Section 3).

Revokes all sessions except the current one. Useful if the user suspects compromise.

### Maximum sessions per user

**5 sessions.** When a user attempts to create a 6th session (login from a new device), the oldest session (by `createdAt`) is automatically revoked. The user is notified via email: "A new sign-in was detected on [device] from [location]. If this was not you, please change your password immediately."

Rationale: 5 sessions accommodates typical usage (desktop, laptop, phone, tablet, one more) while preventing unbounded session accumulation.

### Password change behavior

When a user changes their password:

1. All existing sessions are revoked (`user.refreshTokens = []`).
2. A new session is created for the current device.
3. All Redis session records for the user are deleted.
4. Email notification: "Your password was changed. If you did not make this change, contact support immediately."

This is the current behavior documented in [[Authentication Flow]] and it is correct.

### Account lock/suspension behavior

When an admin suspends or deletes a user account:

1. `user.status` is set to `suspended` or `deleted`.
2. `user.refreshTokens` is set to `[]`.
3. All Redis session records for the user are deleted.
4. Any in-flight requests with tokens for that user return `403 ACCOUNT_SUSPENDED` or `403 ACCOUNT_DELETED` on the next request (the `authMiddleware` already checks `user.status`).

---

## Section 9: Migration Plan

### Current state

| Component | Current | Target | Change Level |
|---|---|---|---|
| Access token storage | localStorage | In-memory variable | Frontend only |
| Access token lifetime | 7 days | 15 minutes | Backend config |
| Refresh token storage | localStorage | httpOnly cookie (backend set) | Full stack |
| Refresh token lifetime | 30 days | 7 days | Backend config |
| Refresh token schema | `String[]` | Subdocument array with metadata | Backend + DB migration |
| CSRF protection | Not needed (header-based) | Not needed (header-based + SameSite cookie) | None |
| Passkey status | Stubbed, accessible | Feature-flagged off in production | Backend + Frontend |
| Session revocation | Not implemented | Endpoints + device listing | Backend + Frontend |
| Admin step-up | Not implemented | Password re-auth + elevated session | Backend + Frontend |
| Two-person approval | Not implemented | Pending approval workflow | Backend + Frontend |

### Migration steps (in order)

**Step 1: Backend -- reduce access token lifetime to 15 minutes**

- Change `JWT_EXPIRES_IN` default from `7d` to `15m`.
- Deploy. Existing 7-day tokens remain valid until they expire naturally (no force-invalidations).
- Risk: users with long-lived sessions will notice more frequent refreshes. This is expected and acceptable.

**Step 2: Backend -- refresh token lifetime to 7 days**

- Change `REFRESH_TOKEN_EXPIRES_IN` default from `30d` to `7d`.
- Deploy. Existing 30-day refresh tokens remain valid until they expire or are rotated.

**Step 3: Backend -- add refresh token metadata to refreshTokens[]**

- Deploy new schema: `user.refreshTokens` becomes a subdocument array with `tokenHash`, `deviceInfo`, `ipAddress`, `createdAt`, `lastUsedAt`.
- Migration script: convert existing `String[]` entries to `{ tokenHash: sha256(entry), deviceInfo: 'Unknown (pre-migration)', ipAddress: 'unknown', createdAt: Date.now(), lastUsedAt: Date.now() }`.
- Login and refresh endpoints updated to write new schema.
- Deploy. Old-format entries continue to work during migration.

**Step 4: Backend -- set refresh token as httpOnly cookie**

- On login, refresh, and OAuth sign-in: set `Set-Cookie` header with the refresh token in an httpOnly cookie. Also return the refresh token in the response body for backward compatibility.
- Add `POST /api/auth/refresh-token-cookie` endpoint that accepts the refresh token from the body and sets it as a cookie (migration helper for existing sessions).
- Deploy. Frontend still works with body-based refresh tokens.

**Step 5: Frontend -- move access token to in-memory storage**

- Replace `localStorage.getItem('accessToken')` and `localStorage.setItem('accessToken', ...)` with an in-memory store (module-level variable or React context).
- On app load: check for refresh cookie. If present, call refresh endpoint to obtain a new access token. If no cookie, redirect to login.
- Remove `localStorage` writes for both tokens. On logout, clear the in-memory token and the cookie (by calling the logout endpoint which sets an expired cookie).
- Deploy frontend.

**Step 6: Frontend -- send refresh via cookie instead of body**

- Modify the Axios interceptor to NOT send `refreshToken` in the body of `POST /api/auth/refresh-token`. The refresh token is sent automatically via the cookie.
- Backend: accept refresh token from either cookie or body (backward compatible). Deprecate body-based refresh with a log warning.
- Deploy both.

**Step 7: Backend -- add session management endpoints**

- `GET /api/auth/sessions` -- list active sessions.
- `POST /api/auth/revoke-session` -- revoke a specific session.
- `POST /api/auth/revoke-all-sessions` -- revoke all other sessions.
- Deploy. No frontend change yet (endpoints are available but unused).

**Step 8: Frontend -- add session management UI**

- Account settings page: "Active Sessions" section listing devices, locations, and last active times.
- "Revoke" button per session. "Revoke all other sessions" button.
- Deploy frontend.

**Step 9: Backend -- feature flag for passkeys**

- Add `ENABLE_PASSKEYS` env var (default `false`).
- Gate all `/api/auth/passkey/*` routes behind the flag.
- Return `404` when disabled.
- Deploy.

**Step 10: Frontend -- feature flag for passkey UI**

- Add `NEXT_PUBLIC_ENABLE_PASSKEYS` env var (default `false`).
- Hide passkey UI components when disabled.
- Deploy frontend.

**Step 11: Backend -- admin step-up authentication**

- Add `POST /api/auth/step-up` endpoint.
- Add `requireStepUp()` middleware.
- Apply middleware to high-risk admin routes.
- Add Redis-based elevated session store.
- Deploy.

**Step 12: Frontend -- admin step-up UI**

- Password prompt modal for step-up challenges.
- Intercept `403 STEP_UP_REQUIRED` responses and show modal.
- Retry original request after successful step-up.
- Deploy frontend.

**Step 13: Backend -- two-person approval**

- Add `PendingApproval` collection.
- Add approval workflow endpoints.
- Apply to payout/release actions above $1,000.
- Add notification logic for other admins.
- Deploy.

**Step 14: Frontend -- two-person approval UI**

- Pending approvals list in admin dashboard.
- Confirm/reject actions with step-up.
- Deploy frontend.

**Step 15: Backend -- remove body-based refresh token acceptance**

- After all frontends are migrated (Step 6 + reasonable buffer of 2 weeks), stop accepting refresh tokens from the request body.
- Accept refresh tokens only from the cookie.
- Deploy.

### Feature flags

| Flag | Default | Environments | Purpose |
|---|---|---|---|
| `ENABLE_PASSKEYS` | `false` | All | Controls passkey route registration |
| `NEXT_PUBLIC_ENABLE_PASSKEYS` | `false` | All | Controls passkey UI visibility |
| `COOKIE_REFRESH_MIGRATION` | `false` | All | Enables cookie-based refresh token issuance |
| `REQUIRE_STEP_UP` | `false` | Staging, Production | Enables step-up auth for admin actions |

### Rollback plan

If any migration step causes issues:

1. **Steps 1-2 (token lifetimes):** Revert `JWT_EXPIRES_IN` and `REFRESH_TOKEN_EXPIRES_IN` to previous values. Redeploy. No data migration to undo.
2. **Steps 3-4 (refresh token schema + cookies):** Backend continues to accept body-based refresh tokens. Frontend can revert to `localStorage` storage. The httpOnly cookie is additive; removing it does not break existing sessions.
3. **Step 5 (in-memory access token):** Frontend can revert to `localStorage`. The backend does not care where the access token comes from.
4. **Steps 7-8 (session management):** These are additive endpoints and UI. Rolling back means removing the UI and endpoints. No data is affected.
5. **Steps 9-10 (passkey feature flag):** Set flags to `true` to restore passkey access (though passkeys remain stubbed and insecure). Rolling back is simply changing env vars.
6. **Steps 11-14 (step-up and two-person approval):** Remove `requireStepUp()` middleware. Admin actions proceed without step-up. This is a security regression but not a functional outage.

### Timeline estimate

| Phase | Steps | Duration | Dependencies |
|---|---|---|---|
| Token hardening | 1-2 | 1 day | None |
| Cookie migration | 3-6 | 3-5 days | Frontend + backend coordination |
| Session management | 7-8 | 2-3 days | Cookie migration complete |
| Passkey feature flag | 9-10 | 1 day | None |
| Admin step-up | 11-12 | 3-4 days | None |
| Two-person approval | 13-14 | 3-5 days | Admin step-up complete |
| Cleanup (step 15) | 15 | 1 day (after 2-week buffer) | All frontends migrated |
| **Total** | | **14-21 days** | |

---

## Section 10: Threat Mitigation Traceability

| Decision | Threats Addressed | Risk Reduction |
|---|---|---|
| Access token in memory (not localStorage) | T13 (XSS token theft) | XSS cannot persistently steal the token; it is lost on page unload |
| Access token lifetime reduced to 15 min | T04 (stolen token reuse) | Stealable token valid for 15 min instead of 7 days (672x reduction in exposure window) |
| Refresh token in httpOnly cookie | T04, T13 | XSS cannot read the refresh token; it is not accessible to JavaScript |
| Refresh token lifetime reduced to 7 days | T04 | Maximum exploitation window from a compromised refresh token is 7 days instead of 30 days |
| Refresh token rotation with reuse detection | T04 | Reuse of a rotated token triggers full session invalidation; attacker and legitimate user are forced to re-authenticate |
| SameSite=Strict on refresh cookie | T15 (CSRF) | Cookie not sent on cross-origin requests; CSRF on refresh endpoint is eliminated |
| Refresh cookie scoped to `/api/auth/refresh-token` path | T15 | Cookie sent only on the refresh endpoint; not on any state-changing endpoint |
| Passkey feature flag disabled in production | T10 (passkey bypass) | Stubbed passkey implementation is unreachable in production; cannot be exploited |
| Session revocation endpoints | T04 | Users can terminate compromised sessions immediately; admins can revoke sessions for suspended users |
| Max 5 sessions per user | T04 | Limits blast radius of session accumulation; oldest sessions auto-revoked |
| Admin step-up authentication | T09 (admin privilege escalation), T18 (insider fund manipulation) | Compromised admin session cannot perform high-risk actions without re-authenticating; elevated session lasts only 5 minutes |
| Two-person approval for large payouts | T05 (double payout), T18 | No single admin can release high-value escrow; second admin must independently verify and approve |
| Audit logging for step-up and approval events | T09, T18 | All elevated-access events are recorded in tamper-evident audit trail |
| Password change revokes all sessions | T04 | If user detects compromise, password change immediately terminates all attacker sessions |
| Account suspension revokes all sessions | T09 | Compromised admin accounts are immediately locked out when suspended |
| Device/session listing | T04 | Users can detect unfamiliar sessions and revoke them; early detection of compromise |
| Email notification on new device login | T04 | User is alerted to unauthorized access within minutes |
| Verification code removal from production logs | T22 (verification code leakage) | Codes are no longer loggable in production; only non-production environments may log them for debugging |
| Access token `jti` claim | T04 | Each token has a unique identifier; enables future deny-listing of individual tokens |
| OAuth token storage same as email/password | T04, T13 | OAuth sessions receive the same protections (in-memory access, httpOnly refresh) |

### Coverage analysis

| Threat | Mitigated by this ADR? | Residual risk |
|---|---|---|
| T04 (stolen token reuse) | Yes -- 15-min access token, httpOnly refresh cookie, rotation, session revocation | Physical access to an unlocked device with an active session; keylogger capturing password during step-up |
| T10 (passkey bypass) | Yes -- feature flag disabled in production | None (passkeys are unreachable) |
| T13 (XSS token theft) | Yes -- in-memory access token, httpOnly refresh cookie | Transient XSS can read in-memory token for up to 15 minutes; XSS cannot access refresh token |
| T15 (CSRF) | Yes -- access token in Authorization header (unchanged), SameSite=Strict on refresh cookie | None |
| T22 (verification code leakage) | Partially -- ADR documents the requirement; implementation is a separate task | Codes still logged until code change is deployed |

### Threats NOT addressed by this ADR (addressed elsewhere)

| Threat | Document |
|---|---|
| T01 (fake payment proof) | [[Payment Provider Adapter Spec]] (future) |
| T02 (webhook replay) | [[Webhook Security Spec]] (future) |
| T03 (arbitrary socket room join) | Realtime Authorization Spec (future) |
| T05 (double payout) | [[Funds Ledger Specification]] (future) |
| T06 (dispute bypass) | Escrow State Machine (future) |
| T07 (email abuse) | Rate limiting implementation |
| T08 (AI cost abuse) | Rate limiting + auth implementation |
| T09 (admin privilege escalation) | [[Authorization Matrix]] + step-up auth (this ADR) |
| T11 (unauthenticated payment endpoints) | Auth middleware implementation |
| T12 (rate limit bypass) | Rate limiting implementation |
| T14 (supply-chain) | [[Secure Build and Supply-Chain Policy]] |
| T16 (deep-link tampering) | Telegram initData verification |
| T17 (provider outage) | Operational runbooks |
| T18 (insider manipulation) | Multi-sig wallet + funds ledger + two-person approval (this ADR) |
| T19 (price manipulation) | Offer status enforcement |
| T20 (delivery brute force) | Rate limiting + code entropy |
| T21 (data exfiltration) | Auth middleware implementation |
| T23 (state machine inconsistency) | Canonical state machine specification |

---

## Cross-references

- [[Threat Model - Amanat Escrow Platform]] -- T04, T10, T13, T15, T22
- [[Security Ownership and Launch Decision Criteria]] -- D-1 (cookie migration), D-4 (real WebAuthn), D-5 (session revocation), D-6 (admin step-up)
- [[Security Architecture]] -- current authentication implementation
- [[Authentication Flow]] -- current token lifecycle
- [[Passkey (WebAuthn) Flow]] -- current passkey implementation (stubbed)
- [[Google OAuth Flow]] -- current OAuth implementation
- [[Platform Logical Audit - 2026-05-24]] -- Findings 2, 8, 10, 12
- [[Backend Stack Security and Refactor Assessment - 2026-05-24]] -- Phase 0 hardening requirements

---

*This document was created on 2026-05-24 as part of the Taskmaster task 4 (authentication and session architecture) for the Amanat escrow platform. It must be reviewed by Backend Lead, Frontend Lead, and CTO before implementation begins. Changes to any decision in this document require sign-off per the RACI in [[Security Ownership and Launch Decision Criteria]] Section 1.*