Complete task 4 backend security architecture docs

2026-05-24 11:31:40 +04:00
parent 4cf5c49274
commit 6a451040d9
18 changed files with 1006 additions and 73 deletions
--- a/Audits/Authorization
+++ b/Audits/Authorization
@@ -37,7 +37,7 @@ reviewers: [backend, security]
 | **Admin** | Authenticated + `req.user.role === 'admin'`. All admin actions MUST be audit-logged. | `authenticateToken` + `roleGuard('admin')` or `authorizeRoles('admin')`. |
 | **Support** | Authenticated + `req.user.role === 'support'`. Read-only access to user data, dispute records, and chat. Can reset passwords and escalate to admin. Cannot modify financial records or release funds. | `authenticateToken` + `roleGuard('support')`. Controller must enforce read-only constraint. |
 | **Service** | Internal service-to-service calls. Authenticated via shared secret (`X-Internal-Secret` header) or restricted to localhost network. Not user-facing. | Custom middleware verifying internal header or `req.ip === '127.0.0.1'`. |
-| **Step-up** | Admin + re-authenticated within last 15 minutes (configurable). Required for: payout creation/release, role changes, large refunds (>$100), user deletion, admin-wallet signing. | `authenticateToken` + `roleGuard('admin')` + step-up timestamp check from Redis session. |
+| **Step-up** | Admin + re-authenticated within configured window (default 5 minutes). Required for high-risk admin actions (role changes, user deletion, payout/release, manual overrides, sensitive wallet operations). | `authenticateToken` + `roleGuard('admin')` + step-up timestamp check from Redis session. |
 | **HMAC** | No user auth. Verified via HMAC-SHA256 signature on raw body using `SHKEEPER_WEBHOOK_SECRET`. Signature-verified, not identity-verified. | `express.raw()` body parser + timing-safe HMAC comparison. |

 ---
@@ -83,6 +83,10 @@ reviewers: [backend, security]
 | AUTH-R022 | PUT | /api/auth/profile | Authenticated | Owner | None | Tier 3 | No | Auth enforced. | Authenticated | |
 | AUTH-R023 | POST | /api/auth/update-profile | Authenticated | Owner | None | Tier 3 | No | Auth enforced. Legacy alias. | Authenticated | Duplicate of R022. |
 | AUTH-R024 | DELETE | /api/auth/account | Authenticated | Owner | Password re-verified | Tier 3 | Yes | Auth + password required. | Authenticated + audit | Permanent deletion. |
+| AUTH-R025 | POST | /api/auth/step-up | Admin | None | Valid challenge context or credentials | Tier 6 | Yes | Not implemented | Admin + Step-up | Required by ADR for high-risk admin actions. Creates 5-minute elevated session in Redis. |
+| AUTH-R026 | GET | /api/auth/sessions | Authenticated | Owner | Current refresh session exists | Tier 3 | Yes | Not implemented | Authenticated | Returns active sessions with device, IP, and session age. |
+| AUTH-R027 | POST | /api/auth/revoke-session | Authenticated | Owner | Target session belongs to user | Tier 3 | Yes | Not implemented | Authenticated + audit | Revokes one session by sessionTokenHash. |
+| AUTH-R028 | POST | /api/auth/revoke-all-sessions | Authenticated | Owner | Multiple active sessions loaded | Tier 3 | Yes | Not implemented | Authenticated + audit | Revokes all sessions except current. |

 ### 2.2 User Routes

@@ -116,6 +120,14 @@ reviewers: [backend, security]
 | UADM-R012 | PATCH | /api/users/admin/:userId/password | Admin | None | Target user exists | Tier 6 | Yes | Inline role check. | Admin + Step-up + audit | Wipes all sessions. |
 | UADM-R013 | POST | /api/users/admin/:userId/resend-verification | Admin | None | User not already verified | Tier 6 | Yes | Inline role check. | Admin + audit | Triggers email. |

+### 2.3A Admin Approval Routes
+
+| ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
+|---|---|---|---|---|---|---|---|---|---|---|
+| APV-R001 | GET | /api/admin/approvals | Admin | None | None | Tier 6 | Yes | Not implemented | Admin + Step-up | Pending approval queue for high-value actions. |
+| APV-R002 | POST | /api/admin/approvals/{id}/confirm | Admin | None | Approval exists, status = PENDING, approver != creator | Tier 6 | Yes | Not implemented | Admin + Step-up + audit | Confirms pending approval and executes action. |
+| APV-R003 | POST | /api/admin/approvals/{id}/reject | Admin | None | Approval exists, status = PENDING | Tier 6 | Yes | Not implemented | Admin + Step-up + audit | Rejects pending approval and records reason. |
+
 ### 2.4 Address Routes

 | ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
@@ -259,16 +271,16 @@ reviewers: [backend, security]

 | ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
 |---|---|---|---|---|---|---|---|---|---|---|
-| REL-R001 | POST | /api/payment/shkeeper/:id/release | Admin | None | Payment funded; no active dispute (T06); escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit | Builds release tx payload. |
+| REL-R001 | POST | /api/payment/shkeeper/:id/release | Admin | None | Payment funded; no active dispute (T06); escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit, + two-person approval for payout > 1000 USD equivalent (see APV-R002/APV-R003) | Builds release tx payload. |
 | REL-R002 | POST | /api/payment/shkeeper/:id/release/confirm | Admin | None | Release tx pending; valid txHash | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit | Confirms release on-chain. |
-| REL-R003 | POST | /api/payment/shkeeper/:id/refund | Admin | None | Payment funded; no active dispute; escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit | Builds refund tx. |
+| REL-R003 | POST | /api/payment/shkeeper/:id/refund | Admin | None | Payment funded; no active dispute; escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit, + two-person approval for payout > 1000 USD equivalent (see APV-R002/APV-R003) | Builds refund tx. |
 | REL-R004 | POST | /api/payment/shkeeper/:id/refund/confirm | Admin | None | Refund tx pending; valid txHash | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit | Confirms refund on-chain. |

 ### 2.15 Payment Routes (SHKeeper Payout)

 | ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
 |---|---|---|---|---|---|---|---|---|---|---|
-| PO-R001 | POST | /api/payment/shkeeper/payout | Admin | None | No existing pending payout for same escrow | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit | Creates payout task. T05. |
+| PO-R001 | POST | /api/payment/shkeeper/payout | Admin | None | No existing pending payout for same escrow | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit, + two-person approval for payout > 1000 USD equivalent (see APV-R002/APV-R003) | Creates payout task. T05. |
 | PO-R002 | GET | /api/payment/shkeeper/payout/status/:taskId | Authenticated | Owner or Admin | Task exists | Tier 3 | No | Auth enforced. | Authenticated | Poll payout status. |
 | PO-R003 | POST | /api/payment/shkeeper/payout/webhook | HMAC | None | Signature valid | Tier 5 | Yes | HMAC verification. | HMAC + audit | Payout state changes. |

@@ -630,9 +642,10 @@ These gaps involve audit logging and presence tracking. They are important for o

 | Route Group | Endpoints |
 |---|---|
-| Auth | 24 |
+| Auth | 28 |
 | User | 9 |
 | User Admin | 13 |
+| Admin Approval | 3 |
 | Address | 5 |
 | Purchase Request | 18 |
 | Delivery Code | 4 |
@@ -656,7 +669,7 @@ These gaps involve audit logging and presence tracking. They are important for o
 | File | 9 |
 | Admin Cleanup | 7 |
 | System | 2 |
-| **Total REST Endpoints** | **248** |
+| **Total REST Endpoints** | **255** |

 ### Socket.IO Event Count

@@ -691,4 +704,4 @@ These gaps involve audit logging and presence tracking. They are important for o

 ---

-*This document was produced on 2026-05-24 as part of the Amanat authorization audit. It must be updated when: new endpoints are added, existing endpoint access levels change, new Socket.IO events are introduced, or the role model is extended. Implementation tasks should reference specific AUTH-R, USER-R, UADM-R, ADDR-R, PR-R, DC-R, OFF-R, TPL-R, SHOP-R, CAT-R, REV-R, PAY-R, SHK-R, REL-R, PO-R, DEC-R, MPAY-R, CHAT-R, NOTIF-R, DIS-R, AI-R, BLOG-R, PTS-R, FILE-R, ADM-R, SYS-R, and SOCK-E IDs from this matrix.*
+*This document was produced on 2026-05-24 as part of the Amanat authorization audit. It must be updated when: new endpoints are added, existing endpoint access levels change, new Socket.IO events are introduced, or the role model is extended. Implementation tasks should reference specific AUTH-R, USER-R, UADM-R, APV-R, ADDR-R, PR-R, DC-R, OFF-R, TPL-R, SHOP-R, CAT-R, REV-R, PAY-R, SHK-R, REL-R, PO-R, DEC-R, MPAY-R, CHAT-R, NOTIF-R, DIS-R, AI-R, BLOG-R, PTS-R, FILE-R, ADM-R, SYS-R, and SOCK-E IDs from this matrix.*
--- a/Audits/Backend
+++ b/Audits/Backend
@@ -0,0 +1,117 @@
+---
+title: Backend Core Stack Decision Record - 2026-05-24
+tags: [adr, architecture, backend]
+created: 2026-05-24
+status: approved
+reviewers: [CTO, backend, security]
+---
+
+# Backend Core Stack Decision Record - 2026-05-24
+
+## 1. Decision
+
+Keep the security-critical backend core on **TypeScript/Node** in the first 12 months.
+
+Do **not** perform a full greenfield rewrite before the payment/auth/escrow core is fully specified and observable.
+
+## 2. Why this stack (today)
+
+The highest current risk is not framework selection; it is **financial state correctness**.
+
+For the next phase, the team needs:
+
+- provider-neutral payment abstraction,
+- immutable funds ledger,
+- webhook hardening and reconciliation,
+- strict dispute hold behavior,
+- admin step-up controls,
+- production-grade operational runbooks.
+
+Moving to Go/Kotlin/Rust now would preserve existing risks while adding migration uncertainty and a delay in launch-readiness.
+
+TypeScript remains the fastest way to ship the required controls while keeping operational visibility and team velocity.
+
+## 3. Scope of extraction
+
+The “backend core” now means:
+
+- `Payment` orchestration and payout/release state transitions,
+- auth/session validation for financial actions,
+- webhook intake and reconciliation,
+- ledger-derived escrow eligibility checks,
+- admin-risk operations (payout/refund/adjustment).
+
+These modules stay in the same service boundary during migration-in-place, but all calls must go through:
+
+- [[Payment Provider Adapter Spec]]
+- [[Webhook Security Spec]]
+- [[Funds Ledger and Escrow State Machine Specification]]
+
+Non-core modules remain where they are:
+
+- marketplace browsing, templates, shop settings,
+- chat and notifications,
+- file uploads/downloads.
+
+## 4. Evaluation of alternatives
+
+### Go
+
+- **Pros:** smaller runtime and dependency surface, better static guarantees.
+- **Cons:** highest immediate migration cost, new operational tooling, delayed delivery of core money-movement correctness.
+
+### Kotlin/Java
+
+- **Pros:** strong enterprise ecosystem, mature auth/security libraries.
+- **Cons:** heavier stack and slower delivery for a small team.
+
+### Rust
+
+- **Pros:** high correctness potential.
+- **Cons:** steep delivery cost and limited team familiarity.
+
+### Keep TypeScript (selected)
+
+- **Pros:** existing team velocity, reduced migration risk, direct integration with current deployment and frontend contracts.
+- **Cons:** npm supply-chain risk remains; mitigated by [[Secure Build and Supply-Chain Policy]] and strict dependency policy.
+
+## 5. Migration and rollout plan
+
+1. **Phase A (this quarter):** lock down high-risk flows in TypeScript (ledger, adapter, webhook, auth/session, runbooks).
+2. **Phase B (next two quarters):** extract core services behind stable interfaces and add adapter-level contract tests.
+3. **Phase C (deferred):** evaluate Go/Kotlin pilot for payout+webhook worker only if:
+   - Phase A and B are stable for 60 days,
+   - team staffing supports dual-stack operations,
+   - audit requirements demand lower runtime dependency exposure.
+
+## 6. Non-goals
+
+- Full frontend rewrite.
+- New language migration without closed-loop reconciliation and signed-state invariants.
+- New provider support that bypasses the adapter contract.
+
+## 7. Rollback criteria
+
+- Any increase in incident rate above baseline +20% for 24h after migration activity.
+- Any unresolved ledger invariant violation (held + disputed + released + refunded mismatch).
+- Any provider outage recovery that requires non-operator-tuned workarounds.
+
+Rollback to prior TS implementation:
+
+- disable any split deployment feature flag,
+- switch `PAYMENT_ENABLED_PROVIDERS` back to legacy-only,
+- freeze new provider routing until incident review is complete,
+- complete post-incident update in this ADR.
+
+## 8. Ownership
+
+- **CTO:** final stack decision + dual-stack approvals.
+- **Backend Lead (BL):** contract and adapter enforcement.
+- **Security Lead (SL):** webhook/security acceptance criteria.
+- **DevOps Lead (DL):** deployment safety and rollback testing.
+
+## Related
+
+- [[Backend Stack Security and Refactor Assessment - 2026-05-24]]
+- [[Secure Build and Supply-Chain Policy]]
+- [[Backend Funds Migration and Operational Runbooks]]
--- a/Audits/Backend
+++ b/Audits/Backend
@@ -345,7 +345,7 @@ Should include:
 - Trust boundaries: browser, backend, database, Redis, provider APIs, wallet/RPC, admin UI, Socket.IO.
 - Abuse cases: fake payment proof, replayed webhook, arbitrary room join, stolen token, double payout, dispute bypass, email/AI abuse.

-### 2. Funds Ledger Specification
+### 2. Funds Ledger and Escrow State Machine Specification

 Purpose: make money movement auditable and provider-independent.

@@ -386,9 +386,7 @@ Should map every endpoint and socket event to:

 ### 5. Payment Provider Adapter Spec

-Purpose: decouple business logic from SHKeeper, Request Network, manual wallet flow, and future providers.
-
-Should define:
+Implemented as [[Payment Provider Adapter Spec]], including:

 - `createPayInIntent`
 - `getPayInStatus`
@@ -399,13 +397,11 @@ Should define:
 - `getPayoutStatus`
 - `searchProviderPayments`

-Provider-specific metadata should be namespaced and never become the canonical funds state.
+Provider-specific metadata is namespaced and never used as canonical funds state.

 ### 6. Webhook Security Spec

-Purpose: prevent forged, replayed, or silently failed provider events.
-
-Should define:
+Implemented as [[Webhook Security Spec]]:

 - Raw-body signature verification.
 - Accepted headers and algorithms.
@@ -434,6 +430,8 @@ Should define:

 ### 8. Realtime Authorization Spec

+Implemented as [[Realtime Authorization Spec]].
+
 Purpose: make Socket.IO events subject to the same security model as REST.

 Should define:
@@ -476,9 +474,7 @@ Should define:

 ### 11. Operational Runbooks

-Purpose: make security incidents and payment failures survivable.
-
-Should include:
+Implemented as [[Backend Funds Migration and Operational Runbooks]]:

 - Failed webhook.
 - Duplicate payment.
--- a/Audits/Payment
+++ b/Audits/Payment
@@ -0,0 +1,173 @@
+---
+title: Payment Provider Adapter Spec
+tags: [adapters, payments, specification, architecture]
+created: 2026-05-24
+status: advisory
+reviewers: [backend, security, product]
+---
+
+# Payment Provider Adapter Spec
+
+This specification standardizes how payment providers are plugged in so platform logic
+does not depend on SHKeeper or any single webhook implementation.
+
+The contract below replaces provider-specific branching in domain services and should
+be used by all pay-in, payout, release, and reconciliation logic.
+
+> Canonical implementation note: this is an advisory ADR for tasks in `task 4.6`.
+> It maps to [[Funds Ledger and Escrow State Machine Specification]] and [[Webhook Security Spec]].
+
+## 1. Core provider contract
+
+Implementations expose one typed adapter:
+
+```ts
+interface PaymentProviderAdapter {
+  readonly provider: "shkeeper" | "request_network" | "manual_wallet" | "admin_wallet" | string;
+
+  createPayInIntent(input: PayInIntentInput): Promise<PayInIntentResult>;
+  getPayInStatus(input: PayInStatusInput): Promise<PayInStatusResult>;
+  handleProviderWebhook(input: ProviderWebhookInput): Promise<ProviderWebhookResult>;
+  createHostedPaymentLink(input: HostedLinkInput): Promise<HostedLinkResult>;
+  createReleaseInstruction(input: ReleaseInstructionInput): Promise<ReleaseInstructionResult>;
+  createRefundInstruction(input: RefundInstructionInput): Promise<RefundInstructionResult>;
+  getPayoutStatus(input: PayoutStatusInput): Promise<PayoutStatusResult>;
+  searchProviderPayments(input: ProviderSearchInput): Promise<ProviderPaymentRecord[]>;
+}
+```
+
+All adapters must return a normalized result shape:
+
+```ts
+type NormalizedProviderStatus = "pending" | "processing" | "confirmed" | "completed" | "failed" | "cancelled" | "released" | "refunded";
+type NormalizedProviderEvent = {
+  providerPaymentId: string;
+  purchaseRequestId?: string;
+  requestId?: string;
+  providerReference?: string;
+  amount: string;      // decimal string
+  currency: string;
+  status: NormalizedProviderStatus;
+  transactionHash?: string;
+  providerEventType: string;
+  receivedAt: string;  // ISO timestamp
+  rawFingerprint: string; // provider payload hash
+};
+```
+
+## 2. Method semantics
+
+### 2.1 `createPayInIntent`
+
+- Create a provider-specific payment intent from a canonical request.
+- Must return:
+  - `providerPaymentId` (source of truth for future reconciliation),
+  - canonical `status`,
+  - `payInUrl` when redirect/payment-page flow is used,
+  - an expiry timestamp.
+- Must persist provider metadata under `payment.providerData.<provider>`.
+
+### 2.2 `getPayInStatus`
+
+- Query provider status for an existing intent.
+- Must map provider statuses into `NormalizedProviderStatus` and include a provider-specific raw snapshot.
+- Must be idempotent and side-effect free.
+
+### 2.3 `handleProviderWebhook`
+
+- Input must include raw body bytes, headers, provider identifier, and parsed envelope.
+- Must verify signatures before parsing business fields.
+- On success, emit canonical domain events and return an idempotency decision:
+  - `processed` for first apply,
+  - `duplicate` for replay,
+  - `ignored` for unknown payment / no-op transitions.
+
+### 2.4 `createHostedPaymentLink`
+
+- Return the user-visible payment URL + optional redirect/callback endpoints.
+- Should support provider aliases (for migration aliasing, e.g., `request-network` vs `request_network`).
+
+### 2.5 `createReleaseInstruction` and `createRefundInstruction`
+
+- Produce signed/payload instructions and pre-check:
+  - account/release eligibility,
+  - dispute hold not active,
+  - sufficient releasable balance (ledger-derived),
+  - admin approval requirements if configured.
+- Must never directly mutate release state.
+- Must be idempotent by `(paymentId, actionType)` where action type is `release|refund`.
+
+### 2.6 `getPayoutStatus`
+
+- Return state of pending/processing payout tasks and chain/on-chain confirmation status.
+- Return normalized status to domain services:
+  - `processing` for queued/broadcast not-finalized,
+  - `completed` for finalized payment,
+  - `failed` with provider error code when rejected.
+
+### 2.7 `searchProviderPayments`
+
+- Used for reconciliation and manual verification.
+- Must support:
+  - `providerPaymentId`/`requestId` lookup,
+  - time-window pagination,
+  - optional min/max amount filtering.
+- Must never be the primary source for state transitions without reconciliation checks.
+
+## 3. Routing and selection
+
+Provider selection follows environment-configured capability flags:
+
+- `PAYMENT_ENABLED_PROVIDERS` (comma-separated allowlist),
+- `PAYMENT_DEFAULT_PROVIDER` (read-first fallback),
+- `PAYMENT_ROLLBACK_PROVIDER` (read-only fallback target for cutbacks),
+- `PAYMENT_MODE`:
+  - `standard`: normal provider routing,
+  - `dry_run`: no writes, status-only,
+  - `read_only`: no new pay-in/intent writes.
+
+Selection rules:
+
+1. Validate provider support and provider license/credential validity.
+2. Route legacy requests to `shkeeper` when explicit migration window is active.
+3. For unknown `provider`, return a `400 Bad Request` with explicit operator-visible error code.
+4. If requested provider is disabled, return `409` with migration explanation and owner-visible hint for operator override.
+
+## 4. Canonical metadata contract
+
+Payment documents keep provider-specific data namespaced under:
+
+- `metadata.providers.<provider>.rawPayload`
+- `metadata.providers.<provider>.rawEvents[]`
+- `metadata.providers.<provider>.providerPaymentId`
+- `metadata.providers.<provider>.lastWebhookAt`
+
+Domain services must never read `metadata.providers.*` as mutable funds state. They must use ledger-derived balances and canonical status fields only.
+
+## 5. Error contract
+
+All adapter methods return standard failure modes:
+
+- `retryable: true` for transient provider errors (timeouts, 5xx, queue backpressure).
+- `retryable: false` for invalid payloads, invalid signatures, and authorization failures.
+- `errorCode` must be stable across retries for auditability.
+
+## 6. Test coverage required
+
+- Contract tests per adapter:
+  - `createPayInIntent`, status polling, webhook handling
+  - invalid/absent signature behavior
+  - duplicate webhook idempotency
+  - unknown payment reference behavior
+  - rollback selection and read-only mode behavior.
+- Reconciliation tests:
+  - provider backfill for missing payment references,
+  - status drift correction,
+  - duplicate/missing event merge.
+
+## Related
+
+- [[Webhook Security Spec]]
+- [[Funds Ledger and Escrow State Machine Specification]]
+- [[Backend Core Stack Decision Record - 2026-05-24]]
+- [[Backend Funds Migration and Operational Runbooks]]
--- a/Audits/Realtime
+++ b/Audits/Realtime
@@ -0,0 +1,153 @@
+---
+title: Realtime Authorization Spec
+tags: [security, realtime, socketio, authorization]
+created: 2026-05-24
+status: advisory
+---
+
+# Realtime Authorization Spec
+
+This document defines the target authorization model for Socket.IO events in the
+escrow platform. It closes Taskmaster subtask 4.4 alongside
+[[Authorization Matrix - REST and Socket.IO]].
+
+## 1. Decision
+
+Socket.IO must use the same trust boundary as REST:
+
+- every socket connection is authenticated during the handshake,
+- room membership is derived by the server,
+- clients cannot subscribe to rooms by supplying arbitrary user, request, chat,
+  seller, or buyer IDs,
+- server-to-client emissions are targeted to authorized rooms only,
+- sensitive payment, payout, dispute, delivery-code, and chat payloads are never
+  sent through global broadcasts.
+
+## 2. Handshake Authentication
+
+Client connects with an access token in `handshake.auth.token`.
+
+Server requirements:
+
+1. verify the JWT signature and standard claims,
+2. reject expired, malformed, revoked, or missing tokens,
+3. attach `{ userId, roles, sessionId, jti }` to `socket.data`,
+4. disconnect immediately when authentication fails,
+5. log authentication failures without recording token values.
+
+Refresh tokens are not accepted by Socket.IO. Clients must refresh through REST
+and reconnect with a fresh access token.
+
+## 3. Server-Derived Base Rooms
+
+On successful connection the server may join only rooms derivable from the
+authenticated principal:
+
+| Room | Eligibility | Source |
+|---|---|---|
+| `user-{userId}` | authenticated user | JWT subject |
+| `seller-{userId}` | authenticated user with seller role | JWT roles or user record |
+| `buyer-{userId}` | authenticated user with buyer role | JWT roles or user record |
+| `sellers` | authenticated seller | JWT roles or user record |
+| `buyers` | authenticated buyer | JWT roles or user record |
+
+Clients must not provide `userId`, `sellerId`, or `buyerId` to join these rooms.
+
+## 4. Resource Rooms
+
+Resource rooms require database authorization before join.
+
+| Room | Eligibility | Authorization Query |
+|---|---|---|
+| `request-{requestId}` | buyer, selected seller, assigned admin/moderator | purchase request participant check |
+| `chat-{chatId}` | chat participant or assigned support/admin user | chat participant check |
+| `dispute-{disputeId}` | dispute party, assigned moderator, admin | dispute participant/assignment check |
+| `template-checkout-{checkoutId}` | checkout owner or service-controlled UI session | checkout ownership check |
+
+Membership must be rechecked when ownership or state changes. If a request,
+chat, or dispute loses a participant, the server must remove that user's sockets
+from the associated room.
+
+## 5. Client Event Policy
+
+Allowed client-originated events:
+
+| Event | Required Authorization | Notes |
+|---|---|---|
+| `join-request-room` | participant check | May remain only as a request for server validation. |
+| `leave-request-room` | current membership | User may leave an allowed room. |
+| `join-chat-room` | participant check | May remain only as a request for server validation. |
+| `leave-chat-room` | current membership | User may leave an allowed room. |
+| `typing-start` / `typing-stop` | current `chat-{chatId}` membership | `userId` in payload is ignored; server derives sender. |
+
+Removed or deprecated client-originated events:
+
+| Event | Replacement |
+|---|---|
+| `join-user-room` | server auto-join on handshake |
+| `join-seller-room` / `join-buyer-room` | server auto-join from authenticated role |
+| `user-online` | server emits presence after authenticated connection |
+
+## 6. Emission Policy
+
+Server emissions must target the narrowest authorized room:
+
+| Data Class | Allowed Target |
+|---|---|
+| user notifications | `user-{recipientId}` |
+| buyer/seller offer updates | relevant `user-*`, `buyer-*`, `seller-*`, or `request-*` room |
+| payment status | buyer and seller user rooms, request room if both parties may see it |
+| payout status | seller user room and admin operations room only |
+| delivery code | seller user room only |
+| chat messages | `chat-{chatId}` |
+| dispute events | `dispute-{disputeId}` and assigned admin/moderator room |
+
+Global payment and payout events are prohibited because they expose financial
+metadata to unrelated users.
+
+## 7. Payload Rules
+
+- Never trust `userId`, `role`, `sellerId`, or `buyerId` from socket payloads.
+- Derive sender identity from `socket.data.userId`.
+- Do not emit delivery verification codes to buyer-visible rooms.
+- Redact wallet addresses, tx hashes, and provider references unless the target
+  user is a party to the transaction or an authorized operator.
+- Keep payload schemas consistent with REST read permissions.
+
+## 8. Rate Limiting and Audit
+
+Socket event rate limits:
+
+| Event Class | Limit |
+|---|---|
+| room join attempts | 30 per 15 minutes per user |
+| typing events | 120 per minute per socket |
+| chat message events | same policy as REST chat message creation |
+| failed authorization checks | 10 per 15 minutes per user, then disconnect |
+
+Audit log required for:
+
+- failed room authorization checks,
+- admin/moderator joins to dispute or request rooms,
+- attempts to join user/seller/buyer rooms for another principal,
+- global payment or payout emission rejection.
+
+## 9. Tests
+
+Minimum verification before launch:
+
+1. invalid or missing JWT cannot connect,
+2. user cannot join another user's `user-*`, `seller-*`, or `buyer-*` room,
+3. user cannot join a request/chat/dispute room without participant status,
+4. removed participant is evicted from the resource room,
+5. payment and payout events are not emitted globally,
+6. delivery code is emitted only to the seller,
+7. socket event rate limits disconnect abusive clients,
+8. audit events are written for denied room joins.
+
+## Related
+
+- [[Authorization Matrix - REST and Socket.IO]]
+- [[Threat Model - Amanat Escrow Platform]]
+- [[Session and Authentication Architecture Decision]]
+- [[Backend Stack Security and Refactor Assessment - 2026-05-24]]
--- a/Audits/Session
+++ b/Audits/Session
@@ -350,6 +350,16 @@ High-risk admin actions require re-authentication. Upon successful re-authentica
 8. Frontend retries the original high-risk action.
 9. The action proceeds.

+### Traceability to Authorization Matrix
+
+This matrix maps to:
+
+- `AUTH-R025` (`POST /api/auth/step-up`) for the step-up API entry point.
+- `AUTH-R026` (`GET /api/auth/sessions`), `AUTH-R027` (`POST /api/auth/revoke-session`), `AUTH-R028` (`POST /api/auth/revoke-all-sessions`) for session controls.
+- `APV-R001`, `APV-R002`, `APV-R003` for approval queue + confirm/reject workflow.
+
+Status: these rows are marked **Not implemented** in the matrix while this ADR remains in planning/rollout state.
+
 ### Two-person approval flow

 For actions requiring two-person approval:
@@ -659,19 +669,19 @@ If any migration step causes issues:

 | Threat | Document |
 |---|---|
-| T01 (fake payment proof) | [[Payment Provider Adapter Spec]] (future) |
-| T02 (webhook replay) | [[Webhook Security Spec]] (future) |
-| T03 (arbitrary socket room join) | Realtime Authorization Spec (future) |
-| T05 (double payout) | [[Funds Ledger Specification]] (future) |
-| T06 (dispute bypass) | Escrow State Machine (future) |
+| T01 (fake payment proof) | [[Funds Ledger and Escrow State Machine Specification]], [[Payment Provider Adapter Spec]] |
+| T02 (webhook replay) | [[Webhook Security Spec]] |
+| T03 (arbitrary socket room join) | [[Realtime Authorization Spec]] |
+| T05 (double payout) | [[Funds Ledger and Escrow State Machine Specification]] |
+| T06 (dispute bypass) | [[Funds Ledger and Escrow State Machine Specification]] |
 | T07 (email abuse) | Rate limiting implementation |
 | T08 (AI cost abuse) | Rate limiting + auth implementation |
-| T09 (admin privilege escalation) | [[Authorization Matrix]] + step-up auth (this ADR) |
+| T09 (admin privilege escalation) | [[Authorization Matrix - REST and Socket.IO]] + step-up auth (this ADR) |
 | T11 (unauthenticated payment endpoints) | Auth middleware implementation |
 | T12 (rate limit bypass) | Rate limiting implementation |
 | T14 (supply-chain) | [[Secure Build and Supply-Chain Policy]] |
 | T16 (deep-link tampering) | Telegram initData verification |
-| T17 (provider outage) | Operational runbooks |
+| T17 (provider outage) | [[Backend Funds Migration and Operational Runbooks]] |
 | T18 (insider manipulation) | Multi-sig wallet + funds ledger + two-person approval (this ADR) |
 | T19 (price manipulation) | Offer status enforcement |
 | T20 (delivery brute force) | Rate limiting + code entropy |
--- a/Audits/Task
+++ b/Audits/Task
@@ -0,0 +1,106 @@
+---
+title: Task 4 Backend Security Architecture Verification Report
+tags: [taskmaster, verification, security, backend]
+created: 2026-05-24
+status: complete
+---
+
+# Task 4 Backend Security Architecture Verification Report
+
+Taskmaster task 4 is complete as an advisory architecture and handoff package.
+The task defines how the backend security/refactor assessment is converted into
+implementation criteria without rewriting or disrupting the current backend
+model.
+
+## 1. Deliverable map
+
+| Taskmaster item | Deliverable |
+|---|---|
+| 4.1 Security ownership and launch criteria | [[Security Ownership and Launch Decision Criteria]] |
+| 4.2 Escrow platform threat model | [[Threat Model - Amanat Escrow Platform]] |
+| 4.3 Funds ledger and escrow state machine | [[Funds Ledger and Escrow State Machine Specification]] |
+| 4.4 REST and Socket.IO authorization matrix | [[Authorization Matrix - REST and Socket.IO]], [[Realtime Authorization Spec]] |
+| 4.5 Session, passkey, and admin step-up architecture | [[Session and Authentication Architecture Decision]] |
+| 4.6 Webhook security and payment adapter contracts | [[Webhook Security Spec]], [[Payment Provider Adapter Spec]] |
+| 4.7 Secure build and supply-chain policy | [[Secure Build and Supply-Chain Policy]] |
+| 4.8 Backend-core stack decision | [[Backend Core Stack Decision Record - 2026-05-24]] |
+| 4.9 Migration and operational runbooks | [[Backend Funds Migration and Operational Runbooks]] |
+
+## 2. Architecture decisions verified
+
+- The current TypeScript/Node backend remains the production delivery path for
+  the next security-hardening phase.
+- A full backend rewrite is explicitly out of scope until ledger, webhook,
+  provider, auth/session, and reconciliation contracts are stable and observable.
+- Payment providers are optional and provider-neutral behind adapter contracts.
+- Webhooks must use raw-body signature verification, replay prevention,
+  idempotency, and dead-letter capture.
+- Funds movement must be derived from the canonical ledger and escrow state
+  machine, not provider metadata.
+- Admin release, refund, payout, role, and destructive account operations require
+  step-up authentication and audit logging; high-risk payouts require
+  two-person approval.
+- Socket.IO room membership must be server-derived and authorization checked,
+  with global financial broadcasts prohibited.
+
+## 3. Verification commands
+
+Executed from `nick-doc` on 2026-05-24:
+
+```bash
+npx task-master show 4
+npx task-master set-status --id=4.6 --status=done
+npx task-master set-status --id=4.7 --status=done
+npx task-master set-status --id=4.8 --status=done
+npx task-master set-status --id=4.9 --status=done
+npx task-master set-status --id=4.3 --status=done
+npx task-master set-status --id=4.4 --status=done
+npx task-master set-status --id=4.5 --status=done
+npx task-master set-status --id=4 --status=done
+node - <<'NODE'
+const fs=require('fs');
+const data=JSON.parse(fs.readFileSync('.taskmaster/tasks/tasks.json','utf8'));
+const t=data.master.tasks.find(x=>String(x.id)==='4');
+console.log(JSON.stringify({
+  task:t.status,
+  subtasks:t.subtasks.map(s=>({id:s.id,status:s.status,title:s.title}))
+}, null, 2));
+NODE
+```
+
+A one-off Node link checker also parsed Task 4 wiki links and verified they
+resolve to markdown files; threat IDs such as `[[T05]]` were treated as allowed
+shorthand references.
+
+## 4. Verification result
+
+- Taskmaster JSON reports task 4 as `done`.
+- Taskmaster JSON reports subtasks 4.1 through 4.9 as `done`.
+- `Authorization Matrix - REST and Socket.IO` now links directly to [[Realtime Authorization Spec]].
+- Task 4 wiki links resolve to existing markdown files, excluding threat-ID
+  shorthand references such as `[[T05]]`.
+- Incident ownership in the task 4 runbook was replaced with explicit role owners
+  that can be mapped to named responders before production launch.
+- Remaining implementation tests belong to follow-up backend tasks because task
+  4 is a documentation and architecture handoff task.
+
+## 5. Follow-up implementation test requirements
+
+Implementation tasks derived from task 4 must include:
+
+- ledger invariant unit tests for every escrow transition,
+- payment provider adapter contract tests for SHKeeper, Request Network, manual
+  wallet, and disabled-provider modes,
+- webhook signature, replay, duplicate, and DLQ tests,
+- REST authorization tests for every gap listed in the authorization matrix,
+- Socket.IO handshake, room authorization, targeted emission, and rate-limit
+  tests,
+- session rotation, revocation, passkey disabled/enabled, and admin step-up
+  tests,
+- runbook drills for provider outage, leaked webhook secret, stuck release,
+  suspicious payment proof, and compromised admin.
+
+## 6. Residual risk
+
+This report verifies the Task 4 architecture package, not production behavior.
+Backend implementation work must still enforce these controls before launch.
--- a/Audits/Threat
+++ b/Audits/Threat
@@ -557,15 +557,15 @@ The following remediation documents (from the recommended documentation set in [

 | Remediation Document | Threats Addressed |
 |---|---|
-| Funds Ledger Specification | T05, T18, T23 |
-| Escrow State Machine | T06, T19, T23 |
+| Funds Ledger and Escrow State Machine Specification | T05, T18, T23 |
+| Funds Ledger and Escrow State Machine Specification | T06, T19, T23 |
 | Authorization Matrix | T09, T21 |
 | Webhook Security Spec | T02 |
 | Session and Auth Architecture | T04, T10, T13, T22 |
 | Realtime Authorization Spec | T03 |
 | Payment Provider Adapter Spec | T01, T11, T17 |
 | Secure Build and Supply-Chain Policy | T14 |
-| Operational Runbooks | T17 |
+| Backend Funds Migration and Operational Runbooks | T17 |

 ---

--- a/Audits/Webhook
+++ b/Audits/Webhook
@@ -0,0 +1,150 @@
+---
+title: Webhook Security Spec
+tags: [webhooks, security, audit, payments]
+created: 2026-05-24
+status: advisory
+reviewers: [backend, security, operations]
+---
+
+# Webhook Security Spec
+
+This document defines signed callback handling for all payment and payout providers.
+It closes the gaps in [[Security Architecture]] by turning webhook behavior into an explicit,
+auditable contract.
+
+The scope is inbound callbacks only:
+
+- SHKeeper pay-in (`/api/payment/shkeeper/webhook`)
+- SHKeeper payout (`/api/payment/shkeeper/payout/webhook`)
+- Request Network (`/api/payment/request-network/webhook`)
+- Manual/admin reconciliation channels (where applicable)
+
+## 1. Canonical event envelope
+
+All callbacks are normalized by [[Payment Provider Adapter Spec]] into:
+
+```ts
+type ProviderCallback = {
+  provider: "shkeeper" | "request_network" | "manual_wallet" | "admin_wallet" | string;
+  providerPaymentId: string;
+  purchaseRequestId?: string;
+  requestId?: string;
+  deliveryId?: string;
+  eventType: string;        // e.g., paid, payout_completed, status_update
+  status: string;           // provider-specific raw status
+  normalizedStatus: "pending" | "completed" | "failed" | "cancelled" | "released" | "refunded";
+  amount?: string;
+  currency?: string;
+  transactionHash?: string;
+  occurredAt?: string;      // ISO 8601 if provided
+  receivedAt: string;       // server-side receive time
+  rawFingerprint: string;    // sha256(raw_body)
+};
+```
+
+Callbacks are processed only through adapter entry points; provider-specific parsing remains private to the adapter.
+
+## 2. Signature verification
+
+### 2.1 Required mechanics
+
+- Verify signatures against raw request bytes, **before JSON parsing**.
+- Use constant-time comparison and short-circuit to 401/403 on mismatch.
+- Never disable verification outside local-only test tooling.
+- Store raw payload hash (`rawFingerprint`) for forensics and idempotency checks.
+
+### 2.2 Provider headers
+
+| Provider | Header(s) |
+|---|---|
+| SHKeeper | `x-shkeeper-signature` |
+| Request Network | `x-request-network-signature` |
+| Test override (local only) | explicitly documented in deployment notes, never in production |
+
+If expected signature header is absent or malformed, treat as a non-retryable client error.
+
+## 3. Replay prevention and idempotency
+
+For each callback store and enforce one of:
+
+- `deliveryId` + `provider` + `eventType`, or
+- `(providerPaymentId, normalizedStatus, provider)` when provider has no delivery id.
+
+Replay rules:
+
+- First successful write path = **processed**.
+- Same key seen again with no state change = **duplicate** (HTTP 200 response, no side effects).
+- Same key seen for different payload hash = **conflict** (HTTP 409, captured to DLQ).
+
+## 4. Unknown and duplicate behavior
+
+| Condition | Response | Side effects |
+|---|---|---|
+| Signature valid, unknown `providerPaymentId` | `200` (`unknown_payment`) in v1 mode / `404` in strict mode | no state write, record DLQ entry for operator review |
+| Known `providerPaymentId`, already terminal | `200` (`duplicate_terminal`) | no state write |
+| Known `providerPaymentId`, stale status transition | `200` (`duplicate_or_out_of_order`) | no state write |
+| Unknown signature | `401` | no state write |
+| Malformed payload | `400` | no state write |
+
+## 5. Retry semantics
+
+- Callback consumers (providers) may retry:
+  - transient network failures,
+  - 5xx/provider internal timeouts,
+  - explicit retryable status from endpoint.
+- Retry is triggered only on non-2xx codes for SHKeeper and Request Network.
+- Recommended handler mapping:
+  - `401/400` = do not retry (hard fail),
+  - `409` = do not retry until manual release,
+  - `500/503` = retry.
+
+## 6. Dead-letter and replay storage
+
+Persist all failed callbacks for at least 7 days in append-only storage:
+
+- `providerWebhookFailures`
+- key fields: `provider`, `deliveryId`, `providerPaymentId`, `requestPath`, `requestHeaders`, `rawFingerprint`, `statusCode`, `errorCode`, `attemptCount`, `nextRetryAt`, `rawBodyRef`, `createdAt`.
+- If storage is unavailable, fail closed and raise a high-severity ops alert.
+
+Retention policy:
+
+- 30 days for `success==true`,
+- 180 days for `unknown_payment`, `repeated_conflict`, `signature_failure`,
+- immediate alert if retry queue exceeds 500 entries for a provider.
+
+## 7. Alerting thresholds
+
+- `failed_webhook_count` over 1 minute:
+  - warning at `> 20`,
+  - critical at `> 100`.
+- signature failures:
+  - warning at `> 5` in 5 minutes,
+  - critical at `> 20` in 5 minutes.
+- duplicate ratio:
+  - warning if `duplicates / total >= 0.15` for 10 minutes.
+- dead-letter growth:
+  - warning at `+200` new entries/hour,
+  - critical at `+500`/hour.
+
+## 8. Required operator signals
+
+Webhook health checks should expose:
+
+- last-seen timestamp by provider,
+- delivery backlog depth,
+- per-status counters (`processed`, `duplicate`, `unknown`, `conflict`, `signature_failure`),
+- DLQ length and oldest entry age.
+
+## 9. Testing requirements
+
+- Signature bypass tests (must remain false in staging/prod),
+- replay/delivery-id duplicate tests,
+- malformed payload tests,
+- unknown payment tests,
+- non-terminal duplicate suppression tests.
+
+## Related
+
+- [[Payment Provider Adapter Spec]]
+- [[Error Codes]]
+- [[Backend Funds Migration and Operational Runbooks]]