Complete task 4 backend security architecture docs

This commit is contained in:
Siavash Sameni
2026-05-24 11:31:40 +04:00
parent 4cf5c49274
commit 6a451040d9
18 changed files with 1006 additions and 73 deletions

View File

@@ -37,7 +37,7 @@ reviewers: [backend, security]
| **Admin** | Authenticated + `req.user.role === 'admin'`. All admin actions MUST be audit-logged. | `authenticateToken` + `roleGuard('admin')` or `authorizeRoles('admin')`. |
| **Support** | Authenticated + `req.user.role === 'support'`. Read-only access to user data, dispute records, and chat. Can reset passwords and escalate to admin. Cannot modify financial records or release funds. | `authenticateToken` + `roleGuard('support')`. Controller must enforce read-only constraint. |
| **Service** | Internal service-to-service calls. Authenticated via shared secret (`X-Internal-Secret` header) or restricted to localhost network. Not user-facing. | Custom middleware verifying internal header or `req.ip === '127.0.0.1'`. |
| **Step-up** | Admin + re-authenticated within last 15 minutes (configurable). Required for: payout creation/release, role changes, large refunds (>$100), user deletion, admin-wallet signing. | `authenticateToken` + `roleGuard('admin')` + step-up timestamp check from Redis session. |
| **Step-up** | Admin + re-authenticated within configured window (default 5 minutes). Required for high-risk admin actions (role changes, user deletion, payout/release, manual overrides, sensitive wallet operations). | `authenticateToken` + `roleGuard('admin')` + step-up timestamp check from Redis session. |
| **HMAC** | No user auth. Verified via HMAC-SHA256 signature on raw body using `SHKEEPER_WEBHOOK_SECRET`. Signature-verified, not identity-verified. | `express.raw()` body parser + timing-safe HMAC comparison. |
---
@@ -83,6 +83,10 @@ reviewers: [backend, security]
| AUTH-R022 | PUT | /api/auth/profile | Authenticated | Owner | None | Tier 3 | No | Auth enforced. | Authenticated | |
| AUTH-R023 | POST | /api/auth/update-profile | Authenticated | Owner | None | Tier 3 | No | Auth enforced. Legacy alias. | Authenticated | Duplicate of R022. |
| AUTH-R024 | DELETE | /api/auth/account | Authenticated | Owner | Password re-verified | Tier 3 | Yes | Auth + password required. | Authenticated + audit | Permanent deletion. |
| AUTH-R025 | POST | /api/auth/step-up | Admin | None | Valid challenge context or credentials | Tier 6 | Yes | Not implemented | Admin + Step-up | Required by ADR for high-risk admin actions. Creates 5-minute elevated session in Redis. |
| AUTH-R026 | GET | /api/auth/sessions | Authenticated | Owner | Current refresh session exists | Tier 3 | Yes | Not implemented | Authenticated | Returns active sessions with device, IP, and session age. |
| AUTH-R027 | POST | /api/auth/revoke-session | Authenticated | Owner | Target session belongs to user | Tier 3 | Yes | Not implemented | Authenticated + audit | Revokes one session by sessionTokenHash. |
| AUTH-R028 | POST | /api/auth/revoke-all-sessions | Authenticated | Owner | Multiple active sessions loaded | Tier 3 | Yes | Not implemented | Authenticated + audit | Revokes all sessions except current. |
### 2.2 User Routes
@@ -116,6 +120,14 @@ reviewers: [backend, security]
| UADM-R012 | PATCH | /api/users/admin/:userId/password | Admin | None | Target user exists | Tier 6 | Yes | Inline role check. | Admin + Step-up + audit | Wipes all sessions. |
| UADM-R013 | POST | /api/users/admin/:userId/resend-verification | Admin | None | User not already verified | Tier 6 | Yes | Inline role check. | Admin + audit | Triggers email. |
### 2.3A Admin Approval Routes
| ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| APV-R001 | GET | /api/admin/approvals | Admin | None | None | Tier 6 | Yes | Not implemented | Admin + Step-up | Pending approval queue for high-value actions. |
| APV-R002 | POST | /api/admin/approvals/{id}/confirm | Admin | None | Approval exists, status = PENDING, approver != creator | Tier 6 | Yes | Not implemented | Admin + Step-up + audit | Confirms pending approval and executes action. |
| APV-R003 | POST | /api/admin/approvals/{id}/reject | Admin | None | Approval exists, status = PENDING | Tier 6 | Yes | Not implemented | Admin + Step-up + audit | Rejects pending approval and records reason. |
### 2.4 Address Routes
| ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
@@ -259,16 +271,16 @@ reviewers: [backend, security]
| ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| REL-R001 | POST | /api/payment/shkeeper/:id/release | Admin | None | Payment funded; no active dispute (T06); escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit | Builds release tx payload. |
| REL-R001 | POST | /api/payment/shkeeper/:id/release | Admin | None | Payment funded; no active dispute (T06); escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit, + two-person approval for payout > 1000 USD equivalent (see APV-R002/APV-R003) | Builds release tx payload. |
| REL-R002 | POST | /api/payment/shkeeper/:id/release/confirm | Admin | None | Release tx pending; valid txHash | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit | Confirms release on-chain. |
| REL-R003 | POST | /api/payment/shkeeper/:id/refund | Admin | None | Payment funded; no active dispute; escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit | Builds refund tx. |
| REL-R003 | POST | /api/payment/shkeeper/:id/refund | Admin | None | Payment funded; no active dispute; escrowState=funded | Tier 6 | Yes | Auth enforced. Admin. NO dispute check. T06. | Admin + Step-up + dispute check + audit, + two-person approval for payout > 1000 USD equivalent (see APV-R002/APV-R003) | Builds refund tx. |
| REL-R004 | POST | /api/payment/shkeeper/:id/refund/confirm | Admin | None | Refund tx pending; valid txHash | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit | Confirms refund on-chain. |
### 2.15 Payment Routes (SHKeeper Payout)
| ID | Method | Path | Access Level | Ownership Check | State Preconditions | Rate-Limit Tier | Audit Log | Current State | Required State | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| PO-R001 | POST | /api/payment/shkeeper/payout | Admin | None | No existing pending payout for same escrow | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit | Creates payout task. T05. |
| PO-R001 | POST | /api/payment/shkeeper/payout | Admin | None | No existing pending payout for same escrow | Tier 6 | Yes | Auth enforced. Admin. | Admin + Step-up + audit, + two-person approval for payout > 1000 USD equivalent (see APV-R002/APV-R003) | Creates payout task. T05. |
| PO-R002 | GET | /api/payment/shkeeper/payout/status/:taskId | Authenticated | Owner or Admin | Task exists | Tier 3 | No | Auth enforced. | Authenticated | Poll payout status. |
| PO-R003 | POST | /api/payment/shkeeper/payout/webhook | HMAC | None | Signature valid | Tier 5 | Yes | HMAC verification. | HMAC + audit | Payout state changes. |
@@ -630,9 +642,10 @@ These gaps involve audit logging and presence tracking. They are important for o
| Route Group | Endpoints |
|---|---|
| Auth | 24 |
| Auth | 28 |
| User | 9 |
| User Admin | 13 |
| Admin Approval | 3 |
| Address | 5 |
| Purchase Request | 18 |
| Delivery Code | 4 |
@@ -656,7 +669,7 @@ These gaps involve audit logging and presence tracking. They are important for o
| File | 9 |
| Admin Cleanup | 7 |
| System | 2 |
| **Total REST Endpoints** | **248** |
| **Total REST Endpoints** | **255** |
### Socket.IO Event Count
@@ -691,4 +704,4 @@ These gaps involve audit logging and presence tracking. They are important for o
---
*This document was produced on 2026-05-24 as part of the Amanat authorization audit. It must be updated when: new endpoints are added, existing endpoint access levels change, new Socket.IO events are introduced, or the role model is extended. Implementation tasks should reference specific AUTH-R, USER-R, UADM-R, ADDR-R, PR-R, DC-R, OFF-R, TPL-R, SHOP-R, CAT-R, REV-R, PAY-R, SHK-R, REL-R, PO-R, DEC-R, MPAY-R, CHAT-R, NOTIF-R, DIS-R, AI-R, BLOG-R, PTS-R, FILE-R, ADM-R, SYS-R, and SOCK-E IDs from this matrix.*
*This document was produced on 2026-05-24 as part of the Amanat authorization audit. It must be updated when: new endpoints are added, existing endpoint access levels change, new Socket.IO events are introduced, or the role model is extended. Implementation tasks should reference specific AUTH-R, USER-R, UADM-R, APV-R, ADDR-R, PR-R, DC-R, OFF-R, TPL-R, SHOP-R, CAT-R, REV-R, PAY-R, SHK-R, REL-R, PO-R, DEC-R, MPAY-R, CHAT-R, NOTIF-R, DIS-R, AI-R, BLOG-R, PTS-R, FILE-R, ADM-R, SYS-R, and SOCK-E IDs from this matrix.*

View File

@@ -0,0 +1,117 @@
---
title: Backend Core Stack Decision Record - 2026-05-24
tags: [adr, architecture, backend]
created: 2026-05-24
status: approved
reviewers: [CTO, backend, security]
---
# Backend Core Stack Decision Record - 2026-05-24
## 1. Decision
Keep the security-critical backend core on **TypeScript/Node** in the first 12 months.
Do **not** perform a full greenfield rewrite before the payment/auth/escrow core is fully specified and observable.
## 2. Why this stack (today)
The highest current risk is not framework selection; it is **financial state correctness**.
For the next phase, the team needs:
- provider-neutral payment abstraction,
- immutable funds ledger,
- webhook hardening and reconciliation,
- strict dispute hold behavior,
- admin step-up controls,
- production-grade operational runbooks.
Moving to Go/Kotlin/Rust now would preserve existing risks while adding migration uncertainty and a delay in launch-readiness.
TypeScript remains the fastest way to ship the required controls while keeping operational visibility and team velocity.
## 3. Scope of extraction
The “backend core” now means:
- `Payment` orchestration and payout/release state transitions,
- auth/session validation for financial actions,
- webhook intake and reconciliation,
- ledger-derived escrow eligibility checks,
- admin-risk operations (payout/refund/adjustment).
These modules stay in the same service boundary during migration-in-place, but all calls must go through:
- [[Payment Provider Adapter Spec]]
- [[Webhook Security Spec]]
- [[Funds Ledger and Escrow State Machine Specification]]
Non-core modules remain where they are:
- marketplace browsing, templates, shop settings,
- chat and notifications,
- file uploads/downloads.
## 4. Evaluation of alternatives
### Go
- **Pros:** smaller runtime and dependency surface, better static guarantees.
- **Cons:** highest immediate migration cost, new operational tooling, delayed delivery of core money-movement correctness.
### Kotlin/Java
- **Pros:** strong enterprise ecosystem, mature auth/security libraries.
- **Cons:** heavier stack and slower delivery for a small team.
### Rust
- **Pros:** high correctness potential.
- **Cons:** steep delivery cost and limited team familiarity.
### Keep TypeScript (selected)
- **Pros:** existing team velocity, reduced migration risk, direct integration with current deployment and frontend contracts.
- **Cons:** npm supply-chain risk remains; mitigated by [[Secure Build and Supply-Chain Policy]] and strict dependency policy.
## 5. Migration and rollout plan
1. **Phase A (this quarter):** lock down high-risk flows in TypeScript (ledger, adapter, webhook, auth/session, runbooks).
2. **Phase B (next two quarters):** extract core services behind stable interfaces and add adapter-level contract tests.
3. **Phase C (deferred):** evaluate Go/Kotlin pilot for payout+webhook worker only if:
- Phase A and B are stable for 60 days,
- team staffing supports dual-stack operations,
- audit requirements demand lower runtime dependency exposure.
## 6. Non-goals
- Full frontend rewrite.
- New language migration without closed-loop reconciliation and signed-state invariants.
- New provider support that bypasses the adapter contract.
## 7. Rollback criteria
- Any increase in incident rate above baseline +20% for 24h after migration activity.
- Any unresolved ledger invariant violation (held + disputed + released + refunded mismatch).
- Any provider outage recovery that requires non-operator-tuned workarounds.
Rollback to prior TS implementation:
- disable any split deployment feature flag,
- switch `PAYMENT_ENABLED_PROVIDERS` back to legacy-only,
- freeze new provider routing until incident review is complete,
- complete post-incident update in this ADR.
## 8. Ownership
- **CTO:** final stack decision + dual-stack approvals.
- **Backend Lead (BL):** contract and adapter enforcement.
- **Security Lead (SL):** webhook/security acceptance criteria.
- **DevOps Lead (DL):** deployment safety and rollback testing.
## Related
- [[Backend Stack Security and Refactor Assessment - 2026-05-24]]
- [[Secure Build and Supply-Chain Policy]]
- [[Backend Funds Migration and Operational Runbooks]]

View File

@@ -345,7 +345,7 @@ Should include:
- Trust boundaries: browser, backend, database, Redis, provider APIs, wallet/RPC, admin UI, Socket.IO.
- Abuse cases: fake payment proof, replayed webhook, arbitrary room join, stolen token, double payout, dispute bypass, email/AI abuse.
### 2. Funds Ledger Specification
### 2. Funds Ledger and Escrow State Machine Specification
Purpose: make money movement auditable and provider-independent.
@@ -386,9 +386,7 @@ Should map every endpoint and socket event to:
### 5. Payment Provider Adapter Spec
Purpose: decouple business logic from SHKeeper, Request Network, manual wallet flow, and future providers.
Should define:
Implemented as [[Payment Provider Adapter Spec]], including:
- `createPayInIntent`
- `getPayInStatus`
@@ -399,13 +397,11 @@ Should define:
- `getPayoutStatus`
- `searchProviderPayments`
Provider-specific metadata should be namespaced and never become the canonical funds state.
Provider-specific metadata is namespaced and never used as canonical funds state.
### 6. Webhook Security Spec
Purpose: prevent forged, replayed, or silently failed provider events.
Should define:
Implemented as [[Webhook Security Spec]]:
- Raw-body signature verification.
- Accepted headers and algorithms.
@@ -434,6 +430,8 @@ Should define:
### 8. Realtime Authorization Spec
Implemented as [[Realtime Authorization Spec]].
Purpose: make Socket.IO events subject to the same security model as REST.
Should define:
@@ -476,9 +474,7 @@ Should define:
### 11. Operational Runbooks
Purpose: make security incidents and payment failures survivable.
Should include:
Implemented as [[Backend Funds Migration and Operational Runbooks]]:
- Failed webhook.
- Duplicate payment.

View File

@@ -0,0 +1,173 @@
---
title: Payment Provider Adapter Spec
tags: [adapters, payments, specification, architecture]
created: 2026-05-24
status: advisory
reviewers: [backend, security, product]
---
# Payment Provider Adapter Spec
This specification standardizes how payment providers are plugged in so platform logic
does not depend on SHKeeper or any single webhook implementation.
The contract below replaces provider-specific branching in domain services and should
be used by all pay-in, payout, release, and reconciliation logic.
> Canonical implementation note: this is an advisory ADR for tasks in `task 4.6`.
> It maps to [[Funds Ledger and Escrow State Machine Specification]] and [[Webhook Security Spec]].
## 1. Core provider contract
Implementations expose one typed adapter:
```ts
interface PaymentProviderAdapter {
readonly provider: "shkeeper" | "request_network" | "manual_wallet" | "admin_wallet" | string;
createPayInIntent(input: PayInIntentInput): Promise<PayInIntentResult>;
getPayInStatus(input: PayInStatusInput): Promise<PayInStatusResult>;
handleProviderWebhook(input: ProviderWebhookInput): Promise<ProviderWebhookResult>;
createHostedPaymentLink(input: HostedLinkInput): Promise<HostedLinkResult>;
createReleaseInstruction(input: ReleaseInstructionInput): Promise<ReleaseInstructionResult>;
createRefundInstruction(input: RefundInstructionInput): Promise<RefundInstructionResult>;
getPayoutStatus(input: PayoutStatusInput): Promise<PayoutStatusResult>;
searchProviderPayments(input: ProviderSearchInput): Promise<ProviderPaymentRecord[]>;
}
```
All adapters must return a normalized result shape:
```ts
type NormalizedProviderStatus = "pending" | "processing" | "confirmed" | "completed" | "failed" | "cancelled" | "released" | "refunded";
type NormalizedProviderEvent = {
providerPaymentId: string;
purchaseRequestId?: string;
requestId?: string;
providerReference?: string;
amount: string; // decimal string
currency: string;
status: NormalizedProviderStatus;
transactionHash?: string;
providerEventType: string;
receivedAt: string; // ISO timestamp
rawFingerprint: string; // provider payload hash
};
```
## 2. Method semantics
### 2.1 `createPayInIntent`
- Create a provider-specific payment intent from a canonical request.
- Must return:
- `providerPaymentId` (source of truth for future reconciliation),
- canonical `status`,
- `payInUrl` when redirect/payment-page flow is used,
- an expiry timestamp.
- Must persist provider metadata under `payment.providerData.<provider>`.
### 2.2 `getPayInStatus`
- Query provider status for an existing intent.
- Must map provider statuses into `NormalizedProviderStatus` and include a provider-specific raw snapshot.
- Must be idempotent and side-effect free.
### 2.3 `handleProviderWebhook`
- Input must include raw body bytes, headers, provider identifier, and parsed envelope.
- Must verify signatures before parsing business fields.
- On success, emit canonical domain events and return an idempotency decision:
- `processed` for first apply,
- `duplicate` for replay,
- `ignored` for unknown payment / no-op transitions.
### 2.4 `createHostedPaymentLink`
- Return the user-visible payment URL + optional redirect/callback endpoints.
- Should support provider aliases (for migration aliasing, e.g., `request-network` vs `request_network`).
### 2.5 `createReleaseInstruction` and `createRefundInstruction`
- Produce signed/payload instructions and pre-check:
- account/release eligibility,
- dispute hold not active,
- sufficient releasable balance (ledger-derived),
- admin approval requirements if configured.
- Must never directly mutate release state.
- Must be idempotent by `(paymentId, actionType)` where action type is `release|refund`.
### 2.6 `getPayoutStatus`
- Return state of pending/processing payout tasks and chain/on-chain confirmation status.
- Return normalized status to domain services:
- `processing` for queued/broadcast not-finalized,
- `completed` for finalized payment,
- `failed` with provider error code when rejected.
### 2.7 `searchProviderPayments`
- Used for reconciliation and manual verification.
- Must support:
- `providerPaymentId`/`requestId` lookup,
- time-window pagination,
- optional min/max amount filtering.
- Must never be the primary source for state transitions without reconciliation checks.
## 3. Routing and selection
Provider selection follows environment-configured capability flags:
- `PAYMENT_ENABLED_PROVIDERS` (comma-separated allowlist),
- `PAYMENT_DEFAULT_PROVIDER` (read-first fallback),
- `PAYMENT_ROLLBACK_PROVIDER` (read-only fallback target for cutbacks),
- `PAYMENT_MODE`:
- `standard`: normal provider routing,
- `dry_run`: no writes, status-only,
- `read_only`: no new pay-in/intent writes.
Selection rules:
1. Validate provider support and provider license/credential validity.
2. Route legacy requests to `shkeeper` when explicit migration window is active.
3. For unknown `provider`, return a `400 Bad Request` with explicit operator-visible error code.
4. If requested provider is disabled, return `409` with migration explanation and owner-visible hint for operator override.
## 4. Canonical metadata contract
Payment documents keep provider-specific data namespaced under:
- `metadata.providers.<provider>.rawPayload`
- `metadata.providers.<provider>.rawEvents[]`
- `metadata.providers.<provider>.providerPaymentId`
- `metadata.providers.<provider>.lastWebhookAt`
Domain services must never read `metadata.providers.*` as mutable funds state. They must use ledger-derived balances and canonical status fields only.
## 5. Error contract
All adapter methods return standard failure modes:
- `retryable: true` for transient provider errors (timeouts, 5xx, queue backpressure).
- `retryable: false` for invalid payloads, invalid signatures, and authorization failures.
- `errorCode` must be stable across retries for auditability.
## 6. Test coverage required
- Contract tests per adapter:
- `createPayInIntent`, status polling, webhook handling
- invalid/absent signature behavior
- duplicate webhook idempotency
- unknown payment reference behavior
- rollback selection and read-only mode behavior.
- Reconciliation tests:
- provider backfill for missing payment references,
- status drift correction,
- duplicate/missing event merge.
## Related
- [[Webhook Security Spec]]
- [[Funds Ledger and Escrow State Machine Specification]]
- [[Backend Core Stack Decision Record - 2026-05-24]]
- [[Backend Funds Migration and Operational Runbooks]]

View File

@@ -0,0 +1,153 @@
---
title: Realtime Authorization Spec
tags: [security, realtime, socketio, authorization]
created: 2026-05-24
status: advisory
---
# Realtime Authorization Spec
This document defines the target authorization model for Socket.IO events in the
escrow platform. It closes Taskmaster subtask 4.4 alongside
[[Authorization Matrix - REST and Socket.IO]].
## 1. Decision
Socket.IO must use the same trust boundary as REST:
- every socket connection is authenticated during the handshake,
- room membership is derived by the server,
- clients cannot subscribe to rooms by supplying arbitrary user, request, chat,
seller, or buyer IDs,
- server-to-client emissions are targeted to authorized rooms only,
- sensitive payment, payout, dispute, delivery-code, and chat payloads are never
sent through global broadcasts.
## 2. Handshake Authentication
Client connects with an access token in `handshake.auth.token`.
Server requirements:
1. verify the JWT signature and standard claims,
2. reject expired, malformed, revoked, or missing tokens,
3. attach `{ userId, roles, sessionId, jti }` to `socket.data`,
4. disconnect immediately when authentication fails,
5. log authentication failures without recording token values.
Refresh tokens are not accepted by Socket.IO. Clients must refresh through REST
and reconnect with a fresh access token.
## 3. Server-Derived Base Rooms
On successful connection the server may join only rooms derivable from the
authenticated principal:
| Room | Eligibility | Source |
|---|---|---|
| `user-{userId}` | authenticated user | JWT subject |
| `seller-{userId}` | authenticated user with seller role | JWT roles or user record |
| `buyer-{userId}` | authenticated user with buyer role | JWT roles or user record |
| `sellers` | authenticated seller | JWT roles or user record |
| `buyers` | authenticated buyer | JWT roles or user record |
Clients must not provide `userId`, `sellerId`, or `buyerId` to join these rooms.
## 4. Resource Rooms
Resource rooms require database authorization before join.
| Room | Eligibility | Authorization Query |
|---|---|---|
| `request-{requestId}` | buyer, selected seller, assigned admin/moderator | purchase request participant check |
| `chat-{chatId}` | chat participant or assigned support/admin user | chat participant check |
| `dispute-{disputeId}` | dispute party, assigned moderator, admin | dispute participant/assignment check |
| `template-checkout-{checkoutId}` | checkout owner or service-controlled UI session | checkout ownership check |
Membership must be rechecked when ownership or state changes. If a request,
chat, or dispute loses a participant, the server must remove that user's sockets
from the associated room.
## 5. Client Event Policy
Allowed client-originated events:
| Event | Required Authorization | Notes |
|---|---|---|
| `join-request-room` | participant check | May remain only as a request for server validation. |
| `leave-request-room` | current membership | User may leave an allowed room. |
| `join-chat-room` | participant check | May remain only as a request for server validation. |
| `leave-chat-room` | current membership | User may leave an allowed room. |
| `typing-start` / `typing-stop` | current `chat-{chatId}` membership | `userId` in payload is ignored; server derives sender. |
Removed or deprecated client-originated events:
| Event | Replacement |
|---|---|
| `join-user-room` | server auto-join on handshake |
| `join-seller-room` / `join-buyer-room` | server auto-join from authenticated role |
| `user-online` | server emits presence after authenticated connection |
## 6. Emission Policy
Server emissions must target the narrowest authorized room:
| Data Class | Allowed Target |
|---|---|
| user notifications | `user-{recipientId}` |
| buyer/seller offer updates | relevant `user-*`, `buyer-*`, `seller-*`, or `request-*` room |
| payment status | buyer and seller user rooms, request room if both parties may see it |
| payout status | seller user room and admin operations room only |
| delivery code | seller user room only |
| chat messages | `chat-{chatId}` |
| dispute events | `dispute-{disputeId}` and assigned admin/moderator room |
Global payment and payout events are prohibited because they expose financial
metadata to unrelated users.
## 7. Payload Rules
- Never trust `userId`, `role`, `sellerId`, or `buyerId` from socket payloads.
- Derive sender identity from `socket.data.userId`.
- Do not emit delivery verification codes to buyer-visible rooms.
- Redact wallet addresses, tx hashes, and provider references unless the target
user is a party to the transaction or an authorized operator.
- Keep payload schemas consistent with REST read permissions.
## 8. Rate Limiting and Audit
Socket event rate limits:
| Event Class | Limit |
|---|---|
| room join attempts | 30 per 15 minutes per user |
| typing events | 120 per minute per socket |
| chat message events | same policy as REST chat message creation |
| failed authorization checks | 10 per 15 minutes per user, then disconnect |
Audit log required for:
- failed room authorization checks,
- admin/moderator joins to dispute or request rooms,
- attempts to join user/seller/buyer rooms for another principal,
- global payment or payout emission rejection.
## 9. Tests
Minimum verification before launch:
1. invalid or missing JWT cannot connect,
2. user cannot join another user's `user-*`, `seller-*`, or `buyer-*` room,
3. user cannot join a request/chat/dispute room without participant status,
4. removed participant is evicted from the resource room,
5. payment and payout events are not emitted globally,
6. delivery code is emitted only to the seller,
7. socket event rate limits disconnect abusive clients,
8. audit events are written for denied room joins.
## Related
- [[Authorization Matrix - REST and Socket.IO]]
- [[Threat Model - Amanat Escrow Platform]]
- [[Session and Authentication Architecture Decision]]
- [[Backend Stack Security and Refactor Assessment - 2026-05-24]]

View File

@@ -350,6 +350,16 @@ High-risk admin actions require re-authentication. Upon successful re-authentica
8. Frontend retries the original high-risk action.
9. The action proceeds.
### Traceability to Authorization Matrix
This matrix maps to:
- `AUTH-R025` (`POST /api/auth/step-up`) for the step-up API entry point.
- `AUTH-R026` (`GET /api/auth/sessions`), `AUTH-R027` (`POST /api/auth/revoke-session`), `AUTH-R028` (`POST /api/auth/revoke-all-sessions`) for session controls.
- `APV-R001`, `APV-R002`, `APV-R003` for approval queue + confirm/reject workflow.
Status: these rows are marked **Not implemented** in the matrix while this ADR remains in planning/rollout state.
### Two-person approval flow
For actions requiring two-person approval:
@@ -659,19 +669,19 @@ If any migration step causes issues:
| Threat | Document |
|---|---|
| T01 (fake payment proof) | [[Payment Provider Adapter Spec]] (future) |
| T02 (webhook replay) | [[Webhook Security Spec]] (future) |
| T03 (arbitrary socket room join) | Realtime Authorization Spec (future) |
| T05 (double payout) | [[Funds Ledger Specification]] (future) |
| T06 (dispute bypass) | Escrow State Machine (future) |
| T01 (fake payment proof) | [[Funds Ledger and Escrow State Machine Specification]], [[Payment Provider Adapter Spec]] |
| T02 (webhook replay) | [[Webhook Security Spec]] |
| T03 (arbitrary socket room join) | [[Realtime Authorization Spec]] |
| T05 (double payout) | [[Funds Ledger and Escrow State Machine Specification]] |
| T06 (dispute bypass) | [[Funds Ledger and Escrow State Machine Specification]] |
| T07 (email abuse) | Rate limiting implementation |
| T08 (AI cost abuse) | Rate limiting + auth implementation |
| T09 (admin privilege escalation) | [[Authorization Matrix]] + step-up auth (this ADR) |
| T09 (admin privilege escalation) | [[Authorization Matrix - REST and Socket.IO]] + step-up auth (this ADR) |
| T11 (unauthenticated payment endpoints) | Auth middleware implementation |
| T12 (rate limit bypass) | Rate limiting implementation |
| T14 (supply-chain) | [[Secure Build and Supply-Chain Policy]] |
| T16 (deep-link tampering) | Telegram initData verification |
| T17 (provider outage) | Operational runbooks |
| T17 (provider outage) | [[Backend Funds Migration and Operational Runbooks]] |
| T18 (insider manipulation) | Multi-sig wallet + funds ledger + two-person approval (this ADR) |
| T19 (price manipulation) | Offer status enforcement |
| T20 (delivery brute force) | Rate limiting + code entropy |

View File

@@ -0,0 +1,106 @@
---
title: Task 4 Backend Security Architecture Verification Report
tags: [taskmaster, verification, security, backend]
created: 2026-05-24
status: complete
---
# Task 4 Backend Security Architecture Verification Report
Taskmaster task 4 is complete as an advisory architecture and handoff package.
The task defines how the backend security/refactor assessment is converted into
implementation criteria without rewriting or disrupting the current backend
model.
## 1. Deliverable map
| Taskmaster item | Deliverable |
|---|---|
| 4.1 Security ownership and launch criteria | [[Security Ownership and Launch Decision Criteria]] |
| 4.2 Escrow platform threat model | [[Threat Model - Amanat Escrow Platform]] |
| 4.3 Funds ledger and escrow state machine | [[Funds Ledger and Escrow State Machine Specification]] |
| 4.4 REST and Socket.IO authorization matrix | [[Authorization Matrix - REST and Socket.IO]], [[Realtime Authorization Spec]] |
| 4.5 Session, passkey, and admin step-up architecture | [[Session and Authentication Architecture Decision]] |
| 4.6 Webhook security and payment adapter contracts | [[Webhook Security Spec]], [[Payment Provider Adapter Spec]] |
| 4.7 Secure build and supply-chain policy | [[Secure Build and Supply-Chain Policy]] |
| 4.8 Backend-core stack decision | [[Backend Core Stack Decision Record - 2026-05-24]] |
| 4.9 Migration and operational runbooks | [[Backend Funds Migration and Operational Runbooks]] |
## 2. Architecture decisions verified
- The current TypeScript/Node backend remains the production delivery path for
the next security-hardening phase.
- A full backend rewrite is explicitly out of scope until ledger, webhook,
provider, auth/session, and reconciliation contracts are stable and observable.
- Payment providers are optional and provider-neutral behind adapter contracts.
- Webhooks must use raw-body signature verification, replay prevention,
idempotency, and dead-letter capture.
- Funds movement must be derived from the canonical ledger and escrow state
machine, not provider metadata.
- Admin release, refund, payout, role, and destructive account operations require
step-up authentication and audit logging; high-risk payouts require
two-person approval.
- Socket.IO room membership must be server-derived and authorization checked,
with global financial broadcasts prohibited.
## 3. Verification commands
Executed from `nick-doc` on 2026-05-24:
```bash
npx task-master show 4
npx task-master set-status --id=4.6 --status=done
npx task-master set-status --id=4.7 --status=done
npx task-master set-status --id=4.8 --status=done
npx task-master set-status --id=4.9 --status=done
npx task-master set-status --id=4.3 --status=done
npx task-master set-status --id=4.4 --status=done
npx task-master set-status --id=4.5 --status=done
npx task-master set-status --id=4 --status=done
node - <<'NODE'
const fs=require('fs');
const data=JSON.parse(fs.readFileSync('.taskmaster/tasks/tasks.json','utf8'));
const t=data.master.tasks.find(x=>String(x.id)==='4');
console.log(JSON.stringify({
task:t.status,
subtasks:t.subtasks.map(s=>({id:s.id,status:s.status,title:s.title}))
}, null, 2));
NODE
```
A one-off Node link checker also parsed Task 4 wiki links and verified they
resolve to markdown files; threat IDs such as `[[T05]]` were treated as allowed
shorthand references.
## 4. Verification result
- Taskmaster JSON reports task 4 as `done`.
- Taskmaster JSON reports subtasks 4.1 through 4.9 as `done`.
- `Authorization Matrix - REST and Socket.IO` now links directly to [[Realtime Authorization Spec]].
- Task 4 wiki links resolve to existing markdown files, excluding threat-ID
shorthand references such as `[[T05]]`.
- Incident ownership in the task 4 runbook was replaced with explicit role owners
that can be mapped to named responders before production launch.
- Remaining implementation tests belong to follow-up backend tasks because task
4 is a documentation and architecture handoff task.
## 5. Follow-up implementation test requirements
Implementation tasks derived from task 4 must include:
- ledger invariant unit tests for every escrow transition,
- payment provider adapter contract tests for SHKeeper, Request Network, manual
wallet, and disabled-provider modes,
- webhook signature, replay, duplicate, and DLQ tests,
- REST authorization tests for every gap listed in the authorization matrix,
- Socket.IO handshake, room authorization, targeted emission, and rate-limit
tests,
- session rotation, revocation, passkey disabled/enabled, and admin step-up
tests,
- runbook drills for provider outage, leaked webhook secret, stuck release,
suspicious payment proof, and compromised admin.
## 6. Residual risk
This report verifies the Task 4 architecture package, not production behavior.
Backend implementation work must still enforce these controls before launch.

View File

@@ -557,15 +557,15 @@ The following remediation documents (from the recommended documentation set in [
| Remediation Document | Threats Addressed |
|---|---|
| Funds Ledger Specification | T05, T18, T23 |
| Escrow State Machine | T06, T19, T23 |
| Funds Ledger and Escrow State Machine Specification | T05, T18, T23 |
| Funds Ledger and Escrow State Machine Specification | T06, T19, T23 |
| Authorization Matrix | T09, T21 |
| Webhook Security Spec | T02 |
| Session and Auth Architecture | T04, T10, T13, T22 |
| Realtime Authorization Spec | T03 |
| Payment Provider Adapter Spec | T01, T11, T17 |
| Secure Build and Supply-Chain Policy | T14 |
| Operational Runbooks | T17 |
| Backend Funds Migration and Operational Runbooks | T17 |
---

View File

@@ -0,0 +1,150 @@
---
title: Webhook Security Spec
tags: [webhooks, security, audit, payments]
created: 2026-05-24
status: advisory
reviewers: [backend, security, operations]
---
# Webhook Security Spec
This document defines signed callback handling for all payment and payout providers.
It closes the gaps in [[Security Architecture]] by turning webhook behavior into an explicit,
auditable contract.
The scope is inbound callbacks only:
- SHKeeper pay-in (`/api/payment/shkeeper/webhook`)
- SHKeeper payout (`/api/payment/shkeeper/payout/webhook`)
- Request Network (`/api/payment/request-network/webhook`)
- Manual/admin reconciliation channels (where applicable)
## 1. Canonical event envelope
All callbacks are normalized by [[Payment Provider Adapter Spec]] into:
```ts
type ProviderCallback = {
provider: "shkeeper" | "request_network" | "manual_wallet" | "admin_wallet" | string;
providerPaymentId: string;
purchaseRequestId?: string;
requestId?: string;
deliveryId?: string;
eventType: string; // e.g., paid, payout_completed, status_update
status: string; // provider-specific raw status
normalizedStatus: "pending" | "completed" | "failed" | "cancelled" | "released" | "refunded";
amount?: string;
currency?: string;
transactionHash?: string;
occurredAt?: string; // ISO 8601 if provided
receivedAt: string; // server-side receive time
rawFingerprint: string; // sha256(raw_body)
};
```
Callbacks are processed only through adapter entry points; provider-specific parsing remains private to the adapter.
## 2. Signature verification
### 2.1 Required mechanics
- Verify signatures against raw request bytes, **before JSON parsing**.
- Use constant-time comparison and short-circuit to 401/403 on mismatch.
- Never disable verification outside local-only test tooling.
- Store raw payload hash (`rawFingerprint`) for forensics and idempotency checks.
### 2.2 Provider headers
| Provider | Header(s) |
|---|---|
| SHKeeper | `x-shkeeper-signature` |
| Request Network | `x-request-network-signature` |
| Test override (local only) | explicitly documented in deployment notes, never in production |
If expected signature header is absent or malformed, treat as a non-retryable client error.
## 3. Replay prevention and idempotency
For each callback store and enforce one of:
- `deliveryId` + `provider` + `eventType`, or
- `(providerPaymentId, normalizedStatus, provider)` when provider has no delivery id.
Replay rules:
- First successful write path = **processed**.
- Same key seen again with no state change = **duplicate** (HTTP 200 response, no side effects).
- Same key seen for different payload hash = **conflict** (HTTP 409, captured to DLQ).
## 4. Unknown and duplicate behavior
| Condition | Response | Side effects |
|---|---|---|
| Signature valid, unknown `providerPaymentId` | `200` (`unknown_payment`) in v1 mode / `404` in strict mode | no state write, record DLQ entry for operator review |
| Known `providerPaymentId`, already terminal | `200` (`duplicate_terminal`) | no state write |
| Known `providerPaymentId`, stale status transition | `200` (`duplicate_or_out_of_order`) | no state write |
| Unknown signature | `401` | no state write |
| Malformed payload | `400` | no state write |
## 5. Retry semantics
- Callback consumers (providers) may retry:
- transient network failures,
- 5xx/provider internal timeouts,
- explicit retryable status from endpoint.
- Retry is triggered only on non-2xx codes for SHKeeper and Request Network.
- Recommended handler mapping:
- `401/400` = do not retry (hard fail),
- `409` = do not retry until manual release,
- `500/503` = retry.
## 6. Dead-letter and replay storage
Persist all failed callbacks for at least 7 days in append-only storage:
- `providerWebhookFailures`
- key fields: `provider`, `deliveryId`, `providerPaymentId`, `requestPath`, `requestHeaders`, `rawFingerprint`, `statusCode`, `errorCode`, `attemptCount`, `nextRetryAt`, `rawBodyRef`, `createdAt`.
- If storage is unavailable, fail closed and raise a high-severity ops alert.
Retention policy:
- 30 days for `success==true`,
- 180 days for `unknown_payment`, `repeated_conflict`, `signature_failure`,
- immediate alert if retry queue exceeds 500 entries for a provider.
## 7. Alerting thresholds
- `failed_webhook_count` over 1 minute:
- warning at `> 20`,
- critical at `> 100`.
- signature failures:
- warning at `> 5` in 5 minutes,
- critical at `> 20` in 5 minutes.
- duplicate ratio:
- warning if `duplicates / total >= 0.15` for 10 minutes.
- dead-letter growth:
- warning at `+200` new entries/hour,
- critical at `+500`/hour.
## 8. Required operator signals
Webhook health checks should expose:
- last-seen timestamp by provider,
- delivery backlog depth,
- per-status counters (`processed`, `duplicate`, `unknown`, `conflict`, `signature_failure`),
- DLQ length and oldest entry age.
## 9. Testing requirements
- Signature bypass tests (must remain false in staging/prod),
- replay/delivery-id duplicate tests,
- malformed payload tests,
- unknown payment tests,
- non-terminal duplicate suppression tests.
## Related
- [[Payment Provider Adapter Spec]]
- [[Error Codes]]
- [[Backend Funds Migration and Operational Runbooks]]