Files

Siavash Sameni 4cf5c49274 docs(audit): align documentation with post-remediation backend reality

- Update data model enums to match backend models
- Update API reference auth requirements
- Add dispute module references and warning blocks
- Add 2026-05-24 audit remediation callout to Overview
- Generate task breakdowns and audit artifacts
- Add doc alignment report (.taskmaster/reports/)

2026-05-24 11:16:29 +04:00

20 KiB

Raw Permalink Blame History

title, tags, created, status

title

Security Ownership and Launch Decision Criteria

Decision document. Answers open questions 9 and 10 from Backend Stack Security and Refactor Assessment - 2026-05-24: who owns security decisions, and what must be true before public launch.

This document is binding for the Amanat platform launch cycle. Changes require written sign-off from the roles listed in Section 1.

1. Security Ownership RACI

Roles: PO = Product Owner, BL = Backend Lead, DI = DevOps/Infra, FL = Frontend Lead, SO = Security Owner (if designated), CTO = CTO/Leadership.

R = Responsible (does the work), A = Accountable (final decision authority), C = Consulted, I = Informed.

Decision Area	PO	BL	DI	FL	SO	CTO
Authentication changes (token storage, session model, passkey scope)	I	R	C	C	A	I
Payment/funds changes (ledger, state machine, release/refund logic)	C	R	I	I	A	I
Provider integrations (SHKeeper, Request Network, new providers)	C	R	C	I	A	I
Webhook handling (signature verification, idempotency, DLQ)	I	R	C	I	A	I
Rate limiting (tiers, thresholds, enforcement points)	I	R	A	I	C	I
Admin access (role definitions, step-up auth, audit logging)	C	R	I	C	C	A
Dependency updates (lockfile policy, provenance, vulnerability triage)	I	R	A	C	C	I
Incident response (runbook ownership, escalation, postmortem)	I	C	R	I	A	I
Cross-cutting security architecture (service split, stack migration)	C	R	C	C	C	A
External penetration testing (scope, timing, vendor selection)	I	C	C	I	R	A

RACI rules

If no Security Owner is designated, accountability for rows marked SO defaults to CTO.
BL is responsible for all implementation work on backend security items. FL is responsible for frontend-side changes (cookie migration, CSP hardening, token storage) and is consulted on rows that affect the frontend.
DI owns rate limiting configuration, dependency pipeline, and infrastructure-level controls.
A role marked A must approve in writing (PR review, doc sign-off, or Slack confirmation logged in the decision register) before the change ships.
Any role marked R or A can escalate to CTO for final arbitration.

2. Launch Safety Gate Checklist

Each item is classified as:

Required -- blocks launch. Must be verified complete before any public-facing deployment.
Strongly Recommended -- should block launch. Can be accepted with a documented risk entry (risk description, owner, remediation deadline) signed by the accountable role from Section 1.
Deferred -- explicitly deferred to post-launch. Must appear in Section 5 (Deferred Decisions Register).

2.1 Authentication and Session Hardening

#	Condition	Classification	Source
2.1.1	All financial endpoints require Bearer JWT authentication	Required	Platform Logical Audit - 2026-05-24 item 3
2.1.2	Ownership checks enforced on all `:userId` parameterized endpoints	Required	Platform Logical Audit - 2026-05-24 item 3
2.1.3	Admin role checks enforced on all admin routes	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0
2.1.4	Test/demo payment and email endpoints disabled or auth-protected in production	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0
2.1.5	Access token lifetime reduced to 60 minutes or less	Strongly Recommended	Platform Logical Audit - 2026-05-24 item 10
2.1.6	Refresh tokens moved to `httpOnly` cookies (or risk accepted with documented rationale)	Strongly Recommended	Security Architecture section 11
2.1.7	Passkey/WebAuthn disabled in production until real cryptographic implementation is complete	Required	Platform Logical Audit - 2026-05-24 item 2
2.1.8	Passkey RP ID set to production domain (not `localhost`)	Required	Security Architecture section 2.3
2.1.9	Device/session revocation functional	Deferred	Post-launch auth hardening

2.2 Payment and Funds Integrity

#	Condition	Classification	Source
2.2.1	Dispute creation enforces escrow hold (`disputed` state) that blocks release and refund	Required	Platform Logical Audit - 2026-05-24 item 1
2.2.2	Web3 verification decodes Transfer event and validates recipient, token contract, and amount	Required	Platform Logical Audit - 2026-05-24 item 4
2.2.3	Payment mutations route through centralized service methods only (no direct controller mutation)	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0
2.2.4	Release/refund eligibility enforced through escrow state, not controller-level flags	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 payment risks
2.2.5	Seller cannot update offer price after acceptance	Strongly Recommended	Platform Logical Audit - 2026-05-24 item 18
2.2.6	Immutable funds ledger operational for new payments	Deferred	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 2
2.2.7	Provider-neutral payment abstraction layer	Deferred	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 2
2.2.8	Payment state enums unified across data model, API, and flow documents	Required	Platform Logical Audit - 2026-05-24 item 9

2.3 Authorization Enforcement

#	Condition	Classification	Source
2.3.1	Every endpoint mapped to required role (public, authenticated, owner, admin) in authorization matrix	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 doc requirement 4
2.3.2	`assertRole` or equivalent guard present in all admin and payment service methods	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0
2.3.3	Arbitrary `userId` from client no longer accepted for private data; server derives identity from JWT	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0

2.4 Rate Limiting

#	Condition	Classification	Source
2.4.1	Global rate limiting enabled	Required	Platform Logical Audit - 2026-05-24 item 13
2.4.2	Auth endpoints: 5 req/5 min/IP	Required	Security Architecture section 9
2.4.3	Payment endpoints: 20 req/15 min/IP	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0
2.4.4	AI endpoints: 10 req/15 min/authenticated-user	Required	Platform Logical Audit - 2026-05-24 item 3
2.4.5	File upload endpoints: 10 req/15 min/authenticated-user	Strongly Recommended	--
2.4.6	Delivery confirmation code: max 5 verification attempts per 15 min per request	Required	Platform Logical Audit - 2026-05-24 item 8

2.5 Webhook Security

#	Condition	Classification	Source
2.5.1	SHKeeper webhook uses raw-body HMAC verification (not reconstructed JSON)	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 webhook risks
2.5.2	Webhook handler is idempotent (duplicate delivery = no-op)	Required	Security Architecture section 5
2.5.3	Webhook returns proper HTTP codes: 400 for bad input, 500 for server error, 200 for success	Required	Platform Logical Audit - 2026-05-24 item 11
2.5.4	Webhook failures logged to dead-letter storage or alerting channel	Strongly Recommended	Platform Logical Audit - 2026-05-24 item 11
2.5.5	Provider callbacks create reconciliation events; do not directly release funds	Strongly Recommended	Backend Stack Security and Refactor Assessment - 2026-05-24 webhook risks

2.6 Socket.IO Authorization

#	Condition	Classification	Source
2.6.1	Socket.IO room membership derived from authenticated socket identity, not client-supplied user IDs	Required	Platform Logical Audit - 2026-05-24 item 12
2.6.2	Socket handshake requires valid JWT	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 realtime risks

2.7 Supply-Chain Controls

#	Condition	Classification	Source
2.7.1	Lockfile reviewed and updated for known vulnerable packages (Multer <2.1.0, Axios compromise, TanStack compromise)	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 supply-chain risks
2.7.2	`npm audit` / `yarn audit` run and all high/critical CVEs triaged	Required	Security Architecture section 12
2.7.3	CI install mode uses frozen lockfile	Strongly Recommended	Backend Stack Security and Refactor Assessment - 2026-05-24 doc requirement 10
2.7.4	No test/demo routes in production builds	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0

2.8 Monitoring, Alerting, and Runbooks

#	Condition	Classification	Source
2.8.1	Backend error monitoring active (Sentry or equivalent with source maps)	Strongly Recommended	Security Architecture section 12
2.8.2	Structured logging for payment state transitions (actor, target, before/after)	Strongly Recommended	Security Architecture section 10
2.8.3	Runbook exists for: failed webhook, duplicate payment, stuck release, compromised admin, leaked API key	Strongly Recommended	Backend Stack Security and Refactor Assessment - 2026-05-24 doc requirement 11
2.8.4	Alerting for: repeated webhook signature failures, unusual payment volume, admin actions on own disputes	Strongly Recommended	--

2.9 External Penetration Testing

#	Condition	Classification	Source
2.9.1	External pentest of payment + dispute + auth flows completed before general public launch	Strongly Recommended	Security Architecture section 12, Open question 9
2.9.2	Pentest findings triaged; all critical/high items resolved or risk-accepted before launch	Required (if pentest performed)	--

2.10 Infrastructure and Operations

#	Condition	Classification	Source
2.10.1	All dev-seeded credentials rotated	Required	Security Architecture section 12
2.10.2	`NODE_ENV=production` confirmed in production backend	Required	Security Architecture section 12
2.10.3	`NEXT_PUBLIC_IS_DEVELOPMENT` and `ENABLE_DEBUG` disabled in production	Required	Security Architecture section 12
2.10.4	Production Watchtower pinned to versioned tag (not `latest`)	Strongly Recommended	Platform Logical Audit - 2026-05-24 item 27
2.10.5	Committed or publicly visible secrets rotated	Required	Backend Stack Security and Refactor Assessment - 2026-05-24 Phase 0

3. Launch Priority Decision

Decision: launch prioritizes immediate hardening of the current Node/Express stack. Backend-core redesign is deferred to post-launch.

Rationale

The audit findings in Backend Stack Security and Refactor Assessment - 2026-05-24 and Platform Logical Audit - 2026-05-24 identify the dominant risks as domain-level security failures, not framework-level weaknesses:

The most dangerous issues are authorization and state-machine bugs, not Node/Express itself. Unauthenticated financial endpoints, client-controlled socket room membership, missing dispute-escrow holds, and broken Web3 verification are independent of the backend language.
A rewrite does not fix the core problems. Moving to Go or Kotlin without first specifying the funds ledger, escrow state machine, and authorization matrix would transplant the same logic gaps into a new codebase. The audit explicitly states: "the larger issue is that the current backend mixes high-risk financial state transitions... in one Express application" -- but a rewrite that does not first solve the domain model problem is wasted effort.
Hardening is faster. The Phase 0 actions from Backend Stack Security and Refactor Assessment - 2026-05-24 (disable unsafe routes, add auth checks, enable rate limiting, fix Web3 verification, fix Socket.IO auth) are discrete, testable tasks that can be completed in days, not months.
The rewrite carries re-introduction risk. The product has working business flows. A full or partial rewrite risks reintroducing escrow and payment bugs that have already been found and can be fixed in place.

Concrete launch sequence

Phase	Work	Timeline
Phase 0: Containment	Complete all Required items from Section 2 checklist. Disable unsafe routes, add auth/ownership enforcement, enable rate limiting, fix dispute-escrow hold, fix Web3 verification, fix Socket.IO auth, disable passkeys, rotate secrets.	Immediate
Phase 1: Documentation	Produce the 11 required documents listed in Backend Stack Security and Refactor Assessment - 2026-05-24 (threat model, funds ledger spec, escrow state machine, authorization matrix, payment provider adapter spec, webhook security spec, session/auth architecture, realtime auth spec, migration plan, supply-chain policy, operational runbooks).	Parallel with Phase 0
Phase 2: Controlled launch	Public launch proceeds once all Required checklist items pass verification. Strongly Recommended items are either completed or have documented risk acceptances.	After Phase 0
Phase 3: Payment/ledger extraction	Build provider-neutral payment layer and immutable ledger. This is the first post-launch engineering priority.	Post-launch
Phase 4: Core migration evaluation	Decide on Go/Kotlin backend-core rewrite based on team capacity, Phase 3 outcomes, and operational experience. No migration begins until Phase 3 is stable.	Post-launch, after Phase 3

4. External Penetration Testing Decision

Decision: yes, commission an external penetration test before general public launch.

Rationale

Amanat is a financial escrow platform handling crypto payments. The attack surface includes webhook processing, payment state machines, Web3 transaction verification, and fund release flows. This is materially different from a typical web application.
The audit identified critical findings (unauthenticated financial endpoints, Web3 verification bypass, dispute-escrow race condition) that an external tester would also find. An external pentest validates that the Phase 0 hardening actually closed these gaps.
Supply-chain compromise evidence from 2026 (Axios, TanStack, Express Multer) demonstrates active threat against the npm ecosystem the platform depends on.

Timeline and scope

Attribute	Value
When	After Phase 0 hardening is complete, before Phase 2 public launch
Scope	Payment flows (SHKeeper pay-in, Web3 verification, payout/release/refund), dispute/escrow state transitions, authentication (login, token refresh, OAuth, session management), admin operations, webhook handling, Socket.IO authorization
Out of scope	Marketplace browsing/listing, blog, points/leaderboard, file upload (assessed via code review instead)
Depth	Black-box or grey-box at tester's discretion, with access to API documentation and a funded test environment
Deliverable	Report with severity ratings, reproduction steps, and remediation recommendations. Findings mapped to checklist items in Section 2.
Gate	All critical and high findings must be resolved or risk-accepted (with CTO sign-off) before launch proceeds

If pentest is delayed or unavailable

If the external pentest cannot be scheduled before the desired launch date, the following compensating controls must be in place:

Complete internal code review of all payment, auth, and webhook code paths by someone other than the original author.
Automated security test suite covering: unauthenticated access denial on all financial endpoints, webhook signature rejection, dispute-escrow hold enforcement, Web3 verification with wrong recipient/amount, Socket.IO unauthorized room join.
Documented risk acceptance signed by CTO acknowledging that external validation was not performed.

5. Deferred Decisions Register

Every item deferred from the launch checklist is recorded here with an owner, risk statement, and decision deadline.

#	Decision	Risk	Owner	Decision Deadline
D-1	Move access/refresh tokens from `localStorage` to `httpOnly` cookies	XSS in any frontend dependency or user-generated content leads to full session hijack. Access token at 60 min expiry limits window, but refresh token at 30 days is high value.	SO (or BL if no SO)	Within 30 days post-launch
D-2	Implement immutable funds ledger for new payments	Without a ledger, payment state is mutable and auditable only through application logs. Reconciliation depends on provider records. Overpayments, partial refunds, and fee calculations have no single source of truth.	BL	Phase 3 start (within 60 days post-launch)
D-3	Build provider-neutral payment abstraction layer	Current SHKeeper coupling means changing providers requires modifying core business logic. Provider-specific metadata may become canonical state by accident.	BL	Phase 3 start (within 60 days post-launch)
D-4	Implement real WebAuthn/passkey authentication	Passkeys remain disabled. Users limited to password + OAuth. No phishing-resistant second factor available.	BL	Within 90 days post-launch
D-5	Device and session revocation	Users cannot revoke individual sessions. Compromised refresh token remains valid until natural expiry or password change.	BL	Within 60 days post-launch
D-6	Admin step-up authentication for payouts and role changes	Admin with compromised session can approve payouts or escalate roles without additional verification.	CTO	Before platform processes real funds at volume
D-7	Production staging pipeline (replace Watchtower auto-deploy on `latest`)	Unvalidated images promoted to production. No health check gate, no rollback automation.	DI	Within 30 days post-launch
D-8	Frontend Docker image runtime configuration injection	Same image cannot be promoted across environments without rebuild. Increases risk of configuration drift or misbuilt production images.	FL	Within 45 days post-launch
D-9	Webhook dead-letter queue and structured failure alerting	Failed webhooks are silently swallowed. Reconciliation depends on manual monitoring or provider retry behavior.	BL	Within 30 days post-launch
D-10	Backend-core stack migration decision (Go, Kotlin, or remain TypeScript)	Continued npm supply-chain exposure for payment core. Express flexibility allows route-level exceptions to accumulate. Decision delayed until payment layer is stable and team capacity is assessed.	CTO	After Phase 3 stability milestone (target: 120 days post-launch)
D-11	Append-only audit log for payment/payout/role-change operations	Payment actions are logged via ad-hoc logger calls, not a tamper-evident audit trail. Required for dispute resolution and regulatory confidence.	BL	Within 45 days post-launch
D-12	ClamAV or equivalent virus scanning on user-uploaded files	Uploaded dispute evidence and attachments served to other users without content scanning.	DI	Within 60 days post-launch

Governance

The accountable owner for each deferred item is responsible for tracking progress and raising blockers.
Items past their decision deadline without resolution escalate to CTO.
This register is reviewed at each engineering standup or weekly review until all items are resolved or reassigned.

Cross-references

Backend Stack Security and Refactor Assessment - 2026-05-24 -- primary audit, open questions 9 and 10
Platform Logical Audit - 2026-05-24 -- detailed findings referenced in checklist items
Security Architecture -- current security architecture and pre-launch hardening checklist
PRD - Platform Audit Remediation Plan (2026-05-24) -- tactical remediation plan (if available)

20 KiB Raw Permalink Blame History

Security Ownership and Launch Decision Criteria

1. Security Ownership RACI

RACI rules

2. Launch Safety Gate Checklist

2.1 Authentication and Session Hardening

2.2 Payment and Funds Integrity

2.3 Authorization Enforcement

2.4 Rate Limiting

2.5 Webhook Security

2.6 Socket.IO Authorization

2.7 Supply-Chain Controls

2.8 Monitoring, Alerting, and Runbooks

2.9 External Penetration Testing

2.10 Infrastructure and Operations

3. Launch Priority Decision

Rationale

Concrete launch sequence

4. External Penetration Testing Decision

Rationale

Timeline and scope

If pentest is delayed or unavailable

5. Deferred Decisions Register

Governance

Cross-references

20 KiB

Raw Permalink Blame History