--- title: Security Ownership and Launch Decision Criteria tags: [audit, security, governance, launch, raci] created: 2026-05-24 status: decision --- # Security Ownership and Launch Decision Criteria **Decision document.** Answers open questions 9 and 10 from [[Backend Stack Security and Refactor Assessment - 2026-05-24]]: who owns security decisions, and what must be true before public launch. This document is binding for the Amanat platform launch cycle. Changes require written sign-off from the roles listed in Section 1. --- ## 1. Security Ownership RACI Roles: **PO** = Product Owner, **BL** = Backend Lead, **DI** = DevOps/Infra, **FL** = Frontend Lead, **SO** = Security Owner (if designated), **CTO** = CTO/Leadership. R = Responsible (does the work), A = Accountable (final decision authority), C = Consulted, I = Informed. | Decision Area | PO | BL | DI | FL | SO | CTO | |---|---|---|---|---|---|---| | Authentication changes (token storage, session model, passkey scope) | I | R | C | C | A | I | | Payment/funds changes (ledger, state machine, release/refund logic) | C | R | I | I | A | I | | Provider integrations (SHKeeper, Request Network, new providers) | C | R | C | I | A | I | | Webhook handling (signature verification, idempotency, DLQ) | I | R | C | I | A | I | | Rate limiting (tiers, thresholds, enforcement points) | I | R | A | I | C | I | | Admin access (role definitions, step-up auth, audit logging) | C | R | I | C | C | A | | Dependency updates (lockfile policy, provenance, vulnerability triage) | I | R | A | C | C | I | | Incident response (runbook ownership, escalation, postmortem) | I | C | R | I | A | I | | Cross-cutting security architecture (service split, stack migration) | C | R | C | C | C | A | | External penetration testing (scope, timing, vendor selection) | I | C | C | I | R | A | ### RACI rules - If no Security Owner is designated, accountability for rows marked **SO** defaults to **CTO**. - **BL** is responsible for all implementation work on backend security items. **FL** is responsible for frontend-side changes (cookie migration, CSP hardening, token storage) and is consulted on rows that affect the frontend. - **DI** owns rate limiting configuration, dependency pipeline, and infrastructure-level controls. - A role marked **A** must approve in writing (PR review, doc sign-off, or Slack confirmation logged in the decision register) before the change ships. - Any role marked **R** or **A** can escalate to **CTO** for final arbitration. --- ## 2. Launch Safety Gate Checklist Each item is classified as: - **Required** -- blocks launch. Must be verified complete before any public-facing deployment. - **Strongly Recommended** -- should block launch. Can be accepted with a documented risk entry (risk description, owner, remediation deadline) signed by the accountable role from Section 1. - **Deferred** -- explicitly deferred to post-launch. Must appear in Section 5 (Deferred Decisions Register). ### 2.1 Authentication and Session Hardening | # | Condition | Classification | Source | |---|---|---|---| | 2.1.1 | All financial endpoints require Bearer JWT authentication | Required | [[Platform Logical Audit - 2026-05-24]] item 3 | | 2.1.2 | Ownership checks enforced on all `:userId` parameterized endpoints | Required | [[Platform Logical Audit - 2026-05-24]] item 3 | | 2.1.3 | Admin role checks enforced on all admin routes | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | | 2.1.4 | Test/demo payment and email endpoints disabled or auth-protected in production | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | | 2.1.5 | Access token lifetime reduced to 60 minutes or less | Strongly Recommended | [[Platform Logical Audit - 2026-05-24]] item 10 | | 2.1.6 | Refresh tokens moved to `httpOnly` cookies (or risk accepted with documented rationale) | Strongly Recommended | [[Security Architecture]] section 11 | | 2.1.7 | Passkey/WebAuthn disabled in production until real cryptographic implementation is complete | Required | [[Platform Logical Audit - 2026-05-24]] item 2 | | 2.1.8 | Passkey RP ID set to production domain (not `localhost`) | Required | [[Security Architecture]] section 2.3 | | 2.1.9 | Device/session revocation functional | Deferred | Post-launch auth hardening | ### 2.2 Payment and Funds Integrity | # | Condition | Classification | Source | |---|---|---|---| | 2.2.1 | Dispute creation enforces escrow hold (`disputed` state) that blocks release and refund | Required | [[Platform Logical Audit - 2026-05-24]] item 1 | | 2.2.2 | Web3 verification decodes Transfer event and validates recipient, token contract, and amount | Required | [[Platform Logical Audit - 2026-05-24]] item 4 | | 2.2.3 | Payment mutations route through centralized service methods only (no direct controller mutation) | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | | 2.2.4 | Release/refund eligibility enforced through escrow state, not controller-level flags | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] payment risks | | 2.2.5 | Seller cannot update offer price after acceptance | Strongly Recommended | [[Platform Logical Audit - 2026-05-24]] item 18 | | 2.2.6 | Immutable funds ledger operational for new payments | Deferred | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 2 | | 2.2.7 | Provider-neutral payment abstraction layer | Deferred | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 2 | | 2.2.8 | Payment state enums unified across data model, API, and flow documents | Required | [[Platform Logical Audit - 2026-05-24]] item 9 | ### 2.3 Authorization Enforcement | # | Condition | Classification | Source | |---|---|---|---| | 2.3.1 | Every endpoint mapped to required role (public, authenticated, owner, admin) in authorization matrix | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] doc requirement 4 | | 2.3.2 | `assertRole` or equivalent guard present in all admin and payment service methods | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | | 2.3.3 | Arbitrary `userId` from client no longer accepted for private data; server derives identity from JWT | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | ### 2.4 Rate Limiting | # | Condition | Classification | Source | |---|---|---|---| | 2.4.1 | Global rate limiting enabled | Required | [[Platform Logical Audit - 2026-05-24]] item 13 | | 2.4.2 | Auth endpoints: 5 req/5 min/IP | Required | [[Security Architecture]] section 9 | | 2.4.3 | Payment endpoints: 20 req/15 min/IP | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | | 2.4.4 | AI endpoints: 10 req/15 min/authenticated-user | Required | [[Platform Logical Audit - 2026-05-24]] item 3 | | 2.4.5 | File upload endpoints: 10 req/15 min/authenticated-user | Strongly Recommended | -- | | 2.4.6 | Delivery confirmation code: max 5 verification attempts per 15 min per request | Required | [[Platform Logical Audit - 2026-05-24]] item 8 | ### 2.5 Webhook Security | # | Condition | Classification | Source | |---|---|---|---| | 2.5.1 | SHKeeper webhook uses raw-body HMAC verification (not reconstructed JSON) | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] webhook risks | | 2.5.2 | Webhook handler is idempotent (duplicate delivery = no-op) | Required | [[Security Architecture]] section 5 | | 2.5.3 | Webhook returns proper HTTP codes: 400 for bad input, 500 for server error, 200 for success | Required | [[Platform Logical Audit - 2026-05-24]] item 11 | | 2.5.4 | Webhook failures logged to dead-letter storage or alerting channel | Strongly Recommended | [[Platform Logical Audit - 2026-05-24]] item 11 | | 2.5.5 | Provider callbacks create reconciliation events; do not directly release funds | Strongly Recommended | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] webhook risks | ### 2.6 Socket.IO Authorization | # | Condition | Classification | Source | |---|---|---|---| | 2.6.1 | Socket.IO room membership derived from authenticated socket identity, not client-supplied user IDs | Required | [[Platform Logical Audit - 2026-05-24]] item 12 | | 2.6.2 | Socket handshake requires valid JWT | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] realtime risks | ### 2.7 Supply-Chain Controls | # | Condition | Classification | Source | |---|---|---|---| | 2.7.1 | Lockfile reviewed and updated for known vulnerable packages (Multer <2.1.0, Axios compromise, TanStack compromise) | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] supply-chain risks | | 2.7.2 | `npm audit` / `yarn audit` run and all high/critical CVEs triaged | Required | [[Security Architecture]] section 12 | | 2.7.3 | CI install mode uses frozen lockfile | Strongly Recommended | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] doc requirement 10 | | 2.7.4 | No test/demo routes in production builds | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | ### 2.8 Monitoring, Alerting, and Runbooks | # | Condition | Classification | Source | |---|---|---|---| | 2.8.1 | Backend error monitoring active (Sentry or equivalent with source maps) | Strongly Recommended | [[Security Architecture]] section 12 | | 2.8.2 | Structured logging for payment state transitions (actor, target, before/after) | Strongly Recommended | [[Security Architecture]] section 10 | | 2.8.3 | Runbook exists for: failed webhook, duplicate payment, stuck release, compromised admin, leaked API key | Strongly Recommended | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] doc requirement 11 | | 2.8.4 | Alerting for: repeated webhook signature failures, unusual payment volume, admin actions on own disputes | Strongly Recommended | -- | ### 2.9 External Penetration Testing | # | Condition | Classification | Source | |---|---|---|---| | 2.9.1 | External pentest of payment + dispute + auth flows completed before general public launch | Strongly Recommended | [[Security Architecture]] section 12, Open question 9 | | 2.9.2 | Pentest findings triaged; all critical/high items resolved or risk-accepted before launch | Required (if pentest performed) | -- | ### 2.10 Infrastructure and Operations | # | Condition | Classification | Source | |---|---|---|---| | 2.10.1 | All dev-seeded credentials rotated | Required | [[Security Architecture]] section 12 | | 2.10.2 | `NODE_ENV=production` confirmed in production backend | Required | [[Security Architecture]] section 12 | | 2.10.3 | `NEXT_PUBLIC_IS_DEVELOPMENT` and `ENABLE_DEBUG` disabled in production | Required | [[Security Architecture]] section 12 | | 2.10.4 | Production Watchtower pinned to versioned tag (not `latest`) | Strongly Recommended | [[Platform Logical Audit - 2026-05-24]] item 27 | | 2.10.5 | Committed or publicly visible secrets rotated | Required | [[Backend Stack Security and Refactor Assessment - 2026-05-24]] Phase 0 | --- ## 3. Launch Priority Decision **Decision: launch prioritizes immediate hardening of the current Node/Express stack. Backend-core redesign is deferred to post-launch.** ### Rationale The audit findings in [[Backend Stack Security and Refactor Assessment - 2026-05-24]] and [[Platform Logical Audit - 2026-05-24]] identify the dominant risks as domain-level security failures, not framework-level weaknesses: 1. **The most dangerous issues are authorization and state-machine bugs**, not Node/Express itself. Unauthenticated financial endpoints, client-controlled socket room membership, missing dispute-escrow holds, and broken Web3 verification are independent of the backend language. 2. **A rewrite does not fix the core problems.** Moving to Go or Kotlin without first specifying the funds ledger, escrow state machine, and authorization matrix would transplant the same logic gaps into a new codebase. The audit explicitly states: "the larger issue is that the current backend mixes high-risk financial state transitions... in one Express application" -- but a rewrite that does not first solve the domain model problem is wasted effort. 3. **Hardening is faster.** The Phase 0 actions from [[Backend Stack Security and Refactor Assessment - 2026-05-24]] (disable unsafe routes, add auth checks, enable rate limiting, fix Web3 verification, fix Socket.IO auth) are discrete, testable tasks that can be completed in days, not months. 4. **The rewrite carries re-introduction risk.** The product has working business flows. A full or partial rewrite risks reintroducing escrow and payment bugs that have already been found and can be fixed in place. ### Concrete launch sequence | Phase | Work | Timeline | |---|---|---| | **Phase 0: Containment** | Complete all Required items from Section 2 checklist. Disable unsafe routes, add auth/ownership enforcement, enable rate limiting, fix dispute-escrow hold, fix Web3 verification, fix Socket.IO auth, disable passkeys, rotate secrets. | Immediate | | **Phase 1: Documentation** | Produce the 11 required documents listed in [[Backend Stack Security and Refactor Assessment - 2026-05-24]] (threat model, funds ledger spec, escrow state machine, authorization matrix, payment provider adapter spec, webhook security spec, session/auth architecture, realtime auth spec, migration plan, supply-chain policy, operational runbooks). | Parallel with Phase 0 | | **Phase 2: Controlled launch** | Public launch proceeds once all Required checklist items pass verification. Strongly Recommended items are either completed or have documented risk acceptances. | After Phase 0 | | **Phase 3: Payment/ledger extraction** | Build provider-neutral payment layer and immutable ledger. This is the first post-launch engineering priority. | Post-launch | | **Phase 4: Core migration evaluation** | Decide on Go/Kotlin backend-core rewrite based on team capacity, Phase 3 outcomes, and operational experience. No migration begins until Phase 3 is stable. | Post-launch, after Phase 3 | --- ## 4. External Penetration Testing Decision **Decision: yes, commission an external penetration test before general public launch.** ### Rationale - Amanat is a financial escrow platform handling crypto payments. The attack surface includes webhook processing, payment state machines, Web3 transaction verification, and fund release flows. This is materially different from a typical web application. - The audit identified critical findings (unauthenticated financial endpoints, Web3 verification bypass, dispute-escrow race condition) that an external tester would also find. An external pentest validates that the Phase 0 hardening actually closed these gaps. - Supply-chain compromise evidence from 2026 (Axios, TanStack, Express Multer) demonstrates active threat against the npm ecosystem the platform depends on. ### Timeline and scope | Attribute | Value | |---|---| | **When** | After Phase 0 hardening is complete, before Phase 2 public launch | | **Scope** | Payment flows (SHKeeper pay-in, Web3 verification, payout/release/refund), dispute/escrow state transitions, authentication (login, token refresh, OAuth, session management), admin operations, webhook handling, Socket.IO authorization | | **Out of scope** | Marketplace browsing/listing, blog, points/leaderboard, file upload (assessed via code review instead) | | **Depth** | Black-box or grey-box at tester's discretion, with access to API documentation and a funded test environment | | **Deliverable** | Report with severity ratings, reproduction steps, and remediation recommendations. Findings mapped to checklist items in Section 2. | | **Gate** | All critical and high findings must be resolved or risk-accepted (with CTO sign-off) before launch proceeds | ### If pentest is delayed or unavailable If the external pentest cannot be scheduled before the desired launch date, the following compensating controls must be in place: 1. Complete internal code review of all payment, auth, and webhook code paths by someone other than the original author. 2. Automated security test suite covering: unauthenticated access denial on all financial endpoints, webhook signature rejection, dispute-escrow hold enforcement, Web3 verification with wrong recipient/amount, Socket.IO unauthorized room join. 3. Documented risk acceptance signed by CTO acknowledging that external validation was not performed. --- ## 5. Deferred Decisions Register Every item deferred from the launch checklist is recorded here with an owner, risk statement, and decision deadline. | # | Decision | Risk | Owner | Decision Deadline | |---|---|---|---|---| | D-1 | Move access/refresh tokens from `localStorage` to `httpOnly` cookies | XSS in any frontend dependency or user-generated content leads to full session hijack. Access token at 60 min expiry limits window, but refresh token at 30 days is high value. | SO (or BL if no SO) | Within 30 days post-launch | | D-2 | Implement immutable funds ledger for new payments | Without a ledger, payment state is mutable and auditable only through application logs. Reconciliation depends on provider records. Overpayments, partial refunds, and fee calculations have no single source of truth. | BL | Phase 3 start (within 60 days post-launch) | | D-3 | Build provider-neutral payment abstraction layer | Current SHKeeper coupling means changing providers requires modifying core business logic. Provider-specific metadata may become canonical state by accident. | BL | Phase 3 start (within 60 days post-launch) | | D-4 | Implement real WebAuthn/passkey authentication | Passkeys remain disabled. Users limited to password + OAuth. No phishing-resistant second factor available. | BL | Within 90 days post-launch | | D-5 | Device and session revocation | Users cannot revoke individual sessions. Compromised refresh token remains valid until natural expiry or password change. | BL | Within 60 days post-launch | | D-6 | Admin step-up authentication for payouts and role changes | Admin with compromised session can approve payouts or escalate roles without additional verification. | CTO | Before platform processes real funds at volume | | D-7 | Production staging pipeline (replace Watchtower auto-deploy on `latest`) | Unvalidated images promoted to production. No health check gate, no rollback automation. | DI | Within 30 days post-launch | | D-8 | Frontend Docker image runtime configuration injection | Same image cannot be promoted across environments without rebuild. Increases risk of configuration drift or misbuilt production images. | FL | Within 45 days post-launch | | D-9 | Webhook dead-letter queue and structured failure alerting | Failed webhooks are silently swallowed. Reconciliation depends on manual monitoring or provider retry behavior. | BL | Within 30 days post-launch | | D-10 | Backend-core stack migration decision (Go, Kotlin, or remain TypeScript) | Continued npm supply-chain exposure for payment core. Express flexibility allows route-level exceptions to accumulate. Decision delayed until payment layer is stable and team capacity is assessed. | CTO | After Phase 3 stability milestone (target: 120 days post-launch) | | D-11 | Append-only audit log for payment/payout/role-change operations | Payment actions are logged via ad-hoc logger calls, not a tamper-evident audit trail. Required for dispute resolution and regulatory confidence. | BL | Within 45 days post-launch | | D-12 | ClamAV or equivalent virus scanning on user-uploaded files | Uploaded dispute evidence and attachments served to other users without content scanning. | DI | Within 60 days post-launch | ### Governance - The accountable owner for each deferred item is responsible for tracking progress and raising blockers. - Items past their decision deadline without resolution escalate to CTO. - This register is reviewed at each engineering standup or weekly review until all items are resolved or reassigned. --- ## Cross-references - [[Backend Stack Security and Refactor Assessment - 2026-05-24]] -- primary audit, open questions 9 and 10 - [[Platform Logical Audit - 2026-05-24]] -- detailed findings referenced in checklist items - [[Security Architecture]] -- current security architecture and pre-launch hardening checklist - [[PRD - Platform Audit Remediation Plan (2026-05-24)]] -- tactical remediation plan (if available)