21 KiB
title, tags, created, status
| title | tags | created | status | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Backend Stack Security and Refactor Assessment |
|
2026-05-24 | advisory |
Backend Stack Security and Refactor Assessment
Purpose
This document records an advisory assessment of whether Amanat should keep the current Node/Express backend, harden it in place, or migrate at least the security-critical backend surface to another technology stack.
The conclusion is intentionally strategic rather than implementation-heavy. It should be used as input for architecture review, security planning, and refactor scoping.
Executive summary
Amanat is not a normal CRUD marketplace. It is a financial escrow platform with authentication, realtime communication, crypto payment intake, payout/release flows, provider webhooks, and dispute-sensitive fund movement.
The main security risk is not simply "Node is insecure." The larger issue is that the current backend mixes high-risk financial state transitions, webhook handling, realtime room membership, admin operations, test/demo endpoints, and ordinary marketplace APIs in one Express application.
Moving away from Node/Express may reduce npm supply-chain exposure and improve long-term auditability, but it will not automatically fix the most important risks. The immediate priority should be to define and enforce the correct security architecture:
- A canonical funds ledger.
- A strict escrow/payment/dispute state machine.
- Centralized authorization and ownership checks.
- Signed webhook handling with idempotency.
- Server-derived realtime authorization.
- Secure session handling.
- A provider-neutral payment abstraction.
Recommended approach:
- Harden the existing backend immediately.
- Define the target payment, ledger, and auth architecture in documentation.
- Extract or rewrite only the security-critical backend core if the team can support the new stack.
- Keep lower-risk marketplace, chat, notification, and dashboard APIs in TypeScript until the core is stable.
Default recommendation: do not rewrite the entire backend at once. If a rewrite is chosen, start with payment/auth/escrow core services, preferably in Go or Kotlin/Java, while preserving current product behavior behind stable API contracts.
Current system profile
Observed architecture:
- Frontend: Next.js, React, MUI, Web3, Socket.IO client.
- Backend: Express 5, TypeScript, Mongoose, Socket.IO, SHKeeper, Web3 transaction verification, SMTP, OpenAI integration.
- Storage: MongoDB and Redis, though Redis is not consistently used as a shared state authority for all security-sensitive flows.
- Realtime: Socket.IO rooms for user, buyer, seller, chat, and purchase-request updates.
- Payments: SHKeeper pay-in, SHKeeper payout, decentralized/Web3 payment verification, manual/admin payout paths.
- Docs: existing logical audit and remediation documents already identify several critical flaws.
The backend currently acts as:
- API server.
- Realtime server.
- Payment orchestrator.
- Webhook processor.
- Background-job runner.
- File upload server.
- Auth/session issuer.
- Admin operations surface.
That is too much responsibility in one process for a financial platform unless the architecture is very tightly controlled.
Code-backed security observations
These findings are consistent with the existing audit docs and representative source review.
Payment and funds risks
- Payment state is largely represented by mutable
Payment.statusandescrowStatefields rather than an immutable funds ledger. - Pay-in, manual confirmation, wallet monitoring, webhook handling, and payout flows can converge on the same records through different paths.
- Release/refund eligibility is not fully centralized around ledger invariants.
- The existing docs identify a dispute/escrow race: disputes do not reliably create an enforceable hold before release.
Paymentuses mixed/string-compatible references for some core links, reducing referential integrity and query safety.- Some payment mutation/history routes were exposed without sufficient authentication or ownership enforcement.
- Web3 verification has been documented as relying primarily on transaction receipt success rather than strict token, recipient, and amount verification.
Security implication: a backend stack change alone will not fix this. The platform needs a funds ledger and state machine first.
Authentication and session risks
- Browser tokens are stored in
localStorage, increasing impact from XSS. - Passkey/WebAuthn behavior is described in the audit docs as stubbed/incomplete and challenge storage is process-local.
- Refresh-token behavior differs between auth paths.
- Admin-sensitive routes need explicit role enforcement, not just authentication.
Security implication: migration should include a session architecture decision, not just a framework change.
Realtime risks
- Socket.IO room joins are client-driven by IDs such as
join-user-room,join-buyer-room, andjoin-seller-room. - The server should derive room membership from authenticated socket identity, not trust client-supplied user IDs.
Security implication: realtime authorization needs to be treated like API authorization.
Rate limiting and abuse controls
- Global rate limiting is explicitly disabled in the Express app.
- Sensitive paths need tiered limits: auth, verification, file upload, AI, payment, webhook, chat.
- AI endpoints and email endpoints can create cost or abuse exposure if not authenticated and rate-limited.
Security implication: this is an immediate hardening task regardless of backend stack.
Webhook and provider risks
- Webhooks must be verified using raw-body signatures, not reconstructed JSON when signatures depend on raw bytes.
- Webhook delivery must be idempotent.
- Unknown, duplicate, malformed, and failed webhooks should be visible in structured records or dead-letter storage.
- Provider callbacks should create reconciliation events, not directly release funds.
Security implication: payment provider integration should be isolated behind a provider-neutral service contract.
Supply-chain risks
The Node/npm ecosystem has real and recurring supply-chain risk. For this codebase, that risk matters because both frontend and backend depend heavily on npm packages.
Relevant 2026 context:
- Express published February 2026 security releases, including high-severity Multer issues affecting versions before 2.1.0. The backend manifest currently specifies
multer: ^2.0.2, so the resolved lockfile version should be reviewed and updated if necessary. - Node.js published March 2026 security releases across active release lines.
- Microsoft reported an Axios npm supply-chain compromise in March 2026. This project uses Axios on frontend and backend.
- TanStack published a May 2026 npm compromise postmortem. This project uses
@tanstack/react-query.
References:
- Express security release, 2026-02-27: https://expressjs.com/2026/02/27/security-releases.html
- Node.js March 2026 security releases: https://nodejs.org/en/blog/vulnerability/march-2026-security-releases
- Microsoft on Axios npm supply-chain compromise: https://www.microsoft.com/en-us/security/blog/2026/04/01/mitigating-the-axios-npm-supply-chain-compromise/
- TanStack npm supply-chain compromise postmortem: https://tanstack.com/blog/npm-supply-chain-compromise-postmortem
Security implication: npm supply-chain controls are required even if the backend is rewritten, because the frontend remains npm-based.
Should the backend move away from Node/Express?
Reasons to keep and harden first
- The product already exists and has working business flows.
- A full rewrite risks reintroducing escrow/payment bugs.
- The most dangerous issues are domain/state/authorization issues, not syntax or framework issues.
- Hardening can reduce immediate exposure faster than a rewrite.
- The team may currently be more productive in TypeScript.
Reasons to migrate at least the backend core
- Financial backend code benefits from a smaller, stricter dependency footprint.
- Payment, ledger, webhook, and payout flows need strong invariants and auditability.
- Express makes it easy to accumulate route-level exceptions, test endpoints, and inconsistent middleware.
- Node/npm supply-chain exposure is material and recurring.
- TypeScript runtime enforcement is limited unless paired with strict schema validation everywhere.
- A separate payment core can be more easily audited, threat-modeled, tested, and locked down.
Balanced conclusion
It is security-wise reasonable to move the highest-risk backend core away from Node/Express, but only after the target security model is specified.
Do not begin with a full product rewrite. Begin with a security-critical core extraction:
- Auth/session/token authority.
- Payment intent creation.
- Provider webhook processing.
- Funds ledger and reconciliation.
- Release/refund/dispute-hold enforcement.
- Admin payout approval and audit logging.
Keep lower-risk modules in the current stack until the core is stable:
- Marketplace browsing/listing.
- Request templates.
- Chat and notifications, after socket authorization fixes.
- Admin dashboard reads.
- File upload, after hardening or moving to object storage.
Stack options
Go
Best fit if the team wants a smaller, operationally simple, security-focused payment core.
Strengths:
- Small binaries and deployment footprint.
- Lower dependency surface than typical Node services.
- Strong standard library for HTTP, crypto, JSON, and concurrency.
- Good fit for webhook receivers, ledger services, workers, and reconciliation jobs.
- Easy to run static analysis and produce reproducible builds.
Weaknesses:
- Less ergonomic than TypeScript for rapid product iteration.
- Requires team comfort with Go idioms.
- API/schema generation must be designed deliberately.
Assessment: recommended first choice for a payment/ledger/auth core if the team can maintain Go.
Kotlin/Java with Spring Boot
Best fit if the team wants enterprise-grade structure, mature auth patterns, and strong ecosystem support.
Strengths:
- Mature security and validation ecosystem.
- Strong typing and tooling.
- Good for complex domain services and audit-heavy systems.
- Well-understood operational patterns.
Weaknesses:
- Heavier runtime and framework footprint.
- More ceremony.
- Slower iteration for a small team.
Assessment: strong choice for a larger engineering team or enterprise-style compliance needs.
Rust
Best fit if maximum memory safety and correctness are worth slower delivery.
Strengths:
- Strong compile-time safety.
- Good for cryptographic and high-assurance components.
- Very low runtime footprint.
Weaknesses:
- Higher implementation cost.
- Smaller hiring pool.
- Web API development may be slower.
Assessment: attractive for narrow cryptographic or transaction-verification components, but probably too costly for the whole backend unless the team is already strong in Rust.
Python/FastAPI
Best fit if rapid backend development and clean API typing are more important than strict compile-time guarantees.
Strengths:
- Fast development.
- Good validation with Pydantic.
- Good for admin tools and internal services.
Weaknesses:
- Supply-chain risk remains.
- Runtime typing and async behavior require discipline.
- Less compelling than Go/Kotlin for a financial core.
Assessment: acceptable for internal services, not the preferred payment-core target.
Continue TypeScript/Node with stronger architecture
Best fit if team capacity cannot support another backend language yet.
Required conditions:
- Strict route registration policy.
- Runtime validation on every boundary.
- No test/demo routes in production builds.
- Full lockfile and package provenance controls.
- Centralized auth, ownership, and role guards.
- Ledger-first payment architecture.
- Secure cookies or a documented token-storage risk acceptance.
- Socket auth middleware.
- Redis-backed challenge/idempotency/rate-limit storage.
Assessment: viable short term, but the security bar must be raised significantly.
Recommended target architecture
Phase 0: Immediate containment
Goal: reduce current high-risk exposure without broad redesign.
Actions:
- Disable or protect test/demo payment and email endpoints in production.
- Require authentication and ownership checks on all payment, notification, AI, and file routes.
- Re-enable rate limiting with stricter limits on auth, payment, AI, file upload, and webhook paths.
- Add admin role checks to admin routes.
- Stop accepting arbitrary
userIdfrom clients for private data. - Validate all payment mutations through centralized service methods.
- Lock Socket.IO room membership to server-verified identity.
- Review and update lockfiles for known vulnerable packages.
- Rotate any committed or publicly visible secrets.
Phase 1: Architecture specification
Goal: define the new security model before implementation.
Documents to produce are listed in the "Required documentation" section below.
Phase 2: Payment and ledger extraction
Goal: move funds logic behind a provider-neutral service.
Introduce:
FundsAccountLedgerEntryFundsBalancePaymentIntentPaymentProviderEventReleaseInstructionRefundInstructionDisputeHold
Key rule: provider webhooks do not directly release funds. They create verified events and ledger entries.
Phase 3: Backend-core rewrite or service split
Goal: decide whether the extracted core remains TypeScript or moves to Go/Kotlin.
Recommended split:
core-payments: payment intent, webhook, ledger, release/refund, reconciliation.core-auth: sessions, passkeys, OAuth, token issuance, session revocation.marketplace-api: purchase requests, offers, categories, templates.realtime-api: chat, notifications, socket rooms.
The split can be logical first, physical later.
Phase 4: Full migration only if justified
Goal: avoid rewriting stable lower-risk product surfaces prematurely.
Only consider full backend migration after:
- Payment core is stable.
- Auth/session model is stable.
- API contracts are documented and tested.
- Legacy payment records are migrated or safely read-only.
- Team has demonstrated production maintenance ability in the new stack.
Required documentation before refactor
1. Threat Model
Purpose: identify what must be protected and how it can be attacked.
Should include:
- Assets: user accounts, admin accounts, wallet addresses, payment records, funds, webhook secrets, API keys, private notifications.
- Actors: buyer, seller, admin, support, unauthenticated attacker, compromised user, compromised admin, provider, malicious webhook sender.
- Trust boundaries: browser, backend, database, Redis, provider APIs, wallet/RPC, admin UI, Socket.IO.
- Abuse cases: fake payment proof, replayed webhook, arbitrary room join, stolen token, double payout, dispute bypass, email/AI abuse.
2. Funds Ledger Specification
Purpose: make money movement auditable and provider-independent.
Should define:
- Account model per purchase request/order.
- Immutable ledger entry types.
- Derived balance model.
- Gross amount, provider fees, platform fees, held amount, disputed amount, releasable amount, released amount, refunded amount.
- Idempotency keys.
- Reconciliation behavior.
3. Escrow State Machine
Purpose: define legal transitions once.
Should include:
- Purchase request states.
- Payment states.
- Escrow/funds states.
- Dispute states.
- Valid transitions and forbidden transitions.
- Who or what can trigger each transition.
- Required preconditions for release, refund, cancellation, dispute hold, and admin override.
4. Authorization Matrix
Purpose: remove route-by-route ambiguity.
Should map every endpoint and socket event to:
- Public, authenticated, owner, seller, buyer, admin, support, or service role.
- Required ownership checks.
- Required object state.
- Rate-limit tier.
- Audit-log requirement.
5. Payment Provider Adapter Spec
Purpose: decouple business logic from SHKeeper, Request Network, manual wallet flow, and future providers.
Should define:
createPayInIntentgetPayInStatushandleProviderWebhookcreateHostedPaymentLinkcreateReleaseInstructioncreateRefundInstructiongetPayoutStatussearchProviderPayments
Provider-specific metadata should be namespaced and never become the canonical funds state.
6. Webhook Security Spec
Purpose: prevent forged, replayed, or silently failed provider events.
Should define:
- Raw-body signature verification.
- Accepted headers and algorithms.
- Replay prevention.
- Delivery ID/idempotency handling.
- Unknown payment behavior.
- Duplicate event behavior.
- Retry semantics.
- Dead-letter/replay storage.
- Alerting thresholds.
7. Session and Auth Architecture
Purpose: decide how browser sessions should work for a financial platform.
Should define:
- Access token lifetime.
- Refresh token lifetime and rotation.
- Whether tokens move from
localStoragetohttpOnlycookies. - CSRF strategy if cookies are used.
- Passkey/WebAuthn implementation requirements.
- OAuth requirements.
- Device/session revocation.
- Admin step-up authentication for payouts or role changes.
8. Realtime Authorization Spec
Purpose: make Socket.IO events subject to the same security model as REST.
Should define:
- Socket handshake authentication.
- Server-derived room membership.
- Which rooms exist.
- Who may join each room.
- Whether room membership changes with request/payment/dispute state.
- Event payload privacy rules.
9. Migration Plan
Purpose: avoid breaking current payments and historical records.
Should include:
- SHKeeper legacy read path.
- New provider feature flag.
- Ledger backfill strategy.
- Data validation report before enforcement.
- Rollback criteria.
- Cutover date for old webhook routes.
- Operator manual reconciliation workflow.
10. Secure Build and Supply-Chain Policy
Purpose: reduce npm and dependency compromise risk.
Should define:
- Package manager and lockfile policy.
- CI install mode.
- Dependency update cadence.
- Security advisory monitoring.
- npm provenance/signature policy where available.
- Secrets handling.
- Production build reproducibility.
- Separation of frontend npm risk from backend core risk.
11. Operational Runbooks
Purpose: make security incidents and payment failures survivable.
Should include:
- Failed webhook.
- Duplicate payment.
- Missing payment.
- Stuck release.
- Disputed release attempt.
- Compromised admin.
- Leaked API key.
- Provider outage.
- Chain/RPC outage.
- Suspicious payment proof.
- npm/package compromise.
Decision framework
Use the following questions before choosing a rewrite:
- Is the current goal safe launch, or long-term platform rebuild?
- Is the team willing to delay feature work for a payment-core redesign?
- Can the team maintain Go/Kotlin/Rust in production?
- Is the biggest current risk supply chain, or incorrect money movement?
- Are admin actions trusted, or should high-risk actions require step-up approval?
- Should Amanat custody funds, or should the provider/payment network hold or route them?
- Are disputes central to the product, or rare manual exceptions?
- Is auditability a regulatory/business requirement or only an internal safety goal?
Recommended decision
Near term:
- Harden the current Express backend.
- Disable unsafe production routes.
- Add centralized authorization and rate limiting.
- Fix Web3 verification.
- Fix Socket.IO authorization.
- Disable passkeys unless implemented with real WebAuthn.
- Begin ledger/state-machine documentation immediately.
Medium term:
- Build a provider-neutral payment and funds layer.
- Add immutable ledger entries.
- Move release/refund/dispute-hold checks into the central payment/funds service.
- Keep SHKeeper compatibility read-only for legacy records.
- Add Request Network or another provider behind the adapter if desired.
Long term:
- Rewrite the payment/auth/escrow core in Go or Kotlin/Java if the team can support it.
- Do not rewrite the entire backend until the core is proven.
- Keep lower-risk modules in TypeScript until there is a business or operational reason to migrate them.
Open questions for leadership and engineering
- Is launch timeline more important than a full payment/funds redesign?
- Should passkeys be removed from launch scope until production-grade WebAuthn is implemented?
- Should browser auth move to
httpOnlycookies even if that requires CSRF work and frontend changes? - Should every payout require admin step-up authentication or two-person approval?
- Should Amanat keep funds in a platform-controlled escrow wallet, or should provider-mediated payment pages become the default?
- Is Request Network a desired provider migration, or just one option being explored?
- What new backend stack can the team realistically operate for the next two years?
- What is the acceptable level of temporary dual-stack complexity during migration?
- Do we need formal external penetration testing before public launch?
- Who owns security decisions: product, backend, DevOps, or a dedicated security owner?
Relationship to existing docs
This assessment complements:
- Platform Logical Audit - 2026-05-24
- PRD - Platform Audit Remediation Plan (2026-05-24)
- PRD - Request Network Migration and Funds Management
- Security Architecture
- Payment Flow - SHKeeper
- Payment Flow - DePay & Web3
- Escrow Flow
- Dispute Flow
The existing remediation PRD is the tactical hardening plan. This document is the strategic backend-stack and refactor assessment.