Complete task 4 backend security architecture docs
This commit is contained in:
117
09 - Audits/Backend Core Stack Decision Record - 2026-05-24.md
Normal file
117
09 - Audits/Backend Core Stack Decision Record - 2026-05-24.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
title: Backend Core Stack Decision Record - 2026-05-24
|
||||
tags: [adr, architecture, backend]
|
||||
created: 2026-05-24
|
||||
status: approved
|
||||
reviewers: [CTO, backend, security]
|
||||
---
|
||||
|
||||
# Backend Core Stack Decision Record - 2026-05-24
|
||||
|
||||
## 1. Decision
|
||||
|
||||
Keep the security-critical backend core on **TypeScript/Node** in the first 12 months.
|
||||
|
||||
Do **not** perform a full greenfield rewrite before the payment/auth/escrow core is fully specified and observable.
|
||||
|
||||
## 2. Why this stack (today)
|
||||
|
||||
The highest current risk is not framework selection; it is **financial state correctness**.
|
||||
|
||||
For the next phase, the team needs:
|
||||
|
||||
- provider-neutral payment abstraction,
|
||||
- immutable funds ledger,
|
||||
- webhook hardening and reconciliation,
|
||||
- strict dispute hold behavior,
|
||||
- admin step-up controls,
|
||||
- production-grade operational runbooks.
|
||||
|
||||
Moving to Go/Kotlin/Rust now would preserve existing risks while adding migration uncertainty and a delay in launch-readiness.
|
||||
|
||||
TypeScript remains the fastest way to ship the required controls while keeping operational visibility and team velocity.
|
||||
|
||||
## 3. Scope of extraction
|
||||
|
||||
The “backend core” now means:
|
||||
|
||||
- `Payment` orchestration and payout/release state transitions,
|
||||
- auth/session validation for financial actions,
|
||||
- webhook intake and reconciliation,
|
||||
- ledger-derived escrow eligibility checks,
|
||||
- admin-risk operations (payout/refund/adjustment).
|
||||
|
||||
These modules stay in the same service boundary during migration-in-place, but all calls must go through:
|
||||
|
||||
- [[Payment Provider Adapter Spec]]
|
||||
- [[Webhook Security Spec]]
|
||||
- [[Funds Ledger and Escrow State Machine Specification]]
|
||||
|
||||
Non-core modules remain where they are:
|
||||
|
||||
- marketplace browsing, templates, shop settings,
|
||||
- chat and notifications,
|
||||
- file uploads/downloads.
|
||||
|
||||
## 4. Evaluation of alternatives
|
||||
|
||||
### Go
|
||||
|
||||
- **Pros:** smaller runtime and dependency surface, better static guarantees.
|
||||
- **Cons:** highest immediate migration cost, new operational tooling, delayed delivery of core money-movement correctness.
|
||||
|
||||
### Kotlin/Java
|
||||
|
||||
- **Pros:** strong enterprise ecosystem, mature auth/security libraries.
|
||||
- **Cons:** heavier stack and slower delivery for a small team.
|
||||
|
||||
### Rust
|
||||
|
||||
- **Pros:** high correctness potential.
|
||||
- **Cons:** steep delivery cost and limited team familiarity.
|
||||
|
||||
### Keep TypeScript (selected)
|
||||
|
||||
- **Pros:** existing team velocity, reduced migration risk, direct integration with current deployment and frontend contracts.
|
||||
- **Cons:** npm supply-chain risk remains; mitigated by [[Secure Build and Supply-Chain Policy]] and strict dependency policy.
|
||||
|
||||
## 5. Migration and rollout plan
|
||||
|
||||
1. **Phase A (this quarter):** lock down high-risk flows in TypeScript (ledger, adapter, webhook, auth/session, runbooks).
|
||||
2. **Phase B (next two quarters):** extract core services behind stable interfaces and add adapter-level contract tests.
|
||||
3. **Phase C (deferred):** evaluate Go/Kotlin pilot for payout+webhook worker only if:
|
||||
- Phase A and B are stable for 60 days,
|
||||
- team staffing supports dual-stack operations,
|
||||
- audit requirements demand lower runtime dependency exposure.
|
||||
|
||||
## 6. Non-goals
|
||||
|
||||
- Full frontend rewrite.
|
||||
- New language migration without closed-loop reconciliation and signed-state invariants.
|
||||
- New provider support that bypasses the adapter contract.
|
||||
|
||||
## 7. Rollback criteria
|
||||
|
||||
- Any increase in incident rate above baseline +20% for 24h after migration activity.
|
||||
- Any unresolved ledger invariant violation (held + disputed + released + refunded mismatch).
|
||||
- Any provider outage recovery that requires non-operator-tuned workarounds.
|
||||
|
||||
Rollback to prior TS implementation:
|
||||
|
||||
- disable any split deployment feature flag,
|
||||
- switch `PAYMENT_ENABLED_PROVIDERS` back to legacy-only,
|
||||
- freeze new provider routing until incident review is complete,
|
||||
- complete post-incident update in this ADR.
|
||||
|
||||
## 8. Ownership
|
||||
|
||||
- **CTO:** final stack decision + dual-stack approvals.
|
||||
- **Backend Lead (BL):** contract and adapter enforcement.
|
||||
- **Security Lead (SL):** webhook/security acceptance criteria.
|
||||
- **DevOps Lead (DL):** deployment safety and rollback testing.
|
||||
|
||||
## Related
|
||||
|
||||
- [[Backend Stack Security and Refactor Assessment - 2026-05-24]]
|
||||
- [[Secure Build and Supply-Chain Policy]]
|
||||
- [[Backend Funds Migration and Operational Runbooks]]
|
||||
Reference in New Issue
Block a user