diff --git a/01 - Architecture/Database Strategy - Mongo vs Postgres Assessment.md b/01 - Architecture/Database Strategy - Mongo vs Postgres Assessment.md new file mode 100644 index 0000000..ea1106d --- /dev/null +++ b/01 - Architecture/Database Strategy - Mongo vs Postgres Assessment.md @@ -0,0 +1,132 @@ +# Database Strategy — Mongo vs Postgres Assessment + +**Status:** Living assessment. Not a decision yet. Written 2026-05-28. +**Owner:** nick + claude +**Decision deadline:** Open. Re-evaluate when one of the trigger conditions below fires. + +--- + +## TL;DR + +Amanat runs on MongoDB (primary store) + Redis (cache/sessions/rate limits). For an escrow product that moves money, Postgres would be the structurally better fit — FK constraints, ACID across rows, mature audit/reporting tooling. But a full migration today is a **3–6 month, single-engineer-equivalent project with high schedule risk** and zero user-visible value during the cutover. + +**Current recommendation:** Don't migrate. Pay down the specific weaknesses Mongo creates (cross-collection consistency, audit trails, FK-shaped bugs) with targeted in-place hardening. Revisit the decision when one of the trigger conditions below fires. + +--- + +## What we run today + +| Store | Use | Notes | +|---|---|---| +| MongoDB (Mongoose 8.x) | Primary store — all domain data | 22 models, ~454 query call sites across 171 backend TS files | +| Redis | Sessions, cache, rate limits (paymentLimiter etc.) | Not in scope for any migration. Keep as-is either way. | + +### Mongoose models (22) + +Ranked by how naturally they map to a relational schema: + +| Tier | Models | Relational fit | +|---|---|---| +| **Core financial** | `Payment`, `FundsLedgerEntry`, `PurchaseRequest`, `DerivedDestination`, `Dispute` | Strong. These are where FK constraints + ACID earn their keep. The orphan-payment deletion bug we hit on 2026-05-28 (`provider:` filter missing) lives here — an FK would have prevented it structurally. | +| **Marketplace** | `SellerOffer`, `RequestTemplate`, `Category`, `Address`, `Review` | Strong. Already relational in shape. | +| **Identity** | `User`, `TelegramLink`, `TelegramSession`, `TempVerification`, `TrezorAccount` | Strong. Clean 1-to-many. | +| **Document-shaped** | `Chat`, `Notification`, `BlogPost`, `PointTransaction`, `LevelConfig`, `ShopSettings` | Weak. Chat especially — message arrays prefer either Mongo or Postgres JSONB. | + +### Mongo-specific patterns we lean on + +These are the patterns that get expensive to migrate: + +- **Atomic upsert counters** — `Counter.findByIdAndUpdate({_id:'derived_destination_index'}, {$inc:{seq:1}}, {new:true, upsert:true})` in `derivedDestinations.ts`. Postgres equivalent is a `SERIAL` column or `nextval('seq')`, trivial — but every existing call site has to change. +- **Embedded `metadata` blobs** — `Payment.metadata.requestNetworkData`, `.derivedDestination`, `.transactionSafety`. Used heavily for RN raw payloads and per-payment overrides. Two migration paths in Postgres: JSONB column (cheap, loses indexed query-ability) or normalized side tables (lots of work, lots of joins). +- **Single-document atomicity assumption** — `grep -rE 'startSession|withTransaction'` finds **1 file** in the codebase using Mongo transactions. The remaining ~454 query sites implicitly rely on single-document atomicity. Going relational forces explicit transaction demarcation everywhere money moves; this is where post-migration bugs hide. +- **Aggregation pipelines** — 11 files use `.aggregate()`. Each is a custom rewrite to SQL. + +--- + +## Cost of a full migration + +One-engineer-equivalent, full-time, not parallel with feature work: + +| Phase | Scope | Estimate | +|---|---|---| +| Schema design + ERD | 22 models → relational schema, decide JSONB vs normalized for each `metadata` field | 1–2 weeks | +| ORM swap (Prisma/Drizzle/TypeORM) | Rewrite 22 models, 454 query sites. ~80% mechanical, ~20% (aggregations, atomic upserts) need genuine rethinking | 6–10 weeks | +| Data backfill scripts | Mongo → Postgres ETL per collection. ObjectId → uuid/int FK resolution, embedded subdoc unrolling | 2–3 weeks | +| Cutover infra | Dual-write window, shadow reads, rollback plan, point-in-time backups | 1–2 weeks | +| Test fix-up | 36 backend test files mock/seed Mongo; rewrite harness, fixtures, in-memory DB | 2–3 weeks | +| Stabilization | Production incidents you didn't predict; the long tail | 2–4 weeks | +| **Total** | | **14–24 weeks (3.5–6 months)** | + +### Multipliers specific to this codebase + +- Only 1 file uses Mongo transactions today → most boundaries are implicit. Going relational means *finding* and explicitly wrapping every multi-row money operation. High bug yield. +- Heavy `metadata` blob usage → either lose query-ability (JSONB) or pay normalization cost (side tables + joins everywhere). +- Multiple agents (nick + claude + kimi + moojttaba) commit weekly. A 4-month migration branch will rot constantly; rebasing it against a fast-moving main is a tax on every other feature. +- 36 test files all assume Mongo. Either keep both DBs in CI during transition, or rewrite the whole test harness up front. + +--- + +## What we'd actually gain + +Honest accounting: + +| Win | Real value | +|---|---| +| FK constraints | Would have caught the 2026-05-28 orphan-payment bug (Payment cleanup with missing `provider:` filter). Will catch similar bugs in the future. | +| Multi-row ACID | Real value for escrow release + dispute resolution + payment-to-request creation. Today these rely on app-level invariants. | +| Audit / financial reporting | SQL is much friendlier for accountants, auditors, and ad-hoc analytical queries. | +| Mature tooling | pg_dump, point-in-time recovery, logical replication, Metabase/Superset integration. | +| Hiring | More backend engineers know SQL well than Mongo well. | + +| Non-win (claimed but not real) | Why it doesn't materialize | +|---|---| +| "Better performance" | Mongo handles this app's load fine; we're nowhere near needing it to scale further. | +| "Better schemas" | Mongoose already enforces schemas at the app layer. The structural integrity gain is FKs, not types. | +| "Fewer bugs" | Most bugs we've hit (`rn_webhook_event_field`, `backend_rate_limits`, `woodpecker_silent_build_fail`, telegram parse_mode) are application logic, not DB choice. Postgres wouldn't have caught any of them. | + +--- + +## The structurally better path: targeted hardening (~2 weeks) + +Get most of the relational wins without the migration: + +1. **Append-only ledger as source of truth.** Promote `FundsLedgerEntry` (or a new collection) to the authoritative record of every money movement. Strict invariants enforced in a single service. Becomes the audit log accountants and disputes consume. +2. **Explicit transaction boundaries.** Identify the ~5 places where multi-collection atomicity actually matters: Payment + PurchaseRequest creation, escrow release, dispute resolution, sweep + DerivedDestination update, refund. Wrap each in `mongoose.startSession() + session.withTransaction(...)`. This requires Mongo to be a replica set in prod (which it already is for our deployment). +3. **App-layer FK enforcement.** Mongoose `pre('save')` and `pre('deleteOne')` hooks that verify referenced documents exist before mutating. Catches the orphan-deletion class of bug. Cheap. +4. **Cleanup-query lint.** Codify the [[feedback-payment-cleanup-provider-filter]] rule: any `Payment.find()/.deleteMany()/.updateMany()` over the payments collection without a `provider:` filter is a bug. Custom ESLint rule or just a grep in CI. + +Estimated cost: ~2 weeks. Catches the bugs that actually hurt. Leaves the migration option open. + +--- + +## When to revisit (trigger conditions) + +Pull this doc out and re-evaluate when **any** of these fires: + +1. **Compliance / audit requirement** — a regulator, payment partner, or auditor demands a relational ledger we can't easily produce from Mongo. +2. **Schema-flexibility cost has gone to zero** — feature velocity is no longer dominated by changing the shape of `Payment.metadata`, `RequestTemplate`, `PurchaseRequest`. If the schema has stabilized, the migration's main friction (rewriting too many evolving entities) is gone. +3. **The bug pattern has repeated** — we hit ≥3 incidents shaped like "missing referential integrity" or "no cross-collection transaction" within 6 months. Then the targeted hardening above wasn't enough and migration starts paying for itself. +4. **A green-field rewrite is happening anyway** — e.g. a major v2 architecture refactor, microservice split, or rewrite of the payments subsystem. Combine the migration with that work; don't do it standalone. +5. **Reporting needs blow up** — finance/ops team wants live SQL-driven dashboards and our Mongo aggregation pipelines + Metabase plugins can't keep up. + +If none of the above fires, **stay on Mongo**. + +--- + +## If we ever do migrate — order of operations + +For when the trigger condition fires. Don't do it standalone — pair it with another large refactor. + +1. Start with the **financial-tier models only** (Payment, FundsLedgerEntry, PurchaseRequest, DerivedDestination, Dispute). These are 5 of 22 models. Dual-store: Postgres for these, Mongo for the rest, with a sync layer or service-per-store boundary. +2. Validate for 3+ months on dev + prod-shadow before any cutover. +3. Migrate the marketplace + identity tiers next (10 more models). Document-shaped models (Chat, Notification, etc.) probably never need to migrate — they're happier in Mongo or as Postgres JSONB. +4. Use Drizzle or Prisma. Prefer Drizzle if you want migrations-as-code and don't want a heavy runtime; Prisma if the team prefers a higher-level abstraction. +5. **Don't** dual-write the same record. Pick one source of truth per model and don't compromise on it. + +--- + +## Related + +- [[feedback-payment-cleanup-provider-filter]] — the bug that prompted this discussion (Payment cleanup missing `provider:` filter destroyed multi-seller cart records). +- `PRD - Wallet, Multichain, Confirmations, AML, Trezor.md` — Task #7 (derived destinations) is the most Mongo-shaped feature we've shipped recently; reference for how atomic upserts and embedded metadata are used. +- `01 - Architecture/Request Network In-House Checkout.md` — RN integration relies heavily on `Payment.metadata.requestNetworkData` blob storage.