Files
nick-doc/01 - Architecture/Database Strategy - Mongo vs Postgres Assessment.md
Siavash Sameni 825d7870b3 Add Mongo vs Postgres database-strategy assessment
Records the current recommendation (stay on Mongo + targeted hardening),
the realistic full-migration cost (3.5–6 months), and the trigger
conditions under which we should revisit the decision. Prompted by the
multi-seller orphan-payment bug on 2026-05-28 — exactly the FK-shaped
class of bug Postgres would prevent, but not by itself worth a migration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 19:13:50 +04:00

133 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Database Strategy — Mongo vs Postgres Assessment
**Status:** Living assessment. Not a decision yet. Written 2026-05-28.
**Owner:** nick + claude
**Decision deadline:** Open. Re-evaluate when one of the trigger conditions below fires.
---
## TL;DR
Amanat runs on MongoDB (primary store) + Redis (cache/sessions/rate limits). For an escrow product that moves money, Postgres would be the structurally better fit — FK constraints, ACID across rows, mature audit/reporting tooling. But a full migration today is a **36 month, single-engineer-equivalent project with high schedule risk** and zero user-visible value during the cutover.
**Current recommendation:** Don't migrate. Pay down the specific weaknesses Mongo creates (cross-collection consistency, audit trails, FK-shaped bugs) with targeted in-place hardening. Revisit the decision when one of the trigger conditions below fires.
---
## What we run today
| Store | Use | Notes |
|---|---|---|
| MongoDB (Mongoose 8.x) | Primary store — all domain data | 22 models, ~454 query call sites across 171 backend TS files |
| Redis | Sessions, cache, rate limits (paymentLimiter etc.) | Not in scope for any migration. Keep as-is either way. |
### Mongoose models (22)
Ranked by how naturally they map to a relational schema:
| Tier | Models | Relational fit |
|---|---|---|
| **Core financial** | `Payment`, `FundsLedgerEntry`, `PurchaseRequest`, `DerivedDestination`, `Dispute` | Strong. These are where FK constraints + ACID earn their keep. The orphan-payment deletion bug we hit on 2026-05-28 (`provider:` filter missing) lives here — an FK would have prevented it structurally. |
| **Marketplace** | `SellerOffer`, `RequestTemplate`, `Category`, `Address`, `Review` | Strong. Already relational in shape. |
| **Identity** | `User`, `TelegramLink`, `TelegramSession`, `TempVerification`, `TrezorAccount` | Strong. Clean 1-to-many. |
| **Document-shaped** | `Chat`, `Notification`, `BlogPost`, `PointTransaction`, `LevelConfig`, `ShopSettings` | Weak. Chat especially — message arrays prefer either Mongo or Postgres JSONB. |
### Mongo-specific patterns we lean on
These are the patterns that get expensive to migrate:
- **Atomic upsert counters** — `Counter.findByIdAndUpdate({_id:'derived_destination_index'}, {$inc:{seq:1}}, {new:true, upsert:true})` in `derivedDestinations.ts`. Postgres equivalent is a `SERIAL` column or `nextval('seq')`, trivial — but every existing call site has to change.
- **Embedded `metadata` blobs** — `Payment.metadata.requestNetworkData`, `.derivedDestination`, `.transactionSafety`. Used heavily for RN raw payloads and per-payment overrides. Two migration paths in Postgres: JSONB column (cheap, loses indexed query-ability) or normalized side tables (lots of work, lots of joins).
- **Single-document atomicity assumption** — `grep -rE 'startSession|withTransaction'` finds **1 file** in the codebase using Mongo transactions. The remaining ~454 query sites implicitly rely on single-document atomicity. Going relational forces explicit transaction demarcation everywhere money moves; this is where post-migration bugs hide.
- **Aggregation pipelines** — 11 files use `.aggregate()`. Each is a custom rewrite to SQL.
---
## Cost of a full migration
One-engineer-equivalent, full-time, not parallel with feature work:
| Phase | Scope | Estimate |
|---|---|---|
| Schema design + ERD | 22 models → relational schema, decide JSONB vs normalized for each `metadata` field | 12 weeks |
| ORM swap (Prisma/Drizzle/TypeORM) | Rewrite 22 models, 454 query sites. ~80% mechanical, ~20% (aggregations, atomic upserts) need genuine rethinking | 610 weeks |
| Data backfill scripts | Mongo → Postgres ETL per collection. ObjectId → uuid/int FK resolution, embedded subdoc unrolling | 23 weeks |
| Cutover infra | Dual-write window, shadow reads, rollback plan, point-in-time backups | 12 weeks |
| Test fix-up | 36 backend test files mock/seed Mongo; rewrite harness, fixtures, in-memory DB | 23 weeks |
| Stabilization | Production incidents you didn't predict; the long tail | 24 weeks |
| **Total** | | **1424 weeks (3.56 months)** |
### Multipliers specific to this codebase
- Only 1 file uses Mongo transactions today → most boundaries are implicit. Going relational means *finding* and explicitly wrapping every multi-row money operation. High bug yield.
- Heavy `metadata` blob usage → either lose query-ability (JSONB) or pay normalization cost (side tables + joins everywhere).
- Multiple agents (nick + claude + kimi + moojttaba) commit weekly. A 4-month migration branch will rot constantly; rebasing it against a fast-moving main is a tax on every other feature.
- 36 test files all assume Mongo. Either keep both DBs in CI during transition, or rewrite the whole test harness up front.
---
## What we'd actually gain
Honest accounting:
| Win | Real value |
|---|---|
| FK constraints | Would have caught the 2026-05-28 orphan-payment bug (Payment cleanup with missing `provider:` filter). Will catch similar bugs in the future. |
| Multi-row ACID | Real value for escrow release + dispute resolution + payment-to-request creation. Today these rely on app-level invariants. |
| Audit / financial reporting | SQL is much friendlier for accountants, auditors, and ad-hoc analytical queries. |
| Mature tooling | pg_dump, point-in-time recovery, logical replication, Metabase/Superset integration. |
| Hiring | More backend engineers know SQL well than Mongo well. |
| Non-win (claimed but not real) | Why it doesn't materialize |
|---|---|
| "Better performance" | Mongo handles this app's load fine; we're nowhere near needing it to scale further. |
| "Better schemas" | Mongoose already enforces schemas at the app layer. The structural integrity gain is FKs, not types. |
| "Fewer bugs" | Most bugs we've hit (`rn_webhook_event_field`, `backend_rate_limits`, `woodpecker_silent_build_fail`, telegram parse_mode) are application logic, not DB choice. Postgres wouldn't have caught any of them. |
---
## The structurally better path: targeted hardening (~2 weeks)
Get most of the relational wins without the migration:
1. **Append-only ledger as source of truth.** Promote `FundsLedgerEntry` (or a new collection) to the authoritative record of every money movement. Strict invariants enforced in a single service. Becomes the audit log accountants and disputes consume.
2. **Explicit transaction boundaries.** Identify the ~5 places where multi-collection atomicity actually matters: Payment + PurchaseRequest creation, escrow release, dispute resolution, sweep + DerivedDestination update, refund. Wrap each in `mongoose.startSession() + session.withTransaction(...)`. This requires Mongo to be a replica set in prod (which it already is for our deployment).
3. **App-layer FK enforcement.** Mongoose `pre('save')` and `pre('deleteOne')` hooks that verify referenced documents exist before mutating. Catches the orphan-deletion class of bug. Cheap.
4. **Cleanup-query lint.** Codify the [[feedback-payment-cleanup-provider-filter]] rule: any `Payment.find()/.deleteMany()/.updateMany()` over the payments collection without a `provider:` filter is a bug. Custom ESLint rule or just a grep in CI.
Estimated cost: ~2 weeks. Catches the bugs that actually hurt. Leaves the migration option open.
---
## When to revisit (trigger conditions)
Pull this doc out and re-evaluate when **any** of these fires:
1. **Compliance / audit requirement** — a regulator, payment partner, or auditor demands a relational ledger we can't easily produce from Mongo.
2. **Schema-flexibility cost has gone to zero** — feature velocity is no longer dominated by changing the shape of `Payment.metadata`, `RequestTemplate`, `PurchaseRequest`. If the schema has stabilized, the migration's main friction (rewriting too many evolving entities) is gone.
3. **The bug pattern has repeated** — we hit ≥3 incidents shaped like "missing referential integrity" or "no cross-collection transaction" within 6 months. Then the targeted hardening above wasn't enough and migration starts paying for itself.
4. **A green-field rewrite is happening anyway** — e.g. a major v2 architecture refactor, microservice split, or rewrite of the payments subsystem. Combine the migration with that work; don't do it standalone.
5. **Reporting needs blow up** — finance/ops team wants live SQL-driven dashboards and our Mongo aggregation pipelines + Metabase plugins can't keep up.
If none of the above fires, **stay on Mongo**.
---
## If we ever do migrate — order of operations
For when the trigger condition fires. Don't do it standalone — pair it with another large refactor.
1. Start with the **financial-tier models only** (Payment, FundsLedgerEntry, PurchaseRequest, DerivedDestination, Dispute). These are 5 of 22 models. Dual-store: Postgres for these, Mongo for the rest, with a sync layer or service-per-store boundary.
2. Validate for 3+ months on dev + prod-shadow before any cutover.
3. Migrate the marketplace + identity tiers next (10 more models). Document-shaped models (Chat, Notification, etc.) probably never need to migrate — they're happier in Mongo or as Postgres JSONB.
4. Use Drizzle or Prisma. Prefer Drizzle if you want migrations-as-code and don't want a heavy runtime; Prisma if the team prefers a higher-level abstraction.
5. **Don't** dual-write the same record. Pick one source of truth per model and don't compromise on it.
---
## Related
- [[feedback-payment-cleanup-provider-filter]] — the bug that prompted this discussion (Payment cleanup missing `provider:` filter destroyed multi-seller cart records).
- `PRD - Wallet, Multichain, Confirmations, AML, Trezor.md` — Task #7 (derived destinations) is the most Mongo-shaped feature we've shipped recently; reference for how atomic upserts and embedded metadata are used.
- `01 - Architecture/Request Network In-House Checkout.md` — RN integration relies heavily on `Payment.metadata.requestNetworkData` blob storage.