Files

Siavash Sameni 825d7870b3 Add Mongo vs Postgres database-strategy assessment

Records the current recommendation (stay on Mongo + targeted hardening),
the realistic full-migration cost (3.5–6 months), and the trigger
conditions under which we should revisit the decision. Prompted by the
multi-seller orphan-payment bug on 2026-05-28 — exactly the FK-shaped
class of bug Postgres would prevent, but not by itself worth a migration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 19:13:50 +04:00

9.8 KiB

Raw Blame History

Database Strategy — Mongo vs Postgres Assessment

Status: Living assessment. Not a decision yet. Written 2026-05-28. Owner: nick + claude Decision deadline: Open. Re-evaluate when one of the trigger conditions below fires.

TL;DR

Amanat runs on MongoDB (primary store) + Redis (cache/sessions/rate limits). For an escrow product that moves money, Postgres would be the structurally better fit — FK constraints, ACID across rows, mature audit/reporting tooling. But a full migration today is a 3–6 month, single-engineer-equivalent project with high schedule risk and zero user-visible value during the cutover.

Current recommendation: Don't migrate. Pay down the specific weaknesses Mongo creates (cross-collection consistency, audit trails, FK-shaped bugs) with targeted in-place hardening. Revisit the decision when one of the trigger conditions below fires.

What we run today

Store	Use	Notes
MongoDB (Mongoose 8.x)	Primary store — all domain data	22 models, ~454 query call sites across 171 backend TS files
Redis	Sessions, cache, rate limits (paymentLimiter etc.)	Not in scope for any migration. Keep as-is either way.

Mongoose models (22)

Ranked by how naturally they map to a relational schema:

Tier	Models	Relational fit
Core financial	`Payment`, `FundsLedgerEntry`, `PurchaseRequest`, `DerivedDestination`, `Dispute`	Strong. These are where FK constraints + ACID earn their keep. The orphan-payment deletion bug we hit on 2026-05-28 (`provider:` filter missing) lives here — an FK would have prevented it structurally.
Marketplace	`SellerOffer`, `RequestTemplate`, `Category`, `Address`, `Review`	Strong. Already relational in shape.
Identity	`User`, `TelegramLink`, `TelegramSession`, `TempVerification`, `TrezorAccount`	Strong. Clean 1-to-many.
Document-shaped	`Chat`, `Notification`, `BlogPost`, `PointTransaction`, `LevelConfig`, `ShopSettings`	Weak. Chat especially — message arrays prefer either Mongo or Postgres JSONB.

Mongo-specific patterns we lean on

These are the patterns that get expensive to migrate:

Atomic upsert counters — Counter.findByIdAndUpdate({_id:'derived_destination_index'}, {$inc:{seq:1}}, {new:true, upsert:true}) in derivedDestinations.ts. Postgres equivalent is a SERIAL column or nextval('seq'), trivial — but every existing call site has to change.
Embedded metadata blobs — Payment.metadata.requestNetworkData, .derivedDestination, .transactionSafety. Used heavily for RN raw payloads and per-payment overrides. Two migration paths in Postgres: JSONB column (cheap, loses indexed query-ability) or normalized side tables (lots of work, lots of joins).
Single-document atomicity assumption — grep -rE 'startSession|withTransaction' finds 1 file in the codebase using Mongo transactions. The remaining ~454 query sites implicitly rely on single-document atomicity. Going relational forces explicit transaction demarcation everywhere money moves; this is where post-migration bugs hide.
Aggregation pipelines — 11 files use .aggregate(). Each is a custom rewrite to SQL.

Cost of a full migration

One-engineer-equivalent, full-time, not parallel with feature work:

Phase	Scope	Estimate
Schema design + ERD	22 models → relational schema, decide JSONB vs normalized for each `metadata` field	1–2 weeks
ORM swap (Prisma/Drizzle/TypeORM)	Rewrite 22 models, 454 query sites. ~80% mechanical, ~20% (aggregations, atomic upserts) need genuine rethinking	6–10 weeks
Data backfill scripts	Mongo → Postgres ETL per collection. ObjectId → uuid/int FK resolution, embedded subdoc unrolling	2–3 weeks
Cutover infra	Dual-write window, shadow reads, rollback plan, point-in-time backups	1–2 weeks
Test fix-up	36 backend test files mock/seed Mongo; rewrite harness, fixtures, in-memory DB	2–3 weeks
Stabilization	Production incidents you didn't predict; the long tail	2–4 weeks
Total		14–24 weeks (3.5–6 months)

Multipliers specific to this codebase

Only 1 file uses Mongo transactions today → most boundaries are implicit. Going relational means finding and explicitly wrapping every multi-row money operation. High bug yield.
Heavy metadata blob usage → either lose query-ability (JSONB) or pay normalization cost (side tables + joins everywhere).
Multiple agents (nick + claude + kimi + moojttaba) commit weekly. A 4-month migration branch will rot constantly; rebasing it against a fast-moving main is a tax on every other feature.
36 test files all assume Mongo. Either keep both DBs in CI during transition, or rewrite the whole test harness up front.

What we'd actually gain

Honest accounting:

Win	Real value
FK constraints	Would have caught the 2026-05-28 orphan-payment bug (Payment cleanup with missing `provider:` filter). Will catch similar bugs in the future.
Multi-row ACID	Real value for escrow release + dispute resolution + payment-to-request creation. Today these rely on app-level invariants.
Audit / financial reporting	SQL is much friendlier for accountants, auditors, and ad-hoc analytical queries.
Mature tooling	pg_dump, point-in-time recovery, logical replication, Metabase/Superset integration.
Hiring	More backend engineers know SQL well than Mongo well.

Non-win (claimed but not real)	Why it doesn't materialize
"Better performance"	Mongo handles this app's load fine; we're nowhere near needing it to scale further.
"Better schemas"	Mongoose already enforces schemas at the app layer. The structural integrity gain is FKs, not types.
"Fewer bugs"	Most bugs we've hit (`rn_webhook_event_field`, `backend_rate_limits`, `woodpecker_silent_build_fail`, telegram parse_mode) are application logic, not DB choice. Postgres wouldn't have caught any of them.

The structurally better path: targeted hardening (~2 weeks)

Get most of the relational wins without the migration:

Append-only ledger as source of truth. Promote FundsLedgerEntry (or a new collection) to the authoritative record of every money movement. Strict invariants enforced in a single service. Becomes the audit log accountants and disputes consume.
Explicit transaction boundaries. Identify the ~5 places where multi-collection atomicity actually matters: Payment + PurchaseRequest creation, escrow release, dispute resolution, sweep + DerivedDestination update, refund. Wrap each in mongoose.startSession() + session.withTransaction(...). This requires Mongo to be a replica set in prod (which it already is for our deployment).
App-layer FK enforcement. Mongoose pre('save') and pre('deleteOne') hooks that verify referenced documents exist before mutating. Catches the orphan-deletion class of bug. Cheap.
Cleanup-query lint. Codify the feedback-payment-cleanup-provider-filter rule: any Payment.find()/.deleteMany()/.updateMany() over the payments collection without a provider: filter is a bug. Custom ESLint rule or just a grep in CI.

Estimated cost: ~2 weeks. Catches the bugs that actually hurt. Leaves the migration option open.

When to revisit (trigger conditions)

Pull this doc out and re-evaluate when any of these fires:

Compliance / audit requirement — a regulator, payment partner, or auditor demands a relational ledger we can't easily produce from Mongo.
Schema-flexibility cost has gone to zero — feature velocity is no longer dominated by changing the shape of Payment.metadata, RequestTemplate, PurchaseRequest. If the schema has stabilized, the migration's main friction (rewriting too many evolving entities) is gone.
The bug pattern has repeated — we hit ≥3 incidents shaped like "missing referential integrity" or "no cross-collection transaction" within 6 months. Then the targeted hardening above wasn't enough and migration starts paying for itself.
A green-field rewrite is happening anyway — e.g. a major v2 architecture refactor, microservice split, or rewrite of the payments subsystem. Combine the migration with that work; don't do it standalone.
Reporting needs blow up — finance/ops team wants live SQL-driven dashboards and our Mongo aggregation pipelines + Metabase plugins can't keep up.

If none of the above fires, stay on Mongo.

If we ever do migrate — order of operations

For when the trigger condition fires. Don't do it standalone — pair it with another large refactor.

Start with the financial-tier models only (Payment, FundsLedgerEntry, PurchaseRequest, DerivedDestination, Dispute). These are 5 of 22 models. Dual-store: Postgres for these, Mongo for the rest, with a sync layer or service-per-store boundary.
Validate for 3+ months on dev + prod-shadow before any cutover.
Migrate the marketplace + identity tiers next (10 more models). Document-shaped models (Chat, Notification, etc.) probably never need to migrate — they're happier in Mongo or as Postgres JSONB.
Use Drizzle or Prisma. Prefer Drizzle if you want migrations-as-code and don't want a heavy runtime; Prisma if the team prefers a higher-level abstraction.
Don't dual-write the same record. Pick one source of truth per model and don't compromise on it.

feedback-payment-cleanup-provider-filter — the bug that prompted this discussion (Payment cleanup missing provider: filter destroyed multi-seller cart records).
PRD - Wallet, Multichain, Confirmations, AML, Trezor.md — Task #7 (derived destinations) is the most Mongo-shaped feature we've shipped recently; reference for how atomic upserts and embedded metadata are used.
01 - Architecture/Request Network In-House Checkout.md — RN integration relies heavily on Payment.metadata.requestNetworkData blob storage.

9.8 KiB Raw Blame History Unescape Escape