Files

Siavash Sameni 7868d94340 DB strategy: add dual-DB partial-migration analysis

Three scoping tiers (ledger-only / +Payment+Dispute / all five financial
models) with concrete time estimates grounded in actual reference counts
from the codebase. Recommends Option 1 (ledger only, 3–4 weeks) as the
right dual-DB shape if a forcing function appears, and explains why it's
not yet worth doing over the 2-week in-place hardening.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 19:17:43 +04:00

15 KiB

Raw Blame History

Database Strategy — Mongo vs Postgres Assessment

Status: Living assessment. Not a decision yet. Written 2026-05-28. Owner: nick + claude Decision deadline: Open. Re-evaluate when one of the trigger conditions below fires.

TL;DR

Amanat runs on MongoDB (primary store) + Redis (cache/sessions/rate limits). For an escrow product that moves money, Postgres would be the structurally better fit — FK constraints, ACID across rows, mature audit/reporting tooling. But a full migration today is a 3–6 month, single-engineer-equivalent project with high schedule risk and zero user-visible value during the cutover.

Current recommendation: Don't migrate. Pay down the specific weaknesses Mongo creates (cross-collection consistency, audit trails, FK-shaped bugs) with targeted in-place hardening. Revisit the decision when one of the trigger conditions below fires.

What we run today

Store	Use	Notes
MongoDB (Mongoose 8.x)	Primary store — all domain data	22 models, ~454 query call sites across 171 backend TS files
Redis	Sessions, cache, rate limits (paymentLimiter etc.)	Not in scope for any migration. Keep as-is either way.

Mongoose models (22)

Ranked by how naturally they map to a relational schema:

Tier	Models	Relational fit
Core financial	`Payment`, `FundsLedgerEntry`, `PurchaseRequest`, `DerivedDestination`, `Dispute`	Strong. These are where FK constraints + ACID earn their keep. The orphan-payment deletion bug we hit on 2026-05-28 (`provider:` filter missing) lives here — an FK would have prevented it structurally.
Marketplace	`SellerOffer`, `RequestTemplate`, `Category`, `Address`, `Review`	Strong. Already relational in shape.
Identity	`User`, `TelegramLink`, `TelegramSession`, `TempVerification`, `TrezorAccount`	Strong. Clean 1-to-many.
Document-shaped	`Chat`, `Notification`, `BlogPost`, `PointTransaction`, `LevelConfig`, `ShopSettings`	Weak. Chat especially — message arrays prefer either Mongo or Postgres JSONB.

Mongo-specific patterns we lean on

These are the patterns that get expensive to migrate:

Atomic upsert counters — Counter.findByIdAndUpdate({_id:'derived_destination_index'}, {$inc:{seq:1}}, {new:true, upsert:true}) in derivedDestinations.ts. Postgres equivalent is a SERIAL column or nextval('seq'), trivial — but every existing call site has to change.
Embedded metadata blobs — Payment.metadata.requestNetworkData, .derivedDestination, .transactionSafety. Used heavily for RN raw payloads and per-payment overrides. Two migration paths in Postgres: JSONB column (cheap, loses indexed query-ability) or normalized side tables (lots of work, lots of joins).
Single-document atomicity assumption — grep -rE 'startSession|withTransaction' finds 1 file in the codebase using Mongo transactions. The remaining ~454 query sites implicitly rely on single-document atomicity. Going relational forces explicit transaction demarcation everywhere money moves; this is where post-migration bugs hide.
Aggregation pipelines — 11 files use .aggregate(). Each is a custom rewrite to SQL.

Cost of a full migration

One-engineer-equivalent, full-time, not parallel with feature work:

Phase	Scope	Estimate
Schema design + ERD	22 models → relational schema, decide JSONB vs normalized for each `metadata` field	1–2 weeks
ORM swap (Prisma/Drizzle/TypeORM)	Rewrite 22 models, 454 query sites. ~80% mechanical, ~20% (aggregations, atomic upserts) need genuine rethinking	6–10 weeks
Data backfill scripts	Mongo → Postgres ETL per collection. ObjectId → uuid/int FK resolution, embedded subdoc unrolling	2–3 weeks
Cutover infra	Dual-write window, shadow reads, rollback plan, point-in-time backups	1–2 weeks
Test fix-up	36 backend test files mock/seed Mongo; rewrite harness, fixtures, in-memory DB	2–3 weeks
Stabilization	Production incidents you didn't predict; the long tail	2–4 weeks
Total		14–24 weeks (3.5–6 months)

Multipliers specific to this codebase

Only 1 file uses Mongo transactions today → most boundaries are implicit. Going relational means finding and explicitly wrapping every multi-row money operation. High bug yield.
Heavy metadata blob usage → either lose query-ability (JSONB) or pay normalization cost (side tables + joins everywhere).
Multiple agents (nick + claude + kimi + moojttaba) commit weekly. A 4-month migration branch will rot constantly; rebasing it against a fast-moving main is a tax on every other feature.
36 test files all assume Mongo. Either keep both DBs in CI during transition, or rewrite the whole test harness up front.

What we'd actually gain

Honest accounting:

Win	Real value
FK constraints	Would have caught the 2026-05-28 orphan-payment bug (Payment cleanup with missing `provider:` filter). Will catch similar bugs in the future.
Multi-row ACID	Real value for escrow release + dispute resolution + payment-to-request creation. Today these rely on app-level invariants.
Audit / financial reporting	SQL is much friendlier for accountants, auditors, and ad-hoc analytical queries.
Mature tooling	pg_dump, point-in-time recovery, logical replication, Metabase/Superset integration.
Hiring	More backend engineers know SQL well than Mongo well.

Non-win (claimed but not real)	Why it doesn't materialize
"Better performance"	Mongo handles this app's load fine; we're nowhere near needing it to scale further.
"Better schemas"	Mongoose already enforces schemas at the app layer. The structural integrity gain is FKs, not types.
"Fewer bugs"	Most bugs we've hit (`rn_webhook_event_field`, `backend_rate_limits`, `woodpecker_silent_build_fail`, telegram parse_mode) are application logic, not DB choice. Postgres wouldn't have caught any of them.

The structurally better path: targeted hardening (~2 weeks)

Get most of the relational wins without the migration:

Append-only ledger as source of truth. Promote FundsLedgerEntry (or a new collection) to the authoritative record of every money movement. Strict invariants enforced in a single service. Becomes the audit log accountants and disputes consume.
Explicit transaction boundaries. Identify the ~5 places where multi-collection atomicity actually matters: Payment + PurchaseRequest creation, escrow release, dispute resolution, sweep + DerivedDestination update, refund. Wrap each in mongoose.startSession() + session.withTransaction(...). This requires Mongo to be a replica set in prod (which it already is for our deployment).
App-layer FK enforcement. Mongoose pre('save') and pre('deleteOne') hooks that verify referenced documents exist before mutating. Catches the orphan-deletion class of bug. Cheap.
Cleanup-query lint. Codify the feedback-payment-cleanup-provider-filter rule: any Payment.find()/.deleteMany()/.updateMany() over the payments collection without a provider: filter is a bug. Custom ESLint rule or just a grep in CI.

Estimated cost: ~2 weeks. Catches the bugs that actually hurt. Leaves the migration option open.

Partial-migration option: dual-DB for financial models only

A narrower question worth its own analysis: what if we keep Mongo for the bulk of the app but move the financial/ledger operations to Postgres just to get ACID where money is involved?

Reference-surface in the current backend

Model	Files referencing it
`Payment`	33
`PurchaseRequest`	25
`FundsLedgerEntry`	4
`DerivedDestination`	4
`Dispute`	2

That gives three natural scoping tiers, each with very different cost.

Option 1 — Ledger only (~3–4 weeks) — recommended dual-DB shape

Move just FundsLedgerEntry to Postgres. Keep everything else on Mongo. The ledger becomes the append-only authoritative record of every money movement, written through a single LedgerService.

Phase	Work	Estimate
Postgres infra	docker-compose, dev seed, prod provisioning, backups, PITR	3–4 days
Schema + Drizzle setup	One table + indexes, migrations	2 days
Service boundary	`LedgerService` is the only writer; everywhere else reads	3–4 days
Rewrite the 4 call sites	Mechanical	2 days
Outbox pattern	Mongo write → outbox row → worker drains into Postgres. Survives crashes between the two writes.	4–5 days
Reconciliation job	Nightly diff between ledger sum and Mongo-derived balances; alerts on drift	2–3 days
Tests	Harness for both stores, ~10 new tests	4–5 days
Total		3–4 weeks

What you get: Audit-grade money trail, ACID guarantee on the ledger itself, SQL-driven reporting for finance/regulators. No FK constraints across stores (does NOT solve the FK-shaped bug class — Mongo entities still can't reference Postgres rows with integrity), but the financial record is bulletproof.

Risk: The outbox is the load-bearing piece. If Mongo writes succeed and the worker crashes before the outbox drains, the ledger is briefly behind. Reconciliation closes the gap within 24h. Acceptable for typical regulatory regimes; not for high-frequency real-time settlement.

Reusable foundation: The outbox + reconciliation pattern built here is the template if you later expand to Option 2. None of the work is wasted.

Option 2 — Ledger + Payment + Dispute (~10–14 weeks)

Move FundsLedgerEntry + Payment + Dispute to Postgres. Keep PurchaseRequest, User, marketplace data in Mongo.

The hard part is not the 33 Payment refs — it's that Payment refers to User, SellerOffer, PurchaseRequest, all of which live in Mongo. Every cross-store join becomes an app-layer lookup. Queries like "find all Payments for users created last week" need a two-stage fetch.

Phase	Work	Estimate
Everything from Option 1		3 weeks
Payment + Dispute schema design	Including JSONB-vs-normalized for `Payment.metadata.requestNetworkData`, `.derivedDestination`, `.transactionSafety`	1–2 weeks
Rewrite 33 + 2 = 35 call sites	Mix of mechanical + `populate('userId')` → manual lookup conversions	3–4 weeks
Cross-store query helpers	Layer that fetches Payment from PG and enriches with User from Mongo. Pagination becomes painful.	1–2 weeks
Dual-store transactional discipline	Payment update + PurchaseRequest update needs outbox + saga	2 weeks
Tests rewrite	36 test files, most touch Payment	2 weeks
Stabilization	Cross-store bugs you didn't predict	1–2 weeks
Total		10–14 weeks

What you get: ACID across the entire payment lifecycle. But you've introduced a permanent cross-store consistency problem and queries got more complex everywhere.

Option 3 — All five financial models (~16–20 weeks)

Move all of FundsLedgerEntry + Payment + PurchaseRequest + Dispute + DerivedDestination. At this point you're approaching the full-migration cost (14–24 weeks) without the full-migration cleanliness — you still own a cross-store boundary, just relocated to the User/marketplace edge.

Skip this option. If you're going this far, commit to the full migration plan in the section above instead of leaving an awkward two-store seam through the middle of the domain.

Recommendation among dual-DB options

Option 1 (ledger only, 3–4 weeks). Smallest blast radius, cleanest service boundary, 80% of the auditor/regulator/finance-team value. Postgres becomes the source of truth for "did money move," not for "what's the order status." Revisit Option 2 only if a specific compliance ask or repeated cross-Payment consistency bugs force it.

Avoid Option 2 unless there's a concrete forcing function. The permanent cross-store query pain is real and rarely worth it for the marginal ACID gain over Option 1 + good service discipline.

How dual-DB Option 1 differs from "stay on Mongo + targeted hardening"

The 2-week in-place hardening above (append-only ledger collection, withTransaction on the 5 money-paths, pre('save') FK hooks, cleanup-query lint) gets you a Mongo-native version of most of Option 1's wins. The reasons to do Option 1 anyway:

Regulator/auditor specifically wants SQL for ledger queries.
Finance team wants Metabase/Superset/BigQuery sync with relational primitives, not Mongo aggregations.
A future financial product (settlement netting, on-chain accounting export, multi-currency reconciliation) is on the roadmap and would be substantially easier in Postgres.

If none of those apply yet, the 2-week targeted hardening is still the right first step. Option 1 builds on top of it cleanly.

When to revisit (trigger conditions)

Pull this doc out and re-evaluate when any of these fires:

Compliance / audit requirement — a regulator, payment partner, or auditor demands a relational ledger we can't easily produce from Mongo.
Schema-flexibility cost has gone to zero — feature velocity is no longer dominated by changing the shape of Payment.metadata, RequestTemplate, PurchaseRequest. If the schema has stabilized, the migration's main friction (rewriting too many evolving entities) is gone.
The bug pattern has repeated — we hit ≥3 incidents shaped like "missing referential integrity" or "no cross-collection transaction" within 6 months. Then the targeted hardening above wasn't enough and migration starts paying for itself.
A green-field rewrite is happening anyway — e.g. a major v2 architecture refactor, microservice split, or rewrite of the payments subsystem. Combine the migration with that work; don't do it standalone.
Reporting needs blow up — finance/ops team wants live SQL-driven dashboards and our Mongo aggregation pipelines + Metabase plugins can't keep up.

If none of the above fires, stay on Mongo.

If we ever do migrate — order of operations

For when the trigger condition fires. Don't do it standalone — pair it with another large refactor.

Start with the financial-tier models only (Payment, FundsLedgerEntry, PurchaseRequest, DerivedDestination, Dispute). These are 5 of 22 models. Dual-store: Postgres for these, Mongo for the rest, with a sync layer or service-per-store boundary.
Validate for 3+ months on dev + prod-shadow before any cutover.
Migrate the marketplace + identity tiers next (10 more models). Document-shaped models (Chat, Notification, etc.) probably never need to migrate — they're happier in Mongo or as Postgres JSONB.
Use Drizzle or Prisma. Prefer Drizzle if you want migrations-as-code and don't want a heavy runtime; Prisma if the team prefers a higher-level abstraction.
Don't dual-write the same record. Pick one source of truth per model and don't compromise on it.

feedback-payment-cleanup-provider-filter — the bug that prompted this discussion (Payment cleanup missing provider: filter destroyed multi-seller cart records).
PRD - Wallet, Multichain, Confirmations, AML, Trezor.md — Task #7 (derived destinations) is the most Mongo-shaped feature we've shipped recently; reference for how atomic upserts and embedded metadata are used.
01 - Architecture/Request Network In-House Checkout.md — RN integration relies heavily on Payment.metadata.requestNetworkData blob storage.

15 KiB Raw Blame History Unescape Escape