17 KiB
Database Strategy — Mongo vs Postgres Assessment
Status: RESOLVED — Full PostgreSQL migration complete as of 2026-06-06, backend v2.9.12. Document retained as historical reference. Owner: nick + claude Decision: Proceed with a staged hybrid migration, not an immediate full cutover.
[!success] Migration Complete — 2026-06-06 The migration to PostgreSQL is complete as of backend v2.9.12. MongoDB and Mongoose have been fully removed from the runtime codebase. This document is retained as historical context for the assessment and decision-making process.
TL;DR
Amanat still runs on MongoDB (primary store) + Redis (cache/sessions/rate limits). Backend 2.6.79 adds Postgres 18 support, Drizzle schemas/migrations, repository implementations, backfill/verify tooling, and conditional payment_quotes persistence, but this is not a full runtime cutover.
Current recommendation: continue the staged hybrid migration. Keep Mongo authoritative for live traffic until each domain is wired through the repository layer, backfilled, dual-written, shadow-read, and explicitly flipped.
See Postgres Runtime Cutover Status for the current line between code that can use Postgres and code that still uses Mongo.
What we run today
| Store | Use | Notes |
|---|---|---|
| MongoDB (Mongoose 8.x) | Primary runtime store — normal domain traffic | 22 models, ~454 query call sites across 171 backend TS files |
| PostgreSQL 18 + Drizzle | Migration target and conditional oracle quote store | Schemas/migrations through 0008, repo implementations, backfill/verify tooling; broad service wiring still pending |
| Redis | Sessions, cache, rate limits (paymentLimiter etc.) | Not in scope for any migration. Keep as-is either way. |
Current Postgres implementation state (2026-05-31)
| Implemented | Not yet cut over |
|---|---|
src/db/client.ts fail-fast PG client, Drizzle schema/index barrel, migrations through 0008, id_map, pg_dualwrite_gaps, payment_quotes |
Service layer still imports Mongoose models directly; no broad runtime use of createRepositories() / get*Repo() factory |
| Drizzle/Mongo/Dual repository classes for user, payment, points, marketplace | Auth, marketplace, payment, wallet, points, chat, notification, dispute, and admin paths still use Mongoose directly |
Backfill and verification scripts guarded by MIGRATION_PG_URL |
Backfills are not auto-run and no domain is verified as PG-authoritative |
Oracle quote persistence can write PG payment_quotes when ORACLE_QUOTING_ENABLED=true |
Payment records themselves are still created/updated in Mongo; PG quote insert depends on a resolvable PG parent row |
Mongoose models (22)
Ranked by how naturally they map to a relational schema:
| Tier | Models | Relational fit |
|---|---|---|
| Core financial | Payment, FundsLedgerEntry, PurchaseRequest, DerivedDestination, Dispute |
Strong. These are where FK constraints + ACID earn their keep. The orphan-payment deletion bug we hit on 2026-05-28 (provider: filter missing) lives here — an FK would have prevented it structurally. |
| Marketplace | SellerOffer, RequestTemplate, Category, Address, Review |
Strong. Already relational in shape. |
| Identity | User, TelegramLink, TelegramSession, TempVerification, TrezorAccount |
Strong. Clean 1-to-many. |
| Document-shaped | Chat, Notification, BlogPost, PointTransaction, LevelConfig, ShopSettings |
Weak. Chat especially — message arrays prefer either Mongo or Postgres JSONB. |
Mongo-specific patterns we lean on
These are the patterns that get expensive to migrate:
- Atomic upsert counters —
Counter.findByIdAndUpdate({_id:'derived_destination_index'}, {$inc:{seq:1}}, {new:true, upsert:true})inderivedDestinations.ts. Postgres equivalent is aSERIALcolumn ornextval('seq'), trivial — but every existing call site has to change. - Embedded
metadatablobs —Payment.metadata.requestNetworkData,.derivedDestination,.transactionSafety. Used heavily for RN raw payloads and per-payment overrides. Two migration paths in Postgres: JSONB column (cheap, loses indexed query-ability) or normalized side tables (lots of work, lots of joins). - Single-document atomicity assumption —
grep -rE 'startSession|withTransaction'finds 1 file in the codebase using Mongo transactions. The remaining ~454 query sites implicitly rely on single-document atomicity. Going relational forces explicit transaction demarcation everywhere money moves; this is where post-migration bugs hide. - Aggregation pipelines — 11 files use
.aggregate(). Each is a custom rewrite to SQL.
Cost of a full migration
One-engineer-equivalent, full-time, not parallel with feature work:
| Phase | Scope | Estimate |
|---|---|---|
| Schema design + ERD | 22 models → relational schema, decide JSONB vs normalized for each metadata field |
1–2 weeks |
| ORM swap (Prisma/Drizzle/TypeORM) | Rewrite 22 models, 454 query sites. ~80% mechanical, ~20% (aggregations, atomic upserts) need genuine rethinking | 6–10 weeks |
| Data backfill scripts | Mongo → Postgres ETL per collection. ObjectId → uuid/int FK resolution, embedded subdoc unrolling | 2–3 weeks |
| Cutover infra | Dual-write window, shadow reads, rollback plan, point-in-time backups | 1–2 weeks |
| Test fix-up | 36 backend test files mock/seed Mongo; rewrite harness, fixtures, in-memory DB | 2–3 weeks |
| Stabilization | Production incidents you didn't predict; the long tail | 2–4 weeks |
| Total | 14–24 weeks (3.5–6 months) |
Multipliers specific to this codebase
- Only 1 file uses Mongo transactions today → most boundaries are implicit. Going relational means finding and explicitly wrapping every multi-row money operation. High bug yield.
- Heavy
metadatablob usage → either lose query-ability (JSONB) or pay normalization cost (side tables + joins everywhere). - Multiple agents (nick + claude + kimi + moojttaba) commit weekly. A 4-month migration branch will rot constantly; rebasing it against a fast-moving main is a tax on every other feature.
- 36 test files all assume Mongo. Either keep both DBs in CI during transition, or rewrite the whole test harness up front.
What we'd actually gain
Honest accounting:
| Win | Real value |
|---|---|
| FK constraints | Would have caught the 2026-05-28 orphan-payment bug (Payment cleanup with missing provider: filter). Will catch similar bugs in the future. |
| Multi-row ACID | Real value for escrow release + dispute resolution + payment-to-request creation. Today these rely on app-level invariants. |
| Audit / financial reporting | SQL is much friendlier for accountants, auditors, and ad-hoc analytical queries. |
| Mature tooling | pg_dump, point-in-time recovery, logical replication, Metabase/Superset integration. |
| Hiring | More backend engineers know SQL well than Mongo well. |
| Non-win (claimed but not real) | Why it doesn't materialize |
|---|---|
| "Better performance" | Mongo handles this app's load fine; we're nowhere near needing it to scale further. |
| "Better schemas" | Mongoose already enforces schemas at the app layer. The structural integrity gain is FKs, not types. |
| "Fewer bugs" | Most bugs we've hit (rn_webhook_event_field, backend_rate_limits, woodpecker_silent_build_fail, telegram parse_mode) are application logic, not DB choice. Postgres wouldn't have caught any of them. |
The structurally better path: targeted hardening (~2 weeks)
Get most of the relational wins without the migration:
- Append-only ledger as source of truth. Promote
FundsLedgerEntry(or a new collection) to the authoritative record of every money movement. Strict invariants enforced in a single service. Becomes the audit log accountants and disputes consume. - Explicit transaction boundaries. Identify the ~5 places where multi-collection atomicity actually matters: Payment + PurchaseRequest creation, escrow release, dispute resolution, sweep + DerivedDestination update, refund. Wrap each in
mongoose.startSession() + session.withTransaction(...). This requires Mongo to be a replica set in prod (which it already is for our deployment). - App-layer FK enforcement. Mongoose
pre('save')andpre('deleteOne')hooks that verify referenced documents exist before mutating. Catches the orphan-deletion class of bug. Cheap. - Cleanup-query lint. Codify the feedback-payment-cleanup-provider-filter rule: any
Payment.find()/.deleteMany()/.updateMany()over the payments collection without aprovider:filter is a bug. Custom ESLint rule or just a grep in CI.
Estimated cost: ~2 weeks. Catches the bugs that actually hurt. Leaves the migration option open.
Partial-migration option: dual-DB for financial models only
A narrower question worth its own analysis: what if we keep Mongo for the bulk of the app but move the financial/ledger operations to Postgres just to get ACID where money is involved?
Reference-surface in the current backend
| Model | Files referencing it |
|---|---|
Payment |
33 |
PurchaseRequest |
25 |
FundsLedgerEntry |
4 |
DerivedDestination |
4 |
Dispute |
2 |
That gives three natural scoping tiers, each with very different cost.
Option 1 — Ledger only (~3–4 weeks) — recommended dual-DB shape
Move just FundsLedgerEntry to Postgres. Keep everything else on Mongo. The ledger becomes the append-only authoritative record of every money movement, written through a single LedgerService.
| Phase | Work | Estimate |
|---|---|---|
| Postgres infra | docker-compose, dev seed, prod provisioning, backups, PITR | 3–4 days |
| Schema + Drizzle setup | One table + indexes, migrations | 2 days |
| Service boundary | LedgerService is the only writer; everywhere else reads |
3–4 days |
| Rewrite the 4 call sites | Mechanical | 2 days |
| Outbox pattern | Mongo write → outbox row → worker drains into Postgres. Survives crashes between the two writes. | 4–5 days |
| Reconciliation job | Nightly diff between ledger sum and Mongo-derived balances; alerts on drift | 2–3 days |
| Tests | Harness for both stores, ~10 new tests | 4–5 days |
| Total | 3–4 weeks |
What you get: Audit-grade money trail, ACID guarantee on the ledger itself, SQL-driven reporting for finance/regulators. No FK constraints across stores (does NOT solve the FK-shaped bug class — Mongo entities still can't reference Postgres rows with integrity), but the financial record is bulletproof.
Risk: The outbox is the load-bearing piece. If Mongo writes succeed and the worker crashes before the outbox drains, the ledger is briefly behind. Reconciliation closes the gap within 24h. Acceptable for typical regulatory regimes; not for high-frequency real-time settlement.
Reusable foundation: The outbox + reconciliation pattern built here is the template if you later expand to Option 2. None of the work is wasted.
Option 2 — Ledger + Payment + Dispute (~10–14 weeks)
Move FundsLedgerEntry + Payment + Dispute to Postgres. Keep PurchaseRequest, User, marketplace data in Mongo.
The hard part is not the 33 Payment refs — it's that Payment refers to User, SellerOffer, PurchaseRequest, all of which live in Mongo. Every cross-store join becomes an app-layer lookup. Queries like "find all Payments for users created last week" need a two-stage fetch.
| Phase | Work | Estimate |
|---|---|---|
| Everything from Option 1 | 3 weeks | |
| Payment + Dispute schema design | Including JSONB-vs-normalized for Payment.metadata.requestNetworkData, .derivedDestination, .transactionSafety |
1–2 weeks |
| Rewrite 33 + 2 = 35 call sites | Mix of mechanical + populate('userId') → manual lookup conversions |
3–4 weeks |
| Cross-store query helpers | Layer that fetches Payment from PG and enriches with User from Mongo. Pagination becomes painful. | 1–2 weeks |
| Dual-store transactional discipline | Payment update + PurchaseRequest update needs outbox + saga | 2 weeks |
| Tests rewrite | 36 test files, most touch Payment | 2 weeks |
| Stabilization | Cross-store bugs you didn't predict | 1–2 weeks |
| Total | 10–14 weeks |
What you get: ACID across the entire payment lifecycle. But you've introduced a permanent cross-store consistency problem and queries got more complex everywhere.
Option 3 — All five financial models (~16–20 weeks)
Move all of FundsLedgerEntry + Payment + PurchaseRequest + Dispute + DerivedDestination. At this point you're approaching the full-migration cost (14–24 weeks) without the full-migration cleanliness — you still own a cross-store boundary, just relocated to the User/marketplace edge.
Skip this option. If you're going this far, commit to the full migration plan in the section above instead of leaving an awkward two-store seam through the middle of the domain.
Recommendation among dual-DB options
Option 1 (ledger only, 3–4 weeks). Smallest blast radius, cleanest service boundary, 80% of the auditor/regulator/finance-team value. Postgres becomes the source of truth for "did money move," not for "what's the order status." Revisit Option 2 only if a specific compliance ask or repeated cross-Payment consistency bugs force it.
Avoid Option 2 unless there's a concrete forcing function. The permanent cross-store query pain is real and rarely worth it for the marginal ACID gain over Option 1 + good service discipline.
How dual-DB Option 1 differs from "stay on Mongo + targeted hardening"
The 2-week in-place hardening above (append-only ledger collection, withTransaction on the 5 money-paths, pre('save') FK hooks, cleanup-query lint) gets you a Mongo-native version of most of Option 1's wins. The reasons to do Option 1 anyway:
- Regulator/auditor specifically wants SQL for ledger queries.
- Finance team wants Metabase/Superset/BigQuery sync with relational primitives, not Mongo aggregations.
- A future financial product (settlement netting, on-chain accounting export, multi-currency reconciliation) is on the roadmap and would be substantially easier in Postgres.
If none of those apply yet, the 2-week targeted hardening is still the right first step. Option 1 builds on top of it cleanly.
When to revisit (trigger conditions)
Pull this doc out and re-evaluate when any of these fires:
- Compliance / audit requirement — a regulator, payment partner, or auditor demands a relational ledger we can't easily produce from Mongo.
- Schema-flexibility cost has gone to zero — feature velocity is no longer dominated by changing the shape of
Payment.metadata,RequestTemplate,PurchaseRequest. If the schema has stabilized, the migration's main friction (rewriting too many evolving entities) is gone. - The bug pattern has repeated — we hit ≥3 incidents shaped like "missing referential integrity" or "no cross-collection transaction" within 6 months. Then the targeted hardening above wasn't enough and migration starts paying for itself.
- A green-field rewrite is happening anyway — e.g. a major v2 architecture refactor, microservice split, or rewrite of the payments subsystem. Combine the migration with that work; don't do it standalone.
- Reporting needs blow up — finance/ops team wants live SQL-driven dashboards and our Mongo aggregation pipelines + Metabase plugins can't keep up.
If none of the above fires, stay on Mongo.
If we ever do migrate — order of operations
For when the trigger condition fires. Don't do it standalone — pair it with another large refactor.
- Start with the financial-tier models only (Payment, FundsLedgerEntry, PurchaseRequest, DerivedDestination, Dispute). These are 5 of 22 models. Dual-store: Postgres for these, Mongo for the rest, with a sync layer or service-per-store boundary.
- Validate for 3+ months on dev + prod-shadow before any cutover.
- Migrate the marketplace + identity tiers next (10 more models). Document-shaped models (Chat, Notification, etc.) probably never need to migrate — they're happier in Mongo or as Postgres JSONB.
- Use Drizzle or Prisma. Prefer Drizzle if you want migrations-as-code and don't want a heavy runtime; Prisma if the team prefers a higher-level abstraction.
- Don't dual-write the same record. Pick one source of truth per model and don't compromise on it.
Related
- feedback-payment-cleanup-provider-filter — the bug that prompted this discussion (Payment cleanup missing
provider:filter destroyed multi-seller cart records). PRD - Wallet, Multichain, Confirmations, AML, Trezor.md— Task #7 (derived destinations) is the most Mongo-shaped feature we've shipped recently; reference for how atomic upserts and embedded metadata are used.01 - Architecture/Request Network In-House Checkout.md— RN integration relies heavily onPayment.metadata.requestNetworkDatablob storage.