DB strategy: add dual-DB partial-migration analysis

Three scoping tiers (ledger-only / +Payment+Dispute / all five financial
models) with concrete time estimates grounded in actual reference counts
from the codebase. Recommends Option 1 (ledger only, 3–4 weeks) as the
right dual-DB shape if a forcing function appears, and explains why it's
not yet worth doing over the 2-week in-place hardening.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-05-28 19:17:43 +04:00
parent 825d7870b3
commit 7868d94340

View File

@@ -99,6 +99,86 @@ Estimated cost: ~2 weeks. Catches the bugs that actually hurt. Leaves the migrat
---
## Partial-migration option: dual-DB for financial models only
A narrower question worth its own analysis: *what if we keep Mongo for the bulk of the app but move the financial/ledger operations to Postgres just to get ACID where money is involved?*
### Reference-surface in the current backend
| Model | Files referencing it |
|---|---|
| `Payment` | 33 |
| `PurchaseRequest` | 25 |
| `FundsLedgerEntry` | 4 |
| `DerivedDestination` | 4 |
| `Dispute` | 2 |
That gives three natural scoping tiers, each with very different cost.
### Option 1 — Ledger only (~34 weeks) — **recommended dual-DB shape**
Move just `FundsLedgerEntry` to Postgres. Keep everything else on Mongo. The ledger becomes the append-only authoritative record of every money movement, written through a single `LedgerService`.
| Phase | Work | Estimate |
|---|---|---|
| Postgres infra | docker-compose, dev seed, prod provisioning, backups, PITR | 34 days |
| Schema + Drizzle setup | One table + indexes, migrations | 2 days |
| Service boundary | `LedgerService` is the only writer; everywhere else reads | 34 days |
| Rewrite the 4 call sites | Mechanical | 2 days |
| Outbox pattern | Mongo write → outbox row → worker drains into Postgres. Survives crashes between the two writes. | 45 days |
| Reconciliation job | Nightly diff between ledger sum and Mongo-derived balances; alerts on drift | 23 days |
| Tests | Harness for both stores, ~10 new tests | 45 days |
| **Total** | | **34 weeks** |
**What you get:** Audit-grade money trail, ACID guarantee on the ledger itself, SQL-driven reporting for finance/regulators. No FK constraints across stores (does NOT solve the FK-shaped bug class — Mongo entities still can't reference Postgres rows with integrity), but the *financial record* is bulletproof.
**Risk:** The outbox is the load-bearing piece. If Mongo writes succeed and the worker crashes before the outbox drains, the ledger is briefly behind. Reconciliation closes the gap within 24h. Acceptable for typical regulatory regimes; not for high-frequency real-time settlement.
**Reusable foundation:** The outbox + reconciliation pattern built here is the template if you later expand to Option 2. None of the work is wasted.
### Option 2 — Ledger + Payment + Dispute (~1014 weeks)
Move `FundsLedgerEntry` + `Payment` + `Dispute` to Postgres. Keep `PurchaseRequest`, `User`, marketplace data in Mongo.
The hard part is not the 33 Payment refs — it's that **Payment refers to User, SellerOffer, PurchaseRequest, all of which live in Mongo**. Every cross-store join becomes an app-layer lookup. Queries like "find all Payments for users created last week" need a two-stage fetch.
| Phase | Work | Estimate |
|---|---|---|
| Everything from Option 1 | | 3 weeks |
| Payment + Dispute schema design | Including JSONB-vs-normalized for `Payment.metadata.requestNetworkData`, `.derivedDestination`, `.transactionSafety` | 12 weeks |
| Rewrite 33 + 2 = 35 call sites | Mix of mechanical + `populate('userId')` → manual lookup conversions | 34 weeks |
| Cross-store query helpers | Layer that fetches Payment from PG and enriches with User from Mongo. Pagination becomes painful. | 12 weeks |
| Dual-store transactional discipline | Payment update + PurchaseRequest update needs outbox + saga | 2 weeks |
| Tests rewrite | 36 test files, most touch Payment | 2 weeks |
| Stabilization | Cross-store bugs you didn't predict | 12 weeks |
| **Total** | | **1014 weeks** |
**What you get:** ACID across the entire payment lifecycle. But you've introduced a permanent cross-store consistency problem and queries got more complex everywhere.
### Option 3 — All five financial models (~1620 weeks)
Move all of `FundsLedgerEntry` + `Payment` + `PurchaseRequest` + `Dispute` + `DerivedDestination`. At this point you're approaching the full-migration cost (1424 weeks) without the full-migration cleanliness — you still own a cross-store boundary, just relocated to the User/marketplace edge.
**Skip this option.** If you're going this far, commit to the full migration plan in the section above instead of leaving an awkward two-store seam through the middle of the domain.
### Recommendation among dual-DB options
**Option 1 (ledger only, 34 weeks).** Smallest blast radius, cleanest service boundary, 80% of the auditor/regulator/finance-team value. Postgres becomes the source of truth for "did money move," not for "what's the order status." Revisit Option 2 only if a specific compliance ask or repeated cross-Payment consistency bugs force it.
**Avoid Option 2** unless there's a concrete forcing function. The permanent cross-store query pain is real and rarely worth it for the marginal ACID gain over Option 1 + good service discipline.
### How dual-DB Option 1 differs from "stay on Mongo + targeted hardening"
The 2-week in-place hardening above (append-only ledger collection, `withTransaction` on the 5 money-paths, `pre('save')` FK hooks, cleanup-query lint) gets you a *Mongo-native* version of most of Option 1's wins. The reasons to do Option 1 anyway:
- **Regulator/auditor specifically wants SQL** for ledger queries.
- **Finance team wants Metabase/Superset/BigQuery sync** with relational primitives, not Mongo aggregations.
- **A future financial product** (settlement netting, on-chain accounting export, multi-currency reconciliation) is on the roadmap and would be substantially easier in Postgres.
If none of those apply yet, the 2-week targeted hardening is still the right first step. Option 1 builds on top of it cleanly.
---
## When to revisit (trigger conditions)
Pull this doc out and re-evaluate when **any** of these fires: