Files
nick-doc/02 - Data Models/MongoDB to PostgreSQL Migration Plan (Drizzle).md
2026-06-01 22:38:33 +04:00

447 lines
28 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: MongoDB → PostgreSQL Migration Plan (Drizzle)
tags: [data-model, migration, postgres, drizzle, plan, runbook]
aliases: [Drizzle Migration Plan, PG Migration Plan]
created: 2026-05-31
companion: "[[MongoDB to PostgreSQL Migration Guide]]"
updated: 2026-06-01 for backend integrate-main-into-development@2c5c3c7 backend 2.8.20 + deployment main@38cb75b
---
# MongoDB → PostgreSQL Migration Plan (Drizzle)
> [!abstract] What this is
> The **execution plan** for the recommendation in [[MongoDB to PostgreSQL Migration Guide]]: a **hybrid target** (Postgres for the money/relational core, Mongo retained for Chat/Notification/TTL-session collections) reached via the **strangler pattern with dual-write**, using **Drizzle ORM** + **drizzle-kit** migrations.
>
> It is opinionated and concrete: a repository seam, an `id_map` bridge, Drizzle schema sketches for the hard cases (Mixed ids, embedded arrays, partial-unique idempotency, TTL), per-phase backfill/verify/cutover mechanics, and a rollback runbook. Where it references fields it uses the **real schema** from `backend/src/models/`.
>
> **Scope reminder:** partial migration (Phases 05) is the recommended stopping point — ≈1628 engineer-weeks. Full migration of Chat/Notification/sessions is explicitly deferred.
> [!warning] Current implementation status
> Backend `2.8.20` has started the runtime cutover with store-specific raw Postgres facades: auth-owned users/Telegram auth records behind `AUTH_STORE=postgres`, confirmation-threshold config/history behind `CONFIG_STORE=postgres`, user address CRUD behind `ADDRESS_STORE=postgres`, and the first marketplace/reference domains behind `CATEGORY_STORE=postgres`, `LEVEL_CONFIG_STORE=postgres`, `SHOP_SETTINGS_STORE=postgres`, and `REVIEW_STORE=postgres`. Category PG mode now deactivates duplicate active names and enforces an active normalized-name unique index. It also contains the broader `src/db/` Drizzle schemas through `0010`, repository implementations/factory, id-map bridge, and backfill runner described below. RequestTemplate now has a PG table/backfill. Funds ledger appends and balance reads now route through `getPaymentRepo()` and can be controlled by `REPO_PAYMENT`, but broad marketplace/payment/points services are still mostly not wired through their factory repos. Code defaults remain Mongo unless a per-store flag is explicitly flipped; dev deployment `38cb75b` now flips the seven PG-capable store flags to Postgres by default. See [[Postgres Runtime Cutover Status]].
---
## 0. Guiding principles
1. **Never cut over without a soak.** Every collection goes through backfill → dual-write → shadow-read verify → flip reads → soak → decommission. Rollback at any point = flip reads back to Mongo.
2. **The repository layer is the only thing that knows where data lives.** Services must stop calling Mongoose directly. This seam is what makes the swap invisible and per-collection reversible.
3. **Parents before children.** FK remapping flows through `id_map`; you cannot migrate `Payment` before `User` exists in PG with stable uuids.
4. **Money correctness is the point.** The migration's payoff is real ACID transactions around payment + ledger + dispute flows that today lean on Mongo per-document atomicity. Treat every money write as transactional from day one in PG.
5. **No feature work during migration.** No new fields, no behavior changes. A migration that also ships features cannot be verified by row-count + checksum equality.
6. **Mongo stays authoritative until cutover.** Dual-write writes both; reads come from Mongo until a collection's shadow-read window is clean.
---
## 1. Target architecture
```
┌─────────────────────────────────────────────┐
│ Service layer │
│ (marketplace, payment, dispute, points, …) │
└───────────────────────┬─────────────────────┘
│ calls interfaces only
┌───────────────────────▼─────────────────────┐
│ Repository layer │
│ IUserRepo, IPaymentRepo, IPurchaseRepo, … │
│ ── feature-flagged per collection ── │
└───────┬───────────────────────────┬─────────┘
reads/writes reads/writes
│ │
┌───────────▼─────────┐ ┌───────────▼─────────┐
│ MongoRepo (today) │ │ DrizzleRepo (new) │
│ Mongoose models │ │ Postgres + Drizzle │
└─────────────────────┘ └─────────────────────┘
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ MongoDB │◄── id_map ──────►│ Postgres │
└───────────┘ (bridge) └───────────┘
Permanent on Mongo: Chat, Notification, TelegramSession,
TempVerification, TelegramLink-state. Redis untouched.
```
Each domain gets an interface (`IPaymentRepo`), a `MongoPaymentRepo` (wraps today's Mongoose calls verbatim), a `DrizzlePaymentRepo` (new), and a `DualWritePaymentRepo` (delegates reads to one, writes to both, behind a flag). A factory picks the implementation per collection from config:
```ts
// repos/factory.ts
type Mode = 'mongo' | 'dual' | 'pg';
const MODE: Record<string, Mode> = {
user: env.REPO_USER ?? 'mongo',
payment: env.REPO_PAYMENT ?? 'mongo',
// …per collection
};
export const paymentRepo: IPaymentRepo =
MODE.payment === 'pg' ? new DrizzlePaymentRepo()
: MODE.payment === 'dual' ? new DualWritePaymentRepo(new MongoPaymentRepo(), new DrizzlePaymentRepo())
: new MongoPaymentRepo();
```
A collection's migration is then just three flag flips: `mongo → dual → pg`.
---
## 2. Drizzle & infra setup (Phase 0)
### Packages
```
pnpm add drizzle-orm pg
pnpm add -D drizzle-kit @types/pg
```
### Layout
```
backend/src/db/
schema/ # one file per table group
users.ts
payments.ts
purchaseRequests.ts
...
idMap.ts
index.ts # re-exports all tables + relations
client.ts # drizzle(pg.Pool) singleton
migrations/ # drizzle-kit generated SQL
repositories/
interfaces/ # IUserRepo, IPaymentRepo, …
mongo/ # MongoUserRepo (wraps existing Mongoose)
drizzle/ # DrizzleUserRepo
dual/ # DualWriteUserRepo
factory.ts
backfill/ # per-collection batch copiers
verify/ # row-count + checksum + shadow-read harness
drizzle.config.ts
```
### `drizzle.config.ts`
```ts
import { defineConfig } from 'drizzle-kit';
export default defineConfig({
schema: './src/db/schema/index.ts',
out: './src/db/migrations',
dialect: 'postgresql',
dbCredentials: { url: process.env.PG_URL! },
strict: true,
verbose: true,
});
```
### Client
```ts
// src/db/client.ts
import { drizzle } from 'drizzle-orm/node-postgres';
import { Pool } from 'pg';
import * as schema from './schema';
export const pool = new Pool({ connectionString: process.env.PG_URL, max: 10 });
export const db = drizzle(pool, { schema });
```
> Mirror the current Mongo pool size (`maxPoolSize: 10` in `connection.ts`). Keep `mongoose.connect` alive in parallel — both drivers run for the whole migration.
### Migration workflow
- Author tables in `schema/*.ts``pnpm drizzle-kit generate` → review the SQL in `migrations/``pnpm drizzle-kit migrate` in CI per environment.
- **Migrations are versioned, reviewed, and reversible.** This is brand-new discipline — there is no migration framework today.
---
## 3. The `id_map` bridge
ObjectIds become uuids. Every legacy id is recorded so FKs can be remapped and dual-writes stay idempotent.
```ts
// src/db/schema/idMap.ts
import { pgTable, uuid, text, timestamp, uniqueIndex } from 'drizzle-orm/pg-core';
export const idMap = pgTable('id_map', {
collection: text('collection').notNull(), // 'users', 'payments', …
legacyId: text('legacy_object_id').notNull(), // 24-char hex
newId: uuid('new_id').notNull().defaultRandom(),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(),
}, (t) => ({
uq: uniqueIndex('id_map_collection_legacy_uq').on(t.collection, t.legacyId),
}));
```
Rules:
- Backfill allocates `new_id` once per `(collection, legacyId)` and upserts here. Re-running backfill is safe.
- Resolving a foreign reference = look up the parent's `legacyId` in `id_map` to get its `new_id`. **A child cannot backfill until its parents are mapped** (enforces parents-before-children).
- Keep `legacy_object_id` as a real column on each migrated table too, for traceability and for the dual-write path to match Mongo docs.
---
## 4. Resolving the hard data-modeling cases in Drizzle
These are the patterns from §3 of the guide, made concrete. Get these right once; they recur.
### 4.1 Mixed / polymorphic ids — `Payment`, `FundsLedgerEntry`, `DerivedDestination`
Today `Payment.purchaseRequestId`, `sellerOfferId`, `sellerId` are `Schema.Types.Mixed` — an ObjectId for normal flows, a **string** for template checkout. **Never** store "uuid-or-string" in one PG column. Split into a typed FK + a nullable free-text ref + a discriminator.
```ts
// src/db/schema/payments.ts
import { pgTable, uuid, text, numeric, boolean, timestamp, jsonb, pgEnum, index, uniqueIndex } from 'drizzle-orm/pg-core';
export const paymentProvider = pgEnum('payment_provider', ['request.network','amn.scanner','shkeeper','other']);
export const paymentDirection = pgEnum('payment_direction', ['in','out','refund']);
export const paymentStatus = pgEnum('payment_status', ['pending','processing','completed','failed','cancelled','refunded']); // confirm full enum from model
export const escrowState = pgEnum('escrow_state', ['funded','releasable','released','refunded','releasing','failed','cancelled','partial']);
export const refKind = pgEnum('ref_kind', ['entity','template']); // discriminator
export const payments = pgTable('payments', {
id: uuid('id').primaryKey().defaultRandom(),
legacyObjectId: text('legacy_object_id'),
// purchaseRequestId (Mixed) → typed FK OR free string
purchaseRequestRefKind: refKind('purchase_request_ref_kind').notNull(),
purchaseRequestId: uuid('purchase_request_id').references(() => purchaseRequests.id), // null when template
purchaseRequestExternalRef: text('purchase_request_external_ref'), // set when template
// sellerOfferId (Mixed) → same shape
sellerOfferRefKind: refKind('seller_offer_ref_kind').notNull(),
sellerOfferId: uuid('seller_offer_id').references(() => sellerOffers.id),
sellerOfferExternalRef: text('seller_offer_external_ref'),
buyerId: uuid('buyer_id').notNull().references(() => users.id),
// sellerId (Mixed)
sellerRefKind: refKind('seller_ref_kind').notNull(),
sellerId: uuid('seller_id').references(() => users.id),
sellerExternalRef: text('seller_external_ref'),
// amount subdoc → inline columns
amount: numeric('amount', { precision: 38, scale: 18 }).notNull(),
currency: text('currency').notNull().default('USDT'),
provider: paymentProvider('provider').notNull().default('request.network'),
direction: paymentDirection('direction').notNull().default('in'),
status: paymentStatus('status').notNull().default('pending'),
escrowState: escrowState('escrow_state'),
providerPaymentId: text('provider_payment_id'),
blockchain: jsonb('blockchain'), // transactionHash etc. — read-as-blob, GIN if filtered
metadata: jsonb('metadata'), // provider-specific, schema-varying
isRefunded: boolean('is_refunded').notNull().default(false),
completedAt: timestamp('completed_at', { withTimezone: true }),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(),
updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow(),
}, (t) => ({
byStatusCreated: index('payments_status_created_idx').on(t.status, t.createdAt),
byBuyerStatus: index('payments_buyer_status_idx').on(t.buyerId, t.status),
bySellerStatus: index('payments_seller_status_idx').on(t.sellerId, t.status),
txHash: index('payments_tx_hash_idx').on(t.providerPaymentId),
// Partial-unique idempotency — the real Mongo index 'uniq_pending_request_network_by_buyer_session_offer'
pendingRnUq: uniqueIndex('uniq_pending_rn_by_buyer_offer')
.on(t.buyerId, t.purchaseRequestId, t.sellerOfferId, t.provider, t.direction)
.where(sql`provider = 'request.network' AND direction = 'in' AND status = 'pending'`),
}));
```
Add a CHECK so a discriminator always agrees with which column is populated:
```sql
ALTER TABLE payments ADD CONSTRAINT payments_pr_ref_ck CHECK (
(purchase_request_ref_kind = 'entity' AND purchase_request_id IS NOT NULL AND purchase_request_external_ref IS NULL) OR
(purchase_request_ref_kind = 'template' AND purchase_request_id IS NULL AND purchase_request_external_ref IS NOT NULL)
);
```
`FundsLedgerEntry` has the same Mixed `purchaseRequestId`/`paymentId` plus a **`idempotencyKey` sparse-unique** → partial unique index `WHERE idempotency_key IS NOT NULL`.
### 4.2 Embedded arrays → child tables
| Source (embedded) | PG | Notes |
|---|---|---|
| `PurchaseRequest.offers[]` (array of SellerOffer ids) | junction `purchase_request_offers(pr_id, offer_id)` | FK integrity; also drop the denormalized array. |
| `PurchaseRequest.preferredSellerIds[]` | junction `pr_preferred_sellers(pr_id, user_id)` | — |
| `PurchaseRequest.deliveryInfo / serviceInfo` (nested subdocs) | child tables `pr_delivery_info`, `pr_service_info` (1:1) | queried logistics; not blobbed. |
| `Dispute.evidence[]`, `Dispute.timeline[]` | `dispute_evidence`, `dispute_timeline` | timeline pre-save append → explicit INSERT. |
| `User.passkeys[]`, `User.refreshTokens[]` | `user_passkeys`, `user_refresh_tokens` | append/revoke + lookup semantics. |
| `DerivedDestination` sweep history, `TrezorAccount.addresses[]` | child tables | per-address rows referenced by payments. |
| `Payment.blockchain`, `Payment.metadata`, `Notification.metadata`, `PointTransaction.metadata` | **JSONB** | read-as-blob, never filtered/joined. |
Rule: **child table when you query/index/FK/aggregate it; JSONB when you read it whole and never filter on it.**
### 4.3 Self-referential FK — `Category`
```ts
export const categories = pgTable('categories', {
id: uuid('id').primaryKey().defaultRandom(),
legacyObjectId: text('legacy_object_id'),
name: text('name').notNull(),
nameEn: text('name_en'),
parentId: uuid('parent_id'), // self-FK, see relations
isActive: boolean('is_active').notNull().default(true),
}, (t) => ({
parentIdx: index('categories_parent_idx').on(t.parentId),
activeIdx: index('categories_active_idx').on(t.isActive),
activeNameNormUq: uniqueIndex('categories_active_name_norm_uq')
.on(sql`lower(btrim(${t.name}))`)
.where(sql`${t.isActive} = true`),
}));
// relations(): parentId → categories.id, ON DELETE SET NULL
```
`Category.parentId` is itself Mixed (ObjectId | string) in the model — verify all rows are ObjectIds during the pre-migration audit; treat stray strings as data errors to clean.
Active categories must also be unique by normalized visible name; migration `0009_unique_active_categories.sql` deactivates duplicate active rows and repoints category references before adding the unique index.
### 4.4 Sparse-unique → partial unique index — `User.email`, `User.referralCode`
The runtime code in `connection.ts` rebuilds `users.email` as unique+sparse. In PG:
```ts
emailUq: uniqueIndex('users_email_uq').on(t.email).where(sql`email IS NOT NULL`),
referralUq: uniqueIndex('users_referral_uq').on(t.referralCode).where(sql`referral_code IS NOT NULL`),
```
Reimplement `toJSON()` password/token stripping in the repository's read mapper (it deletes `refreshTokens`, `emailVerification*` before returning).
### 4.5 Atomic counter — `DerivedDestination.derivationIndex`
Today allocation relies on Mongo atomicity. In PG use a real transaction with `SELECT … FOR UPDATE` on a per-(buyer,chain) counter row, or a dedicated sequence per chain. The `uniq_destination_by_buyer_seller_chain` unique index ports directly. `status` enum `('active','swept','sweeping','quarantined')``pgEnum`.
### 4.6 TTL → `pg_cron`
`TempVerification` and `TelegramSession` stay on Mongo (ephemeral, recommended). If `Notification` (90-day TTL) ever moves: monthly range-partition + drop, or
```sql
SELECT cron.schedule('notifications_ttl', '0 3 * * *',
$$DELETE FROM notifications WHERE created_at < now() - interval '90 days'$$);
```
---
## 5. The dual-write seam (the mechanic that makes it safe)
```ts
// repositories/dual/DualWritePaymentRepo.ts
export class DualWritePaymentRepo implements IPaymentRepo {
constructor(private mongo: IPaymentRepo, private pg: IPaymentRepo) {}
// READS: source of truth = Mongo until cutover
findById(id) { return this.mongo.findById(id); }
// WRITES: both, idempotently. Mongo first (authoritative); PG must not break the request.
async create(input) {
const m = await this.mongo.create(input); // returns doc incl. _id
try {
await this.pg.upsertFromMongo(m); // keyed by legacyObjectId / idempotencyKey
} catch (e) {
metrics.dualWriteError('payments', 'create', e); // alert, do NOT throw
}
return m;
}
async update(id, patch) {
const m = await this.mongo.update(id, patch);
try { await this.pg.upsertFromMongo(m); } catch (e) { metrics.dualWriteError('payments','update',e); }
return m;
}
}
```
- **Mongo write is authoritative and must succeed**; PG write failures are logged + alerted, never surfaced to the user, during `dual` mode. (Once in `pg` mode, PG is authoritative and wrapped in real transactions.)
- All PG writes are **idempotent upserts** keyed on `legacyObjectId` (or natural idempotency keys: `Payment` partial-unique set, `FundsLedgerEntry.idempotencyKey`). This lets backfill and live dual-write overlap without double-insert.
- `$inc`/`$push` translate inside the repo: `$inc points``UPDATE … SET points = points + $1` in a transaction; `$push offers``INSERT INTO purchase_request_offers …`.
---
## 6. Phased execution
Same phases as the guide §2, here with Drizzle-concrete entry/exit gates. Each phase ends with a collection in `pg` mode and dual-write removed only after the soak.
### Phase 0 — Foundations (25 wk) — *no data moves*
- Stand up Postgres (per env), Drizzle, drizzle-kit, CI migrations. **Status 2026-05-31:** implemented in code and dev stack, but migrations must still be applied per target DB.
- Build repository interfaces + `MongoRepo` wrappers for the relational-core domains (refactor services to call repos, not Mongoose directly). **Status 2026-05-31:** repo interfaces/implementations exist; service-layer wiring remains the bulk of the cutover risk.
- Create `id_map`, the verification harness (§7), and the backfill batch runner skeleton.
- **Exit:** all relational-core services call repositories; PG reachable everywhere; `id_map` + verify harness exist; CI runs migrations.
### Phase 1 — Address pilot (12 wk)
- Smallest real domain; proves backfill → dual-write → verify → cutover end-to-end.
- **Status 2026-06-01:** `/api/addresses` has an opt-in PG runtime path through `ADDRESS_STORE=postgres`; PG writes/deletes mirror to Mongo for rollback.
- Reimplement the **one-primary-per-user** pre-save invariant as either a partial unique index `UNIQUE (user_id) WHERE primary = true` or a trigger.
- **Exit:** `addresses` in `pg` mode in prod, invariant proven under concurrent writes, verify green, dual-write removed.
### Phase 2 — Reference/config (23 wk)
- `Category` (self-FK, soft-delete), `LevelConfig`, `ConfigSetting`, `ConfigSettingHistory`, `ShopSettings`, `Review`.
- **Status 2026-06-01:** confirmation-threshold `ConfigSetting` / `ConfigSettingHistory`, categories, level config, shop settings, and reviews have opt-in PG runtime paths through their per-store flags; writes mirror back to Mongo where still-Mongo consumers need compatibility. Categories now enforce one active row per normalized visible name in PG mode.
- Port seeds to run in dependency order. Enforce `ShopSettings.sellerId` unique, Category `parentId` ON DELETE SET NULL, and Category active normalized-name uniqueness.
- **Exit:** these read from PG; seeds run in PG.
### Phase 3 — User + auth core (35 wk)
- `User` is the FK hub — **must precede the money core** so `id_map` for users is authoritative.
- **Status 2026-06-01:** auth-owned user data is opt-in PG-backed through `AUTH_STORE=postgres`, with a Mongo legacy mirror for still-Mongo consumers. Broader user consumers are not fully cut over.
- Normalize `profile`/`preferences`/`points`/`referralStats` into columns; extract `passkeys[]`, `refreshTokens[]` to child tables; partial-unique `email`/`referralCode`; reimplement `toJSON()` stripping; passkey `default: Date.now()` in app code.
- Redis session/rate-limit + in-memory passkey challenge store stay as-is.
- **Exit:** `users` in `pg` mode; referral self-FK intact; all auth flows pass; user uuids authoritative in `id_map`.
### Phase 4 — Money core (610 wk) — *the point of the project*
- `PurchaseRequest`, `SellerOffer`, `RequestTemplate`, `Payment`, `FundsLedgerEntry`, `DerivedDestination`, `TrezorAccount`, `PointTransaction`.
- **Status 2026-06-01:** Drizzle schemas and backfill scripts exist for PurchaseRequest/SellerOffer/RequestTemplate. Backend `2.8.19` hardens the marketplace-core backfill path with `npm run backfill:marketplace-core:postgres`, fixed PurchaseRequest timestamp/preferred-seller writes, a RequestTemplate backfill step, a post-SellerOffer selected-offer remap step, and category duplicate cleanup/unique active-name enforcement. Backend `2.8.20` wires the funds-ledger service through `getPaymentRepo()`, fixes Mongo/Drizzle payment-stat parity for future service wiring, and makes the repo factory lazy-load PG/dual implementations so Mongo mode does not require `PG_URL`. Runtime marketplace services still call Mongoose directly and must not be flipped with `REPO_MARKETPLACE` until service wiring plus shadow-read checks land.
- Apply §4.1 (Mixed→discriminator+FK), §4.2 (offers/preferredSellers junctions, deliveryInfo/serviceInfo child tables), §4.5 (derivation counter).
- **Wrap in real PG transactions the multi-doc writes that today have none:** `raiseDispute` (PurchaseRequest + Payment), payment confirm + `FundsLedgerEntry` AML-fee insert, referral reward (points + referralStats), PointsService flows (migrate its 2 `withTransaction` sites to PG `BEGIN/COMMIT`).
- Preserve the `Payment` partial-unique idempotency index and `FundsLedgerEntry.idempotencyKey` uniqueness.
- **Exit:** money core in `pg` mode; checksum equality on `funds_ledger_entries` sums & `payments` amounts across a full soak; idempotency + escrow-hold invariants pass concurrency tests.
### Phase 5 — Dispute + delivery (24 wk)
- `Dispute.evidence[]`/`timeline[]` → child tables; pre-save timeline-append → explicit INSERT; delivery `$set/$push` nested updates → SQL.
- `Dispute ↔ Chat` becomes a **cross-store call** (Chat stays on Mongo) — define the boundary API.
- **Exit:** dispute lifecycle in `pg` mode; release-hold sync transactional.
### Phase 6 (deferred / optional) — `BlogPost`
- Behind a search abstraction; `$regex` → PG trigram/FTS only if migrated. Otherwise leave on Mongo. RequestTemplate schema/backfill moved into Phase 4 because template checkout creates PurchaseRequest/SellerOffer rows.
### Permanent on Mongo
`Chat`, `Notification`, `TelegramSession`, `TempVerification`, `TelegramLink` link-state. Revisit only if dual-stack ops cost exceeds migration cost.
---
## 7. Verification (gate for every cutover)
Three layers, **all green before any read flip**:
1. **Row counts** — per collection and per FK relationship, Mongo vs PG. Catches dropped/dangling rows. Run continuously during dual-write.
2. **Checksums** — column-level hashes; special attention to financial sums (`SUM(funds_ledger_entries.amount)`, `SUM(payments.amount)` grouped by status/provider) and the partial-unique idempotency set.
3. **Shadow reads** — in prod, serve from Mongo, asynchronously read PG for the same key, diff, alert on mismatch. **A clean shadow-read window (e.g. 7 days, zero diffs on hot paths) is the exit criterion for cutover.**
```ts
// verify/shadow.ts — wrap a repo read in dual mode
async function shadowRead(key, mongoFn, pgFn) {
const m = await mongoFn(key);
pgFn(key).then(p => { if (!deepEqualNormalized(m, p)) metrics.shadowMismatch(key, diff(m, p)); })
.catch(e => metrics.shadowError(key, e));
return m; // user always gets Mongo result
}
```
---
## 8. Cutover & rollback runbook (per collection)
1. **Backfill** in batches with checkpointing; allocate uuids → `id_map`; remap FKs from already-migrated parents. Re-runnable (idempotent upserts).
- Marketplace-core operator path: `MIGRATION_MONGO_URL=... MIGRATION_PG_URL=... npm run backfill:marketplace-core:postgres:dry-run`, then `npm run backfill:marketplace-core:postgres` in non-prod. The group now includes RequestTemplate before PurchaseRequest/SellerOffer. Run `scripts/smoke/marketplace-core-postgres-backfill.sh` with the same DSNs to exercise the static backfill invariants and dry-run.
2. **Enable `dual`** (flag) — writes go to both; shadow-read diffing on. Backfill the delta accumulated during step 1.
3. **Soak** until row-count + checksum + shadow-read are clean for the agreed window.
4. **Flip reads to `pg`** (flag). Keep dual-write on.
5. **Soak again** (shorter). Rollback = flip reads back to `mongo`; data still mirrored, so rollback is instant.
6. **Decommission**: stop writing Mongo for that collection; archive the collection.
> Near-zero downtime: there is no global write freeze except, optionally, a brief one during final ledger reconciliation for the money core.
---
## 9. First two weeks — concrete starter checklist
- [ ] Add `drizzle-orm`, `pg`, `drizzle-kit`; create `src/db/{schema,client.ts,migrations}` + `drizzle.config.ts`.
- [x] Provision Postgres in dev (compose) + define `PG_URL`; keep Mongo running alongside. Use Postgres 18 volume mount `/var/lib/postgresql`, not `/var/lib/postgresql/data`.
- [ ] Write `id_map` schema; generate + run the first migration in CI.
- [ ] Define `IAddressRepo`; implement `MongoAddressRepo` by moving the existing Mongoose calls behind it; refactor address service to use the repo. **No behavior change** — prove the seam is invisible (existing tests pass).
- [ ] Build the verification harness (row count + checksum) against `addresses`.
- [ ] Author `addresses` Drizzle schema (incl. one-primary partial unique index) + `DrizzleAddressRepo` + `DualWriteAddressRepo`.
- [ ] Write the batch backfill for `addresses`; run dev backfill; confirm verify is green.
- [ ] Flip dev to `dual`, then `pg`; document the flag flips. This is the template for all later phases.
---
## 10. Effort recap (from the guide)
| Scope | Eng-weeks | Notes |
|---|---|---|
| **Partial — money/relational core (Phases 05 + cross-cutting)** | **~1628** | Recommended stopping point; captures ~90% of value (ACID money + relational integrity). |
| Full — all 23 collections | ~2340 | Extra 712+ wks mostly buys Chat/Notification normalization the access patterns don't reward. |
Add ~20% contingency for data-audit surprises in the Mixed-id fields. One focused engineer assumed; parallelize to compress wall-clock, not effort.
---
> [!warning] Before trusting the code sketches
> Drizzle schemas above use the real field names from `backend/src/models/` but are **first-pass sketches** — confirm the full `Payment.status` enum, the exact `amount` precision/scale your tokens need (USDT/USDC decimals), and audit which `Mixed` rows are actually strings vs ObjectIds **before** writing the money-core migration. See [[MongoDB to PostgreSQL Migration Guide]] §3/§5 for the authoritative per-field detail.