--- title: MongoDB → PostgreSQL Migration Plan (Drizzle) tags: [data-model, migration, postgres, drizzle, plan, runbook] aliases: [Drizzle Migration Plan, PG Migration Plan] created: 2026-05-31 companion: "[[MongoDB to PostgreSQL Migration Guide]]" updated: 2026-05-31 for backend integrate-main-into-development@cab0719 --- # MongoDB → PostgreSQL Migration Plan (Drizzle) > [!abstract] What this is > The **execution plan** for the recommendation in [[MongoDB to PostgreSQL Migration Guide]]: a **hybrid target** (Postgres for the money/relational core, Mongo retained for Chat/Notification/TTL-session collections) reached via the **strangler pattern with dual-write**, using **Drizzle ORM** + **drizzle-kit** migrations. > > It is opinionated and concrete: a repository seam, an `id_map` bridge, Drizzle schema sketches for the hard cases (Mixed ids, embedded arrays, partial-unique idempotency, TTL), per-phase backfill/verify/cutover mechanics, and a rollback runbook. Where it references fields it uses the **real schema** from `backend/src/models/`. > > **Scope reminder:** partial migration (Phases 0–5) is the recommended stopping point — ≈16–28 engineer-weeks. Full migration of Chat/Notification/sessions is explicitly deferred. > [!warning] Current implementation status > Backend `2.6.80` has completed the first implementation slice of this plan: Postgres/Drizzle infra, schemas/migrations through `0008`, `id_map`, `pg_dualwrite_gaps`, Drizzle/Mongo/Dual repo implementations, backfill/verify tooling, conditional oracle `payment_quotes` persistence, and the `PurchaseRequest`/`RequestTemplate` budget enum alignment with PG `budget_currency`. It has **not** completed service-layer wiring or runtime cutover. Mongo remains authoritative for normal traffic. See [[Postgres Runtime Cutover Status]]. --- ## 0. Guiding principles 1. **Never cut over without a soak.** Every collection goes through backfill → dual-write → shadow-read verify → flip reads → soak → decommission. Rollback at any point = flip reads back to Mongo. 2. **The repository layer is the only thing that knows where data lives.** Services must stop calling Mongoose directly. This seam is what makes the swap invisible and per-collection reversible. 3. **Parents before children.** FK remapping flows through `id_map`; you cannot migrate `Payment` before `User` exists in PG with stable uuids. 4. **Money correctness is the point.** The migration's payoff is real ACID transactions around payment + ledger + dispute flows that today lean on Mongo per-document atomicity. Treat every money write as transactional from day one in PG. 5. **No feature work during migration.** No new fields, no behavior changes. A migration that also ships features cannot be verified by row-count + checksum equality. 6. **Mongo stays authoritative until cutover.** Dual-write writes both; reads come from Mongo until a collection's shadow-read window is clean. --- ## 1. Target architecture ``` ┌─────────────────────────────────────────────┐ │ Service layer │ │ (marketplace, payment, dispute, points, …) │ └───────────────────────┬─────────────────────┘ │ calls interfaces only ┌───────────────────────▼─────────────────────┐ │ Repository layer │ │ IUserRepo, IPaymentRepo, IPurchaseRepo, … │ │ ── feature-flagged per collection ── │ └───────┬───────────────────────────┬─────────┘ reads/writes reads/writes │ │ ┌───────────▼─────────┐ ┌───────────▼─────────┐ │ MongoRepo (today) │ │ DrizzleRepo (new) │ │ Mongoose models │ │ Postgres + Drizzle │ └─────────────────────┘ └─────────────────────┘ │ │ ┌─────▼─────┐ ┌─────▼─────┐ │ MongoDB │◄── id_map ──────►│ Postgres │ └───────────┘ (bridge) └───────────┘ Permanent on Mongo: Chat, Notification, TelegramSession, TempVerification, TelegramLink-state. Redis untouched. ``` Each domain gets an interface (`IPaymentRepo`), a `MongoPaymentRepo` (wraps today's Mongoose calls verbatim), a `DrizzlePaymentRepo` (new), and a `DualWritePaymentRepo` (delegates reads to one, writes to both, behind a flag). A factory picks the implementation per collection from config: ```ts // repos/factory.ts type Mode = 'mongo' | 'dual' | 'pg'; const MODE: Record = { user: env.REPO_USER ?? 'mongo', payment: env.REPO_PAYMENT ?? 'mongo', // …per collection }; export const paymentRepo: IPaymentRepo = MODE.payment === 'pg' ? new DrizzlePaymentRepo() : MODE.payment === 'dual' ? new DualWritePaymentRepo(new MongoPaymentRepo(), new DrizzlePaymentRepo()) : new MongoPaymentRepo(); ``` A collection's migration is then just three flag flips: `mongo → dual → pg`. --- ## 2. Drizzle & infra setup (Phase 0) ### Packages ``` pnpm add drizzle-orm pg pnpm add -D drizzle-kit @types/pg ``` ### Layout ``` backend/src/db/ schema/ # one file per table group users.ts payments.ts purchaseRequests.ts ... idMap.ts index.ts # re-exports all tables + relations client.ts # drizzle(pg.Pool) singleton migrations/ # drizzle-kit generated SQL repositories/ interfaces/ # IUserRepo, IPaymentRepo, … mongo/ # MongoUserRepo (wraps existing Mongoose) drizzle/ # DrizzleUserRepo dual/ # DualWriteUserRepo factory.ts backfill/ # per-collection batch copiers verify/ # row-count + checksum + shadow-read harness drizzle.config.ts ``` ### `drizzle.config.ts` ```ts import { defineConfig } from 'drizzle-kit'; export default defineConfig({ schema: './src/db/schema/index.ts', out: './src/db/migrations', dialect: 'postgresql', dbCredentials: { url: process.env.PG_URL! }, strict: true, verbose: true, }); ``` ### Client ```ts // src/db/client.ts import { drizzle } from 'drizzle-orm/node-postgres'; import { Pool } from 'pg'; import * as schema from './schema'; export const pool = new Pool({ connectionString: process.env.PG_URL, max: 10 }); export const db = drizzle(pool, { schema }); ``` > Mirror the current Mongo pool size (`maxPoolSize: 10` in `connection.ts`). Keep `mongoose.connect` alive in parallel — both drivers run for the whole migration. ### Migration workflow - Author tables in `schema/*.ts` → `pnpm drizzle-kit generate` → review the SQL in `migrations/` → `pnpm drizzle-kit migrate` in CI per environment. - **Migrations are versioned, reviewed, and reversible.** This is brand-new discipline — there is no migration framework today. --- ## 3. The `id_map` bridge ObjectIds become uuids. Every legacy id is recorded so FKs can be remapped and dual-writes stay idempotent. ```ts // src/db/schema/idMap.ts import { pgTable, uuid, text, timestamp, uniqueIndex } from 'drizzle-orm/pg-core'; export const idMap = pgTable('id_map', { collection: text('collection').notNull(), // 'users', 'payments', … legacyId: text('legacy_object_id').notNull(), // 24-char hex newId: uuid('new_id').notNull().defaultRandom(), createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(), }, (t) => ({ uq: uniqueIndex('id_map_collection_legacy_uq').on(t.collection, t.legacyId), })); ``` Rules: - Backfill allocates `new_id` once per `(collection, legacyId)` and upserts here. Re-running backfill is safe. - Resolving a foreign reference = look up the parent's `legacyId` in `id_map` to get its `new_id`. **A child cannot backfill until its parents are mapped** (enforces parents-before-children). - Keep `legacy_object_id` as a real column on each migrated table too, for traceability and for the dual-write path to match Mongo docs. --- ## 4. Resolving the hard data-modeling cases in Drizzle These are the patterns from §3 of the guide, made concrete. Get these right once; they recur. ### 4.1 Mixed / polymorphic ids — `Payment`, `FundsLedgerEntry`, `DerivedDestination` Today `Payment.purchaseRequestId`, `sellerOfferId`, `sellerId` are `Schema.Types.Mixed` — an ObjectId for normal flows, a **string** for template checkout. **Never** store "uuid-or-string" in one PG column. Split into a typed FK + a nullable free-text ref + a discriminator. ```ts // src/db/schema/payments.ts import { pgTable, uuid, text, numeric, boolean, timestamp, jsonb, pgEnum, index, uniqueIndex } from 'drizzle-orm/pg-core'; export const paymentProvider = pgEnum('payment_provider', ['request.network','amn.scanner','shkeeper','other']); export const paymentDirection = pgEnum('payment_direction', ['in','out','refund']); export const paymentStatus = pgEnum('payment_status', ['pending','processing','completed','failed','cancelled','refunded']); // confirm full enum from model export const escrowState = pgEnum('escrow_state', ['funded','releasable','released','refunded','releasing','failed','cancelled','partial']); export const refKind = pgEnum('ref_kind', ['entity','template']); // discriminator export const payments = pgTable('payments', { id: uuid('id').primaryKey().defaultRandom(), legacyObjectId: text('legacy_object_id'), // purchaseRequestId (Mixed) → typed FK OR free string purchaseRequestRefKind: refKind('purchase_request_ref_kind').notNull(), purchaseRequestId: uuid('purchase_request_id').references(() => purchaseRequests.id), // null when template purchaseRequestExternalRef: text('purchase_request_external_ref'), // set when template // sellerOfferId (Mixed) → same shape sellerOfferRefKind: refKind('seller_offer_ref_kind').notNull(), sellerOfferId: uuid('seller_offer_id').references(() => sellerOffers.id), sellerOfferExternalRef: text('seller_offer_external_ref'), buyerId: uuid('buyer_id').notNull().references(() => users.id), // sellerId (Mixed) sellerRefKind: refKind('seller_ref_kind').notNull(), sellerId: uuid('seller_id').references(() => users.id), sellerExternalRef: text('seller_external_ref'), // amount subdoc → inline columns amount: numeric('amount', { precision: 38, scale: 18 }).notNull(), currency: text('currency').notNull().default('USDT'), provider: paymentProvider('provider').notNull().default('request.network'), direction: paymentDirection('direction').notNull().default('in'), status: paymentStatus('status').notNull().default('pending'), escrowState: escrowState('escrow_state'), providerPaymentId: text('provider_payment_id'), blockchain: jsonb('blockchain'), // transactionHash etc. — read-as-blob, GIN if filtered metadata: jsonb('metadata'), // provider-specific, schema-varying isRefunded: boolean('is_refunded').notNull().default(false), completedAt: timestamp('completed_at', { withTimezone: true }), createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(), updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow(), }, (t) => ({ byStatusCreated: index('payments_status_created_idx').on(t.status, t.createdAt), byBuyerStatus: index('payments_buyer_status_idx').on(t.buyerId, t.status), bySellerStatus: index('payments_seller_status_idx').on(t.sellerId, t.status), txHash: index('payments_tx_hash_idx').on(t.providerPaymentId), // Partial-unique idempotency — the real Mongo index 'uniq_pending_request_network_by_buyer_session_offer' pendingRnUq: uniqueIndex('uniq_pending_rn_by_buyer_offer') .on(t.buyerId, t.purchaseRequestId, t.sellerOfferId, t.provider, t.direction) .where(sql`provider = 'request.network' AND direction = 'in' AND status = 'pending'`), })); ``` Add a CHECK so a discriminator always agrees with which column is populated: ```sql ALTER TABLE payments ADD CONSTRAINT payments_pr_ref_ck CHECK ( (purchase_request_ref_kind = 'entity' AND purchase_request_id IS NOT NULL AND purchase_request_external_ref IS NULL) OR (purchase_request_ref_kind = 'template' AND purchase_request_id IS NULL AND purchase_request_external_ref IS NOT NULL) ); ``` `FundsLedgerEntry` has the same Mixed `purchaseRequestId`/`paymentId` plus a **`idempotencyKey` sparse-unique** → partial unique index `WHERE idempotency_key IS NOT NULL`. ### 4.2 Embedded arrays → child tables | Source (embedded) | PG | Notes | |---|---|---| | `PurchaseRequest.offers[]` (array of SellerOffer ids) | junction `purchase_request_offers(pr_id, offer_id)` | FK integrity; also drop the denormalized array. | | `PurchaseRequest.preferredSellerIds[]` | junction `pr_preferred_sellers(pr_id, user_id)` | — | | `PurchaseRequest.deliveryInfo / serviceInfo` (nested subdocs) | child tables `pr_delivery_info`, `pr_service_info` (1:1) | queried logistics; not blobbed. | | `Dispute.evidence[]`, `Dispute.timeline[]` | `dispute_evidence`, `dispute_timeline` | timeline pre-save append → explicit INSERT. | | `User.passkeys[]`, `User.refreshTokens[]` | `user_passkeys`, `user_refresh_tokens` | append/revoke + lookup semantics. | | `DerivedDestination` sweep history, `TrezorAccount.addresses[]` | child tables | per-address rows referenced by payments. | | `Payment.blockchain`, `Payment.metadata`, `Notification.metadata`, `PointTransaction.metadata` | **JSONB** | read-as-blob, never filtered/joined. | Rule: **child table when you query/index/FK/aggregate it; JSONB when you read it whole and never filter on it.** ### 4.3 Self-referential FK — `Category` ```ts export const categories = pgTable('categories', { id: uuid('id').primaryKey().defaultRandom(), legacyObjectId: text('legacy_object_id'), name: text('name').notNull(), nameEn: text('name_en'), parentId: uuid('parent_id'), // self-FK, see relations isActive: boolean('is_active').notNull().default(true), }, (t) => ({ parentIdx: index('categories_parent_idx').on(t.parentId), activeIdx: index('categories_active_idx').on(t.isActive), })); // relations(): parentId → categories.id, ON DELETE SET NULL ``` `Category.parentId` is itself Mixed (ObjectId | string) in the model — verify all rows are ObjectIds during the pre-migration audit; treat stray strings as data errors to clean. ### 4.4 Sparse-unique → partial unique index — `User.email`, `User.referralCode` The runtime code in `connection.ts` rebuilds `users.email` as unique+sparse. In PG: ```ts emailUq: uniqueIndex('users_email_uq').on(t.email).where(sql`email IS NOT NULL`), referralUq: uniqueIndex('users_referral_uq').on(t.referralCode).where(sql`referral_code IS NOT NULL`), ``` Reimplement `toJSON()` password/token stripping in the repository's read mapper (it deletes `refreshTokens`, `emailVerification*` before returning). ### 4.5 Atomic counter — `DerivedDestination.derivationIndex` Today allocation relies on Mongo atomicity. In PG use a real transaction with `SELECT … FOR UPDATE` on a per-(buyer,chain) counter row, or a dedicated sequence per chain. The `uniq_destination_by_buyer_seller_chain` unique index ports directly. `status` enum `('active','swept','sweeping','quarantined')` → `pgEnum`. ### 4.6 TTL → `pg_cron` `TempVerification` and `TelegramSession` stay on Mongo (ephemeral, recommended). If `Notification` (90-day TTL) ever moves: monthly range-partition + drop, or ```sql SELECT cron.schedule('notifications_ttl', '0 3 * * *', $$DELETE FROM notifications WHERE created_at < now() - interval '90 days'$$); ``` --- ## 5. The dual-write seam (the mechanic that makes it safe) ```ts // repositories/dual/DualWritePaymentRepo.ts export class DualWritePaymentRepo implements IPaymentRepo { constructor(private mongo: IPaymentRepo, private pg: IPaymentRepo) {} // READS: source of truth = Mongo until cutover findById(id) { return this.mongo.findById(id); } // WRITES: both, idempotently. Mongo first (authoritative); PG must not break the request. async create(input) { const m = await this.mongo.create(input); // returns doc incl. _id try { await this.pg.upsertFromMongo(m); // keyed by legacyObjectId / idempotencyKey } catch (e) { metrics.dualWriteError('payments', 'create', e); // alert, do NOT throw } return m; } async update(id, patch) { const m = await this.mongo.update(id, patch); try { await this.pg.upsertFromMongo(m); } catch (e) { metrics.dualWriteError('payments','update',e); } return m; } } ``` - **Mongo write is authoritative and must succeed**; PG write failures are logged + alerted, never surfaced to the user, during `dual` mode. (Once in `pg` mode, PG is authoritative and wrapped in real transactions.) - All PG writes are **idempotent upserts** keyed on `legacyObjectId` (or natural idempotency keys: `Payment` partial-unique set, `FundsLedgerEntry.idempotencyKey`). This lets backfill and live dual-write overlap without double-insert. - `$inc`/`$push` translate inside the repo: `$inc points` → `UPDATE … SET points = points + $1` in a transaction; `$push offers` → `INSERT INTO purchase_request_offers …`. --- ## 6. Phased execution Same phases as the guide §2, here with Drizzle-concrete entry/exit gates. Each phase ends with a collection in `pg` mode and dual-write removed only after the soak. ### Phase 0 — Foundations (2–5 wk) — *no data moves* - Stand up Postgres (per env), Drizzle, drizzle-kit, CI migrations. **Status 2026-05-31:** implemented in code and dev stack, but migrations must still be applied per target DB. - Build repository interfaces + `MongoRepo` wrappers for the relational-core domains (refactor services to call repos, not Mongoose directly). **Status 2026-05-31:** repo interfaces/implementations exist; service-layer wiring remains the bulk of the cutover risk. - Create `id_map`, the verification harness (§7), and the backfill batch runner skeleton. - **Exit:** all relational-core services call repositories; PG reachable everywhere; `id_map` + verify harness exist; CI runs migrations. ### Phase 1 — Address pilot (1–2 wk) - Smallest real domain; proves backfill → dual-write → verify → cutover end-to-end. - Reimplement the **one-primary-per-user** pre-save invariant as either a partial unique index `UNIQUE (user_id) WHERE primary = true` or a trigger. - **Exit:** `addresses` in `pg` mode in prod, invariant proven under concurrent writes, verify green, dual-write removed. ### Phase 2 — Reference/config (2–3 wk) - `Category` (self-FK, soft-delete), `LevelConfig`, `ConfigSetting`, `ConfigSettingHistory`, `ShopSettings`, `Review`. - Port seeds to run in dependency order. Enforce `ShopSettings.sellerId` unique, Category `parentId` ON DELETE SET NULL. - **Exit:** these read from PG; seeds run in PG. ### Phase 3 — User + auth core (3–5 wk) - `User` is the FK hub — **must precede the money core** so `id_map` for users is authoritative. - Normalize `profile`/`preferences`/`points`/`referralStats` into columns; extract `passkeys[]`, `refreshTokens[]` to child tables; partial-unique `email`/`referralCode`; reimplement `toJSON()` stripping; passkey `default: Date.now()` in app code. - Redis session/rate-limit + in-memory passkey challenge store stay as-is. - **Exit:** `users` in `pg` mode; referral self-FK intact; all auth flows pass; user uuids authoritative in `id_map`. ### Phase 4 — Money core (6–10 wk) — *the point of the project* - `PurchaseRequest`, `SellerOffer`, `Payment`, `FundsLedgerEntry`, `DerivedDestination`, `TrezorAccount`, `PointTransaction`. - Apply §4.1 (Mixed→discriminator+FK), §4.2 (offers/preferredSellers junctions, deliveryInfo/serviceInfo child tables), §4.5 (derivation counter). - **Wrap in real PG transactions the multi-doc writes that today have none:** `raiseDispute` (PurchaseRequest + Payment), payment confirm + `FundsLedgerEntry` AML-fee insert, referral reward (points + referralStats), PointsService flows (migrate its 2 `withTransaction` sites to PG `BEGIN/COMMIT`). - Preserve the `Payment` partial-unique idempotency index and `FundsLedgerEntry.idempotencyKey` uniqueness. - **Exit:** money core in `pg` mode; checksum equality on `funds_ledger_entries` sums & `payments` amounts across a full soak; idempotency + escrow-hold invariants pass concurrency tests. ### Phase 5 — Dispute + delivery (2–4 wk) - `Dispute.evidence[]`/`timeline[]` → child tables; pre-save timeline-append → explicit INSERT; delivery `$set/$push` nested updates → SQL. - `Dispute ↔ Chat` becomes a **cross-store call** (Chat stays on Mongo) — define the boundary API. - **Exit:** dispute lifecycle in `pg` mode; release-hold sync transactional. ### Phase 6 (deferred / optional) — `RequestTemplate`, `BlogPost` - Behind a search abstraction; `$regex` → PG trigram/FTS only if migrated. Otherwise leave on Mongo. ### Permanent on Mongo `Chat`, `Notification`, `TelegramSession`, `TempVerification`, `TelegramLink` link-state. Revisit only if dual-stack ops cost exceeds migration cost. --- ## 7. Verification (gate for every cutover) Three layers, **all green before any read flip**: 1. **Row counts** — per collection and per FK relationship, Mongo vs PG. Catches dropped/dangling rows. Run continuously during dual-write. 2. **Checksums** — column-level hashes; special attention to financial sums (`SUM(funds_ledger_entries.amount)`, `SUM(payments.amount)` grouped by status/provider) and the partial-unique idempotency set. 3. **Shadow reads** — in prod, serve from Mongo, asynchronously read PG for the same key, diff, alert on mismatch. **A clean shadow-read window (e.g. 7 days, zero diffs on hot paths) is the exit criterion for cutover.** ```ts // verify/shadow.ts — wrap a repo read in dual mode async function shadowRead(key, mongoFn, pgFn) { const m = await mongoFn(key); pgFn(key).then(p => { if (!deepEqualNormalized(m, p)) metrics.shadowMismatch(key, diff(m, p)); }) .catch(e => metrics.shadowError(key, e)); return m; // user always gets Mongo result } ``` --- ## 8. Cutover & rollback runbook (per collection) 1. **Backfill** in batches with checkpointing; allocate uuids → `id_map`; remap FKs from already-migrated parents. Re-runnable (idempotent upserts). 2. **Enable `dual`** (flag) — writes go to both; shadow-read diffing on. Backfill the delta accumulated during step 1. 3. **Soak** until row-count + checksum + shadow-read are clean for the agreed window. 4. **Flip reads to `pg`** (flag). Keep dual-write on. 5. **Soak again** (shorter). Rollback = flip reads back to `mongo`; data still mirrored, so rollback is instant. 6. **Decommission**: stop writing Mongo for that collection; archive the collection. > Near-zero downtime: there is no global write freeze except, optionally, a brief one during final ledger reconciliation for the money core. --- ## 9. First two weeks — concrete starter checklist - [ ] Add `drizzle-orm`, `pg`, `drizzle-kit`; create `src/db/{schema,client.ts,migrations}` + `drizzle.config.ts`. - [x] Provision Postgres in dev (compose) + define `PG_URL`; keep Mongo running alongside. Use Postgres 18 volume mount `/var/lib/postgresql`, not `/var/lib/postgresql/data`. - [ ] Write `id_map` schema; generate + run the first migration in CI. - [ ] Define `IAddressRepo`; implement `MongoAddressRepo` by moving the existing Mongoose calls behind it; refactor address service to use the repo. **No behavior change** — prove the seam is invisible (existing tests pass). - [ ] Build the verification harness (row count + checksum) against `addresses`. - [ ] Author `addresses` Drizzle schema (incl. one-primary partial unique index) + `DrizzleAddressRepo` + `DualWriteAddressRepo`. - [ ] Write the batch backfill for `addresses`; run dev backfill; confirm verify is green. - [ ] Flip dev to `dual`, then `pg`; document the flag flips. This is the template for all later phases. --- ## 10. Effort recap (from the guide) | Scope | Eng-weeks | Notes | |---|---|---| | **Partial — money/relational core (Phases 0–5 + cross-cutting)** | **~16–28** | Recommended stopping point; captures ~90% of value (ACID money + relational integrity). | | Full — all 23 collections | ~23–40 | Extra 7–12+ wks mostly buys Chat/Notification normalization the access patterns don't reward. | Add ~20% contingency for data-audit surprises in the Mixed-id fields. One focused engineer assumed; parallelize to compress wall-clock, not effort. --- > [!warning] Before trusting the code sketches > Drizzle schemas above use the real field names from `backend/src/models/` but are **first-pass sketches** — confirm the full `Payment.status` enum, the exact `amount` precision/scale your tokens need (USDT/USDC decimals), and audit which `Mixed` rows are actually strings vs ObjectIds **before** writing the money-core migration. See [[MongoDB to PostgreSQL Migration Guide]] §3/§5 for the authoritative per-field detail.