Files
nick-doc/02 - Data Models/Money-Core Migration Workflow — Audit Report.md

15 KiB
Raw Blame History

title, tags, created, target, verdict, companion, updated
title tags created target verdict companion updated
Money-Core Migration Workflow — Audit Report
migration
audit
workflow
postgres
drizzle
2026-05-31 .claude/workflows/pg-money-core-migration.js go-with-changes (0 blockers) MongoDB to PostgreSQL Migration Plan (Drizzle) 2026-05-31 after backend integrate-main-into-development@3a50dc4

[!info] Provenance Generated 2026-05-31 by a 29-agent read-only audit workflow (4 review dimensions → adversarial verification → synthesis) over the pg-money-core-migration workflow design + the real money-core backend code. All recommended changes below have since been applied to the workflow script.

Audit Report — pg-money-core-migration.js Workflow

Target: /Users/manwe/CascadeProjects/escrow/.claude/workflows/pg-money-core-migration.js Status: Not yet run. 8 phases (Preflight → Infra → Schema → Repos → Backfill → Tests → VerifyGate → SelfAudit). Adversarial verification: Every high/blocker finding was challenged against the actual script. All of them were refuted. The remaining open items are medium/low enforcement-gap concerns that were not independently re-verified (n/a).

[!warning] Historical scope This is a workflow audit, not the current runtime status. The backend branch now contains the generated Postgres layer, but Mongo remains authoritative for normal traffic and service-layer wiring is still incomplete. Use Postgres Runtime Cutover Status for the current code/runtime boundary.


The workflow cannot itself cause fraud or fund/order dataloss, and that is the load-bearing question. Three independent, code-level facts make it safe to run:

  1. It never touches production data. SAFETY rule 1 (lines 3436, 67) binds every agent to code/SQL/test generation only; any DB the harness spins up is an ephemeral local throwaway. The backfill scripts are generated as artifacts for humans, explicitly never auto-run (line 160), and are gated on a non-prod MIGRATION_PG_URL allowlist.
  2. It cannot deploy. It works on an isolated branch feat/pg-money-core-migration cut from origin/main (lines 5455, 91), is additive-only under backend/src/db/** (rule 2), never bumps version (rule 3), and never pushes / opens a PR / flips a flag (rule 4). Per the CI audit (BRANCH_001, DEPLOY_001), creating or committing to this branch fires no CI; only a human push to main/master or a v* tag deploys — outside the workflow's reach.
  3. The worst realistic failure is recoverable. VerifyGate commits only on green typecheck+tests (lines 216217), and to a local branch. Even the post-commit SelfAudit ordering (SAFETY_001, refuted) is benign because the commit is local and reversible; the verdict still surfaces to the human before any push.

Every adversarially-tested blocker/high claim was refuted because the corresponding safety constraint is present in the prompts (SAFETY rules 17, plus per-phase reinforcement on schema indexes line 127, transactions line 145, dual-write line 146, decimals line 126, backfill guard line 160). The legitimate residual risk is that these are prompt-level instructions, not code-enforced gates — an agent could under-implement one and VerifyGate (which only runs typecheck+tests) would not catch it. That is a quality/trust gap, not a safety gap, and it is fully contained by the mandatory human review before cutover. Hence: go, but tighten the gates below first so the human reviewer is handed verifiable evidence instead of having to re-derive it.

Confirmed blockers: 0.


2. Confirmed Blockers (must fix before first run)

None. All blocker/high-severity findings were adversarially verified and refuted — the safety contract they claimed was missing is in fact specified in the workflow prompts. The workflow is safe to run as-is. The items in §3 are improvements that raise trust and catch agent under-implementation; they are not gating.

ID Why it is NOT a blocker
IDEMPOTENCY_INDEX_PRESERVATION / FK-010 Line 127 explicitly mandates the partial WHERE provider='request.network' AND direction='in' AND status='pending' and the ledger sparse-unique. Not enforced in code, but specified + audited by SelfAudit (line 240).
DECIMAL_PRECISION_TRUNCATION_RISK / FK-009 Line 126 mandates numeric(38,18), never float; SelfAudit hunts float money (line 239). Token-decimal scale flagged as a risk to surface (line 131).
MULTI_DOC_WRITE / FK-011 Line 145 requires db.transaction(...) for payment+ledger, referral reward, dispute hold/release; SAFETY rule 7 repeats it.
LOCKFILE_001 / BRANCH_002 / SAFETY_002 Workflow never pushes; lockfile/npm audit are CI-merge-time concerns the human owns. Refuted as in-workflow blockers.
MIXED_ID / immutability / prod-guard / version-bump All specified in prompts (lines 124, 48/line 70, 160, rule 3).

These convert prompt-level promises into verifiable evidence so the human reviewer and VerifyGate can confirm them, and they fix the two genuine structural risks (factory race, test-skip-as-pass).

Phase ID Change Severity
Preflight PREFLIGHT_BRANCH_ALREADY_EXISTS_REUSE_RISK After git switch ${BRANCH} (line 91, reuse path), run git status --porcelain on the target branch and STOP if dirty — re-running must not accumulate prior uncommitted work. medium
Repos PARALLEL_AGENTS_RACE / CONFLICT_001 Structural fix. Four parallel domain agents all "update factory.ts" (lines 138, 147); last writer wins, silently dropping 3 domains' wiring. Split Repos into: (a) parallel — each agent writes only its own repo files, no factory.ts; (b) sequential aggregator agent that builds one unified factory.ts. Then have VerifyGate assert factory.ts imports/exports all 4 domains. medium
Repos REPOS_PHASE_MONGO_WRAPPER_NOT_VERIFIED Require the MongoRepo to carry the original Mongoose call as an inline comment (or a method→source map) so the human can confirm "verbatim, no behavior change" (line 144). medium
Repos FK-012 For multi-doc money writes, the dual-write "log+alert, don't throw" (line 146) can leave Mongo funds released with no PG ledger row during the dual-write window. Add an explicit escalation/alert requirement and a reconcile sweep for these gaps (covered by reconcile.ts, line 170). medium
Schema IDEMPOTENCY_INDEX_REPRODUCTION_NOT_TESTED / SCHEMA_AGENTS_INDEX_REPRODUCTION_NO_VERIFY / FK-001/FK-006/FK-008 Have each schema agent emit the generated SQL (or index/CHECK list) in its return. Add a schema-audit step in VerifyGate that greps the generated migration for: (a) the partial WHERE on the RN idempotency index, (b) ledger idempotency_key IS NOT NULL, (c) each Mixed-id CHECK constraint, (d) presence of every Mongo index. Missing partial-WHERE or CHECK → blocker. high (where) / medium
Backfill NO_PROD_BACKFILL_RUNAWAY_GUARD / FK-003/FK-004/FK-005/FK-007 Make the non-prod guard a whitelist (localhost/127.0.0.1/named staging hosts), not a !includes('prod') blacklist, and require a guard unit-test that aborts on a mock prod DSN. Add per-FK parent-completion checks (TrezorAccount addresses[].paymentId, PointTransaction order/referredUser, Category parentId string audit) before child upsert. high (where) / medium
Backfill (verify) VERIFICATION_CHECKSUM_PRECISION / LEDGER_IMMUTABILITY In checksums.ts (line 168) compare fund sums as ::numeric cast to text, not float. In reconcile.ts (line 170) add an assertion that ledger rows are never UPDATE/DELETE (replicating the Mongo pre-save immutability hook) — via a PG trigger or a repo guard plus a test. medium / high (where)
Tests TEST_SKIP_WITHOUT_TEST_PG_SILENTLY_PASSES / TEST_001 Structural fix. Tests "skip if no test PG" (line 190) but VerifyGate treats skip as pass. Return {testsPassed, testsSkipped, skipReason}; VerifyGate must fail the commit if testsSkipped is true, so money-safety tests are never silently bypassed. medium
Tests PAYMENT_CONFIRM_LEDGER_ATOMICITY / MISSING_CONSTRAINT_SERIALIZATION / DUAL_WRITE idempotency Expand the atomicity test (line 182) to a concrete template: inject a mid-transaction ledger failure → assert payment status rolled back → retry → assert success (idempotent). Add a 2-concurrent-confirm test and document the required isolation level / FOR UPDATE locking to prevent double-release. high (where) / medium
VerifyGate VERSION_BUMP_DETECTION / VERSION_001 Add a hard diff check: `git diff ${BASE}...HEAD -- backend/package.json grep '"version"'` → if non-empty, BLOCKER and do not commit. Cheap insurance against an accidental deploy-triggering bump.
SelfAudit SELF_AUDIT_VERDICT_NOT_GATED After SelfAudit, gate the final return: if verdict === 'unsafe' or mustFixBeforeBackfill is non-empty, return an error/flag prominently (the commit is local and reversible, so this is advisory, but it must not be buried in the JSON). medium
New phase NO_MANUAL_BACKFILL_RUNBOOK_GENERATED Add a Documentation phase that emits backend/src/db/BACKFILL_RUNBOOK.md: dependency order, per-script invocation + env, checksum/row-count verification, dual-write enablement, soak/monitor, rollback. This is the cutover team's source of truth. medium

4. What the Workflow Does Well (trust these parts)

  • Airtight non-prod / non-deploy posture. SAFETY rules 14 are stated globally and re-stated in every phase prompt. Branch off origin/main, additive-only, no version bump, no push, no auto-backfill. CI audit independently confirms the branch cannot trigger a deploy.
  • Correct scope and FK ordering. Tier A (money/orders) + Tier B parents (User, Category) migrated first; COLLECTIONS ordered parents-first (line 59); backfill runner enforces User, Category → PurchaseRequest, SellerOffer → Payment, ... (line 159). Out-of-scope set (Chat, Notification, Address, Review) is sensible.
  • Money-safety modeling is explicitly specified. numeric(38,18) not float (line 126); the exact RN partial-unique WHERE, ledger sparse-unique, and derived-destination uniqueness (line 127); Mixed-id discriminator+typed-FK+external-ref+CHECK pattern (line 124); ledger append-only (rule 6); multi-doc writes in real PG transactions (rule 7, line 145).
  • Dual-write direction is correct. Reads authoritative from Mongo, writes Mongo-first then PG idempotent upsert, PG failure does not block Mongo (line 146) — the safe ordering for a not-yet-cut-over store.
  • Real verification harness + reconciliation. Row counts, fund-sum checksums, shadow-read diffing, and a money-specific reconcile (released/refunded payments have matching ledger entries, no double-release, monotonic escrow state) — lines 167170.
  • Layered safety net. A BULK(sonnet)/BRAIN(opus) model split that reserves Opus for the commit gate and the adversarial fund-safety SelfAudit (lines 1620, 247), which hunts exactly the right failure modes (float money, dropped idempotency, non-transactional writes, collapsed Mixed ids, prod backfill).

5. Refuted / Non-Issues (do not re-raise)

All of the following were challenged and confirmed already handled by the workflow prompts or by the no-push/no-prod design. They are not action items:

  • Prod backfill guard, version-bump, client.ts throw-on-unset-PG_URL, ledger immutability, Mixed-id CHECK, idempotency index WHERE, decimal-as-numeric, multi-doc transactions — all specified in SAFETY rules 17 and per-phase prompts (lines 105, 124, 126, 127, 145, 146, 160, rule 3/6/7). The "not code-enforced" critique is valid as a trust improvement (see §3) but does not make the workflow unsafe to run.
  • SAFETY_001 (self-audit after commit): refuted — commit is to a local, reversible, non-deploying branch; the verdict still reaches the human.
  • LOCKFILE_001 / BRANCH_002 / SAFETY_002 (lockfile, npm audit): refuted as in-workflow blockers — the workflow never installs, never pushes; these are CI-merge-time concerns owned by the human at integration. (Note: CI is Gitea Actions, not Woodpecker as the SAFETY comment says — a cosmetic doc fix only.)
  • BRANCH_001 / BRANCH_003 / DEPLOY_001 / VERSION_001 / MERGE_001 / PARALLEL_001: refuted — branch isolation, additive package.json edits, and "never push" are operationally sound; the residual is human discipline at push time, covered by §6.

6. Pre-Run Checklist & Human-Only Gates

Before invoking the workflow

  1. Apply the two structural §3 fixes first — they are cheap and prevent silent breakage:
    • Repos factory.ts race: split into parallel-repo-files + sequential-aggregator.
    • Tests skip-as-pass: return testsSkipped and have VerifyGate fail on skip.
  2. Confirm a clean working tree on the repo (Preflight will STOP otherwise — line 90) and that origin is fetched.
  3. Confirm branch state: if feat/pg-money-core-migration already exists, verify it has no stray uncommitted work (PREFLIGHT_BRANCH_ALREADY_EXISTS).
  4. Provide an ephemeral local test PG (container/throwaway) so money-safety tests actually run instead of skipping. If you cannot, accept that VerifyGate (post-fix) will refuse to commit.
  5. Verify parallel-agent scope is disjoint — per repo memory, the moojttaba agent pushes to the same branches. Ensure it is NOT touching user/payment/points/marketplace domains or backend/src/db/repositories/factory.ts.
  6. Set no production env vars in the harness shell. Ensure PG_URL (if set) and MIGRATION_PG_URL point only at local/staging.

Gates enforced during the run

  • VerifyGate commits only on green typecheck + tests; otherwise leaves work uncommitted with verbatim blockers (lines 216217). Trust this — but read the blockers array.
  • (After §3) VerifyGate also fails on version-bump diff, skipped tests, and missing partial-WHERE/CHECK in generated SQL.

After the run — human reviews REQUIRED before anything leaves the branch

  1. Read selfAudit.verdict and selfAudit.mustFixBeforeBackfill (lines 222248). Do not proceed if unsafe or non-empty must-fix.
  2. Code-review the generated schema for: partial-unique WHERE clauses, ledger immutability enforcement, Mixed-id CHECK constraints, numeric (no float), and db.transaction(...) around payment+ledger / referral / dispute writes.
  3. Confirm package.json version is unchanged and the lockfile situation is understood (update lockfiles + run npm/yarn audit at integration time, not in this workflow).

STAYS HUMAN-ONLY (workflow must never do these — and does not)

  • Running any backfill against real/staging data. Scripts are artifacts; a human runs them, in dependency order, against a whitelisted non-prod DSN, with --dry-run first and checksum verification after each step (use the §3 runbook).
  • Cutover / flipping any mongo|dual|pg rollout flag.
  • Pushing the branch, opening/merging a PR, tagging v* — any of which can trigger CI/deploy. Cherry-pick reviewed changes into a proper feature branch if needed; never push feat/pg-money-core-migration to main.