Files
nick-doc/02 - Data Models/Money-Core Migration Workflow — Audit Report.md

121 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Money-Core Migration Workflow — Audit Report
tags: [migration, audit, workflow, postgres, drizzle]
created: 2026-05-31
target: .claude/workflows/pg-money-core-migration.js
verdict: go-with-changes (0 blockers)
companion: "[[MongoDB to PostgreSQL Migration Plan (Drizzle)]]"
updated: 2026-05-31 after backend integrate-main-into-development@3a50dc4
---
> [!info] Provenance
> Generated 2026-05-31 by a 29-agent read-only audit workflow (4 review dimensions → adversarial verification → synthesis) over the `pg-money-core-migration` workflow design + the real money-core backend code. All recommended changes below have since been **applied** to the workflow script.
# Audit Report — `pg-money-core-migration.js` Workflow
**Target:** `/Users/manwe/CascadeProjects/escrow/.claude/workflows/pg-money-core-migration.js`
**Status:** Not yet run. 8 phases (Preflight → Infra → Schema → Repos → Backfill → Tests → VerifyGate → SelfAudit).
**Adversarial verification:** Every `high`/`blocker` finding was challenged against the actual script. **All of them were refuted.** The remaining open items are `medium`/`low` enforcement-gap concerns that were not independently re-verified (`n/a`).
> [!warning] Historical scope
> This is a workflow audit, not the current runtime status. The backend branch now contains the generated Postgres layer, but Mongo remains authoritative for normal traffic and service-layer wiring is still incomplete. Use [[Postgres Runtime Cutover Status]] for the current code/runtime boundary.
---
## 1. Verdict: **GO (with recommended changes)**
The workflow **cannot itself cause fraud or fund/order dataloss**, and that is the load-bearing question. Three independent, code-level facts make it safe to *run*:
1. **It never touches production data.** SAFETY rule 1 (lines 3436, 67) binds every agent to code/SQL/test generation only; any DB the harness spins up is an ephemeral local throwaway. The backfill scripts are generated as *artifacts for humans*, explicitly never auto-run (line 160), and are gated on a non-prod `MIGRATION_PG_URL` allowlist.
2. **It cannot deploy.** It works on an isolated branch `feat/pg-money-core-migration` cut from `origin/main` (lines 5455, 91), is additive-only under `backend/src/db/**` (rule 2), never bumps version (rule 3), and never pushes / opens a PR / flips a flag (rule 4). Per the CI audit (BRANCH_001, DEPLOY_001), creating or committing to this branch fires no CI; only a human push to `main`/`master` or a `v*` tag deploys — outside the workflow's reach.
3. **The worst realistic failure is recoverable.** VerifyGate commits only on green typecheck+tests (lines 216217), and to a *local* branch. Even the post-commit SelfAudit ordering (SAFETY_001, refuted) is benign because the commit is local and reversible; the verdict still surfaces to the human before any push.
Every adversarially-tested `blocker`/`high` claim was **refuted** because the corresponding safety constraint *is* present in the prompts (SAFETY rules 17, plus per-phase reinforcement on schema indexes line 127, transactions line 145, dual-write line 146, decimals line 126, backfill guard line 160). The legitimate residual risk is that these are **prompt-level instructions, not code-enforced gates** — an agent could under-implement one and VerifyGate (which only runs typecheck+tests) would not catch it. That is a *quality/trust* gap, not a *safety* gap, and it is fully contained by the mandatory human review before cutover. Hence: go, but tighten the gates below first so the human reviewer is handed verifiable evidence instead of having to re-derive it.
**Confirmed blockers: 0.**
---
## 2. Confirmed Blockers (must fix before first run)
**None.** All `blocker`/`high`-severity findings were adversarially verified and **refuted** — the safety contract they claimed was missing is in fact specified in the workflow prompts. The workflow is safe to run as-is. The items in §3 are improvements that raise trust and catch agent under-implementation; they are not gating.
| ID | Why it is NOT a blocker |
|---|---|
| IDEMPOTENCY_INDEX_PRESERVATION / FK-010 | Line 127 explicitly mandates the partial `WHERE provider='request.network' AND direction='in' AND status='pending'` and the ledger sparse-unique. Not enforced in code, but specified + audited by SelfAudit (line 240). |
| DECIMAL_PRECISION_TRUNCATION_RISK / FK-009 | Line 126 mandates `numeric(38,18)`, never float; SelfAudit hunts float money (line 239). Token-decimal scale flagged as a risk to surface (line 131). |
| MULTI_DOC_WRITE / FK-011 | Line 145 requires `db.transaction(...)` for payment+ledger, referral reward, dispute hold/release; SAFETY rule 7 repeats it. |
| LOCKFILE_001 / BRANCH_002 / SAFETY_002 | Workflow never pushes; lockfile/`npm audit` are CI-merge-time concerns the human owns. Refuted as in-workflow blockers. |
| MIXED_ID / immutability / prod-guard / version-bump | All specified in prompts (lines 124, 48/line 70, 160, rule 3). |
---
## 3. Recommended Changes Before Running (high / medium), by phase
These convert prompt-level promises into **verifiable evidence** so the human reviewer and VerifyGate can confirm them, and they fix the two genuine structural risks (factory race, test-skip-as-pass).
| Phase | ID | Change | Severity |
|---|---|---|---|
| **Preflight** | PREFLIGHT_BRANCH_ALREADY_EXISTS_REUSE_RISK | After `git switch ${BRANCH}` (line 91, reuse path), run `git status --porcelain` on the target branch and STOP if dirty — re-running must not accumulate prior uncommitted work. | medium |
| **Repos** | PARALLEL_AGENTS_RACE / CONFLICT_001 | **Structural fix.** Four parallel domain agents all "update `factory.ts`" (lines 138, 147); last writer wins, silently dropping 3 domains' wiring. Split Repos into: (a) parallel — each agent writes only its own repo files, no `factory.ts`; (b) sequential aggregator agent that builds one unified `factory.ts`. Then have VerifyGate assert `factory.ts` imports/exports all 4 domains. | medium |
| **Repos** | REPOS_PHASE_MONGO_WRAPPER_NOT_VERIFIED | Require the MongoRepo to carry the original Mongoose call as an inline comment (or a method→source map) so the human can confirm "verbatim, no behavior change" (line 144). | medium |
| **Repos** | FK-012 | For multi-doc money writes, the dual-write "log+alert, don't throw" (line 146) can leave Mongo funds released with no PG ledger row during the dual-write window. Add an explicit escalation/alert requirement and a reconcile sweep for these gaps (covered by reconcile.ts, line 170). | medium |
| **Schema** | IDEMPOTENCY_INDEX_REPRODUCTION_NOT_TESTED / SCHEMA_AGENTS_INDEX_REPRODUCTION_NO_VERIFY / FK-001/FK-006/FK-008 | Have each schema agent emit the generated SQL (or index/CHECK list) in its return. Add a schema-audit step in VerifyGate that greps the generated migration for: (a) the partial `WHERE` on the RN idempotency index, (b) ledger `idempotency_key IS NOT NULL`, (c) each Mixed-id CHECK constraint, (d) presence of every Mongo index. Missing partial-`WHERE` or CHECK → blocker. | high (where) / medium |
| **Backfill** | NO_PROD_BACKFILL_RUNAWAY_GUARD / FK-003/FK-004/FK-005/FK-007 | Make the non-prod guard a **whitelist** (`localhost`/`127.0.0.1`/named staging hosts), not a `!includes('prod')` blacklist, and require a guard unit-test that aborts on a mock prod DSN. Add per-FK parent-completion checks (TrezorAccount `addresses[].paymentId`, PointTransaction `order`/`referredUser`, Category `parentId` string audit) before child upsert. | high (where) / medium |
| **Backfill (verify)** | VERIFICATION_CHECKSUM_PRECISION / LEDGER_IMMUTABILITY | In `checksums.ts` (line 168) compare fund sums as `::numeric` cast to **text**, not float. In `reconcile.ts` (line 170) add an assertion that ledger rows are never UPDATE/DELETE (replicating the Mongo pre-save immutability hook) — via a PG trigger or a repo guard plus a test. | medium / high (where) |
| **Tests** | TEST_SKIP_WITHOUT_TEST_PG_SILENTLY_PASSES / TEST_001 | **Structural fix.** Tests "skip if no test PG" (line 190) but VerifyGate treats skip as pass. Return `{testsPassed, testsSkipped, skipReason}`; VerifyGate must **fail the commit if `testsSkipped` is true**, so money-safety tests are never silently bypassed. | medium |
| **Tests** | PAYMENT_CONFIRM_LEDGER_ATOMICITY / MISSING_CONSTRAINT_SERIALIZATION / DUAL_WRITE idempotency | Expand the atomicity test (line 182) to a concrete template: inject a mid-transaction ledger failure → assert payment status rolled back → retry → assert success (idempotent). Add a 2-concurrent-confirm test and document the required isolation level / `FOR UPDATE` locking to prevent double-release. | high (where) / medium |
| **VerifyGate** | VERSION_BUMP_DETECTION / VERSION_001 | Add a hard diff check: `git diff ${BASE}...HEAD -- backend/package.json | grep '"version"'` → if non-empty, BLOCKER and do not commit. Cheap insurance against an accidental deploy-triggering bump. | high (where) / low |
| **SelfAudit** | SELF_AUDIT_VERDICT_NOT_GATED | After SelfAudit, gate the final return: if `verdict === 'unsafe'` or `mustFixBeforeBackfill` is non-empty, return an error/flag prominently (the commit is local and reversible, so this is advisory, but it must not be buried in the JSON). | medium |
| **New phase** | NO_MANUAL_BACKFILL_RUNBOOK_GENERATED | Add a Documentation phase that emits `backend/src/db/BACKFILL_RUNBOOK.md`: dependency order, per-script invocation + env, checksum/row-count verification, dual-write enablement, soak/monitor, rollback. This is the cutover team's source of truth. | medium |
---
## 4. What the Workflow Does Well (trust these parts)
- **Airtight non-prod / non-deploy posture.** SAFETY rules 14 are stated globally and re-stated in every phase prompt. Branch off `origin/main`, additive-only, no version bump, no push, no auto-backfill. CI audit independently confirms the branch cannot trigger a deploy.
- **Correct scope and FK ordering.** Tier A (money/orders) + Tier B parents (User, Category) migrated first; `COLLECTIONS` ordered parents-first (line 59); backfill runner enforces `User, Category → PurchaseRequest, SellerOffer → Payment, ...` (line 159). Out-of-scope set (Chat, Notification, Address, Review) is sensible.
- **Money-safety modeling is explicitly specified.** `numeric(38,18)` not float (line 126); the exact RN partial-unique `WHERE`, ledger sparse-unique, and derived-destination uniqueness (line 127); Mixed-id discriminator+typed-FK+external-ref+CHECK pattern (line 124); ledger append-only (rule 6); multi-doc writes in real PG transactions (rule 7, line 145).
- **Dual-write direction is correct.** Reads authoritative from Mongo, writes Mongo-first then PG idempotent upsert, PG failure does not block Mongo (line 146) — the safe ordering for a not-yet-cut-over store.
- **Real verification harness + reconciliation.** Row counts, fund-sum checksums, shadow-read diffing, and a money-specific reconcile (released/refunded payments have matching ledger entries, no double-release, monotonic escrow state) — lines 167170.
- **Layered safety net.** A `BULK`(sonnet)/`BRAIN`(opus) model split that reserves Opus for the commit gate and the adversarial fund-safety SelfAudit (lines 1620, 247), which hunts exactly the right failure modes (float money, dropped idempotency, non-transactional writes, collapsed Mixed ids, prod backfill).
---
## 5. Refuted / Non-Issues (do not re-raise)
All of the following were challenged and confirmed already handled by the workflow prompts or by the no-push/no-prod design. They are **not** action items:
- **Prod backfill guard, version-bump, client.ts throw-on-unset-PG_URL, ledger immutability, Mixed-id CHECK, idempotency index `WHERE`, decimal-as-numeric, multi-doc transactions** — all specified in SAFETY rules 17 and per-phase prompts (lines 105, 124, 126, 127, 145, 146, 160, rule 3/6/7). The "not code-enforced" critique is valid as a *trust* improvement (see §3) but does not make the workflow unsafe to run.
- **SAFETY_001 (self-audit after commit):** refuted — commit is to a local, reversible, non-deploying branch; the verdict still reaches the human.
- **LOCKFILE_001 / BRANCH_002 / SAFETY_002 (lockfile, npm audit):** refuted as in-workflow blockers — the workflow never installs, never pushes; these are CI-merge-time concerns owned by the human at integration. (Note: CI is Gitea Actions, not Woodpecker as the SAFETY comment says — a cosmetic doc fix only.)
- **BRANCH_001 / BRANCH_003 / DEPLOY_001 / VERSION_001 / MERGE_001 / PARALLEL_001:** refuted — branch isolation, additive package.json edits, and "never push" are operationally sound; the residual is human discipline at push time, covered by §6.
---
## 6. Pre-Run Checklist & Human-Only Gates
### Before invoking the workflow
1. **Apply the two structural §3 fixes first** — they are cheap and prevent silent breakage:
- Repos `factory.ts` race: split into parallel-repo-files + sequential-aggregator.
- Tests skip-as-pass: return `testsSkipped` and have VerifyGate fail on skip.
2. **Confirm a clean working tree** on the repo (Preflight will STOP otherwise — line 90) and that `origin` is fetched.
3. **Confirm branch state:** if `feat/pg-money-core-migration` already exists, verify it has no stray uncommitted work (PREFLIGHT_BRANCH_ALREADY_EXISTS).
4. **Provide an ephemeral local test PG** (container/throwaway) so money-safety tests actually run instead of skipping. If you cannot, accept that VerifyGate (post-fix) will refuse to commit.
5. **Verify parallel-agent scope is disjoint** — per repo memory, the moojttaba agent pushes to the same branches. Ensure it is NOT touching `user/payment/points/marketplace` domains or `backend/src/db/repositories/factory.ts`.
6. **Set no production env vars** in the harness shell. Ensure `PG_URL` (if set) and `MIGRATION_PG_URL` point only at local/staging.
### Gates enforced during the run
- VerifyGate commits **only** on green typecheck + tests; otherwise leaves work uncommitted with verbatim blockers (lines 216217). Trust this — but read the `blockers` array.
- (After §3) VerifyGate also fails on version-bump diff, skipped tests, and missing partial-`WHERE`/CHECK in generated SQL.
### After the run — human reviews REQUIRED before anything leaves the branch
1. Read `selfAudit.verdict` and `selfAudit.mustFixBeforeBackfill` (lines 222248). **Do not proceed if `unsafe` or non-empty must-fix.**
2. Code-review the generated schema for: partial-unique `WHERE` clauses, ledger immutability enforcement, Mixed-id CHECK constraints, `numeric` (no float), and `db.transaction(...)` around payment+ledger / referral / dispute writes.
3. Confirm `package.json` version is unchanged and the lockfile situation is understood (update lockfiles + run `npm/yarn audit` at integration time, not in this workflow).
### STAYS HUMAN-ONLY (workflow must never do these — and does not)
- **Running any backfill against real/staging data.** Scripts are artifacts; a human runs them, in dependency order, against a whitelisted non-prod DSN, with `--dry-run` first and checksum verification after each step (use the §3 runbook).
- **Cutover / flipping any `mongo|dual|pg` rollout flag.**
- **Pushing the branch, opening/merging a PR, tagging `v*`** — any of which can trigger CI/deploy. Cherry-pick reviewed changes into a proper feature branch if needed; never push `feat/pg-money-core-migration` to `main`.