docs: add notification and concurrency test procedures

2026-06-06 10:57:44 +04:00
parent 9267961909
commit bee91dd01f
6 changed files with 552 additions and 16 deletions
--- a/Testing/Concurrency
+++ b/Testing/Concurrency
@@ -0,0 +1,270 @@
+---
+title: Concurrency and Performance Profile
+tags: [testing, performance, concurrency, profiling, e2e]
+created: 2026-06-06
+---
+
+# Concurrency and Performance Profile
+
+This procedure defines the ramp test for simultaneous escrow E2E flows and the
+report format for performance characteristics.
+
+The purpose is not only load generation. It must prove that business behavior
+remains correct under concurrency: payments confirm once, notifications are
+issued to the right users, and no request/offer/payment state leaks across
+parallel workers.
+
+## Test Shape
+
+One worker is one complete isolated E2E flow:
+
+```text
+buyer + sellers -> request -> bids -> accept -> payment intent -> tUSDT payment
+-> scanner confirmation -> seller delivery -> buyer confirmation
+```
+
+Each worker must use unique:
+
+- run id suffix;
+- buyer and seller users;
+- purchase request;
+- selected offer;
+- payment id;
+- scanner destination/baseline;
+- tx hash or simulated payment fixture, depending on mode.
+
+Notifications are mandatory inside every worker. See
+[[Notification Assertion Procedure]].
+
+## Ramp Plan
+
+Start with one simultaneous worker and double until a stop condition is reached:
+
+| Stage | Simultaneous workers | Purpose |
+|---|---:|---|
+| C1 | `1` | Baseline correctness and latency. |
+| C2 | `2` | Detect simple race conditions. |
+| C4 | `4` | Validate small parallel seller/payment load. |
+| C8 | `8` | First meaningful contention check. |
+| C16 | `16` | Stress DB/API/socket fanout. |
+| C32 | `32` | Upper dev-stack target before release planning. |
+| C64+ | `64+` | Only if C32 passes and infrastructure headroom is clear. |
+
+Hold each stage long enough to complete at least one full E2E round per worker.
+For API-only profiling, also support a fixed-duration mode such as 5 minutes per
+stage.
+
+## Modes
+
+| Mode | Payment behavior | Use |
+|---|---|---|
+| Live-chain mode | Real BSC Testnet tUSDT transfers | Final confidence at low concurrency; expensive/slower; consumes gas. |
+| Scanner fixture mode | Deterministic scanner/balance fixture or controlled test endpoint | High concurrency without chain bottleneck. Must not be enabled in production. |
+| API-only dry run | Runs request/offer/delivery and skips payment finalization | Marketplace/notification profiling without chain variables. |
+
+Live-chain mode should usually stop at low concurrency unless there is enough
+tBNB/tUSDT and the chain/RPC is reliable. Higher stages should use scanner
+fixture mode once implemented.
+
+## Metrics To Collect
+
+### Business correctness
+
+| Metric | Target |
+|---|---|
+| completed worker success rate | `100%` for C1-C8, `>= 99%` for C16+ after retries are classified |
+| duplicate payment credit count | `0` |
+| wrong-recipient notification count | `0` |
+| cross-worker state leak count | `0` |
+| non-buyer delivery confirmation success | `0` |
+| ledger inconsistency count | `0` |
+
+### API latency
+
+Initial performance goals for dev profiling:
+
+| Operation | p50 goal | p95 goal | p99 watch |
+|---|---:|---:|---:|
+| login | `< 300 ms` | `< 1 s` | `< 2 s` |
+| create request | `< 400 ms` | `< 1.5 s` | `< 3 s` |
+| create offer | `< 400 ms` | `< 1.5 s` | `< 3 s` |
+| accept offer | `< 500 ms` | `< 2 s` | `< 4 s` |
+| create payment intent | `< 750 ms` | `< 3 s` | `< 6 s` |
+| scanner balance check | `< 1 s` | `< 5 s` | `< 10 s` |
+| seller delivery | `< 500 ms` | `< 2 s` | `< 4 s` |
+| buyer delivery confirmation | `< 500 ms` | `< 2 s` | `< 4 s` |
+| notification visibility | `< 1 s` | `< 5 s` | `< 10 s` |
+
+These are starting goals, not final SLOs. The first complete C1-C32 run should
+produce a baseline report and then adjust targets with evidence.
+
+### Infrastructure
+
+Collect per stage:
+
+- backend CPU and memory;
+- frontend CPU and memory;
+- scanner CPU and memory;
+- MongoDB CPU, memory, connections, slow queries;
+- Postgres CPU, memory, connections, locks;
+- Redis CPU, memory, connected clients;
+- container restarts;
+- Docker image/version;
+- BSC Testnet RPC latency/error rate;
+- Socket.IO connected clients and emitted event count;
+- notification insert count and error count.
+
+Suggested host commands:
+
+```bash
+docker stats --no-stream
+docker ps --format '{{.Names}}\t{{.Image}}\t{{.Status}}'
+docker logs --since 5m escrow-backend
+docker logs --since 5m escrow-scanner
+```
+
+Do not paste secrets from environment output into reports.
+
+## Stop Conditions
+
+Stop the ramp immediately if any P0 condition appears:
+
+- payment marked paid without correct chain/token/destination/amount evidence;
+- duplicate ledger credit;
+- notification delivered to wrong user;
+- expected notification missing for a step without approved known-gap classification;
+- backend, scanner, Mongo, Postgres, or Redis container restarts;
+- sustained HTTP 5xx rate above `1%`;
+- p95 create payment intent exceeds `10 s` for two consecutive stages;
+- scanner confirmation/check p95 exceeds `30 s` outside known BSC Testnet RPC issues;
+- queue/backlog grows without draining after the stage ends;
+- host CPU remains above `85%` or memory above `90%` after cooldown.
+
+## Stage Procedure
+
+For each stage:
+
+1. Verify dev stack health.
+2. Capture container stats baseline.
+3. Create isolated worker test data.
+4. Start all workers at a barrier time.
+5. For every worker, execute full E2E and notification assertions.
+6. Capture per-operation timings.
+7. Capture infrastructure metrics during run.
+8. Wait for queues/notifications to settle.
+9. Capture cooldown metrics.
+10. Classify failures:
+    - product bug;
+    - test data/setup bug;
+    - BSC Testnet/RPC external issue;
+    - infrastructure capacity issue;
+    - known product gap.
+11. Decide whether to proceed to the next stage.
+
+## Worker Result Schema
+
+Each worker should produce a JSON result:
+
+```json
+{
+  "workerId": "C8-W03",
+  "stage": 8,
+  "runId": "20260606-perf-C8-W03",
+  "status": "pass",
+  "buyerUserId": "<id>",
+  "sellerUserIds": ["<id>", "<id>", "<id>"],
+  "purchaseRequestId": "<uuid>",
+  "selectedOfferId": "<uuid>",
+  "paymentId": "<uuid>",
+  "txHash": "0x...",
+  "timingsMs": {
+    "login": 180,
+    "createRequest": 420,
+    "createOffers": 910,
+    "acceptOffer": 330,
+    "createPaymentIntent": 850,
+    "scannerConfirm": 4200,
+    "sellerDelivery": 380,
+    "buyerConfirmDelivery": 410,
+    "total": 12100
+  },
+  "notifications": [
+    {
+      "step": "seller_offer_created",
+      "recipient": "buyer",
+      "observed": true,
+      "latencyMs": 640
+    }
+  ],
+  "errors": []
+}
+```
+
+## Report Template
+
+Create one report per full ramp:
+
+```markdown
+# Performance Profile Report - <date>
+
+## Summary
+
+- Target:
+- Backend/frontend/scanner versions:
+- Commit SHAs:
+- Payment mode: live-chain / scanner fixture / API-only
+- Ramp stages completed:
+- Overall result:
+
+## Key Findings
+
+| Finding | Severity | Evidence | Next action |
+|---|---|---|---|
+
+## Stage Results
+
+| Stage | Workers | Pass | Fail | p95 total | p95 payment intent | p95 scanner | p95 notification | 5xx rate |
+|---|---:|---:|---:|---:|---:|---:|---:|---:|
+
+## Notification Results
+
+| Step | Expected | Observed | Missing | Wrong recipient | p95 latency |
+|---|---:|---:|---:|---:|---:|
+
+## Infrastructure
+
+| Stage | Backend CPU/mem | Scanner CPU/mem | Mongo | Postgres | Redis | Restarts |
+|---|---|---|---|---|---|---:|
+
+## Payment Correctness
+
+- Duplicate credits:
+- Under/overpayment anomalies:
+- Scanner mismatches:
+- Ledger mismatches:
+
+## Bottlenecks
+
+- API:
+- Database:
+- Scanner:
+- Socket/notifications:
+- RPC/chain:
+
+## Decisions
+
+- Current safe dev concurrency:
+- Recommended production target:
+- Required fixes before next ramp:
+```
+
+## Initial Performance Characteristic Hypotheses
+
+These are the expectations to validate:
+
+- Request/offer APIs should scale mostly with Mongo/Postgres write throughput.
+- Notification latency will become a visible bottleneck before raw API latency if every offer/status change creates individual Mongo inserts and socket emits.
+- Scanner live-chain checks are likely bounded by BSC Testnet RPC latency and should be separated from API-only profiling.
+- Payment intent creation may become slower if destination derivation, token registry lookup, and scanner registration are serial.
+- Socket fanout should be watched at C16+ because each worker has multiple actors and multiple tabs/devices may multiply room membership.
+