docs: add notification and concurrency test procedures
This commit is contained in:
270
11 - Testing/Concurrency and Performance Profile.md
Normal file
270
11 - Testing/Concurrency and Performance Profile.md
Normal file
@@ -0,0 +1,270 @@
|
||||
---
|
||||
title: Concurrency and Performance Profile
|
||||
tags: [testing, performance, concurrency, profiling, e2e]
|
||||
created: 2026-06-06
|
||||
---
|
||||
|
||||
# Concurrency and Performance Profile
|
||||
|
||||
This procedure defines the ramp test for simultaneous escrow E2E flows and the
|
||||
report format for performance characteristics.
|
||||
|
||||
The purpose is not only load generation. It must prove that business behavior
|
||||
remains correct under concurrency: payments confirm once, notifications are
|
||||
issued to the right users, and no request/offer/payment state leaks across
|
||||
parallel workers.
|
||||
|
||||
## Test Shape
|
||||
|
||||
One worker is one complete isolated E2E flow:
|
||||
|
||||
```text
|
||||
buyer + sellers -> request -> bids -> accept -> payment intent -> tUSDT payment
|
||||
-> scanner confirmation -> seller delivery -> buyer confirmation
|
||||
```
|
||||
|
||||
Each worker must use unique:
|
||||
|
||||
- run id suffix;
|
||||
- buyer and seller users;
|
||||
- purchase request;
|
||||
- selected offer;
|
||||
- payment id;
|
||||
- scanner destination/baseline;
|
||||
- tx hash or simulated payment fixture, depending on mode.
|
||||
|
||||
Notifications are mandatory inside every worker. See
|
||||
[[Notification Assertion Procedure]].
|
||||
|
||||
## Ramp Plan
|
||||
|
||||
Start with one simultaneous worker and double until a stop condition is reached:
|
||||
|
||||
| Stage | Simultaneous workers | Purpose |
|
||||
|---|---:|---|
|
||||
| C1 | `1` | Baseline correctness and latency. |
|
||||
| C2 | `2` | Detect simple race conditions. |
|
||||
| C4 | `4` | Validate small parallel seller/payment load. |
|
||||
| C8 | `8` | First meaningful contention check. |
|
||||
| C16 | `16` | Stress DB/API/socket fanout. |
|
||||
| C32 | `32` | Upper dev-stack target before release planning. |
|
||||
| C64+ | `64+` | Only if C32 passes and infrastructure headroom is clear. |
|
||||
|
||||
Hold each stage long enough to complete at least one full E2E round per worker.
|
||||
For API-only profiling, also support a fixed-duration mode such as 5 minutes per
|
||||
stage.
|
||||
|
||||
## Modes
|
||||
|
||||
| Mode | Payment behavior | Use |
|
||||
|---|---|---|
|
||||
| Live-chain mode | Real BSC Testnet tUSDT transfers | Final confidence at low concurrency; expensive/slower; consumes gas. |
|
||||
| Scanner fixture mode | Deterministic scanner/balance fixture or controlled test endpoint | High concurrency without chain bottleneck. Must not be enabled in production. |
|
||||
| API-only dry run | Runs request/offer/delivery and skips payment finalization | Marketplace/notification profiling without chain variables. |
|
||||
|
||||
Live-chain mode should usually stop at low concurrency unless there is enough
|
||||
tBNB/tUSDT and the chain/RPC is reliable. Higher stages should use scanner
|
||||
fixture mode once implemented.
|
||||
|
||||
## Metrics To Collect
|
||||
|
||||
### Business correctness
|
||||
|
||||
| Metric | Target |
|
||||
|---|---|
|
||||
| completed worker success rate | `100%` for C1-C8, `>= 99%` for C16+ after retries are classified |
|
||||
| duplicate payment credit count | `0` |
|
||||
| wrong-recipient notification count | `0` |
|
||||
| cross-worker state leak count | `0` |
|
||||
| non-buyer delivery confirmation success | `0` |
|
||||
| ledger inconsistency count | `0` |
|
||||
|
||||
### API latency
|
||||
|
||||
Initial performance goals for dev profiling:
|
||||
|
||||
| Operation | p50 goal | p95 goal | p99 watch |
|
||||
|---|---:|---:|---:|
|
||||
| login | `< 300 ms` | `< 1 s` | `< 2 s` |
|
||||
| create request | `< 400 ms` | `< 1.5 s` | `< 3 s` |
|
||||
| create offer | `< 400 ms` | `< 1.5 s` | `< 3 s` |
|
||||
| accept offer | `< 500 ms` | `< 2 s` | `< 4 s` |
|
||||
| create payment intent | `< 750 ms` | `< 3 s` | `< 6 s` |
|
||||
| scanner balance check | `< 1 s` | `< 5 s` | `< 10 s` |
|
||||
| seller delivery | `< 500 ms` | `< 2 s` | `< 4 s` |
|
||||
| buyer delivery confirmation | `< 500 ms` | `< 2 s` | `< 4 s` |
|
||||
| notification visibility | `< 1 s` | `< 5 s` | `< 10 s` |
|
||||
|
||||
These are starting goals, not final SLOs. The first complete C1-C32 run should
|
||||
produce a baseline report and then adjust targets with evidence.
|
||||
|
||||
### Infrastructure
|
||||
|
||||
Collect per stage:
|
||||
|
||||
- backend CPU and memory;
|
||||
- frontend CPU and memory;
|
||||
- scanner CPU and memory;
|
||||
- MongoDB CPU, memory, connections, slow queries;
|
||||
- Postgres CPU, memory, connections, locks;
|
||||
- Redis CPU, memory, connected clients;
|
||||
- container restarts;
|
||||
- Docker image/version;
|
||||
- BSC Testnet RPC latency/error rate;
|
||||
- Socket.IO connected clients and emitted event count;
|
||||
- notification insert count and error count.
|
||||
|
||||
Suggested host commands:
|
||||
|
||||
```bash
|
||||
docker stats --no-stream
|
||||
docker ps --format '{{.Names}}\t{{.Image}}\t{{.Status}}'
|
||||
docker logs --since 5m escrow-backend
|
||||
docker logs --since 5m escrow-scanner
|
||||
```
|
||||
|
||||
Do not paste secrets from environment output into reports.
|
||||
|
||||
## Stop Conditions
|
||||
|
||||
Stop the ramp immediately if any P0 condition appears:
|
||||
|
||||
- payment marked paid without correct chain/token/destination/amount evidence;
|
||||
- duplicate ledger credit;
|
||||
- notification delivered to wrong user;
|
||||
- expected notification missing for a step without approved known-gap classification;
|
||||
- backend, scanner, Mongo, Postgres, or Redis container restarts;
|
||||
- sustained HTTP 5xx rate above `1%`;
|
||||
- p95 create payment intent exceeds `10 s` for two consecutive stages;
|
||||
- scanner confirmation/check p95 exceeds `30 s` outside known BSC Testnet RPC issues;
|
||||
- queue/backlog grows without draining after the stage ends;
|
||||
- host CPU remains above `85%` or memory above `90%` after cooldown.
|
||||
|
||||
## Stage Procedure
|
||||
|
||||
For each stage:
|
||||
|
||||
1. Verify dev stack health.
|
||||
2. Capture container stats baseline.
|
||||
3. Create isolated worker test data.
|
||||
4. Start all workers at a barrier time.
|
||||
5. For every worker, execute full E2E and notification assertions.
|
||||
6. Capture per-operation timings.
|
||||
7. Capture infrastructure metrics during run.
|
||||
8. Wait for queues/notifications to settle.
|
||||
9. Capture cooldown metrics.
|
||||
10. Classify failures:
|
||||
- product bug;
|
||||
- test data/setup bug;
|
||||
- BSC Testnet/RPC external issue;
|
||||
- infrastructure capacity issue;
|
||||
- known product gap.
|
||||
11. Decide whether to proceed to the next stage.
|
||||
|
||||
## Worker Result Schema
|
||||
|
||||
Each worker should produce a JSON result:
|
||||
|
||||
```json
|
||||
{
|
||||
"workerId": "C8-W03",
|
||||
"stage": 8,
|
||||
"runId": "20260606-perf-C8-W03",
|
||||
"status": "pass",
|
||||
"buyerUserId": "<id>",
|
||||
"sellerUserIds": ["<id>", "<id>", "<id>"],
|
||||
"purchaseRequestId": "<uuid>",
|
||||
"selectedOfferId": "<uuid>",
|
||||
"paymentId": "<uuid>",
|
||||
"txHash": "0x...",
|
||||
"timingsMs": {
|
||||
"login": 180,
|
||||
"createRequest": 420,
|
||||
"createOffers": 910,
|
||||
"acceptOffer": 330,
|
||||
"createPaymentIntent": 850,
|
||||
"scannerConfirm": 4200,
|
||||
"sellerDelivery": 380,
|
||||
"buyerConfirmDelivery": 410,
|
||||
"total": 12100
|
||||
},
|
||||
"notifications": [
|
||||
{
|
||||
"step": "seller_offer_created",
|
||||
"recipient": "buyer",
|
||||
"observed": true,
|
||||
"latencyMs": 640
|
||||
}
|
||||
],
|
||||
"errors": []
|
||||
}
|
||||
```
|
||||
|
||||
## Report Template
|
||||
|
||||
Create one report per full ramp:
|
||||
|
||||
```markdown
|
||||
# Performance Profile Report - <date>
|
||||
|
||||
## Summary
|
||||
|
||||
- Target:
|
||||
- Backend/frontend/scanner versions:
|
||||
- Commit SHAs:
|
||||
- Payment mode: live-chain / scanner fixture / API-only
|
||||
- Ramp stages completed:
|
||||
- Overall result:
|
||||
|
||||
## Key Findings
|
||||
|
||||
| Finding | Severity | Evidence | Next action |
|
||||
|---|---|---|---|
|
||||
|
||||
## Stage Results
|
||||
|
||||
| Stage | Workers | Pass | Fail | p95 total | p95 payment intent | p95 scanner | p95 notification | 5xx rate |
|
||||
|---|---:|---:|---:|---:|---:|---:|---:|---:|
|
||||
|
||||
## Notification Results
|
||||
|
||||
| Step | Expected | Observed | Missing | Wrong recipient | p95 latency |
|
||||
|---|---:|---:|---:|---:|---:|
|
||||
|
||||
## Infrastructure
|
||||
|
||||
| Stage | Backend CPU/mem | Scanner CPU/mem | Mongo | Postgres | Redis | Restarts |
|
||||
|---|---|---|---|---|---|---:|
|
||||
|
||||
## Payment Correctness
|
||||
|
||||
- Duplicate credits:
|
||||
- Under/overpayment anomalies:
|
||||
- Scanner mismatches:
|
||||
- Ledger mismatches:
|
||||
|
||||
## Bottlenecks
|
||||
|
||||
- API:
|
||||
- Database:
|
||||
- Scanner:
|
||||
- Socket/notifications:
|
||||
- RPC/chain:
|
||||
|
||||
## Decisions
|
||||
|
||||
- Current safe dev concurrency:
|
||||
- Recommended production target:
|
||||
- Required fixes before next ramp:
|
||||
```
|
||||
|
||||
## Initial Performance Characteristic Hypotheses
|
||||
|
||||
These are the expectations to validate:
|
||||
|
||||
- Request/offer APIs should scale mostly with Mongo/Postgres write throughput.
|
||||
- Notification latency will become a visible bottleneck before raw API latency if every offer/status change creates individual Mongo inserts and socket emits.
|
||||
- Scanner live-chain checks are likely bounded by BSC Testnet RPC latency and should be separated from API-only profiling.
|
||||
- Payment intent creation may become slower if destination derivation, token registry lookup, and scanner registration are serial.
|
||||
- Socket fanout should be watched at C16+ because each worker has multiple actors and multiple tabs/devices may multiply room membership.
|
||||
|
||||
Reference in New Issue
Block a user