271 lines
8.1 KiB
Markdown
271 lines
8.1 KiB
Markdown
---
|
|
title: Concurrency and Performance Profile
|
|
tags: [testing, performance, concurrency, profiling, e2e]
|
|
created: 2026-06-06
|
|
---
|
|
|
|
# Concurrency and Performance Profile
|
|
|
|
This procedure defines the ramp test for simultaneous escrow E2E flows and the
|
|
report format for performance characteristics.
|
|
|
|
The purpose is not only load generation. It must prove that business behavior
|
|
remains correct under concurrency: payments confirm once, notifications are
|
|
issued to the right users, and no request/offer/payment state leaks across
|
|
parallel workers.
|
|
|
|
## Test Shape
|
|
|
|
One worker is one complete isolated E2E flow:
|
|
|
|
```text
|
|
buyer + sellers -> request -> bids -> accept -> payment intent -> tUSDT payment
|
|
-> scanner confirmation -> seller delivery -> buyer confirmation
|
|
```
|
|
|
|
Each worker must use unique:
|
|
|
|
- run id suffix;
|
|
- buyer and seller users;
|
|
- purchase request;
|
|
- selected offer;
|
|
- payment id;
|
|
- scanner destination/baseline;
|
|
- tx hash or simulated payment fixture, depending on mode.
|
|
|
|
Notifications are mandatory inside every worker. See
|
|
[[Notification Assertion Procedure]].
|
|
|
|
## Ramp Plan
|
|
|
|
Start with one simultaneous worker and double until a stop condition is reached:
|
|
|
|
| Stage | Simultaneous workers | Purpose |
|
|
|---|---:|---|
|
|
| C1 | `1` | Baseline correctness and latency. |
|
|
| C2 | `2` | Detect simple race conditions. |
|
|
| C4 | `4` | Validate small parallel seller/payment load. |
|
|
| C8 | `8` | First meaningful contention check. |
|
|
| C16 | `16` | Stress DB/API/socket fanout. |
|
|
| C32 | `32` | Upper dev-stack target before release planning. |
|
|
| C64+ | `64+` | Only if C32 passes and infrastructure headroom is clear. |
|
|
|
|
Hold each stage long enough to complete at least one full E2E round per worker.
|
|
For API-only profiling, also support a fixed-duration mode such as 5 minutes per
|
|
stage.
|
|
|
|
## Modes
|
|
|
|
| Mode | Payment behavior | Use |
|
|
|---|---|---|
|
|
| Live-chain mode | Real BSC Testnet tUSDT transfers | Final confidence at low concurrency; expensive/slower; consumes gas. |
|
|
| Scanner fixture mode | Deterministic scanner/balance fixture or controlled test endpoint | High concurrency without chain bottleneck. Must not be enabled in production. |
|
|
| API-only dry run | Runs request/offer/delivery and skips payment finalization | Marketplace/notification profiling without chain variables. |
|
|
|
|
Live-chain mode should usually stop at low concurrency unless there is enough
|
|
tBNB/tUSDT and the chain/RPC is reliable. Higher stages should use scanner
|
|
fixture mode once implemented.
|
|
|
|
## Metrics To Collect
|
|
|
|
### Business correctness
|
|
|
|
| Metric | Target |
|
|
|---|---|
|
|
| completed worker success rate | `100%` for C1-C8, `>= 99%` for C16+ after retries are classified |
|
|
| duplicate payment credit count | `0` |
|
|
| wrong-recipient notification count | `0` |
|
|
| cross-worker state leak count | `0` |
|
|
| non-buyer delivery confirmation success | `0` |
|
|
| ledger inconsistency count | `0` |
|
|
|
|
### API latency
|
|
|
|
Initial performance goals for dev profiling:
|
|
|
|
| Operation | p50 goal | p95 goal | p99 watch |
|
|
|---|---:|---:|---:|
|
|
| login | `< 300 ms` | `< 1 s` | `< 2 s` |
|
|
| create request | `< 400 ms` | `< 1.5 s` | `< 3 s` |
|
|
| create offer | `< 400 ms` | `< 1.5 s` | `< 3 s` |
|
|
| accept offer | `< 500 ms` | `< 2 s` | `< 4 s` |
|
|
| create payment intent | `< 750 ms` | `< 3 s` | `< 6 s` |
|
|
| scanner balance check | `< 1 s` | `< 5 s` | `< 10 s` |
|
|
| seller delivery | `< 500 ms` | `< 2 s` | `< 4 s` |
|
|
| buyer delivery confirmation | `< 500 ms` | `< 2 s` | `< 4 s` |
|
|
| notification visibility | `< 1 s` | `< 5 s` | `< 10 s` |
|
|
|
|
These are starting goals, not final SLOs. The first complete C1-C32 run should
|
|
produce a baseline report and then adjust targets with evidence.
|
|
|
|
### Infrastructure
|
|
|
|
Collect per stage:
|
|
|
|
- backend CPU and memory;
|
|
- frontend CPU and memory;
|
|
- scanner CPU and memory;
|
|
- MongoDB CPU, memory, connections, slow queries;
|
|
- Postgres CPU, memory, connections, locks;
|
|
- Redis CPU, memory, connected clients;
|
|
- container restarts;
|
|
- Docker image/version;
|
|
- BSC Testnet RPC latency/error rate;
|
|
- Socket.IO connected clients and emitted event count;
|
|
- notification insert count and error count.
|
|
|
|
Suggested host commands:
|
|
|
|
```bash
|
|
docker stats --no-stream
|
|
docker ps --format '{{.Names}}\t{{.Image}}\t{{.Status}}'
|
|
docker logs --since 5m escrow-backend
|
|
docker logs --since 5m escrow-scanner
|
|
```
|
|
|
|
Do not paste secrets from environment output into reports.
|
|
|
|
## Stop Conditions
|
|
|
|
Stop the ramp immediately if any P0 condition appears:
|
|
|
|
- payment marked paid without correct chain/token/destination/amount evidence;
|
|
- duplicate ledger credit;
|
|
- notification delivered to wrong user;
|
|
- expected notification missing for a step without approved known-gap classification;
|
|
- backend, scanner, Mongo, Postgres, or Redis container restarts;
|
|
- sustained HTTP 5xx rate above `1%`;
|
|
- p95 create payment intent exceeds `10 s` for two consecutive stages;
|
|
- scanner confirmation/check p95 exceeds `30 s` outside known BSC Testnet RPC issues;
|
|
- queue/backlog grows without draining after the stage ends;
|
|
- host CPU remains above `85%` or memory above `90%` after cooldown.
|
|
|
|
## Stage Procedure
|
|
|
|
For each stage:
|
|
|
|
1. Verify dev stack health.
|
|
2. Capture container stats baseline.
|
|
3. Create isolated worker test data.
|
|
4. Start all workers at a barrier time.
|
|
5. For every worker, execute full E2E and notification assertions.
|
|
6. Capture per-operation timings.
|
|
7. Capture infrastructure metrics during run.
|
|
8. Wait for queues/notifications to settle.
|
|
9. Capture cooldown metrics.
|
|
10. Classify failures:
|
|
- product bug;
|
|
- test data/setup bug;
|
|
- BSC Testnet/RPC external issue;
|
|
- infrastructure capacity issue;
|
|
- known product gap.
|
|
11. Decide whether to proceed to the next stage.
|
|
|
|
## Worker Result Schema
|
|
|
|
Each worker should produce a JSON result:
|
|
|
|
```json
|
|
{
|
|
"workerId": "C8-W03",
|
|
"stage": 8,
|
|
"runId": "20260606-perf-C8-W03",
|
|
"status": "pass",
|
|
"buyerUserId": "<id>",
|
|
"sellerUserIds": ["<id>", "<id>", "<id>"],
|
|
"purchaseRequestId": "<uuid>",
|
|
"selectedOfferId": "<uuid>",
|
|
"paymentId": "<uuid>",
|
|
"txHash": "0x...",
|
|
"timingsMs": {
|
|
"login": 180,
|
|
"createRequest": 420,
|
|
"createOffers": 910,
|
|
"acceptOffer": 330,
|
|
"createPaymentIntent": 850,
|
|
"scannerConfirm": 4200,
|
|
"sellerDelivery": 380,
|
|
"buyerConfirmDelivery": 410,
|
|
"total": 12100
|
|
},
|
|
"notifications": [
|
|
{
|
|
"step": "seller_offer_created",
|
|
"recipient": "buyer",
|
|
"observed": true,
|
|
"latencyMs": 640
|
|
}
|
|
],
|
|
"errors": []
|
|
}
|
|
```
|
|
|
|
## Report Template
|
|
|
|
Create one report per full ramp:
|
|
|
|
```markdown
|
|
# Performance Profile Report - <date>
|
|
|
|
## Summary
|
|
|
|
- Target:
|
|
- Backend/frontend/scanner versions:
|
|
- Commit SHAs:
|
|
- Payment mode: live-chain / scanner fixture / API-only
|
|
- Ramp stages completed:
|
|
- Overall result:
|
|
|
|
## Key Findings
|
|
|
|
| Finding | Severity | Evidence | Next action |
|
|
|---|---|---|---|
|
|
|
|
## Stage Results
|
|
|
|
| Stage | Workers | Pass | Fail | p95 total | p95 payment intent | p95 scanner | p95 notification | 5xx rate |
|
|
|---|---:|---:|---:|---:|---:|---:|---:|---:|
|
|
|
|
## Notification Results
|
|
|
|
| Step | Expected | Observed | Missing | Wrong recipient | p95 latency |
|
|
|---|---:|---:|---:|---:|---:|
|
|
|
|
## Infrastructure
|
|
|
|
| Stage | Backend CPU/mem | Scanner CPU/mem | Mongo | Postgres | Redis | Restarts |
|
|
|---|---|---|---|---|---|---:|
|
|
|
|
## Payment Correctness
|
|
|
|
- Duplicate credits:
|
|
- Under/overpayment anomalies:
|
|
- Scanner mismatches:
|
|
- Ledger mismatches:
|
|
|
|
## Bottlenecks
|
|
|
|
- API:
|
|
- Database:
|
|
- Scanner:
|
|
- Socket/notifications:
|
|
- RPC/chain:
|
|
|
|
## Decisions
|
|
|
|
- Current safe dev concurrency:
|
|
- Recommended production target:
|
|
- Required fixes before next ramp:
|
|
```
|
|
|
|
## Initial Performance Characteristic Hypotheses
|
|
|
|
These are the expectations to validate:
|
|
|
|
- Request/offer APIs should scale mostly with Mongo/Postgres write throughput.
|
|
- Notification latency will become a visible bottleneck before raw API latency if every offer/status change creates individual Mongo inserts and socket emits.
|
|
- Scanner live-chain checks are likely bounded by BSC Testnet RPC latency and should be separated from API-only profiling.
|
|
- Payment intent creation may become slower if destination derivation, token registry lookup, and scanner registration are serial.
|
|
- Socket fanout should be watched at C16+ because each worker has multiple actors and multiple tabs/devices may multiply room membership.
|
|
|