Complete task 4 backend security architecture docs

This commit is contained in:
Siavash Sameni
2026-05-24 11:31:40 +04:00
parent 4cf5c49274
commit 6a451040d9
18 changed files with 1006 additions and 73 deletions

View File

@@ -0,0 +1,150 @@
---
title: Webhook Security Spec
tags: [webhooks, security, audit, payments]
created: 2026-05-24
status: advisory
reviewers: [backend, security, operations]
---
# Webhook Security Spec
This document defines signed callback handling for all payment and payout providers.
It closes the gaps in [[Security Architecture]] by turning webhook behavior into an explicit,
auditable contract.
The scope is inbound callbacks only:
- SHKeeper pay-in (`/api/payment/shkeeper/webhook`)
- SHKeeper payout (`/api/payment/shkeeper/payout/webhook`)
- Request Network (`/api/payment/request-network/webhook`)
- Manual/admin reconciliation channels (where applicable)
## 1. Canonical event envelope
All callbacks are normalized by [[Payment Provider Adapter Spec]] into:
```ts
type ProviderCallback = {
provider: "shkeeper" | "request_network" | "manual_wallet" | "admin_wallet" | string;
providerPaymentId: string;
purchaseRequestId?: string;
requestId?: string;
deliveryId?: string;
eventType: string; // e.g., paid, payout_completed, status_update
status: string; // provider-specific raw status
normalizedStatus: "pending" | "completed" | "failed" | "cancelled" | "released" | "refunded";
amount?: string;
currency?: string;
transactionHash?: string;
occurredAt?: string; // ISO 8601 if provided
receivedAt: string; // server-side receive time
rawFingerprint: string; // sha256(raw_body)
};
```
Callbacks are processed only through adapter entry points; provider-specific parsing remains private to the adapter.
## 2. Signature verification
### 2.1 Required mechanics
- Verify signatures against raw request bytes, **before JSON parsing**.
- Use constant-time comparison and short-circuit to 401/403 on mismatch.
- Never disable verification outside local-only test tooling.
- Store raw payload hash (`rawFingerprint`) for forensics and idempotency checks.
### 2.2 Provider headers
| Provider | Header(s) |
|---|---|
| SHKeeper | `x-shkeeper-signature` |
| Request Network | `x-request-network-signature` |
| Test override (local only) | explicitly documented in deployment notes, never in production |
If expected signature header is absent or malformed, treat as a non-retryable client error.
## 3. Replay prevention and idempotency
For each callback store and enforce one of:
- `deliveryId` + `provider` + `eventType`, or
- `(providerPaymentId, normalizedStatus, provider)` when provider has no delivery id.
Replay rules:
- First successful write path = **processed**.
- Same key seen again with no state change = **duplicate** (HTTP 200 response, no side effects).
- Same key seen for different payload hash = **conflict** (HTTP 409, captured to DLQ).
## 4. Unknown and duplicate behavior
| Condition | Response | Side effects |
|---|---|---|
| Signature valid, unknown `providerPaymentId` | `200` (`unknown_payment`) in v1 mode / `404` in strict mode | no state write, record DLQ entry for operator review |
| Known `providerPaymentId`, already terminal | `200` (`duplicate_terminal`) | no state write |
| Known `providerPaymentId`, stale status transition | `200` (`duplicate_or_out_of_order`) | no state write |
| Unknown signature | `401` | no state write |
| Malformed payload | `400` | no state write |
## 5. Retry semantics
- Callback consumers (providers) may retry:
- transient network failures,
- 5xx/provider internal timeouts,
- explicit retryable status from endpoint.
- Retry is triggered only on non-2xx codes for SHKeeper and Request Network.
- Recommended handler mapping:
- `401/400` = do not retry (hard fail),
- `409` = do not retry until manual release,
- `500/503` = retry.
## 6. Dead-letter and replay storage
Persist all failed callbacks for at least 7 days in append-only storage:
- `providerWebhookFailures`
- key fields: `provider`, `deliveryId`, `providerPaymentId`, `requestPath`, `requestHeaders`, `rawFingerprint`, `statusCode`, `errorCode`, `attemptCount`, `nextRetryAt`, `rawBodyRef`, `createdAt`.
- If storage is unavailable, fail closed and raise a high-severity ops alert.
Retention policy:
- 30 days for `success==true`,
- 180 days for `unknown_payment`, `repeated_conflict`, `signature_failure`,
- immediate alert if retry queue exceeds 500 entries for a provider.
## 7. Alerting thresholds
- `failed_webhook_count` over 1 minute:
- warning at `> 20`,
- critical at `> 100`.
- signature failures:
- warning at `> 5` in 5 minutes,
- critical at `> 20` in 5 minutes.
- duplicate ratio:
- warning if `duplicates / total >= 0.15` for 10 minutes.
- dead-letter growth:
- warning at `+200` new entries/hour,
- critical at `+500`/hour.
## 8. Required operator signals
Webhook health checks should expose:
- last-seen timestamp by provider,
- delivery backlog depth,
- per-status counters (`processed`, `duplicate`, `unknown`, `conflict`, `signature_failure`),
- DLQ length and oldest entry age.
## 9. Testing requirements
- Signature bypass tests (must remain false in staging/prod),
- replay/delivery-id duplicate tests,
- malformed payload tests,
- unknown payment tests,
- non-terminal duplicate suppression tests.
## Related
- [[Payment Provider Adapter Spec]]
- [[Error Codes]]
- [[Backend Funds Migration and Operational Runbooks]]