Files
nick-doc/09 - Audits/Webhook Security Spec.md
2026-05-24 11:31:40 +04:00

5.2 KiB

title, tags, created, status, reviewers
title tags created status reviewers
Webhook Security Spec
webhooks
security
audit
payments
2026-05-24 advisory
backend
security
operations

Webhook Security Spec

This document defines signed callback handling for all payment and payout providers. It closes the gaps in Security Architecture by turning webhook behavior into an explicit, auditable contract.

The scope is inbound callbacks only:

  • SHKeeper pay-in (/api/payment/shkeeper/webhook)
  • SHKeeper payout (/api/payment/shkeeper/payout/webhook)
  • Request Network (/api/payment/request-network/webhook)
  • Manual/admin reconciliation channels (where applicable)

1. Canonical event envelope

All callbacks are normalized by Payment Provider Adapter Spec into:

type ProviderCallback = {
  provider: "shkeeper" | "request_network" | "manual_wallet" | "admin_wallet" | string;
  providerPaymentId: string;
  purchaseRequestId?: string;
  requestId?: string;
  deliveryId?: string;
  eventType: string;        // e.g., paid, payout_completed, status_update
  status: string;           // provider-specific raw status
  normalizedStatus: "pending" | "completed" | "failed" | "cancelled" | "released" | "refunded";
  amount?: string;
  currency?: string;
  transactionHash?: string;
  occurredAt?: string;      // ISO 8601 if provided
  receivedAt: string;       // server-side receive time
  rawFingerprint: string;    // sha256(raw_body)
};

Callbacks are processed only through adapter entry points; provider-specific parsing remains private to the adapter.

2. Signature verification

2.1 Required mechanics

  • Verify signatures against raw request bytes, before JSON parsing.
  • Use constant-time comparison and short-circuit to 401/403 on mismatch.
  • Never disable verification outside local-only test tooling.
  • Store raw payload hash (rawFingerprint) for forensics and idempotency checks.

2.2 Provider headers

Provider Header(s)
SHKeeper x-shkeeper-signature
Request Network x-request-network-signature
Test override (local only) explicitly documented in deployment notes, never in production

If expected signature header is absent or malformed, treat as a non-retryable client error.

3. Replay prevention and idempotency

For each callback store and enforce one of:

  • deliveryId + provider + eventType, or
  • (providerPaymentId, normalizedStatus, provider) when provider has no delivery id.

Replay rules:

  • First successful write path = processed.
  • Same key seen again with no state change = duplicate (HTTP 200 response, no side effects).
  • Same key seen for different payload hash = conflict (HTTP 409, captured to DLQ).

4. Unknown and duplicate behavior

Condition Response Side effects
Signature valid, unknown providerPaymentId 200 (unknown_payment) in v1 mode / 404 in strict mode no state write, record DLQ entry for operator review
Known providerPaymentId, already terminal 200 (duplicate_terminal) no state write
Known providerPaymentId, stale status transition 200 (duplicate_or_out_of_order) no state write
Unknown signature 401 no state write
Malformed payload 400 no state write

5. Retry semantics

  • Callback consumers (providers) may retry:
    • transient network failures,
    • 5xx/provider internal timeouts,
    • explicit retryable status from endpoint.
  • Retry is triggered only on non-2xx codes for SHKeeper and Request Network.
  • Recommended handler mapping:
    • 401/400 = do not retry (hard fail),
    • 409 = do not retry until manual release,
    • 500/503 = retry.

6. Dead-letter and replay storage

Persist all failed callbacks for at least 7 days in append-only storage:

  • providerWebhookFailures
  • key fields: provider, deliveryId, providerPaymentId, requestPath, requestHeaders, rawFingerprint, statusCode, errorCode, attemptCount, nextRetryAt, rawBodyRef, createdAt.
  • If storage is unavailable, fail closed and raise a high-severity ops alert.

Retention policy:

  • 30 days for success==true,
  • 180 days for unknown_payment, repeated_conflict, signature_failure,
  • immediate alert if retry queue exceeds 500 entries for a provider.

7. Alerting thresholds

  • failed_webhook_count over 1 minute:
    • warning at > 20,
    • critical at > 100.
  • signature failures:
    • warning at > 5 in 5 minutes,
    • critical at > 20 in 5 minutes.
  • duplicate ratio:
    • warning if duplicates / total >= 0.15 for 10 minutes.
  • dead-letter growth:
    • warning at +200 new entries/hour,
    • critical at +500/hour.

8. Required operator signals

Webhook health checks should expose:

  • last-seen timestamp by provider,
  • delivery backlog depth,
  • per-status counters (processed, duplicate, unknown, conflict, signature_failure),
  • DLQ length and oldest entry age.

9. Testing requirements

  • Signature bypass tests (must remain false in staging/prod),
  • replay/delivery-id duplicate tests,
  • malformed payload tests,
  • unknown payment tests,
  • non-terminal duplicate suppression tests.