Files
nick-doc/09 - Audits/Task 5.9 QA Rollout Analytics and Launch Runbooks.md
2026-05-24 13:19:54 +04:00

5.4 KiB

title, tags, created, status
title tags created status
Task 5.9 QA, Rollout, Analytics, and Launch Runbooks
taskmaster
telegram
qa
rollout
analytics
runbook
2026-05-24 draft

Task 5.9 QA, Rollout, Analytics, and Launch Runbooks

Source: /.taskmaster/docs/prd-telegram-native-app-bot-wallet.md

1) QA scope for launch readiness

1.1 Client matrix (required)

  • Telegram iOS
  • Telegram Android
  • Telegram Desktop
  • Telegram Web
  • Light / dark themes
  • Compact / fullscreen modes
  • Normal and slow network
  • Blocked bot scenario
  • Expired / stale session scenario
  • Payment cancellation and abort
  • Unlinked user and re-link path

1.2 Functional QA checklist

  1. Identity and linking
  2. Request listing/detail in both bot and Mini App
  3. Offer review flow
  4. Payment initiation and cancel path
  5. Delivery evidence upload
  6. Dispute open/respond and status progression
  7. Notification quiet/error state
  8. Error and blocked-bot behavior
  9. Support escalation handoff

1.3 Security/abuse QA

  • forged/invalid initData rejection
  • callback replay replayed twice: one success one no-op
  • deep-link tampering
  • wallet proof mismatch
  • callback processing under invalid provider secrets
  • admin override behavior and audit event capture

2) Environments and rollout

2.1 Environment separation

  • telegram-dev-bot and telegram-prod-bot tokens and webhook endpoints must be distinct.
  • No shared webhook secret between environments.
  • QA and production payment fixtures remain isolated.

2.2 Feature flag sequence

  1. Development flag off: no surface exposed
  2. Internal allowlist: selected users only (buyer/seller/admin)
  3. Beta cohort: controlled percentage and fixed org list
  4. Production enablement: after runbook and KPI thresholds pass

2.3 Deployment safety

  • If new surface increases payment mismatch or callback failure, immediately pause TELEGRAM_SURFACE_ENABLED and keep providers in read-only mode.
  • Use existing rollback flow from incident operations and deployment runbooks.

3) Analytics and launch KPIs

Track these metrics daily for 14 days after stage advancement:

  • activation rate (activatedTelegramUsers / startedTelegramUsers)
  • link completion rate (linkedUsers / startedLink)
  • request creation from Telegram (telegramRequestsCreated)
  • offer response completion (offerResponses / offersOpened)
  • payment started / payment completed (telegramPaymentStart, telegramPaymentComplete, telegramPaymentFail)
  • dispute activity (disputesOpened, disputesResolvedInTelegram)
  • release approvals from Telegram context (telegramReleaseApprovals)
  • notification opt-outs (notificationsOptOutRate)
  • callback duplicate ratio (callbackReplay / callbackTotal)
  • average context resume latency (min and p95)

Reporting destinations

  • Sentry for exception and failure spikes
  • application logs for workflow events
  • existing monitoring dashboards for rate/latency anomalies

4) Launch runbooks

All runbooks are mandatory for Stage-1 rollout and post-launch incidents.

4.1 Bot outage

  1. Validate webhook endpoint response health.
  2. Switch status to notification-only mode where possible.
  3. Confirm bot token and webhook URL.
  4. Re-route urgent flows to web fallback.
  5. Restore Telegram webhook + replay backlog after recovery.

4.2 Telegram API outage

  1. Confirm external Telegram API status.
  2. Temporarily disable deep-link / in-app actions that require Telegram callbacks.
  3. Notify users of delayed updates.
  4. Keep pending payment states in read-only mode until callback channel is restored.

4.3 Payment provider outage

  1. Identify affected provider via provider mode and provider health flags.
  2. Switch to read-only or alternative provider mode where configured.
  3. Run reconciliation before re-enabling full writes.
  4. Track stale pending payment age and contact support workflow.

4.4 Stuck payment

  1. Check payment reconciliation queue and provider status.
  2. Verify callback proof and on-chain confirmation.
  3. Manually reconcile if allowed by protocol and policy.
  4. Escalate if stale > 24h in funded or processing state.

4.5 Duplicate callback

  1. Validate idempotency path executed correctly.
  2. Confirm callback dedupe key retention window.
  3. Compare event fingerprint for payload divergence.
  4. Mark one path as duplicate no-op and keep audit trail.

4.6 Suspicious wallet proof

  1. Block automated release/refund for the request.
  2. Flag payment and mark for manual ops review.
  3. Verify recipient, amount, and tx hash against chain/provider data.
  4. Resume only after explicit approval.

4.7 Compromised bot token

  1. Rotate bot token immediately.
  2. Disable bot endpoints and clear webhook secret for 1 hour.
  3. Validate callback signatures with new secret.
  4. Resume in staged rollout mode with monitoring for 24h.

5) Stage exit criteria

  • All required QA scenarios pass on iOS/Android/desktop/web.
  • No critical webhook/payload mismatch regressions in 24h observation window.
  • No unresolved payment stuck items > 24h after manual triage.
  • Incident owners can execute all seven runbooks.
  • Rollout metrics show non-degrading trend for the first two days.

6) Known rollout gaps

  1. Fine-grained feature toggles for Telegram in existing observability dashboards are pending.
  2. Admin analytics for Telegram-originated releases are schema-dependent and need implementation wiring.
  3. Deep-link recovery behavior after prolonged Telegram link expiry still needs UX polishing.