Files
wz-phone/docs/PRD/README.md
2026-05-11 12:37:32 +04:00

152 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD Index — Protocol v2, Video, Abuse Mitigation
> Coordinated worklist that addresses (a) the P0/P1 findings in `docs/PROTOCOL-AUDIT.md`, (b) the video roadmap in `docs/ROAD-TO-VIDEO.md`, and (c) the relay abuse vectors in `docs/ATTACK-SURFACE-RELAY-ABUSE.md`. Each item below links to its own PRD.
## Why a combined plan
The three documents share substantial structure:
- **Wire format v2** (audit P0: W1, W4, W9, W10) is the prerequisite for video framing **and** for per-`MediaType` conformance enforcement against abuse. One change resolves three pressures.
- **TransportFeedback + BWE** (audit P1: W6, W14) is mandatory for video, materially improves audio adaptation, and gives the relay another observable for abuse detection.
- **Relay conformance enforcement** (attack surface Tiers AG) is independently valuable for audio today, and the v2 `MediaType` bit lets it scale cleanly to video.
Sequencing matters. Implementing v2 wire format **before** any video work or any deep abuse mitigation avoids two compatibility breaks.
## PRD catalog
| # | PRD | Resolves | Status |
|---|---|---|---|
| 1 | [PRD-wire-format-v2](./PRD-wire-format-v2.md) | Audit W1, W4, W9, W10; prereq for #5/#6/#7/#8 and Tier F of #2 | proposed |
| 2 | [PRD-relay-conformance](./PRD-relay-conformance.md) | Attack-surface Tiers AG | proposed |
| 3 | [PRD-transport-feedback-bwe](./PRD-transport-feedback-bwe.md) | Audit W6, W14 | proposed |
| 4 | [PRD-protocol-hardening](./PRD-protocol-hardening.md) | Audit W2, W3, W5, W11, W12, W13 (security + correctness batch) | proposed |
| 5 | [PRD-video-v1](./PRD-video-v1.md) | Road-to-video Phases V3 + V4 (H.264 single-layer, NACK, keyframe cache) | proposed |
| 6 | [PRD-video-multicodec](./PRD-video-multicodec.md) | H.265 + AV1 negotiation (road-to-video Phase V3 codec rollout) | proposed |
| 7 | [PRD-video-quality-priority](./PRD-video-quality-priority.md) | Road-to-video Phase V5 (VideoQualityController + PriorityMode + ScreenShare) | proposed |
| 8 | [PRD-video-simulcast](./PRD-video-simulcast.md) | Road-to-video Phases V5 + V6 (simulcast, per-receiver layer selection at SFU) | proposed |
Native capture pipelines (road-to-video Phase V7) are out of scope here — they sit downstream of #5 and are platform team work; tracked separately.
## Dependency graph
```
┌───────────────────────────────┐
│ #1 Wire format v2 (keystone) │
└────────┬──────────────────────┘
┌──────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│ #2 Conformance│ │ #3 Transport │ │ #4 Protocol │
│ Tier A-G │ │ Feedback + BWE │ │ Hardening │
└──────┬────────┘ └────────┬─────────┘ └──────────────────────┘
│ Tier A-D first │
│ Tier F needs traffic │
│ baseline │
│ │
│ ┌───────▼────────┐
│ │ #5 Video v1 │
│ │ (H.264 + NACK) │
│ └───────┬────────┘
│ │
│ ┌──────────────┼──────────────┐
│ │ │ │
│ ▼ ▼ ▼
│ ┌────────┐ ┌──────────────┐ ┌──────────────┐
│ │ #6 │ │ #7 Video │ │ #8 Simulcast │
│ │ Multi- │ │ Quality + │ │ │
│ │ codec │ │ Priority │ │ │
│ └────────┘ └──────────────┘ └──────────────┘
└──> #2 Tier F (video) — needs #5 in production traffic to baseline
```
## Combined task list
Ordered by dependency and risk. Each task references its PRD.
### Wave 1 — Foundation (week 1)
| Task | PRD | Effort | Output |
|---|---|---|---|
| T1.1 Land 16 B MediaHeader v2 + 5 B MiniHeader v2 in `wzp-proto` | #1 | 1 d | New types behind feature flag; old paths still work |
| T1.2 Update `wzp-codec` + `wzp-client` + `wzp-relay` to emit v2 | #1 | 1 d | All audio tests pass under v2 |
| T1.3 Protocol version negotiation in `CallOffer/CallAnswer` (typed `Hangup::ProtocolVersionMismatch`) | #1 + #4 (W12) | 0.5 d | v1 clients rejected with clear reason |
| T1.4 `QualityReport` trailer moved inside AEAD payload (or AAD-bound) | #4 (W5) | 0.5 d | Security fix, audit log |
| T1.5 Anti-replay window made per-stream and per-MediaType configurable | #4 (W11) | 0.5 d | Audio=64, video=1024 ready |
### Wave 2 — Feedback + abuse mitigation (week 2)
| Task | PRD | Effort | Output |
|---|---|---|---|
| T2.1 `SignalMessage::TransportFeedback` variant | #3 | 1 d | Wire path; not yet consumed |
| T2.2 `BandwidthEstimator` in `wzp-proto` (cwnd + remb fusion) | #3 | 2 d | Prometheus output |
| T2.3 `AdaptiveQualityController` consumes BWE | #3 | 1 d | Audio upgrade decisions use bandwidth, not just loss |
| T2.4 `wzp-relay/src/conformance.rs` — Tier A (bitrate ceilings per CodecID) | #2 | 1 d | Bulk-tunnel abuse killed |
| T2.5 Tier B (packet-rate cap) + Tier C (timestamp consistency) | #2 | 1 d | Loud abuse caught |
| T2.6 Prometheus: `relay_conformance_*` counters + observable histograms | #2 | 0.5 d | Baseline data collection starts |
### Wave 3 — Protocol hardening (week 3)
| Task | PRD | Effort | Output |
|---|---|---|---|
| T3.1 `fec_block_id` widened to u16 in v2 | #4 (W2) | 0.5 d | No FEC collisions on slow joiners |
| T3.2 Document `timestamp_ms` rebase behavior at rekey | #4 (W3) | 0.5 d | Spec clarity |
| T3.3 `SignalMessage` variants prefixed with `version: u8` | #4 (W12) | 0.5 d | Future-proof signaling |
| T3.4 `RoomManager` migrated to `DashMap<RoomId, Arc<RwLock<Room>>>` | #4 (W13) | 2 d | No per-packet global lock |
| T3.5 Tier E (per-fingerprint / per-IP token bucket) wired to featherChat auth | #2 | 1.5 d | Aggregate quota enforced |
| T3.6 Tier D (per-codec packet-size sanity) | #2 | 0.5 d | Sneaky-payload class caught |
### Wave 4 — Video v1 (weeks 46)
| Task | PRD | Effort | Output |
|---|---|---|---|
| T4.1 `wzp-video` crate scaffold; H.264 framer + depacketizer | #5 | 4 d | NAL fragmentation, access-unit reassembly |
| T4.2 VideoToolbox encoder + decoder (macOS) | #5 | 3 d | Unidirectional video macOS↔macOS |
| T4.3 MediaCodec encoder + decoder (Android, via JNI) | #5 | 5 d | Android video path |
| T4.4 NACK loop (`SignalMessage::Nack`) + RTT-gated policy | #5 | 2 d | P-frame loss recovery |
| T4.5 Dynamic FEC ratio on I-frames (encoder hint to FEC layer) | #5 | 1 d | I-frame survivability without round trip |
| T4.6 SFU keyframe cache per (room, sender, stream) | #5 | 2 d | < 200 ms join-to-first-frame |
| T4.7 PLI suppression at SFU | #5 | 1 d | Bounded upstream PLI rate |
### Wave 5 — Quality, codecs, simulcast (weeks 79)
| Task | PRD | Effort | Output |
|---|---|---|---|
| T5.1 `PriorityMode` enum on `QualityProfile` + `SignalMessage::SetPriorityMode` | #7 | 1 d | Wire path |
| T5.2 `VideoQualityController` with per-mode allocation gates | #7 | 3 d | AudioFirst / VideoFirst / Balanced live |
| T5.3 ScreenShare mode: slide-fallback encoder policy | #7 | 2 d | Presentation use case viable |
| T5.4 H.265 encoder/decoder (reuse framer) | #6 | 3 d | Codec negotiation cascade live |
| T5.5 Simulcast: encoder emits 3 layers; `stream_id` carries layer | #8 | 4 d | Layer-tagged uplink |
| T5.6 Per-receiver layer selection at SFU | #8 | 3 d | Mixed-quality rooms work |
| T5.7 Tier F (entropy scorer) — audio variant first, baselined from Wave 2/3 data | #2 | 3 d | Covert-tunnel pressure |
| T5.8 Tier G (response policy + audit log) | #2 | 1 d | Operational |
### Wave 6 — AV1 + Tier F video (weeks 10+)
| Task | PRD | Effort | Output |
|---|---|---|---|
| T6.1 AV1 encoder/decoder with HW detection (SVT-AV1 fallback) | #6 | 5 d | Top-tier efficiency on capable HW |
| T6.2 Tier F video scorer (keyframe periodicity, I/P frame-size ratio, BWE responsiveness) | #2 | 3 d | Video abuse detection |
| T6.3 Federated reputation gossip (optional) | #2 | 4 d | Cross-relay abuse mitigation |
## Risk register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| v2 wire format break strands old clients | High | High | Typed `Hangup::ProtocolVersionMismatch`, clear UI, force update prompt |
| BWE oscillation regresses audio adaptation | Med | Med | Behind feature flag; A/B with shadow Prometheus before flipping default |
| Conformance Tier A false positives | Low | High | Math-derived ceilings × 1.5; counter-only mode for 1 week before enforcement |
| `DashMap` migration regresses room semantics | Med | Med | Integration tests for federation + trunking before merging |
| Android MediaCodec edge cases (Nothing A059 baseline) | High | Med | Per-device test matrix; software fallback path |
| AV1 software encode torches battery | High | Low | HW probe at session start; refuse AV1 if no HW encode |
| Tier F false-positives on edge cases (e.g., long silences in lectures) | Med | High | Verdict-only mode + 30 s window minimum + Suspect tier escalation |
## Open product questions (not blocking)
- Anonymous vs. authenticated quota split — numbers TBD pending Prometheus baseline.
- Whether to expose `PriorityMode` UI for end users or only via product preset (call vs. screen-share).
- AV1 rollout gate: 5 %? 20 %? of sessions reporting HW support before enabling by default.
- Federated reputation gossip is powerful but introduces a poisoning surface; decision deferred to after Wave 5.