Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
138 lines
5.0 KiB
Markdown
138 lines
5.0 KiB
Markdown
---
|
||
tags: [prd, wzp]
|
||
type: prd
|
||
---
|
||
|
||
# PRD: Video v1 — H.264 Single-Layer
|
||
|
||
> **Status:** proposed
|
||
> **Resolves:** Road-to-video Phases V3 + V4 (encoder/decoder, framer, NACK, keyframe cache).
|
||
> **Depends on:** PRD #1 (wire format v2), PRD #3 (TransportFeedback + BWE).
|
||
|
||
## Problem
|
||
|
||
WZP has no video path. Add a working unidirectional video call (macOS↔macOS first, then Android↔macOS) using H.264 baseline, with loss recovery appropriate for lossy mobile links.
|
||
|
||
## Goals
|
||
|
||
- New `wzp-video` crate parallel to `wzp-codec`.
|
||
- H.264 baseline encode/decode using platform hardware encoders.
|
||
- NAL fragmentation and access-unit reassembly conformant to our 16 B `MediaHeader` v2.
|
||
- NACK loop for P-frame loss (RTT-gated).
|
||
- Dynamic FEC ratio boost on I-frame packets.
|
||
- SFU keyframe cache for fast join-to-first-frame.
|
||
- PLI suppression at SFU to bound upstream keyframe-request traffic.
|
||
|
||
## Non-goals
|
||
|
||
- Multi-codec negotiation (PRD #6).
|
||
- Simulcast or per-receiver layer selection (PRD #8).
|
||
- VideoQualityController logic beyond a fixed bitrate target (PRD #7).
|
||
- Native camera capture pipelines (separate platform work).
|
||
|
||
## Design
|
||
|
||
### `wzp-video` crate
|
||
|
||
```
|
||
wzp-video/
|
||
src/
|
||
encoder.rs # trait VideoEncoder
|
||
# VideoToolboxEncoder (macOS)
|
||
# MediaCodecEncoder (Android, JNI)
|
||
# OpenH264Encoder (software fallback)
|
||
decoder.rs # trait VideoDecoder; mirror per-platform
|
||
framer.rs # H.264 NAL fragmentation to MTU-sized chunks
|
||
depacketizer.rs # Reassemble NALs, emit access units
|
||
keyframe.rs # Keyframe request handling, sender + receiver
|
||
config.rs # SPS/PPS shipment over signal stream
|
||
```
|
||
|
||
### Framing
|
||
|
||
One access unit (frame) → N packets, each ≤ `MTU - 16 (header) - 16 (AEAD tag)`.
|
||
|
||
- `sequence` global per (session, stream_id), advances per packet.
|
||
- `timestamp_ms` is presentation time, equal across all packets of a single access unit.
|
||
- `KeyFrame` bit set on every packet of an I-frame.
|
||
- `FrameEnd` bit set on the last packet of the access unit.
|
||
- `fec_block_id` per access unit (u16 in v2, large blocks).
|
||
|
||
Parameter sets (SPS/PPS) ride on the **signal stream**, not media datagrams. Sent at session start and on codec change. Reliable, ordered, one-time.
|
||
|
||
### NACK loop
|
||
|
||
```
|
||
SignalMessage::Nack {
|
||
version: u8,
|
||
stream_id: u8,
|
||
seqs: Vec<u32>, // missing P-frame packets
|
||
}
|
||
```
|
||
|
||
Receiver behavior:
|
||
- If access unit incomplete after `frame_interval` ms:
|
||
- If `RTT < 2 × frame_interval`: emit `Nack`.
|
||
- Else: emit `PictureLossIndication`.
|
||
- Backoff: max 1 Nack per (stream, seq) per 2 × RTT.
|
||
|
||
Sender behavior:
|
||
- On `Nack`: re-transmit if packet is still in send buffer (last 500 ms).
|
||
- On `PictureLossIndication`: emit a fresh I-frame within 200 ms.
|
||
|
||
### Dynamic FEC on I-frames
|
||
|
||
Encoder marks packets belonging to I-frames. FEC layer applies a higher ratio (default 0.5) to I-frame blocks, vs. nominal (0.1) for P-frames. Configurable.
|
||
|
||
### SFU keyframe cache
|
||
|
||
`RoomManager` maintains per `(room, sender, stream_id)`:
|
||
|
||
```rust
|
||
struct KeyframeCache {
|
||
packets: Vec<Bytes>, // most recent complete I-frame
|
||
timestamp_ms: u32,
|
||
sequence_first: u32,
|
||
}
|
||
```
|
||
|
||
On new participant join, cache is replayed before live forwarding starts. Eliminates 2 s black-screen-on-join.
|
||
|
||
Cache TTL: replaced whenever a new complete I-frame arrives.
|
||
|
||
### PLI suppression
|
||
|
||
If ≥ 2 receivers PLI within 200 ms for the same `(sender, stream_id)`, the SFU emits one `KeyframeRequest` upstream, not N. Tracked per-(sender, stream).
|
||
|
||
## Implementation outline
|
||
|
||
1. `wzp-video` crate scaffold (T4.1).
|
||
2. Framer/depacketizer with property tests (T4.1).
|
||
3. VideoToolbox encoder/decoder (macOS) (T4.2).
|
||
4. MediaCodec encoder/decoder (Android, JNI) (T4.3).
|
||
5. NACK signal + sender/receiver state machines (T4.4).
|
||
6. I-frame FEC ratio hint plumbed from encoder to FEC layer (T4.5).
|
||
7. SFU keyframe cache (T4.6).
|
||
8. PLI suppression (T4.7).
|
||
9. End-to-end test: macOS sender → relay → macOS receiver, 5 min call, < 1 % loss network.
|
||
|
||
## Acceptance criteria
|
||
|
||
- Unidirectional H.264 720p30 call macOS↔macOS, CPU < 5 % on M1.
|
||
- Android↔macOS works with MediaCodec (surface-texture path).
|
||
- Black-screen-on-join < 200 ms when keyframe cache is warm.
|
||
- Under 5 % synthetic packet loss at 50 ms RTT: NACK recovery keeps video smooth, < 1 keyframe / 2 s.
|
||
- Under 5 % synthetic packet loss at 300 ms RTT: PLI fallback fires, keyframe rate ~ 1 / s.
|
||
- Upstream PLI traffic at SFU < 2 / s under simulated mass packet loss with 8 receivers.
|
||
|
||
## Risks
|
||
|
||
- **MediaCodec surface-texture edge cases.** Per-device matrix; software fallback path mandatory.
|
||
- **VideoToolbox H.264 baseline restrictions** (some profiles are main-only in HW). Mitigation: profile detection at session start.
|
||
- **NACK storm under heavy loss.** Mitigation: rate cap (max 50 Nacks/s/receiver) and exponential backoff.
|
||
- **Keyframe cache memory footprint** (one I-frame per active stream per room). Mitigation: cap cache at 200 KB; if exceeded, drop and rely on PLI.
|
||
|
||
## Effort
|
||
|
||
~3 weeks (Wave 4 tasks T4.1–T4.7).
|