5.0 KiB
PRD: Video v1 — H.264 Single-Layer
Status: proposed Resolves: Road-to-video Phases V3 + V4 (encoder/decoder, framer, NACK, keyframe cache). Depends on: PRD #1 (wire format v2), PRD #3 (TransportFeedback + BWE).
Problem
WZP has no video path. Add a working unidirectional video call (macOS↔macOS first, then Android↔macOS) using H.264 baseline, with loss recovery appropriate for lossy mobile links.
Goals
- New
wzp-videocrate parallel towzp-codec. - H.264 baseline encode/decode using platform hardware encoders.
- NAL fragmentation and access-unit reassembly conformant to our 16 B
MediaHeaderv2. - NACK loop for P-frame loss (RTT-gated).
- Dynamic FEC ratio boost on I-frame packets.
- SFU keyframe cache for fast join-to-first-frame.
- PLI suppression at SFU to bound upstream keyframe-request traffic.
Non-goals
- Multi-codec negotiation (PRD #6).
- Simulcast or per-receiver layer selection (PRD #8).
- VideoQualityController logic beyond a fixed bitrate target (PRD #7).
- Native camera capture pipelines (separate platform work).
Design
wzp-video crate
wzp-video/
src/
encoder.rs # trait VideoEncoder
# VideoToolboxEncoder (macOS)
# MediaCodecEncoder (Android, JNI)
# OpenH264Encoder (software fallback)
decoder.rs # trait VideoDecoder; mirror per-platform
framer.rs # H.264 NAL fragmentation to MTU-sized chunks
depacketizer.rs # Reassemble NALs, emit access units
keyframe.rs # Keyframe request handling, sender + receiver
config.rs # SPS/PPS shipment over signal stream
Framing
One access unit (frame) → N packets, each ≤ MTU - 16 (header) - 16 (AEAD tag).
sequenceglobal per (session, stream_id), advances per packet.timestamp_msis presentation time, equal across all packets of a single access unit.KeyFramebit set on every packet of an I-frame.FrameEndbit set on the last packet of the access unit.fec_block_idper access unit (u16 in v2, large blocks).
Parameter sets (SPS/PPS) ride on the signal stream, not media datagrams. Sent at session start and on codec change. Reliable, ordered, one-time.
NACK loop
SignalMessage::Nack {
version: u8,
stream_id: u8,
seqs: Vec<u32>, // missing P-frame packets
}
Receiver behavior:
- If access unit incomplete after
frame_intervalms:- If
RTT < 2 × frame_interval: emitNack. - Else: emit
PictureLossIndication.
- If
- Backoff: max 1 Nack per (stream, seq) per 2 × RTT.
Sender behavior:
- On
Nack: re-transmit if packet is still in send buffer (last 500 ms). - On
PictureLossIndication: emit a fresh I-frame within 200 ms.
Dynamic FEC on I-frames
Encoder marks packets belonging to I-frames. FEC layer applies a higher ratio (default 0.5) to I-frame blocks, vs. nominal (0.1) for P-frames. Configurable.
SFU keyframe cache
RoomManager maintains per (room, sender, stream_id):
struct KeyframeCache {
packets: Vec<Bytes>, // most recent complete I-frame
timestamp_ms: u32,
sequence_first: u32,
}
On new participant join, cache is replayed before live forwarding starts. Eliminates 2 s black-screen-on-join.
Cache TTL: replaced whenever a new complete I-frame arrives.
PLI suppression
If ≥ 2 receivers PLI within 200 ms for the same (sender, stream_id), the SFU emits one KeyframeRequest upstream, not N. Tracked per-(sender, stream).
Implementation outline
wzp-videocrate scaffold (T4.1).- Framer/depacketizer with property tests (T4.1).
- VideoToolbox encoder/decoder (macOS) (T4.2).
- MediaCodec encoder/decoder (Android, JNI) (T4.3).
- NACK signal + sender/receiver state machines (T4.4).
- I-frame FEC ratio hint plumbed from encoder to FEC layer (T4.5).
- SFU keyframe cache (T4.6).
- PLI suppression (T4.7).
- End-to-end test: macOS sender → relay → macOS receiver, 5 min call, < 1 % loss network.
Acceptance criteria
- Unidirectional H.264 720p30 call macOS↔macOS, CPU < 5 % on M1.
- Android↔macOS works with MediaCodec (surface-texture path).
- Black-screen-on-join < 200 ms when keyframe cache is warm.
- Under 5 % synthetic packet loss at 50 ms RTT: NACK recovery keeps video smooth, < 1 keyframe / 2 s.
- Under 5 % synthetic packet loss at 300 ms RTT: PLI fallback fires, keyframe rate ~ 1 / s.
- Upstream PLI traffic at SFU < 2 / s under simulated mass packet loss with 8 receivers.
Risks
- MediaCodec surface-texture edge cases. Per-device matrix; software fallback path mandatory.
- VideoToolbox H.264 baseline restrictions (some profiles are main-only in HW). Mitigation: profile detection at session start.
- NACK storm under heavy loss. Mitigation: rate cap (max 50 Nacks/s/receiver) and exponential backoff.
- Keyframe cache memory footprint (one I-frame per active stream per room). Mitigation: cap cache at 200 KB; if exceeded, drop and rely on PLI.
Effort
~3 weeks (Wave 4 tasks T4.1–T4.7).