Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.0 KiB
tags, type
| tags | type | ||
|---|---|---|---|
|
prd |
PRD: Video v1 — H.264 Single-Layer
Status: proposed Resolves: Road-to-video Phases V3 + V4 (encoder/decoder, framer, NACK, keyframe cache). Depends on: PRD #1 (wire format v2), PRD #3 (TransportFeedback + BWE).
Problem
WZP has no video path. Add a working unidirectional video call (macOS↔macOS first, then Android↔macOS) using H.264 baseline, with loss recovery appropriate for lossy mobile links.
Goals
- New
wzp-videocrate parallel towzp-codec. - H.264 baseline encode/decode using platform hardware encoders.
- NAL fragmentation and access-unit reassembly conformant to our 16 B
MediaHeaderv2. - NACK loop for P-frame loss (RTT-gated).
- Dynamic FEC ratio boost on I-frame packets.
- SFU keyframe cache for fast join-to-first-frame.
- PLI suppression at SFU to bound upstream keyframe-request traffic.
Non-goals
- Multi-codec negotiation (PRD #6).
- Simulcast or per-receiver layer selection (PRD #8).
- VideoQualityController logic beyond a fixed bitrate target (PRD #7).
- Native camera capture pipelines (separate platform work).
Design
wzp-video crate
wzp-video/
src/
encoder.rs # trait VideoEncoder
# VideoToolboxEncoder (macOS)
# MediaCodecEncoder (Android, JNI)
# OpenH264Encoder (software fallback)
decoder.rs # trait VideoDecoder; mirror per-platform
framer.rs # H.264 NAL fragmentation to MTU-sized chunks
depacketizer.rs # Reassemble NALs, emit access units
keyframe.rs # Keyframe request handling, sender + receiver
config.rs # SPS/PPS shipment over signal stream
Framing
One access unit (frame) → N packets, each ≤ MTU - 16 (header) - 16 (AEAD tag).
sequenceglobal per (session, stream_id), advances per packet.timestamp_msis presentation time, equal across all packets of a single access unit.KeyFramebit set on every packet of an I-frame.FrameEndbit set on the last packet of the access unit.fec_block_idper access unit (u16 in v2, large blocks).
Parameter sets (SPS/PPS) ride on the signal stream, not media datagrams. Sent at session start and on codec change. Reliable, ordered, one-time.
NACK loop
SignalMessage::Nack {
version: u8,
stream_id: u8,
seqs: Vec<u32>, // missing P-frame packets
}
Receiver behavior:
- If access unit incomplete after
frame_intervalms:- If
RTT < 2 × frame_interval: emitNack. - Else: emit
PictureLossIndication.
- If
- Backoff: max 1 Nack per (stream, seq) per 2 × RTT.
Sender behavior:
- On
Nack: re-transmit if packet is still in send buffer (last 500 ms). - On
PictureLossIndication: emit a fresh I-frame within 200 ms.
Dynamic FEC on I-frames
Encoder marks packets belonging to I-frames. FEC layer applies a higher ratio (default 0.5) to I-frame blocks, vs. nominal (0.1) for P-frames. Configurable.
SFU keyframe cache
RoomManager maintains per (room, sender, stream_id):
struct KeyframeCache {
packets: Vec<Bytes>, // most recent complete I-frame
timestamp_ms: u32,
sequence_first: u32,
}
On new participant join, cache is replayed before live forwarding starts. Eliminates 2 s black-screen-on-join.
Cache TTL: replaced whenever a new complete I-frame arrives.
PLI suppression
If ≥ 2 receivers PLI within 200 ms for the same (sender, stream_id), the SFU emits one KeyframeRequest upstream, not N. Tracked per-(sender, stream).
Implementation outline
wzp-videocrate scaffold (T4.1).- Framer/depacketizer with property tests (T4.1).
- VideoToolbox encoder/decoder (macOS) (T4.2).
- MediaCodec encoder/decoder (Android, JNI) (T4.3).
- NACK signal + sender/receiver state machines (T4.4).
- I-frame FEC ratio hint plumbed from encoder to FEC layer (T4.5).
- SFU keyframe cache (T4.6).
- PLI suppression (T4.7).
- End-to-end test: macOS sender → relay → macOS receiver, 5 min call, < 1 % loss network.
Acceptance criteria
- Unidirectional H.264 720p30 call macOS↔macOS, CPU < 5 % on M1.
- Android↔macOS works with MediaCodec (surface-texture path).
- Black-screen-on-join < 200 ms when keyframe cache is warm.
- Under 5 % synthetic packet loss at 50 ms RTT: NACK recovery keeps video smooth, < 1 keyframe / 2 s.
- Under 5 % synthetic packet loss at 300 ms RTT: PLI fallback fires, keyframe rate ~ 1 / s.
- Upstream PLI traffic at SFU < 2 / s under simulated mass packet loss with 8 receivers.
Risks
- MediaCodec surface-texture edge cases. Per-device matrix; software fallback path mandatory.
- VideoToolbox H.264 baseline restrictions (some profiles are main-only in HW). Mitigation: profile detection at session start.
- NACK storm under heavy loss. Mitigation: rate cap (max 50 Nacks/s/receiver) and exponential backoff.
- Keyframe cache memory footprint (one I-frame per active stream per room). Mitigation: cap cache at 200 KB; if exceeded, drop and rely on PLI.
Effort
~3 weeks (Wave 4 tasks T4.1–T4.7).