Files
wz-phone/docs/PRD/PRD-video-v1.md
2026-05-11 12:37:32 +04:00

5.0 KiB
Raw Blame History

PRD: Video v1 — H.264 Single-Layer

Status: proposed Resolves: Road-to-video Phases V3 + V4 (encoder/decoder, framer, NACK, keyframe cache). Depends on: PRD #1 (wire format v2), PRD #3 (TransportFeedback + BWE).

Problem

WZP has no video path. Add a working unidirectional video call (macOS↔macOS first, then Android↔macOS) using H.264 baseline, with loss recovery appropriate for lossy mobile links.

Goals

  • New wzp-video crate parallel to wzp-codec.
  • H.264 baseline encode/decode using platform hardware encoders.
  • NAL fragmentation and access-unit reassembly conformant to our 16 B MediaHeader v2.
  • NACK loop for P-frame loss (RTT-gated).
  • Dynamic FEC ratio boost on I-frame packets.
  • SFU keyframe cache for fast join-to-first-frame.
  • PLI suppression at SFU to bound upstream keyframe-request traffic.

Non-goals

  • Multi-codec negotiation (PRD #6).
  • Simulcast or per-receiver layer selection (PRD #8).
  • VideoQualityController logic beyond a fixed bitrate target (PRD #7).
  • Native camera capture pipelines (separate platform work).

Design

wzp-video crate

wzp-video/
  src/
    encoder.rs       # trait VideoEncoder
                     # VideoToolboxEncoder (macOS)
                     # MediaCodecEncoder (Android, JNI)
                     # OpenH264Encoder (software fallback)
    decoder.rs       # trait VideoDecoder; mirror per-platform
    framer.rs        # H.264 NAL fragmentation to MTU-sized chunks
    depacketizer.rs  # Reassemble NALs, emit access units
    keyframe.rs      # Keyframe request handling, sender + receiver
    config.rs        # SPS/PPS shipment over signal stream

Framing

One access unit (frame) → N packets, each ≤ MTU - 16 (header) - 16 (AEAD tag).

  • sequence global per (session, stream_id), advances per packet.
  • timestamp_ms is presentation time, equal across all packets of a single access unit.
  • KeyFrame bit set on every packet of an I-frame.
  • FrameEnd bit set on the last packet of the access unit.
  • fec_block_id per access unit (u16 in v2, large blocks).

Parameter sets (SPS/PPS) ride on the signal stream, not media datagrams. Sent at session start and on codec change. Reliable, ordered, one-time.

NACK loop

SignalMessage::Nack {
    version: u8,
    stream_id: u8,
    seqs: Vec<u32>,    // missing P-frame packets
}

Receiver behavior:

  • If access unit incomplete after frame_interval ms:
    • If RTT < 2 × frame_interval: emit Nack.
    • Else: emit PictureLossIndication.
  • Backoff: max 1 Nack per (stream, seq) per 2 × RTT.

Sender behavior:

  • On Nack: re-transmit if packet is still in send buffer (last 500 ms).
  • On PictureLossIndication: emit a fresh I-frame within 200 ms.

Dynamic FEC on I-frames

Encoder marks packets belonging to I-frames. FEC layer applies a higher ratio (default 0.5) to I-frame blocks, vs. nominal (0.1) for P-frames. Configurable.

SFU keyframe cache

RoomManager maintains per (room, sender, stream_id):

struct KeyframeCache {
    packets: Vec<Bytes>,        // most recent complete I-frame
    timestamp_ms: u32,
    sequence_first: u32,
}

On new participant join, cache is replayed before live forwarding starts. Eliminates 2 s black-screen-on-join.

Cache TTL: replaced whenever a new complete I-frame arrives.

PLI suppression

If ≥ 2 receivers PLI within 200 ms for the same (sender, stream_id), the SFU emits one KeyframeRequest upstream, not N. Tracked per-(sender, stream).

Implementation outline

  1. wzp-video crate scaffold (T4.1).
  2. Framer/depacketizer with property tests (T4.1).
  3. VideoToolbox encoder/decoder (macOS) (T4.2).
  4. MediaCodec encoder/decoder (Android, JNI) (T4.3).
  5. NACK signal + sender/receiver state machines (T4.4).
  6. I-frame FEC ratio hint plumbed from encoder to FEC layer (T4.5).
  7. SFU keyframe cache (T4.6).
  8. PLI suppression (T4.7).
  9. End-to-end test: macOS sender → relay → macOS receiver, 5 min call, < 1 % loss network.

Acceptance criteria

  • Unidirectional H.264 720p30 call macOS↔macOS, CPU < 5 % on M1.
  • Android↔macOS works with MediaCodec (surface-texture path).
  • Black-screen-on-join < 200 ms when keyframe cache is warm.
  • Under 5 % synthetic packet loss at 50 ms RTT: NACK recovery keeps video smooth, < 1 keyframe / 2 s.
  • Under 5 % synthetic packet loss at 300 ms RTT: PLI fallback fires, keyframe rate ~ 1 / s.
  • Upstream PLI traffic at SFU < 2 / s under simulated mass packet loss with 8 receivers.

Risks

  • MediaCodec surface-texture edge cases. Per-device matrix; software fallback path mandatory.
  • VideoToolbox H.264 baseline restrictions (some profiles are main-only in HW). Mitigation: profile detection at session start.
  • NACK storm under heavy loss. Mitigation: rate cap (max 50 Nacks/s/receiver) and exponential backoff.
  • Keyframe cache memory footprint (one I-frame per active stream per room). Mitigation: cap cache at 200 KB; if exceeded, drop and rely on PLI.

Effort

~3 weeks (Wave 4 tasks T4.1T4.7).