docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
137
vault/PRDs/PRD-video-v1.md
Normal file
137
vault/PRDs/PRD-video-v1.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
tags: [prd, wzp]
|
||||
type: prd
|
||||
---
|
||||
|
||||
# PRD: Video v1 — H.264 Single-Layer
|
||||
|
||||
> **Status:** proposed
|
||||
> **Resolves:** Road-to-video Phases V3 + V4 (encoder/decoder, framer, NACK, keyframe cache).
|
||||
> **Depends on:** PRD #1 (wire format v2), PRD #3 (TransportFeedback + BWE).
|
||||
|
||||
## Problem
|
||||
|
||||
WZP has no video path. Add a working unidirectional video call (macOS↔macOS first, then Android↔macOS) using H.264 baseline, with loss recovery appropriate for lossy mobile links.
|
||||
|
||||
## Goals
|
||||
|
||||
- New `wzp-video` crate parallel to `wzp-codec`.
|
||||
- H.264 baseline encode/decode using platform hardware encoders.
|
||||
- NAL fragmentation and access-unit reassembly conformant to our 16 B `MediaHeader` v2.
|
||||
- NACK loop for P-frame loss (RTT-gated).
|
||||
- Dynamic FEC ratio boost on I-frame packets.
|
||||
- SFU keyframe cache for fast join-to-first-frame.
|
||||
- PLI suppression at SFU to bound upstream keyframe-request traffic.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Multi-codec negotiation (PRD #6).
|
||||
- Simulcast or per-receiver layer selection (PRD #8).
|
||||
- VideoQualityController logic beyond a fixed bitrate target (PRD #7).
|
||||
- Native camera capture pipelines (separate platform work).
|
||||
|
||||
## Design
|
||||
|
||||
### `wzp-video` crate
|
||||
|
||||
```
|
||||
wzp-video/
|
||||
src/
|
||||
encoder.rs # trait VideoEncoder
|
||||
# VideoToolboxEncoder (macOS)
|
||||
# MediaCodecEncoder (Android, JNI)
|
||||
# OpenH264Encoder (software fallback)
|
||||
decoder.rs # trait VideoDecoder; mirror per-platform
|
||||
framer.rs # H.264 NAL fragmentation to MTU-sized chunks
|
||||
depacketizer.rs # Reassemble NALs, emit access units
|
||||
keyframe.rs # Keyframe request handling, sender + receiver
|
||||
config.rs # SPS/PPS shipment over signal stream
|
||||
```
|
||||
|
||||
### Framing
|
||||
|
||||
One access unit (frame) → N packets, each ≤ `MTU - 16 (header) - 16 (AEAD tag)`.
|
||||
|
||||
- `sequence` global per (session, stream_id), advances per packet.
|
||||
- `timestamp_ms` is presentation time, equal across all packets of a single access unit.
|
||||
- `KeyFrame` bit set on every packet of an I-frame.
|
||||
- `FrameEnd` bit set on the last packet of the access unit.
|
||||
- `fec_block_id` per access unit (u16 in v2, large blocks).
|
||||
|
||||
Parameter sets (SPS/PPS) ride on the **signal stream**, not media datagrams. Sent at session start and on codec change. Reliable, ordered, one-time.
|
||||
|
||||
### NACK loop
|
||||
|
||||
```
|
||||
SignalMessage::Nack {
|
||||
version: u8,
|
||||
stream_id: u8,
|
||||
seqs: Vec<u32>, // missing P-frame packets
|
||||
}
|
||||
```
|
||||
|
||||
Receiver behavior:
|
||||
- If access unit incomplete after `frame_interval` ms:
|
||||
- If `RTT < 2 × frame_interval`: emit `Nack`.
|
||||
- Else: emit `PictureLossIndication`.
|
||||
- Backoff: max 1 Nack per (stream, seq) per 2 × RTT.
|
||||
|
||||
Sender behavior:
|
||||
- On `Nack`: re-transmit if packet is still in send buffer (last 500 ms).
|
||||
- On `PictureLossIndication`: emit a fresh I-frame within 200 ms.
|
||||
|
||||
### Dynamic FEC on I-frames
|
||||
|
||||
Encoder marks packets belonging to I-frames. FEC layer applies a higher ratio (default 0.5) to I-frame blocks, vs. nominal (0.1) for P-frames. Configurable.
|
||||
|
||||
### SFU keyframe cache
|
||||
|
||||
`RoomManager` maintains per `(room, sender, stream_id)`:
|
||||
|
||||
```rust
|
||||
struct KeyframeCache {
|
||||
packets: Vec<Bytes>, // most recent complete I-frame
|
||||
timestamp_ms: u32,
|
||||
sequence_first: u32,
|
||||
}
|
||||
```
|
||||
|
||||
On new participant join, cache is replayed before live forwarding starts. Eliminates 2 s black-screen-on-join.
|
||||
|
||||
Cache TTL: replaced whenever a new complete I-frame arrives.
|
||||
|
||||
### PLI suppression
|
||||
|
||||
If ≥ 2 receivers PLI within 200 ms for the same `(sender, stream_id)`, the SFU emits one `KeyframeRequest` upstream, not N. Tracked per-(sender, stream).
|
||||
|
||||
## Implementation outline
|
||||
|
||||
1. `wzp-video` crate scaffold (T4.1).
|
||||
2. Framer/depacketizer with property tests (T4.1).
|
||||
3. VideoToolbox encoder/decoder (macOS) (T4.2).
|
||||
4. MediaCodec encoder/decoder (Android, JNI) (T4.3).
|
||||
5. NACK signal + sender/receiver state machines (T4.4).
|
||||
6. I-frame FEC ratio hint plumbed from encoder to FEC layer (T4.5).
|
||||
7. SFU keyframe cache (T4.6).
|
||||
8. PLI suppression (T4.7).
|
||||
9. End-to-end test: macOS sender → relay → macOS receiver, 5 min call, < 1 % loss network.
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
- Unidirectional H.264 720p30 call macOS↔macOS, CPU < 5 % on M1.
|
||||
- Android↔macOS works with MediaCodec (surface-texture path).
|
||||
- Black-screen-on-join < 200 ms when keyframe cache is warm.
|
||||
- Under 5 % synthetic packet loss at 50 ms RTT: NACK recovery keeps video smooth, < 1 keyframe / 2 s.
|
||||
- Under 5 % synthetic packet loss at 300 ms RTT: PLI fallback fires, keyframe rate ~ 1 / s.
|
||||
- Upstream PLI traffic at SFU < 2 / s under simulated mass packet loss with 8 receivers.
|
||||
|
||||
## Risks
|
||||
|
||||
- **MediaCodec surface-texture edge cases.** Per-device matrix; software fallback path mandatory.
|
||||
- **VideoToolbox H.264 baseline restrictions** (some profiles are main-only in HW). Mitigation: profile detection at session start.
|
||||
- **NACK storm under heavy loss.** Mitigation: rate cap (max 50 Nacks/s/receiver) and exponential backoff.
|
||||
- **Keyframe cache memory footprint** (one I-frame per active stream per room). Mitigation: cap cache at 200 KB; if exceeded, drop and rely on PLI.
|
||||
|
||||
## Effort
|
||||
|
||||
~3 weeks (Wave 4 tasks T4.1–T4.7).
|
||||
Reference in New Issue
Block a user