Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.5 KiB
tags, type
| tags | type | ||
|---|---|---|---|
|
prd |
PRD: Multi-Codec Video Negotiation (H.264 + H.265 + AV1)
Status: proposed Resolves: Road-to-video Phase V3 codec rollout; reserves
CodecIDslots 9–13. Depends on: PRD #5 (video v1 working with H.264).
Problem
H.264 baseline ships first because it has universal hardware encode coverage. H.265 offers ~30 % efficiency at equal quality and is now broadly supported in HW (Apple A10+, Snapdragon since ~2017, NVENC since GTX 9xx). AV1 is the long-term target but hardware encode is limited (Apple M3/A17+, Snapdragon 8 Gen 3+, RTX 40+).
We need codec negotiation so each session uses the best mutually-supported codec without manual configuration, and so we can roll AV1 in gated on real telemetry.
Goals
CodecIDassignments for H.264 baseline (9), H.264 main (10), H.265 main (11), AV1 (12), VP9 reserved (13).- Capability declaration in
CallOffer.supported_codecs. - Picker logic: highest mutually-supported codec from a deterministic preference cascade.
- Hardware-encode detection at session start; refuse codecs requiring SW encode on battery-powered devices.
- Existing framer/depacketizer reused — only the codec wrapper changes.
Non-goals
- New codecs beyond this list.
- Per-receiver codec selection (one codec per stream for v1; could be revisited with simulcast).
Design
Codec capability declaration
pub struct CodecCapability {
pub codec_id: u8,
pub max_resolution: (u16, u16),
pub max_fps: u8,
pub hardware: bool, // true if HW encode available
}
pub struct CallOffer {
...
pub supported_codecs: Vec<CodecCapability>,
}
Preference cascade
preference: [AV1, H.265 main, H.264 main, H.264 baseline]
pick = first codec in `preference` where:
caller.supported.contains(codec)
AND callee.supported.contains(codec)
AND (codec.hardware on both sides OR codec.allow_software)
allow_software defaults to false for AV1 (battery cost too high), true for H.264 (cheap SW fallback).
Per-codec details
| ID | Codec | Encoder priority |
|---|---|---|
| 9 | H.264 baseline | VideoToolbox / MediaCodec / NVENC / QSV / AMF / VAAPI; OpenH264 SW |
| 10 | H.264 main | Same HW; same SW |
| 11 | H.265 main | VideoToolbox A10+ / MediaCodec / NVENC GTX 9xx+ / QSV Skylake+; x265 SW (slow, disabled by default) |
| 12 | AV1 | VideoToolbox M3+/A17+ / MediaCodec SD8G3+ / NVENC RTX 40+; SVT-AV1 SW (gated) |
| 13 | VP9 | Reserved; may not implement |
Framer reuse
The 16 B MediaHeader carries codec_id. The framer doesn't care which codec — it fragments NALs (for H.264/H.265) or OBUs (for AV1) into MTU-sized chunks, sets KeyFrame/FrameEnd bits, and passes payload through. Per-codec parameter sets (SPS/PPS for H.264/H.265, sequence header OBU for AV1) ship on the signal stream.
Mid-call codec switch
Optional in v1. If implemented:
- Sender sends
SignalMessage::CodecSwitch { stream_id, new_codec_id, parameter_sets }. - Receiver swaps decoder and emits PLI to force a clean keyframe.
Implementation outline
CodecCapabilitydeclaration + serde (additive change).- HW probe at session start (per platform).
- Picker logic in
CallOffer/CallAnswerflow. - H.265 encoder/decoder wrappers (VideoToolbox + MediaCodec).
- AV1 encoder/decoder wrappers, gated on HW (SVT-AV1 fallback behind flag).
- Prometheus:
wzp_session_codec_id_total{codec}for telemetry on actual codec usage.
Acceptance criteria
- Two macOS clients (M1 + M3) pick H.265 by default; M3 + iPhone 15 Pro pick AV1.
- M1 + Android device without H.265 HW picks H.264.
- Codec selection is deterministic given both sides' capabilities.
- AV1 refused on devices without HW unless
allow_softwareflag explicitly set.
Rollout gates
- H.264 baseline + main: ship with PRD #5.
- H.265: enable by default once HW probe accuracy verified on 5+ macOS + 5+ Android devices.
- AV1: 20 % of session-start probes must report HW encode capability before enabling by default. Until then, available only via debug flag.
Risks
- AV1 SW encode torches battery. Mitigation: HW gate is mandatory; SW fallback off by default.
- H.265 patent surface. Mitigation: rely on platform-provided HW encoders (license covered upstream); avoid shipping x265 binary.
- HW probe lies on some Android devices. Mitigation: in-session fallback if encoder errors at start; degrade one codec tier.
Effort
- H.265 wrappers: 3 d (T5.4)
- AV1 wrappers + HW gate: 5 d (T6.1)
- Picker + capability declaration: 1 d
Total: ~9 engineer-days, in Waves 5–6.