Files
wz-phone/docs/WZP-SPEC.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

6.3 KiB
Raw Permalink Blame History

WZP Protocol Specification (one-page reference)

Distilled from docs/ARCHITECTURE.md and the wzp-proto crate. Authoritative wire details live in crates/wzp-proto/src/packet.rs.

Status: v2 is the deployed protocol (audio + video, 16 B header, MediaType, u32 seq). v1 clients are rejected with Hangup::ProtocolVersionMismatch.

Layer summary

Layer WZP FaceTime equivalent
Transport QUIC datagrams (Quinn), PLPMTUD 1200 → 1452 RTP/SRTP over UDP, ICE
Signaling SignalMessage (bincode) over a QUIC stream, SNI = hashed room name APNs-tunneled binary plist
Identity Ed25519 + X25519 from BIP39 seed; fingerprint = SHA-256(pubkey)[..16] IDS RSA + ECDSA per device
Key agreement X25519 DH + HKDF, Ed25519 signatures, rekey every 65,536 packets Per-call DH signed by IDS keys
Bulk crypto ChaCha20-Poly1305, 64-packet sliding anti-replay SRTP (AES-CTR + HMAC)
Loss recovery RaptorQ FEC + Opus DRED + classical PLC NACK / PLI + reference-picture selection
Adaptive 3-tier hysteresis (Good / Degraded / Catastrophic) + continuous DRED tuner Per-frame bitrate ladder
Topology SFU rooms + inter-relay federation + P2P via ICE Mesh ≤ ~3, SFU above, Apple relays
Header 16 B MediaHeader v2 / 5 B MiniHeader (49 of 50), 4 B QualityReport trailer RTP 12 B + extensions

Distinctive choices

  • QUIC datagrams instead of raw UDP + SRTP. Brings TLS 1.3, PLPMTUD, path migration, and ACK-based RTT/loss estimation for free.
  • Continuous DRED tuning. Maps live (loss%, RTT, jitter) to a continuous Opus DRED lookback window. Most stacks treat DRED as discrete tiers.
  • MiniHeader (5 B for 49/50 packets). Saves ~11 B/packet ≈ 550 B/s/stream at 50 pps vs. the full 16 B header.
  • E2E-preserving SFU. The relay forwards encrypted datagrams; it never decrypts media. Room membership uses SNI = hash(room_name).
  • Codec coordination via QualityReport trailer. Receivers attach 4-byte loss/RTT/jitter/cap to media packets; the SFU broadcasts QualityDirective so all senders in a room converge on the same tier.

Wire format (current — v2)

MediaHeader v2 (16 bytes, byte-aligned)

Byte 0:      version        (u8)   0x02
Byte 1:      flags          (u8)   [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
Byte 2:      media_type     (u8)   0=audio, 1=video, 2=data, 3=control
Byte 3:      codec_id       (u8)   0-255 (see codec table)
Byte 4:      stream_id      (u8)   simulcast layer; 0=base
Byte 5:      fec_ratio      (u8)   0..200 → 0.0..2.0
Bytes 6-9:   sequence       (u32 BE)
Bytes 10-13: timestamp_ms   (u32 BE)
Bytes 14-15: fec_block_id   (u16 BE)
Field Bits Meaning
version 8 Must be 0x02; v1 clients receive Hangup::ProtocolVersionMismatch
T (bit 7 of flags) 1 1 = FEC repair packet
Q (bit 6 of flags) 1 QualityReport trailer present
KeyFrame (bit 5 of flags) 1 Packet belongs to a video I-frame
FrameEnd (bit 4 of flags) 1 Last packet of an access unit
reserved (bits 3-0 of flags) 4 Must be zero
media_type 8 0=audio, 1=video, 2=data, 3=control
codec_id 8 See codec table (widened from v1's 4-bit field)
stream_id 8 Simulcast layer; 0=base layer
fec_ratio 8 0..200 → 0.0..2.0
sequence 32 Monotonically increasing packet seq (not reset by rekey)
timestamp_ms 32 ms since session start. Monotonic across the full session; not reset by rekey
fec_block_id 16 FEC source block ID

Codec table

ID Codec Bitrate Sample Frame
0 Opus 24k 24 kbps 48 kHz 20 ms
1 Opus 16k 16 kbps 48 kHz 20 ms
2 Opus 6k 6 kbps 48 kHz 40 ms
3 Codec2 3200 3.2 kbps 8 kHz 20 ms
4 Codec2 1200 1.2 kbps 8 kHz 40 ms
5 ComfortNoise 0 48 kHz 20 ms
6 Opus 32k 32 kbps 48 kHz 20 ms
7 Opus 48k 48 kbps 48 kHz 20 ms
8 Opus 64k 64 kbps 48 kHz 20 ms
9 H.264 Baseline
10 H.264 Main
11 H.265 Main
12 AV1 Main

MiniHeader v2 (5 bytes, compressed — 49 of every 50 packets)

[FRAME_TYPE_MINI = 0x01]
Byte 0:    seq_delta           (u8)
Bytes 1-2: timestamp_delta_ms  (u16 BE)
Bytes 3-4: payload_len         (u16 BE)

Full header sent every 50th packet to resync.

TrunkFrame (batched, relay-internal)

[count: u16]
  [session_id: 2][len: u16][payload: len]   × count

Up to 10 entries or PMTUD-discovered MTU; flushed every 5 ms.

QualityReport (4 bytes, optional inline trailer)

Byte 0: loss_pct          (0-255 → 0-100%)
Byte 1: rtt_4ms           (0-255 → 0-1020 ms)
Byte 2: jitter_ms         (0-255 ms)
Byte 3: bitrate_cap_kbps  (0-255 kbps)

Version negotiation

  • version=0x02 in MediaHeader is a hard switch — there is no fallback negotiation.
  • Both endpoints must speak v2. A v1 peer receives Hangup::ProtocolVersionMismatch immediately.
  • Relays inspect only version and media_type; they never downgrade or translate between versions.

Session lifecycle

Idle → Connecting → Handshaking → Active ⇄ Rekeying → Closed
  • CallOffer { identity_pub, ephemeral_pub, signature, profiles }
  • CallAnswer { identity_pub, ephemeral_pub, signature, chosen_profile }
  • session_key = HKDF(X25519_DH(eph_a, eph_b), "warzone-session-key")
  • Rekey every 65,536 packets via fresh ephemeral DH.

SFU forwarding rules

  1. Fan-out to all room participants except the sender.
  2. Failed sends are skipped; forwarding is best-effort.
  3. The relay never decrypts media.
  4. With trunking on, packets to the same receiver are batched (flush 5 ms).
  5. QualityDirective is broadcast when the room-wide tier degrades.

Adaptive quality (audio, today)

Tier Codec FEC Frame
Good Opus 24 k 20 % 20 ms
Degraded Opus 6 k 50 % 40 ms
Catastrophic Codec2 1200 100 % 40 ms

Hysteresis: 3 reports to downgrade (2 on cellular), 10 to upgrade.

NAT traversal (Phase 8)

  • Candidate types: Host, Port-mapped (NAT-PMP / PCP / UPnP), Server-reflexive (STUN), Relay.
  • Hard-NAT port prediction with classify_port_allocation()predict_ports()HardNatProbe signal.
  • Mid-call re-gather: CandidateUpdate { generation }.