Files
wz-phone/docs/WZP-SPEC.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

148 lines
6.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# WZP Protocol Specification (one-page reference)
> Distilled from `docs/ARCHITECTURE.md` and the `wzp-proto` crate. Authoritative wire details live in `crates/wzp-proto/src/packet.rs`.
>
> **Status:** v2 is the deployed protocol (audio + video, 16 B header, MediaType, u32 seq). v1 clients are rejected with `Hangup::ProtocolVersionMismatch`.
## Layer summary
| Layer | WZP | FaceTime equivalent |
|---|---|---|
| Transport | **QUIC datagrams** (Quinn), PLPMTUD 1200 → 1452 | RTP/SRTP over UDP, ICE |
| Signaling | `SignalMessage` (bincode) over a QUIC stream, SNI = hashed room name | APNs-tunneled binary plist |
| Identity | Ed25519 + X25519 from BIP39 seed; fingerprint = SHA-256(pubkey)[..16] | IDS RSA + ECDSA per device |
| Key agreement | X25519 DH + HKDF, Ed25519 signatures, rekey every 65,536 packets | Per-call DH signed by IDS keys |
| Bulk crypto | ChaCha20-Poly1305, 64-packet sliding anti-replay | SRTP (AES-CTR + HMAC) |
| Loss recovery | **RaptorQ FEC + Opus DRED + classical PLC** | NACK / PLI + reference-picture selection |
| Adaptive | 3-tier hysteresis (Good / Degraded / Catastrophic) + continuous DRED tuner | Per-frame bitrate ladder |
| Topology | SFU rooms + inter-relay federation + P2P via ICE | Mesh ≤ ~3, SFU above, Apple relays |
| Header | 16 B `MediaHeader` v2 / 5 B `MiniHeader` (49 of 50), 4 B `QualityReport` trailer | RTP 12 B + extensions |
## Distinctive choices
- **QUIC datagrams instead of raw UDP + SRTP.** Brings TLS 1.3, PLPMTUD, path migration, and ACK-based RTT/loss estimation for free.
- **Continuous DRED tuning.** Maps live `(loss%, RTT, jitter)` to a continuous Opus DRED lookback window. Most stacks treat DRED as discrete tiers.
- **MiniHeader (5 B for 49/50 packets).** Saves ~11 B/packet ≈ 550 B/s/stream at 50 pps vs. the full 16 B header.
- **E2E-preserving SFU.** The relay forwards encrypted datagrams; it never decrypts media. Room membership uses SNI = `hash(room_name)`.
- **Codec coordination via `QualityReport` trailer.** Receivers attach 4-byte loss/RTT/jitter/cap to media packets; the SFU broadcasts `QualityDirective` so all senders in a room converge on the same tier.
## Wire format (current — v2)
### `MediaHeader` v2 (16 bytes, byte-aligned)
```
Byte 0: version (u8) 0x02
Byte 1: flags (u8) [T:1][Q:1][KeyFrame:1][FrameEnd:1][reserved:4]
Byte 2: media_type (u8) 0=audio, 1=video, 2=data, 3=control
Byte 3: codec_id (u8) 0-255 (see codec table)
Byte 4: stream_id (u8) simulcast layer; 0=base
Byte 5: fec_ratio (u8) 0..200 → 0.0..2.0
Bytes 6-9: sequence (u32 BE)
Bytes 10-13: timestamp_ms (u32 BE)
Bytes 14-15: fec_block_id (u16 BE)
```
| Field | Bits | Meaning |
|---|---|---|
| version | 8 | Must be `0x02`; v1 clients receive `Hangup::ProtocolVersionMismatch` |
| T (bit 7 of flags) | 1 | 1 = FEC repair packet |
| Q (bit 6 of flags) | 1 | QualityReport trailer present |
| KeyFrame (bit 5 of flags) | 1 | Packet belongs to a video I-frame |
| FrameEnd (bit 4 of flags) | 1 | Last packet of an access unit |
| reserved (bits 3-0 of flags) | 4 | Must be zero |
| media_type | 8 | 0=audio, 1=video, 2=data, 3=control |
| codec_id | 8 | See codec table (widened from v1's 4-bit field) |
| stream_id | 8 | Simulcast layer; 0=base layer |
| fec_ratio | 8 | 0..200 → 0.0..2.0 |
| sequence | 32 | Monotonically increasing packet seq (not reset by rekey) |
| timestamp_ms | 32 | ms since session start. Monotonic across the full session; **not reset by rekey** |
| fec_block_id | 16 | FEC source block ID |
### Codec table
| ID | Codec | Bitrate | Sample | Frame |
|---|---|---|---|---|
| 0 | Opus 24k | 24 kbps | 48 kHz | 20 ms |
| 1 | Opus 16k | 16 kbps | 48 kHz | 20 ms |
| 2 | Opus 6k | 6 kbps | 48 kHz | 40 ms |
| 3 | Codec2 3200 | 3.2 kbps | 8 kHz | 20 ms |
| 4 | Codec2 1200 | 1.2 kbps | 8 kHz | 40 ms |
| 5 | ComfortNoise | 0 | 48 kHz | 20 ms |
| 6 | Opus 32k | 32 kbps | 48 kHz | 20 ms |
| 7 | Opus 48k | 48 kbps | 48 kHz | 20 ms |
| 8 | Opus 64k | 64 kbps | 48 kHz | 20 ms |
| 9 | H.264 Baseline | — | — | — |
| 10 | H.264 Main | — | — | — |
| 11 | H.265 Main | — | — | — |
| 12 | AV1 Main | — | — | — |
### `MiniHeader` v2 (5 bytes, compressed — 49 of every 50 packets)
```
[FRAME_TYPE_MINI = 0x01]
Byte 0: seq_delta (u8)
Bytes 1-2: timestamp_delta_ms (u16 BE)
Bytes 3-4: payload_len (u16 BE)
```
Full header sent every 50th packet to resync.
### `TrunkFrame` (batched, relay-internal)
```
[count: u16]
[session_id: 2][len: u16][payload: len] × count
```
Up to 10 entries or PMTUD-discovered MTU; flushed every 5 ms.
### `QualityReport` (4 bytes, optional inline trailer)
```
Byte 0: loss_pct (0-255 → 0-100%)
Byte 1: rtt_4ms (0-255 → 0-1020 ms)
Byte 2: jitter_ms (0-255 ms)
Byte 3: bitrate_cap_kbps (0-255 kbps)
```
### Version negotiation
- `version=0x02` in `MediaHeader` is a hard switch — there is no fallback negotiation.
- Both endpoints must speak v2. A v1 peer receives `Hangup::ProtocolVersionMismatch` immediately.
- Relays inspect only `version` and `media_type`; they never downgrade or translate between versions.
## Session lifecycle
```
Idle → Connecting → Handshaking → Active ⇄ Rekeying → Closed
```
- `CallOffer { identity_pub, ephemeral_pub, signature, profiles }`
- `CallAnswer { identity_pub, ephemeral_pub, signature, chosen_profile }`
- `session_key = HKDF(X25519_DH(eph_a, eph_b), "warzone-session-key")`
- Rekey every 65,536 packets via fresh ephemeral DH.
## SFU forwarding rules
1. Fan-out to all room participants except the sender.
2. Failed sends are skipped; forwarding is best-effort.
3. The relay never decrypts media.
4. With trunking on, packets to the same receiver are batched (flush 5 ms).
5. `QualityDirective` is broadcast when the room-wide tier degrades.
## Adaptive quality (audio, today)
| Tier | Codec | FEC | Frame |
|---|---|---|---|
| Good | Opus 24 k | 20 % | 20 ms |
| Degraded | Opus 6 k | 50 % | 40 ms |
| Catastrophic | Codec2 1200 | 100 % | 40 ms |
Hysteresis: 3 reports to downgrade (2 on cellular), 10 to upgrade.
## NAT traversal (Phase 8)
- Candidate types: Host, Port-mapped (NAT-PMP / PCP / UPnP), Server-reflexive (STUN), Relay.
- Hard-NAT port prediction with `classify_port_allocation()``predict_ports()``HardNatProbe` signal.
- Mid-call re-gather: `CandidateUpdate { generation }`.