Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
114 lines
7.5 KiB
Markdown
114 lines
7.5 KiB
Markdown
---
|
||
tags: [architecture, wzp]
|
||
type: architecture
|
||
---
|
||
|
||
# WZP Protocol Audit
|
||
|
||
> Protocol-level review of WZP as of 2026-05-11. See `WZP-SPEC.md` for the spec being audited.
|
||
|
||
## Strengths
|
||
|
||
- **QUIC datagrams instead of raw UDP + SRTP** — buys TLS 1.3, PLPMTUD, path migration, and ACK-based loss/RTT estimation. Quinn's `PathSnapshot` feeding `DredTuner` is something WebRTC stacks build from scratch.
|
||
- **Continuous DRED tuning.** Mapping RTT / loss / jitter to a continuous Opus DRED lookback window is genuinely better than discrete tiers — most stacks treat DRED as on/off.
|
||
- **MiniHeader (49/50).** At 50 pps that is ~400 B/s saved per stream; meaningful at scale.
|
||
- **SFU never decodes.** Preserves E2E. Most SFUs (LiveKit, Janus) terminate SRTP at the SFU.
|
||
- **RaptorQ for low-bitrate Codec2 + DRED for Opus.** Correct split — DRED is cheaper than FEC at high bitrate; RaptorQ shines when you can afford many small symbols.
|
||
|
||
## Weaknesses
|
||
|
||
### W1. `u16` sequence wraps every ~21 minutes at 50 pps
|
||
Anti-replay window is 64 packets so wrap is safe for replay. **But** the jitter buffer's `BTreeMap<u16, _>` will misorder across the wrap boundary if a packet is delayed more than ~32 k frames. Widen to `u32` (or version the field).
|
||
|
||
### W2. `fec_block_id: u8` wraps every 256 blocks (~25 s at 5-frame blocks)
|
||
A late-joining peer or a slow reconstructor can collide block IDs. Widen to `u16` or carry an epoch counter.
|
||
|
||
### W3. `timestamp_ms` rebase behavior at rekey is unspecified
|
||
Rekey every 65,536 packets (~22 min). If `timestamp_ms` resets, downstream sync glitches. If it does not, document explicitly.
|
||
|
||
### W4. `MiniHeader` has no `seq`
|
||
Receiver infers absolute seq from the most recent full header + frame count. One missed full header (every 50 frames = 1 s) leaves 49 packets with unknown absolute seq. Acceptable for audio with short jitter buffers — **fatal for video** where one missed full header can desync an entire GOP. **Add `seq_delta: u8` to MiniHeader before video lands.**
|
||
|
||
### W5. `QualityReport` placement vs. AEAD
|
||
A 4-byte trailer on encrypted media is fine **iff it sits inside the AEAD payload**. If it is outside, anything stripping the last 4 bytes corrupts decryption and creates a downgrade vector. Verify in `packet.rs`; if outside, move it inside or AAD-bind it.
|
||
|
||
### W6. Adaptive controller is loss / RTT-only — no bandwidth estimator
|
||
Quinn exposes `cwnd` and `bytes_in_flight`, but `AdaptiveQualityController` does not consume them. Under low utilization you cannot detect that you *could* upgrade to Opus 64 k. **For video this is mandatory** — without BWE you will either oscillate or never use available capacity.
|
||
|
||
### W7. No NACK / explicit retransmit path
|
||
For audio with DRED + FEC this is fine. For video keyframes it is wasteful — an I-frame is 50–200 packets, protecting at 50 % FEC doubles bitrate. A NACK path is cheap and far cheaper than blanket FEC for I-frames.
|
||
|
||
### W8. TrunkFrame batching multiplies AEAD cost
|
||
Each inner payload is its own AEAD operation. At 10 entries that is 10× ChaCha calls per recv. Fine on x86 / ARM with AES-NI / NEON; profile on weak Android (Nothing A059 baseline).
|
||
|
||
### W9. `CodecID` is 4 bits → max 16 codecs; 9 already used
|
||
Adding H.264, H.265, AV1, VP9 takes you to 13. Land the widening **before** deployment — either steal from `reserved` / `csrc_count` to make CodecID 8-bit, or split into `MediaType:2 / CodecID:6`. Doing this post-deployment is painful.
|
||
|
||
### W10. No `MediaType` field
|
||
Audio vs. video vs. data is implicit in CodecID. A 2-bit `MediaType` lets the SFU apply per-type policy (drop video first under congestion, prioritize audio fan-out) without a codec lookup.
|
||
|
||
### W11. Anti-replay window 64 packets is tight for video
|
||
One keyframe burst can be 100+ packets; a single reordered earlier packet stalls the window. Bump to 256 or 1024 for video streams, or maintain a per-stream window.
|
||
|
||
### W12. `SignalMessage` has no version byte
|
||
Bincode + `#[serde(default, skip_serializing_if)]` covers field additions but not variant removal or semantic change. Lead every variant with `version: u8`.
|
||
|
||
### W13. RoomManager Mutex per-packet — **RESOLVED**
|
||
Already flagged in `ARCHITECTURE.md`. At ~1500 pps/sender for video this becomes a real ceiling.
|
||
|
||
**Resolution (T3.1):** `RoomManager` now stores `DashMap<String, Arc<RwLock<Room>>>` instead of `DashMap<String, Room>`. The DashMap guard is held only long enough to clone the `Arc`; all per-room operations (fan-out `others()`, quality `observe_quality()`, join/leave) then acquire the room-level `std::sync::RwLock`. This lets concurrent `others()` calls share a read lock while writers hold the write lock, eliminating the per-packet DashMap contention that was the original concern.
|
||
|
||
### W14. No receiver → sender congestion feedback beyond inline QualityReport
|
||
For video you need REMB-style or transport-CC-style explicit BWE feedback at ~50 ms cadence, independent of media packets.
|
||
|
||
## Priorities
|
||
|
||
| Priority | Issue | Why |
|
||
|---|---|---|
|
||
| P0 | W9 (CodecID width), W10 (MediaType), W4 (MiniHeader seq_delta) | Wire-format changes — must land before video, painful to change post-deploy |
|
||
| P0 | W1 (seq u16 → u32) | Same window; audio benefits too |
|
||
| P1 | W6 (BWE), W14 (transport feedback) | Blocking for usable video; improves audio adaptation |
|
||
| P1 | W5 (QualityReport in AEAD) | Security correctness |
|
||
| P2 | W2 (fec_block_id width), W11 (anti-replay window), W12 (signal version byte) | Long-tail correctness |
|
||
| P2 | W7 (NACK path), W13 (RoomManager lock) | Video performance, not correctness |
|
||
| P3 | W3 (timestamp rebase doc), W8 (AEAD profiling) | Documentation / measurement |
|
||
|
||
## Resolution status (2026-05-11)
|
||
|
||
The v2 wire format specified in `ROAD-TO-VIDEO.md` Phase V1 addresses:
|
||
|
||
| Issue | Resolved by |
|
||
|---|---|
|
||
| W1 (seq u16 → u32) | `sequence: u32` in MediaHeader v2 |
|
||
| W4 (MiniHeader seq) | `seq_delta: u8` added; MiniHeader v2 is 5 B |
|
||
| W9 (CodecID width) | Widened to 8-bit (room for 256) |
|
||
| W10 (MediaType) | Explicit `media_type: u8` byte |
|
||
|
||
W6 / W14 (BWE + TransportFeedback) addressed in Phase V2. W7 (NACK) addressed in Phase V2 / V4. Others remain open.
|
||
|
||
## Known pre-existing clippy debt (as of T1.5.2)
|
||
|
||
Measured at commit `c93d302` on `experimental-ui` (2026-05-11).
|
||
|
||
`cargo clippy --workspace --all-targets -- -D warnings` fails in two crates with **pre-existing** errors (verified against `HEAD~1`). These are not introduced by any Wave 1 task; they should be cleaned up in a dedicated hygiene sprint or accepted as known debt.
|
||
|
||
### `wzp-codec` — 9 errors
|
||
|
||
| Category | Count | Lint | Files |
|
||
|---|---|---|---|
|
||
| Manual saturating sub | 1 | `clippy::implicit_saturating_sub` | `aec.rs:117` |
|
||
| Needless range loop | 2 | `clippy::needless_range_loop` | `aec.rs:164`, `resample.rs:51` |
|
||
| Manual `div_ceil` | 2 | `clippy::manual_div_ceil` | `codec2_dec.rs:48`, `codec2_enc.rs:48` |
|
||
| Manual `clamp` | 2 | `clippy::manual_clamp` | `denoise.rs:59`, `opus_enc.rs:250` |
|
||
| Manual ASCII case-cmp | 1 | `clippy::manual_ascii_check` | `opus_enc.rs:99` |
|
||
| Same-item push in loop | 1 | `clippy::same_item_push` | `resample.rs:184` |
|
||
|
||
### `warzone-protocol` (submodule `deps/featherchat`) — 3 errors
|
||
|
||
| Category | Count | Lint | Files |
|
||
|---|---|---|---|
|
||
| `clone` on `Copy` type | 1 | `clippy::clone_on_copy` | `ratchet.rs:202` |
|
||
| Missing `Default` impl | 2 | `clippy::new_without_default` | `types.rs:59`, `types.rs:69` |
|
||
|
||
**Policy:** New tasks must not add *new* clippy errors in crates they touch. The 12 errors above are grandfathered; a follow-up cleanup task should be scheduled to fix them (especially the `wzp-codec` ones, which are straightforward mechanical replacements).
|