Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
238 lines
13 KiB
Markdown
238 lines
13 KiB
Markdown
---
|
||
tags: [audit, wzp]
|
||
type: audit
|
||
created: 2026-05-25
|
||
---
|
||
|
||
# WarzonePhone Protocol Audit — 2026-05-25
|
||
|
||
**Auditor:** Claude Sonnet 4.6 (assisted)
|
||
**Branch:** `experimental-ui` @ `f3e3ee5`
|
||
**Scope:** All workspace crates (`wzp-proto`, `wzp-codec`, `wzp-fec`, `wzp-crypto`, `wzp-transport`, `wzp-relay`, `wzp-client`, `wzp-android`, `wzp-native`, `wzp-video`)
|
||
**Test baseline:** 702 passing (excludes `wzp-android`)
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
The audio call path is functionally correct and cryptographically sound on clean network paths. **There is a session-breaking bug in the crypto nonce derivation (C1) that will cause a permanent decryption failure on any out-of-order UDP delivery.** This is the single highest-priority fix — it will manifest as periodic session crashes under normal internet conditions. Video has a solid architectural foundation but three hard blockers remain before shipping: the AEAD coverage gap (C2), dead video scorer (C3), and Android MediaCodec compile failure (C4).
|
||
|
||
The project is in good shape overall. The crypto design (X25519, HKDF, ChaCha20-Poly1305, Ed25519 identity, SAS verification) is sound. The SFU-never-decrypts architecture is rare and valuable. The codec adaptation (Opus DRED + Codec2 RaptorQ split) is genuinely innovative. The eight issues below are fixable in ~12 engineer-hours.
|
||
|
||
---
|
||
|
||
## Critical
|
||
|
||
### C1 — Nonce derives from `recv_seq` counter, not `MediaHeader.seq`
|
||
|
||
**File:** `crates/wzp-crypto/src/session.rs:132`
|
||
**Severity:** Critical — session-breaking on any packet reorder
|
||
|
||
```rust
|
||
// decrypt()
|
||
let nonce_bytes = nonce::build_nonce(&self.session_id, self.recv_seq, Direction::Send);
|
||
// ...
|
||
self.recv_seq = self.recv_seq.wrapping_add(1); // line 148
|
||
```
|
||
|
||
`recv_seq` increments once per successful `decrypt()` call. The sender's `send_seq` also increments once per `encrypt()` call (line 120). In perfect in-order delivery they stay synchronized. With any reorder or mid-stream packet loss they permanently diverge. Once diverged, every subsequent packet uses the wrong nonce → AEAD tag mismatch → every packet fails for the rest of the session.
|
||
|
||
This isn't a low-probability edge case. UDP over any internet path reorders packets routinely. The `multiple_packets_roundtrip` test (line 254) only exercises in-order delivery. HANDOFF-2026-05-12.md acknowledges this as a known latent item: *"AEAD nonce derivation: switch to `MediaHeader::seq`"*.
|
||
|
||
The anti-replay check at lines 152–161 already parses `MediaHeader` and has `header.seq` available. The fix is one line in `decrypt()`:
|
||
|
||
```rust
|
||
// Use sender's wire-level seq as nonce input, not a local counter.
|
||
// This survives reordering because both sides derive the same nonce from
|
||
// the same field. recv_seq was wrong: it diverged from send_seq on any
|
||
// reorder, breaking all subsequent decryptions for the session.
|
||
let header = parse_header(header_bytes)
|
||
.ok_or_else(|| CryptoError::Internal("header parse failed".into()))?;
|
||
let nonce_bytes = nonce::build_nonce(&self.session_id, header.seq, Direction::Send);
|
||
```
|
||
|
||
Remove `recv_seq` field from `ChaChaSession` (it's now redundant — anti-replay uses `header.seq` directly). On the encrypt side, verify that `self.send_seq` equals the `seq` written into the `MediaHeader` at the call site.
|
||
|
||
**Estimated effort:** ~1 hour including test coverage for out-of-order delivery.
|
||
|
||
> **Note on rekey seq reset:** The agent initially flagged `send_seq/recv_seq = 0` in `complete_rekey()` as a separate critical issue. This is a false positive — `install_key()` rotates `session_id` (hash of new key), so pre-/post-rekey nonces live in distinct namespaces. The reset is intentional and cryptographically safe.
|
||
|
||
---
|
||
|
||
### C2 — AEAD not wired to every QUIC datagram send path
|
||
|
||
**File:** `crates/wzp-client/src/analyzer.rs:363` (only confirmed decrypt call site)
|
||
**Severity:** Critical — potential plaintext media leakage
|
||
|
||
The HANDOFF document explicitly flags this: *"Encryption is implemented in `wzp-crypto` but not yet on every QUIC datagram path."* The `analyzer.rs` path decrypts inbound packets. What needs verification: every outbound `send_datagram()` / `write_datagram()` call across `wzp-client` and `wzp-transport` must pass through `ChaChaSession::encrypt()`.
|
||
|
||
**Required action:** Grep every `send_datagram` call site. Confirm each path encrypts before transmit. Add a CI-level test or `#[forbid(dead_code)]`-style assertion that makes a plaintext send path impossible to merge. Until this is verified, the E2E security claim cannot be made.
|
||
|
||
**Estimated effort:** ~1 hour audit + test.
|
||
|
||
---
|
||
|
||
### C3 — `VideoScorer::observe()` never called — scorer is dead code
|
||
|
||
**File:** `crates/wzp-relay/src/room.rs:1263–1266`
|
||
**Severity:** Critical — relay abuse control for video is completely absent
|
||
|
||
```rust
|
||
// T6.2-follow-up: feed video packets to VideoScorer here.
|
||
// video_scorer.observe(&pkt.header, pkt.payload.len(), now, bwe_kbps);
|
||
```
|
||
|
||
`video_scorer.rs` was delivered in T6.2 with legitimacy scoring, keyframe regularity checks, I/P ratio analysis, and a verdict enum. The observe call was never wired into the packet forwarding loop. The scorer compiles but accumulates no data. Any participant can flood the room with malformed video or synthetic keyframe bursts and the relay will forward everything without challenge.
|
||
|
||
**Fix:** Wire `video_scorer.observe(...)` at the TODO marker and integrate `legitimacy_score()` into the forwarding decision (drop or rate-limit streams with `Verdict::Malicious`). Add an integration test: synthetic high-frequency keyframe bursts should trigger a `Malicious` verdict within 2 seconds.
|
||
|
||
**Estimated effort:** ~2 hours.
|
||
|
||
---
|
||
|
||
### C4 — `wzp-video` Android target fails to compile (31 errors)
|
||
|
||
**File:** `crates/wzp-video/src/mediacodec.rs`
|
||
**Severity:** Critical — Android video is completely blocked
|
||
|
||
Five error categories from the NDK 0.9 API migration, all documented in HANDOFF-2026-05-12.md. `dav1d`/`svt-av1` were cfg-gated off Android in `f3e3ee5`; these 31 errors are the remaining MediaCodec API mismatch.
|
||
|
||
| Error | Count | Root cause | Fix |
|
||
|---|---|---|---|
|
||
| `E0277` `NonNull<AMediaCodec>` not `Send` | ~3 | Raw pointer held across `tokio::spawn` boundary | `struct SendMediaCodec(NonNull<…>); unsafe impl Send for SendMediaCodec {}` — or use `ndk::media::MediaCodec` owned type (already `Send`) |
|
||
| `E0308` `&[MaybeUninit<u8>]` vs `&[u8]` | many | NDK 0.9 returns uninit slices | `MaybeUninit::write_slice` or transmute pattern |
|
||
| `E0425` missing `BITRATE_MODE_CBR` | 1+ | Constant renamed in NDK 0.9 | Check `ndk` crate docs for current name |
|
||
| `E0433` `ndk_sys` not a dep | several | Direct `ndk_sys` import; only `ndk = "0.9"` declared | Add `ndk-sys` as explicit dep or use safe `ndk` wrappers |
|
||
| `E0599` `InputBuffer::index()` / `OutputBuffer::index()` private | 2 | API changed in NDK 0.9 | Use buffer through safe queue/dequeue API |
|
||
|
||
Nothing live is blocked today — `wzp-video` is not yet consumed by Tauri Android. But video on Android cannot progress until this compiles.
|
||
|
||
**Reproduce:**
|
||
```bash
|
||
ssh -i ~/CascadeProjects/wzp manwe@manwehs \
|
||
'cd ~/wzp-builder/data/source && \
|
||
docker run --rm \
|
||
-v ~/wzp-builder/data/source:/build/source \
|
||
-v ~/wzp-builder/data/cache/cargo-registry:/home/builder/.cargo/registry \
|
||
-v ~/wzp-builder/data/cache/cargo-git:/home/builder/.cargo/git \
|
||
-v ~/wzp-builder/data/cache/target:/build/source/target \
|
||
wzp-android-builder:latest \
|
||
bash -c "cd /build/source && cargo build --target aarch64-linux-android -p wzp-video 2>&1 | tail -60"'
|
||
```
|
||
|
||
**Estimated effort:** ~2 hours (one commit per error category).
|
||
|
||
---
|
||
|
||
## High
|
||
|
||
### H1 — AV1 call engine wiring missing
|
||
|
||
**Source:** HANDOFF-2026-05-12.md (T6.1.2 open item)
|
||
**File:** `crates/wzp-video/src/factory.rs`
|
||
|
||
`factory.rs` and step tables landed in commit `086d0a4`. No caller yet invokes `create_video_encoder(Av1Main, ...)`. The entire AV1 path is reachable only from tests. Video on macOS/Linux desktop requires wiring `create_video_encoder` into the call engine's media negotiation path.
|
||
|
||
**Estimated effort:** ~1–2 hours.
|
||
|
||
---
|
||
|
||
### H2 — `fec_block_id: u8` wraps every ~25 seconds
|
||
|
||
**File:** `crates/wzp-fec/src/encoder.rs` (`block_id.wrapping_add(1)` on u8)
|
||
**Reference:** PROTOCOL-AUDIT.md W2 (deferred P2)
|
||
|
||
At 5 frames/block (Codec2), u8 ID wraps at block 256 ≈ 25 seconds. A slow reconstructor or late-joining peer will collide block IDs with in-flight blocks. The window distance check in `block_manager.rs` partially mitigates this but can't prevent all collisions. Widen to `u16` in the next wire-format revision.
|
||
|
||
---
|
||
|
||
## Medium
|
||
|
||
### M1 — `SignalMessage` has no version byte
|
||
|
||
**File:** `crates/wzp-proto/src/session.rs` (SignalMessage enum)
|
||
**Reference:** PROTOCOL-AUDIT.md W12
|
||
|
||
`bincode + serde(default)` handles field additions but not variant removal or semantic changes. Any variant deprecation is silent at the wire level. This becomes a correctness risk when federation routes `SignalMessage`s across relay versions. Add `version: u8` as a leading field to all variants before federation ships.
|
||
|
||
---
|
||
|
||
### M2 — BWE not consumed by `AdaptiveQualityController`
|
||
|
||
**Reference:** PROTOCOL-AUDIT.md W6, deferred to Phase V2
|
||
|
||
Quinn exposes `cwnd` and `bytes_in_flight`, but `AdaptiveQualityController` does not consume them. Loss + RTT adaptation works for audio. For video, without bandwidth estimation the encoder cannot detect available uplink capacity and will either oscillate or permanently under-utilize bandwidth. Mandatory before video production.
|
||
|
||
---
|
||
|
||
### M3 — PLI suppression window hardcoded at 200ms
|
||
|
||
**File:** `crates/wzp-relay/src/room.rs:1060`
|
||
|
||
Not adaptive to link speed. On slow links 200ms may allow multiple keyframe requests. Accept for Phase 1; make configurable in Phase 2.
|
||
|
||
---
|
||
|
||
### M4 — Repair packet index wrapping in FEC encoder
|
||
|
||
**File:** `crates/wzp-fec/src/encoder.rs:140`
|
||
|
||
```rust
|
||
let idx = (num_source as u8).wrapping_add(i as u8);
|
||
```
|
||
|
||
If `num_source + repair_count > 255`, indices wrap silently. In practice bounded by `frames_per_block` (5–10), so max sum is ~20. Low risk today; widen to u16 when `fec_block_id` is widened (H2).
|
||
|
||
---
|
||
|
||
### M5 — `timestamp_ms` monotonicity after rekey not enforced
|
||
|
||
**Reference:** PROTOCOL-AUDIT.md W3
|
||
|
||
Spec: `timestamp_ms` must not reset on rekey. The code correctly does not reset it, but there is no assertion to prevent regression. Add a debug assert in `complete_rekey()` that `new_session.next_timestamp >= old_session.last_timestamp`.
|
||
|
||
---
|
||
|
||
## Low / Accepted Debt
|
||
|
||
| ID | Description | File | Accepted in |
|
||
|---|---|---|---|
|
||
| L1 | 9 pre-existing clippy lints in `wzp-codec` | `aec.rs`, `denoise.rs`, `opus_enc.rs`, `codec2_{enc,dec}.rs`, `resample.rs` | PROTOCOL-AUDIT.md |
|
||
| L2 | 3 clippy errors in `deps/featherchat` submodule | `ratchet.rs`, `types.rs` | PROTOCOL-AUDIT.md |
|
||
| L3 | Audio anti-replay window 64 packets | `wzp-crypto/src/session.rs:89` | Accepted — jitter buffer + PLC masks loss |
|
||
| L4 | Debug tap logs at INFO with no rate limiting | `wzp-relay/src/room.rs:46–59` | Safe in dev; add 1:100 sampling for prod |
|
||
|
||
---
|
||
|
||
## What Was Not Found
|
||
|
||
These are explicitly confirmed sound after code-level verification:
|
||
|
||
- **Anti-replay bitmap** — correct u32 wrapping, per-stream isolation, window sizing by `MediaType`
|
||
- **HKDF + X25519 + Ed25519 key agreement** — standard construction, no gaps
|
||
- **SAS code derivation** — SHA-256(shared_secret)[:4] as 4-digit voice verification code
|
||
- **Rekey forward secrecy** — `session_id` rotation on rekey isolates nonce namespaces; seq counter reset is intentional and safe
|
||
- **MiniHeader v2 `seq_delta`** — fully implemented at `wzp-proto/src/packet.rs:469–526` with tests; PROTOCOL-AUDIT resolution table is accurate
|
||
- **SFU E2E preservation** — relay ciphertext passthrough, no plaintext access
|
||
- **RaptorQ for Codec2** — correct tool for the bitrate regime
|
||
- **DRED continuous tuning** — better than discrete tiers; 15% loss floor is empirically grounded
|
||
- **Jitter buffer** — BTreeMap with wrapping-aware comparisons, EWMA adaptive playout delay, solid
|
||
- **Quinn QUIC datagram transport** — correct primitives for unreliable media
|
||
|
||
---
|
||
|
||
## Fix Priority Table
|
||
|
||
| # | Issue | Category | Effort | Blocks |
|
||
|---|---|---|---|---|
|
||
| 1 | C1: nonce → `MediaHeader.seq` | Crypto | 1h | All sessions on lossy paths |
|
||
| 2 | C2: verify AEAD on all datagram send paths | Crypto | 1h | E2E security claim |
|
||
| 3 | C3: wire `VideoScorer::observe()` into room | Relay | 2h | Relay abuse control for video |
|
||
| 4 | C4: NDK 0.9 `mediacodec.rs` migration (5 categories) | Android | 2h | Android video |
|
||
| 5 | H1: wire AV1 factory into call engine | Video | 2h | Desktop video |
|
||
| 6 | H2: widen `fec_block_id` to `u16` | FEC/Wire | 30min | Next protocol release |
|
||
| 7 | M1: `SignalMessage` version byte | Proto | 1h | Federation correctness |
|
||
| 8 | M2: BWE into `AdaptiveQualityController` | Transport | 2–3 days | Video production quality |
|
||
|
||
**Total for C1–H1 (items 1–5):** ~8 hours focused engineering.
|