Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
13 KiB
WarzonePhone Protocol Audit — 2026-05-25
Auditor: Claude Sonnet 4.6 (assisted)
Branch: experimental-ui @ f3e3ee5
Scope: All workspace crates (wzp-proto, wzp-codec, wzp-fec, wzp-crypto, wzp-transport, wzp-relay, wzp-client, wzp-android, wzp-native, wzp-video)
Test baseline: 702 passing (excludes wzp-android)
Executive Summary
The audio call path is functionally correct and cryptographically sound on clean network paths. There is a session-breaking bug in the crypto nonce derivation (C1) that will cause a permanent decryption failure on any out-of-order UDP delivery. This is the single highest-priority fix — it will manifest as periodic session crashes under normal internet conditions. Video has a solid architectural foundation but three hard blockers remain before shipping: the AEAD coverage gap (C2), dead video scorer (C3), and Android MediaCodec compile failure (C4).
The project is in good shape overall. The crypto design (X25519, HKDF, ChaCha20-Poly1305, Ed25519 identity, SAS verification) is sound. The SFU-never-decrypts architecture is rare and valuable. The codec adaptation (Opus DRED + Codec2 RaptorQ split) is genuinely innovative. The eight issues below are fixable in ~12 engineer-hours.
Critical
C1 — Nonce derives from recv_seq counter, not MediaHeader.seq
File: crates/wzp-crypto/src/session.rs:132
Severity: Critical — session-breaking on any packet reorder
// decrypt()
let nonce_bytes = nonce::build_nonce(&self.session_id, self.recv_seq, Direction::Send);
// ...
self.recv_seq = self.recv_seq.wrapping_add(1); // line 148
recv_seq increments once per successful decrypt() call. The sender's send_seq also increments once per encrypt() call (line 120). In perfect in-order delivery they stay synchronized. With any reorder or mid-stream packet loss they permanently diverge. Once diverged, every subsequent packet uses the wrong nonce → AEAD tag mismatch → every packet fails for the rest of the session.
This isn't a low-probability edge case. UDP over any internet path reorders packets routinely. The multiple_packets_roundtrip test (line 254) only exercises in-order delivery. HANDOFF-2026-05-12.md acknowledges this as a known latent item: "AEAD nonce derivation: switch to MediaHeader::seq".
The anti-replay check at lines 152–161 already parses MediaHeader and has header.seq available. The fix is one line in decrypt():
// Use sender's wire-level seq as nonce input, not a local counter.
// This survives reordering because both sides derive the same nonce from
// the same field. recv_seq was wrong: it diverged from send_seq on any
// reorder, breaking all subsequent decryptions for the session.
let header = parse_header(header_bytes)
.ok_or_else(|| CryptoError::Internal("header parse failed".into()))?;
let nonce_bytes = nonce::build_nonce(&self.session_id, header.seq, Direction::Send);
Remove recv_seq field from ChaChaSession (it's now redundant — anti-replay uses header.seq directly). On the encrypt side, verify that self.send_seq equals the seq written into the MediaHeader at the call site.
Estimated effort: ~1 hour including test coverage for out-of-order delivery.
Note on rekey seq reset: The agent initially flagged
send_seq/recv_seq = 0incomplete_rekey()as a separate critical issue. This is a false positive —install_key()rotatessession_id(hash of new key), so pre-/post-rekey nonces live in distinct namespaces. The reset is intentional and cryptographically safe.
C2 — AEAD not wired to every QUIC datagram send path
File: crates/wzp-client/src/analyzer.rs:363 (only confirmed decrypt call site)
Severity: Critical — potential plaintext media leakage
The HANDOFF document explicitly flags this: "Encryption is implemented in wzp-crypto but not yet on every QUIC datagram path." The analyzer.rs path decrypts inbound packets. What needs verification: every outbound send_datagram() / write_datagram() call across wzp-client and wzp-transport must pass through ChaChaSession::encrypt().
Required action: Grep every send_datagram call site. Confirm each path encrypts before transmit. Add a CI-level test or #[forbid(dead_code)]-style assertion that makes a plaintext send path impossible to merge. Until this is verified, the E2E security claim cannot be made.
Estimated effort: ~1 hour audit + test.
C3 — VideoScorer::observe() never called — scorer is dead code
File: crates/wzp-relay/src/room.rs:1263–1266
Severity: Critical — relay abuse control for video is completely absent
// T6.2-follow-up: feed video packets to VideoScorer here.
// video_scorer.observe(&pkt.header, pkt.payload.len(), now, bwe_kbps);
video_scorer.rs was delivered in T6.2 with legitimacy scoring, keyframe regularity checks, I/P ratio analysis, and a verdict enum. The observe call was never wired into the packet forwarding loop. The scorer compiles but accumulates no data. Any participant can flood the room with malformed video or synthetic keyframe bursts and the relay will forward everything without challenge.
Fix: Wire video_scorer.observe(...) at the TODO marker and integrate legitimacy_score() into the forwarding decision (drop or rate-limit streams with Verdict::Malicious). Add an integration test: synthetic high-frequency keyframe bursts should trigger a Malicious verdict within 2 seconds.
Estimated effort: ~2 hours.
C4 — wzp-video Android target fails to compile (31 errors)
File: crates/wzp-video/src/mediacodec.rs
Severity: Critical — Android video is completely blocked
Five error categories from the NDK 0.9 API migration, all documented in HANDOFF-2026-05-12.md. dav1d/svt-av1 were cfg-gated off Android in f3e3ee5; these 31 errors are the remaining MediaCodec API mismatch.
| Error | Count | Root cause | Fix |
|---|---|---|---|
E0277 NonNull<AMediaCodec> not Send |
~3 | Raw pointer held across tokio::spawn boundary |
struct SendMediaCodec(NonNull<…>); unsafe impl Send for SendMediaCodec {} — or use ndk::media::MediaCodec owned type (already Send) |
E0308 &[MaybeUninit<u8>] vs &[u8] |
many | NDK 0.9 returns uninit slices | MaybeUninit::write_slice or transmute pattern |
E0425 missing BITRATE_MODE_CBR |
1+ | Constant renamed in NDK 0.9 | Check ndk crate docs for current name |
E0433 ndk_sys not a dep |
several | Direct ndk_sys import; only ndk = "0.9" declared |
Add ndk-sys as explicit dep or use safe ndk wrappers |
E0599 InputBuffer::index() / OutputBuffer::index() private |
2 | API changed in NDK 0.9 | Use buffer through safe queue/dequeue API |
Nothing live is blocked today — wzp-video is not yet consumed by Tauri Android. But video on Android cannot progress until this compiles.
Reproduce:
ssh -i ~/CascadeProjects/wzp manwe@manwehs \
'cd ~/wzp-builder/data/source && \
docker run --rm \
-v ~/wzp-builder/data/source:/build/source \
-v ~/wzp-builder/data/cache/cargo-registry:/home/builder/.cargo/registry \
-v ~/wzp-builder/data/cache/cargo-git:/home/builder/.cargo/git \
-v ~/wzp-builder/data/cache/target:/build/source/target \
wzp-android-builder:latest \
bash -c "cd /build/source && cargo build --target aarch64-linux-android -p wzp-video 2>&1 | tail -60"'
Estimated effort: ~2 hours (one commit per error category).
High
H1 — AV1 call engine wiring missing
Source: HANDOFF-2026-05-12.md (T6.1.2 open item)
File: crates/wzp-video/src/factory.rs
factory.rs and step tables landed in commit 086d0a4. No caller yet invokes create_video_encoder(Av1Main, ...). The entire AV1 path is reachable only from tests. Video on macOS/Linux desktop requires wiring create_video_encoder into the call engine's media negotiation path.
Estimated effort: ~1–2 hours.
H2 — fec_block_id: u8 wraps every ~25 seconds
File: crates/wzp-fec/src/encoder.rs (block_id.wrapping_add(1) on u8)
Reference: PROTOCOL-AUDIT.md W2 (deferred P2)
At 5 frames/block (Codec2), u8 ID wraps at block 256 ≈ 25 seconds. A slow reconstructor or late-joining peer will collide block IDs with in-flight blocks. The window distance check in block_manager.rs partially mitigates this but can't prevent all collisions. Widen to u16 in the next wire-format revision.
Medium
M1 — SignalMessage has no version byte
File: crates/wzp-proto/src/session.rs (SignalMessage enum)
Reference: PROTOCOL-AUDIT.md W12
bincode + serde(default) handles field additions but not variant removal or semantic changes. Any variant deprecation is silent at the wire level. This becomes a correctness risk when federation routes SignalMessages across relay versions. Add version: u8 as a leading field to all variants before federation ships.
M2 — BWE not consumed by AdaptiveQualityController
Reference: PROTOCOL-AUDIT.md W6, deferred to Phase V2
Quinn exposes cwnd and bytes_in_flight, but AdaptiveQualityController does not consume them. Loss + RTT adaptation works for audio. For video, without bandwidth estimation the encoder cannot detect available uplink capacity and will either oscillate or permanently under-utilize bandwidth. Mandatory before video production.
M3 — PLI suppression window hardcoded at 200ms
File: crates/wzp-relay/src/room.rs:1060
Not adaptive to link speed. On slow links 200ms may allow multiple keyframe requests. Accept for Phase 1; make configurable in Phase 2.
M4 — Repair packet index wrapping in FEC encoder
File: crates/wzp-fec/src/encoder.rs:140
let idx = (num_source as u8).wrapping_add(i as u8);
If num_source + repair_count > 255, indices wrap silently. In practice bounded by frames_per_block (5–10), so max sum is ~20. Low risk today; widen to u16 when fec_block_id is widened (H2).
M5 — timestamp_ms monotonicity after rekey not enforced
Reference: PROTOCOL-AUDIT.md W3
Spec: timestamp_ms must not reset on rekey. The code correctly does not reset it, but there is no assertion to prevent regression. Add a debug assert in complete_rekey() that new_session.next_timestamp >= old_session.last_timestamp.
Low / Accepted Debt
| ID | Description | File | Accepted in |
|---|---|---|---|
| L1 | 9 pre-existing clippy lints in wzp-codec |
aec.rs, denoise.rs, opus_enc.rs, codec2_{enc,dec}.rs, resample.rs |
PROTOCOL-AUDIT.md |
| L2 | 3 clippy errors in deps/featherchat submodule |
ratchet.rs, types.rs |
PROTOCOL-AUDIT.md |
| L3 | Audio anti-replay window 64 packets | wzp-crypto/src/session.rs:89 |
Accepted — jitter buffer + PLC masks loss |
| L4 | Debug tap logs at INFO with no rate limiting | wzp-relay/src/room.rs:46–59 |
Safe in dev; add 1:100 sampling for prod |
What Was Not Found
These are explicitly confirmed sound after code-level verification:
- Anti-replay bitmap — correct u32 wrapping, per-stream isolation, window sizing by
MediaType - HKDF + X25519 + Ed25519 key agreement — standard construction, no gaps
- SAS code derivation — SHA-256(shared_secret)[:4] as 4-digit voice verification code
- Rekey forward secrecy —
session_idrotation on rekey isolates nonce namespaces; seq counter reset is intentional and safe - MiniHeader v2
seq_delta— fully implemented atwzp-proto/src/packet.rs:469–526with tests; PROTOCOL-AUDIT resolution table is accurate - SFU E2E preservation — relay ciphertext passthrough, no plaintext access
- RaptorQ for Codec2 — correct tool for the bitrate regime
- DRED continuous tuning — better than discrete tiers; 15% loss floor is empirically grounded
- Jitter buffer — BTreeMap with wrapping-aware comparisons, EWMA adaptive playout delay, solid
- Quinn QUIC datagram transport — correct primitives for unreliable media
Fix Priority Table
| # | Issue | Category | Effort | Blocks |
|---|---|---|---|---|
| 1 | C1: nonce → MediaHeader.seq |
Crypto | 1h | All sessions on lossy paths |
| 2 | C2: verify AEAD on all datagram send paths | Crypto | 1h | E2E security claim |
| 3 | C3: wire VideoScorer::observe() into room |
Relay | 2h | Relay abuse control for video |
| 4 | C4: NDK 0.9 mediacodec.rs migration (5 categories) |
Android | 2h | Android video |
| 5 | H1: wire AV1 factory into call engine | Video | 2h | Desktop video |
| 6 | H2: widen fec_block_id to u16 |
FEC/Wire | 30min | Next protocol release |
| 7 | M1: SignalMessage version byte |
Proto | 1h | Federation correctness |
| 8 | M2: BWE into AdaptiveQualityController |
Transport | 2–3 days | Video production quality |
Total for C1–H1 (items 1–5): ~8 hours focused engineering.