Files
wz-phone/docs/AUDIT-2026-05-25.md
Siavash Sameni ed8a7ae5aa docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:00:17 +04:00

13 KiB
Raw Permalink Blame History

WarzonePhone Protocol Audit — 2026-05-25

Auditor: Claude Sonnet 4.6 (assisted) Branch: experimental-ui @ f3e3ee5 Scope: All workspace crates (wzp-proto, wzp-codec, wzp-fec, wzp-crypto, wzp-transport, wzp-relay, wzp-client, wzp-android, wzp-native, wzp-video) Test baseline: 702 passing (excludes wzp-android)


Executive Summary

The audio call path is functionally correct and cryptographically sound on clean network paths. There is a session-breaking bug in the crypto nonce derivation (C1) that will cause a permanent decryption failure on any out-of-order UDP delivery. This is the single highest-priority fix — it will manifest as periodic session crashes under normal internet conditions. Video has a solid architectural foundation but three hard blockers remain before shipping: the AEAD coverage gap (C2), dead video scorer (C3), and Android MediaCodec compile failure (C4).

The project is in good shape overall. The crypto design (X25519, HKDF, ChaCha20-Poly1305, Ed25519 identity, SAS verification) is sound. The SFU-never-decrypts architecture is rare and valuable. The codec adaptation (Opus DRED + Codec2 RaptorQ split) is genuinely innovative. The eight issues below are fixable in ~12 engineer-hours.


Critical

C1 — Nonce derives from recv_seq counter, not MediaHeader.seq

File: crates/wzp-crypto/src/session.rs:132 Severity: Critical — session-breaking on any packet reorder

// decrypt()
let nonce_bytes = nonce::build_nonce(&self.session_id, self.recv_seq, Direction::Send);
// ...
self.recv_seq = self.recv_seq.wrapping_add(1);  // line 148

recv_seq increments once per successful decrypt() call. The sender's send_seq also increments once per encrypt() call (line 120). In perfect in-order delivery they stay synchronized. With any reorder or mid-stream packet loss they permanently diverge. Once diverged, every subsequent packet uses the wrong nonce → AEAD tag mismatch → every packet fails for the rest of the session.

This isn't a low-probability edge case. UDP over any internet path reorders packets routinely. The multiple_packets_roundtrip test (line 254) only exercises in-order delivery. HANDOFF-2026-05-12.md acknowledges this as a known latent item: "AEAD nonce derivation: switch to MediaHeader::seq".

The anti-replay check at lines 152161 already parses MediaHeader and has header.seq available. The fix is one line in decrypt():

// Use sender's wire-level seq as nonce input, not a local counter.
// This survives reordering because both sides derive the same nonce from
// the same field. recv_seq was wrong: it diverged from send_seq on any
// reorder, breaking all subsequent decryptions for the session.
let header = parse_header(header_bytes)
    .ok_or_else(|| CryptoError::Internal("header parse failed".into()))?;
let nonce_bytes = nonce::build_nonce(&self.session_id, header.seq, Direction::Send);

Remove recv_seq field from ChaChaSession (it's now redundant — anti-replay uses header.seq directly). On the encrypt side, verify that self.send_seq equals the seq written into the MediaHeader at the call site.

Estimated effort: ~1 hour including test coverage for out-of-order delivery.

Note on rekey seq reset: The agent initially flagged send_seq/recv_seq = 0 in complete_rekey() as a separate critical issue. This is a false positive — install_key() rotates session_id (hash of new key), so pre-/post-rekey nonces live in distinct namespaces. The reset is intentional and cryptographically safe.


C2 — AEAD not wired to every QUIC datagram send path

File: crates/wzp-client/src/analyzer.rs:363 (only confirmed decrypt call site) Severity: Critical — potential plaintext media leakage

The HANDOFF document explicitly flags this: "Encryption is implemented in wzp-crypto but not yet on every QUIC datagram path." The analyzer.rs path decrypts inbound packets. What needs verification: every outbound send_datagram() / write_datagram() call across wzp-client and wzp-transport must pass through ChaChaSession::encrypt().

Required action: Grep every send_datagram call site. Confirm each path encrypts before transmit. Add a CI-level test or #[forbid(dead_code)]-style assertion that makes a plaintext send path impossible to merge. Until this is verified, the E2E security claim cannot be made.

Estimated effort: ~1 hour audit + test.


C3 — VideoScorer::observe() never called — scorer is dead code

File: crates/wzp-relay/src/room.rs:12631266 Severity: Critical — relay abuse control for video is completely absent

// T6.2-follow-up: feed video packets to VideoScorer here.
//     video_scorer.observe(&pkt.header, pkt.payload.len(), now, bwe_kbps);

video_scorer.rs was delivered in T6.2 with legitimacy scoring, keyframe regularity checks, I/P ratio analysis, and a verdict enum. The observe call was never wired into the packet forwarding loop. The scorer compiles but accumulates no data. Any participant can flood the room with malformed video or synthetic keyframe bursts and the relay will forward everything without challenge.

Fix: Wire video_scorer.observe(...) at the TODO marker and integrate legitimacy_score() into the forwarding decision (drop or rate-limit streams with Verdict::Malicious). Add an integration test: synthetic high-frequency keyframe bursts should trigger a Malicious verdict within 2 seconds.

Estimated effort: ~2 hours.


C4 — wzp-video Android target fails to compile (31 errors)

File: crates/wzp-video/src/mediacodec.rs Severity: Critical — Android video is completely blocked

Five error categories from the NDK 0.9 API migration, all documented in HANDOFF-2026-05-12.md. dav1d/svt-av1 were cfg-gated off Android in f3e3ee5; these 31 errors are the remaining MediaCodec API mismatch.

Error Count Root cause Fix
E0277 NonNull<AMediaCodec> not Send ~3 Raw pointer held across tokio::spawn boundary struct SendMediaCodec(NonNull<…>); unsafe impl Send for SendMediaCodec {} — or use ndk::media::MediaCodec owned type (already Send)
E0308 &[MaybeUninit<u8>] vs &[u8] many NDK 0.9 returns uninit slices MaybeUninit::write_slice or transmute pattern
E0425 missing BITRATE_MODE_CBR 1+ Constant renamed in NDK 0.9 Check ndk crate docs for current name
E0433 ndk_sys not a dep several Direct ndk_sys import; only ndk = "0.9" declared Add ndk-sys as explicit dep or use safe ndk wrappers
E0599 InputBuffer::index() / OutputBuffer::index() private 2 API changed in NDK 0.9 Use buffer through safe queue/dequeue API

Nothing live is blocked today — wzp-video is not yet consumed by Tauri Android. But video on Android cannot progress until this compiles.

Reproduce:

ssh -i ~/CascadeProjects/wzp manwe@manwehs \
  'cd ~/wzp-builder/data/source && \
   docker run --rm \
     -v ~/wzp-builder/data/source:/build/source \
     -v ~/wzp-builder/data/cache/cargo-registry:/home/builder/.cargo/registry \
     -v ~/wzp-builder/data/cache/cargo-git:/home/builder/.cargo/git \
     -v ~/wzp-builder/data/cache/target:/build/source/target \
     wzp-android-builder:latest \
     bash -c "cd /build/source && cargo build --target aarch64-linux-android -p wzp-video 2>&1 | tail -60"'

Estimated effort: ~2 hours (one commit per error category).


High

H1 — AV1 call engine wiring missing

Source: HANDOFF-2026-05-12.md (T6.1.2 open item) File: crates/wzp-video/src/factory.rs

factory.rs and step tables landed in commit 086d0a4. No caller yet invokes create_video_encoder(Av1Main, ...). The entire AV1 path is reachable only from tests. Video on macOS/Linux desktop requires wiring create_video_encoder into the call engine's media negotiation path.

Estimated effort: ~12 hours.


H2 — fec_block_id: u8 wraps every ~25 seconds

File: crates/wzp-fec/src/encoder.rs (block_id.wrapping_add(1) on u8) Reference: PROTOCOL-AUDIT.md W2 (deferred P2)

At 5 frames/block (Codec2), u8 ID wraps at block 256 ≈ 25 seconds. A slow reconstructor or late-joining peer will collide block IDs with in-flight blocks. The window distance check in block_manager.rs partially mitigates this but can't prevent all collisions. Widen to u16 in the next wire-format revision.


Medium

M1 — SignalMessage has no version byte

File: crates/wzp-proto/src/session.rs (SignalMessage enum) Reference: PROTOCOL-AUDIT.md W12

bincode + serde(default) handles field additions but not variant removal or semantic changes. Any variant deprecation is silent at the wire level. This becomes a correctness risk when federation routes SignalMessages across relay versions. Add version: u8 as a leading field to all variants before federation ships.


M2 — BWE not consumed by AdaptiveQualityController

Reference: PROTOCOL-AUDIT.md W6, deferred to Phase V2

Quinn exposes cwnd and bytes_in_flight, but AdaptiveQualityController does not consume them. Loss + RTT adaptation works for audio. For video, without bandwidth estimation the encoder cannot detect available uplink capacity and will either oscillate or permanently under-utilize bandwidth. Mandatory before video production.


M3 — PLI suppression window hardcoded at 200ms

File: crates/wzp-relay/src/room.rs:1060

Not adaptive to link speed. On slow links 200ms may allow multiple keyframe requests. Accept for Phase 1; make configurable in Phase 2.


M4 — Repair packet index wrapping in FEC encoder

File: crates/wzp-fec/src/encoder.rs:140

let idx = (num_source as u8).wrapping_add(i as u8);

If num_source + repair_count > 255, indices wrap silently. In practice bounded by frames_per_block (510), so max sum is ~20. Low risk today; widen to u16 when fec_block_id is widened (H2).


M5 — timestamp_ms monotonicity after rekey not enforced

Reference: PROTOCOL-AUDIT.md W3

Spec: timestamp_ms must not reset on rekey. The code correctly does not reset it, but there is no assertion to prevent regression. Add a debug assert in complete_rekey() that new_session.next_timestamp >= old_session.last_timestamp.


Low / Accepted Debt

ID Description File Accepted in
L1 9 pre-existing clippy lints in wzp-codec aec.rs, denoise.rs, opus_enc.rs, codec2_{enc,dec}.rs, resample.rs PROTOCOL-AUDIT.md
L2 3 clippy errors in deps/featherchat submodule ratchet.rs, types.rs PROTOCOL-AUDIT.md
L3 Audio anti-replay window 64 packets wzp-crypto/src/session.rs:89 Accepted — jitter buffer + PLC masks loss
L4 Debug tap logs at INFO with no rate limiting wzp-relay/src/room.rs:4659 Safe in dev; add 1:100 sampling for prod

What Was Not Found

These are explicitly confirmed sound after code-level verification:

  • Anti-replay bitmap — correct u32 wrapping, per-stream isolation, window sizing by MediaType
  • HKDF + X25519 + Ed25519 key agreement — standard construction, no gaps
  • SAS code derivation — SHA-256(shared_secret)[:4] as 4-digit voice verification code
  • Rekey forward secrecysession_id rotation on rekey isolates nonce namespaces; seq counter reset is intentional and safe
  • MiniHeader v2 seq_delta — fully implemented at wzp-proto/src/packet.rs:469526 with tests; PROTOCOL-AUDIT resolution table is accurate
  • SFU E2E preservation — relay ciphertext passthrough, no plaintext access
  • RaptorQ for Codec2 — correct tool for the bitrate regime
  • DRED continuous tuning — better than discrete tiers; 15% loss floor is empirically grounded
  • Jitter buffer — BTreeMap with wrapping-aware comparisons, EWMA adaptive playout delay, solid
  • Quinn QUIC datagram transport — correct primitives for unreliable media

Fix Priority Table

# Issue Category Effort Blocks
1 C1: nonce → MediaHeader.seq Crypto 1h All sessions on lossy paths
2 C2: verify AEAD on all datagram send paths Crypto 1h E2E security claim
3 C3: wire VideoScorer::observe() into room Relay 2h Relay abuse control for video
4 C4: NDK 0.9 mediacodec.rs migration (5 categories) Android 2h Android video
5 H1: wire AV1 factory into call engine Video 2h Desktop video
6 H2: widen fec_block_id to u16 FEC/Wire 30min Next protocol release
7 M1: SignalMessage version byte Proto 1h Federation correctness
8 M2: BWE into AdaptiveQualityController Transport 23 days Video production quality

Total for C1H1 (items 15): ~8 hours focused engineering.