docs: protocol audit 2026-05-25, update architecture + Obsidian vault

Audit:
- docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings
  (4 critical, 2 high, 5 medium, 4 low) with code references and fix
  effort estimates
- vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit
  items with priorities, due dates, and per-step checklists

Architecture docs updated for Wire format v2 and Wave 5/6 features:
- ARCHITECTURE.md: adds wzp-video to dependency graph and project
  structure; wire format updated to v2 (16B header, 5B MiniHeader);
  relay concurrency section corrected (DashMap+RwLock is current, not
  a future optimization); test count 571→702; Android note
- PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702;
  current status and open blockers as of 2026-05-25
- ROAD-TO-VIDEO.md: implementation status table inserted (/🟡/🔴/🔲
  per phase); 6-step critical path to first video call
- WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader
  updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1);
  version negotiation section added

Obsidian vault (vault/):
- 114 files across Architecture/, PRDs/, Reports/, Android/,
  Reference/, Audit/ with YAML frontmatter
- 00 - Home.md index note with wiki links
- .obsidian/app.json config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-05-25 06:00:17 +04:00
parent 12b0d9738f
commit ed8a7ae5aa
120 changed files with 22781 additions and 65 deletions

View File

@@ -0,0 +1,89 @@
---
tags: [prd, wzp]
type: prd
---
# PRD: QUIC Path MTU Discovery
## Problem
WarzonePhone uses conservative 1200-byte QUIC datagrams. Some network paths support larger MTUs (1400+), wasting bandwidth. Some broken paths (VPNs, tunnels, double-NAT, cellular) have MTU < 1200, causing silent packet drops — this may explain why Opus 64k fails on some paths while 24k works (larger encoded frames + FEC repair packets).
## Solution
Enable Quinn's built-in Path MTU Discovery (PMTUD) and handle edge cases:
1. PMTUD probes larger packet sizes and discovers the actual path MTU
2. Graceful fallback when datagrams exceed discovered MTU
3. Expose MTU in metrics for debugging
## Implementation
### Phase 1: Enable PMTUD in Quinn
`crates/wzp-transport/src/config.rs` — update `transport_config()`:
```rust
// Enable PMTUD (Quinn default is enabled, but we should ensure it)
config.mtu_discovery_config(Some(quinn::MtuDiscoveryConfig::default()));
// Set minimum MTU for safety (some paths can't handle 1200)
// Quinn default min is 1200, which is the QUIC spec minimum
```
Quinn's `MtuDiscoveryConfig` has:
- `interval`: how often to probe (default: 600s)
- `upper_bound`: max MTU to probe (default: 1452 for IPv4)
- `minimum_change`: min MTU increase to be worth probing (default: 20)
### Phase 2: Handle MTU-related Failures
In federation forwarding (`send_raw_datagram`), if the datagram exceeds the connection's current MTU, Quinn returns an error. Handle gracefully:
- Log warning with packet size vs MTU
- Drop the packet (don't crash)
- Track in metrics: `wzp_relay_mtu_exceeded_total`
### Phase 3: Codec-Aware MTU
When the path MTU is small, the relay or client should:
- Prefer lower-bitrate codecs (smaller packets)
- Reduce FEC ratio (fewer repair packets)
- This feeds into the adaptive quality system
### Phase 4: Expose MTU in Stats
- Add `path_mtu` to relay metrics (per peer)
- Add `path_mtu` to client stats (visible in UI)
- Log MTU on connection establishment
## Non-Goals (v1)
- Datagram fragmentation (QUIC datagrams are atomic — either fit or don't)
- Manual MTU override per relay config
- MTU-based codec selection (future, needs adaptive quality)
## Effort: 1 day
## Implementation Status (2026-04-12)
Phase 1 is now implemented:
### What was built
- **Transport config** (`crates/wzp-transport/src/config.rs`):
- `MtuDiscoveryConfig` with `upper_bound=1452`, `interval=300s`, `black_hole_cooldown=30s`
- `initial_mtu=1200` (safe QUIC minimum)
- Quinn's PLPMTUD binary-searches from 1200 up to 1452 automatically
- **`QuinnPathSnapshot::current_mtu`** (`crates/wzp-transport/src/quic.rs`):
- Reads `connection.max_datagram_size()` which reflects the PMTUD-discovered value
- Available to all callers via `transport.quinn_path_stats()`
- **Trunk batcher MTU-aware** (`crates/wzp-relay/src/room.rs`):
- `TrunkedForwarder::new()` initializes `max_bytes` from discovered MTU
- `send()` refreshes `max_bytes` on every call (cheap atomic read in quinn)
- Federation trunk frames grow automatically as PMTUD discovers larger paths
### Phases 2-3 status
- Phase 2 (handle MTU failures): Already handled — `send_media()`/`send_trunk()` check `max_datagram_size()` and return `DatagramTooLarge` errors. These are logged and the packet is dropped gracefully.
- Phase 3 (codec-aware MTU): Not yet implemented. Future video frames will need application-layer fragmentation when they exceed the discovered MTU.