docs: protocol audit 2026-05-25, update architecture + Obsidian vault
Audit: - docs/AUDIT-2026-05-25.md: full protocol audit covering 8 findings (4 critical, 2 high, 5 medium, 4 low) with code references and fix effort estimates - vault/Audit/Tasks.md: Obsidian Tasks plugin file tracking all audit items with priorities, due dates, and per-step checklists Architecture docs updated for Wire format v2 and Wave 5/6 features: - ARCHITECTURE.md: adds wzp-video to dependency graph and project structure; wire format updated to v2 (16B header, 5B MiniHeader); relay concurrency section corrected (DashMap+RwLock is current, not a future optimization); test count 571→702; Android note - PROGRESS.md: Wave 5 and Wave 6 sections appended; test count 372→702; current status and open blockers as of 2026-05-25 - ROAD-TO-VIDEO.md: implementation status table inserted (✅/🟡/🔴/🔲 per phase); 6-step critical path to first video call - WZP-SPEC.md: MediaHeader updated to v2 (16B byte-aligned); MiniHeader updated to 5B with seq_delta; codec IDs 9-12 added (H.264/H.265/AV1); version negotiation section added Obsidian vault (vault/): - 114 files across Architecture/, PRDs/, Reports/, Android/, Reference/, Audit/ with YAML frontmatter - 00 - Home.md index note with wiki links - .obsidian/app.json config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
89
vault/PRDs/PRD-mtu-discovery.md
Normal file
89
vault/PRDs/PRD-mtu-discovery.md
Normal file
@@ -0,0 +1,89 @@
|
||||
---
|
||||
tags: [prd, wzp]
|
||||
type: prd
|
||||
---
|
||||
|
||||
# PRD: QUIC Path MTU Discovery
|
||||
|
||||
## Problem
|
||||
|
||||
WarzonePhone uses conservative 1200-byte QUIC datagrams. Some network paths support larger MTUs (1400+), wasting bandwidth. Some broken paths (VPNs, tunnels, double-NAT, cellular) have MTU < 1200, causing silent packet drops — this may explain why Opus 64k fails on some paths while 24k works (larger encoded frames + FEC repair packets).
|
||||
|
||||
## Solution
|
||||
|
||||
Enable Quinn's built-in Path MTU Discovery (PMTUD) and handle edge cases:
|
||||
1. PMTUD probes larger packet sizes and discovers the actual path MTU
|
||||
2. Graceful fallback when datagrams exceed discovered MTU
|
||||
3. Expose MTU in metrics for debugging
|
||||
|
||||
## Implementation
|
||||
|
||||
### Phase 1: Enable PMTUD in Quinn
|
||||
|
||||
`crates/wzp-transport/src/config.rs` — update `transport_config()`:
|
||||
|
||||
```rust
|
||||
// Enable PMTUD (Quinn default is enabled, but we should ensure it)
|
||||
config.mtu_discovery_config(Some(quinn::MtuDiscoveryConfig::default()));
|
||||
|
||||
// Set minimum MTU for safety (some paths can't handle 1200)
|
||||
// Quinn default min is 1200, which is the QUIC spec minimum
|
||||
```
|
||||
|
||||
Quinn's `MtuDiscoveryConfig` has:
|
||||
- `interval`: how often to probe (default: 600s)
|
||||
- `upper_bound`: max MTU to probe (default: 1452 for IPv4)
|
||||
- `minimum_change`: min MTU increase to be worth probing (default: 20)
|
||||
|
||||
### Phase 2: Handle MTU-related Failures
|
||||
|
||||
In federation forwarding (`send_raw_datagram`), if the datagram exceeds the connection's current MTU, Quinn returns an error. Handle gracefully:
|
||||
- Log warning with packet size vs MTU
|
||||
- Drop the packet (don't crash)
|
||||
- Track in metrics: `wzp_relay_mtu_exceeded_total`
|
||||
|
||||
### Phase 3: Codec-Aware MTU
|
||||
|
||||
When the path MTU is small, the relay or client should:
|
||||
- Prefer lower-bitrate codecs (smaller packets)
|
||||
- Reduce FEC ratio (fewer repair packets)
|
||||
- This feeds into the adaptive quality system
|
||||
|
||||
### Phase 4: Expose MTU in Stats
|
||||
|
||||
- Add `path_mtu` to relay metrics (per peer)
|
||||
- Add `path_mtu` to client stats (visible in UI)
|
||||
- Log MTU on connection establishment
|
||||
|
||||
## Non-Goals (v1)
|
||||
|
||||
- Datagram fragmentation (QUIC datagrams are atomic — either fit or don't)
|
||||
- Manual MTU override per relay config
|
||||
- MTU-based codec selection (future, needs adaptive quality)
|
||||
|
||||
## Effort: 1 day
|
||||
|
||||
## Implementation Status (2026-04-12)
|
||||
|
||||
Phase 1 is now implemented:
|
||||
|
||||
### What was built
|
||||
|
||||
- **Transport config** (`crates/wzp-transport/src/config.rs`):
|
||||
- `MtuDiscoveryConfig` with `upper_bound=1452`, `interval=300s`, `black_hole_cooldown=30s`
|
||||
- `initial_mtu=1200` (safe QUIC minimum)
|
||||
- Quinn's PLPMTUD binary-searches from 1200 up to 1452 automatically
|
||||
|
||||
- **`QuinnPathSnapshot::current_mtu`** (`crates/wzp-transport/src/quic.rs`):
|
||||
- Reads `connection.max_datagram_size()` which reflects the PMTUD-discovered value
|
||||
- Available to all callers via `transport.quinn_path_stats()`
|
||||
|
||||
- **Trunk batcher MTU-aware** (`crates/wzp-relay/src/room.rs`):
|
||||
- `TrunkedForwarder::new()` initializes `max_bytes` from discovered MTU
|
||||
- `send()` refreshes `max_bytes` on every call (cheap atomic read in quinn)
|
||||
- Federation trunk frames grow automatically as PMTUD discovers larger paths
|
||||
|
||||
### Phases 2-3 status
|
||||
|
||||
- Phase 2 (handle MTU failures): Already handled — `send_media()`/`send_trunk()` check `max_datagram_size()` and return `DatagramTooLarge` errors. These are logged and the packet is dropped gracefully.
|
||||
- Phase 3 (codec-aware MTU): Not yet implemented. Future video frames will need application-layer fragmentation when they exceed the discovered MTU.
|
||||
Reference in New Issue
Block a user