Files
wz-phone/docs/PRD-mtu-discovery.md
Siavash Sameni 766c9df442 feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous
  DRED duration every ~500ms instead of discrete tier-locked values.
  Includes jitter-spike detection for pre-emptive Starlink-style boost.
- Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports)
- PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval
- TrunkedForwarder respects discovered MTU (was hard-coded 1200)
- QuinnPathSnapshot exposes quinn internal stats + discovered MTU
- AudioEncoder trait: set_expected_loss() + set_dred_duration() methods
- PathMonitor: sliding-window jitter variance for spike detection
- Integrated into both Android and desktop send tasks in engine.rs
- 14 new tests (10 tuner unit + 4 encoder integration)
- Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:38:37 +04:00

3.3 KiB

PRD: QUIC Path MTU Discovery

Problem

WarzonePhone uses conservative 1200-byte QUIC datagrams. Some network paths support larger MTUs (1400+), wasting bandwidth. Some broken paths (VPNs, tunnels, double-NAT, cellular) have MTU < 1200, causing silent packet drops — this may explain why Opus 64k fails on some paths while 24k works (larger encoded frames + FEC repair packets).

Solution

Enable Quinn's built-in Path MTU Discovery (PMTUD) and handle edge cases:

  1. PMTUD probes larger packet sizes and discovers the actual path MTU
  2. Graceful fallback when datagrams exceed discovered MTU
  3. Expose MTU in metrics for debugging

Implementation

Phase 1: Enable PMTUD in Quinn

crates/wzp-transport/src/config.rs — update transport_config():

// Enable PMTUD (Quinn default is enabled, but we should ensure it)
config.mtu_discovery_config(Some(quinn::MtuDiscoveryConfig::default()));

// Set minimum MTU for safety (some paths can't handle 1200)
// Quinn default min is 1200, which is the QUIC spec minimum

Quinn's MtuDiscoveryConfig has:

  • interval: how often to probe (default: 600s)
  • upper_bound: max MTU to probe (default: 1452 for IPv4)
  • minimum_change: min MTU increase to be worth probing (default: 20)

In federation forwarding (send_raw_datagram), if the datagram exceeds the connection's current MTU, Quinn returns an error. Handle gracefully:

  • Log warning with packet size vs MTU
  • Drop the packet (don't crash)
  • Track in metrics: wzp_relay_mtu_exceeded_total

Phase 3: Codec-Aware MTU

When the path MTU is small, the relay or client should:

  • Prefer lower-bitrate codecs (smaller packets)
  • Reduce FEC ratio (fewer repair packets)
  • This feeds into the adaptive quality system

Phase 4: Expose MTU in Stats

  • Add path_mtu to relay metrics (per peer)
  • Add path_mtu to client stats (visible in UI)
  • Log MTU on connection establishment

Non-Goals (v1)

  • Datagram fragmentation (QUIC datagrams are atomic — either fit or don't)
  • Manual MTU override per relay config
  • MTU-based codec selection (future, needs adaptive quality)

Effort: 1 day

Implementation Status (2026-04-12)

Phase 1 is now implemented:

What was built

  • Transport config (crates/wzp-transport/src/config.rs):

    • MtuDiscoveryConfig with upper_bound=1452, interval=300s, black_hole_cooldown=30s
    • initial_mtu=1200 (safe QUIC minimum)
    • Quinn's PLPMTUD binary-searches from 1200 up to 1452 automatically
  • QuinnPathSnapshot::current_mtu (crates/wzp-transport/src/quic.rs):

    • Reads connection.max_datagram_size() which reflects the PMTUD-discovered value
    • Available to all callers via transport.quinn_path_stats()
  • Trunk batcher MTU-aware (crates/wzp-relay/src/room.rs):

    • TrunkedForwarder::new() initializes max_bytes from discovered MTU
    • send() refreshes max_bytes on every call (cheap atomic read in quinn)
    • Federation trunk frames grow automatically as PMTUD discovers larger paths

Phases 2-3 status

  • Phase 2 (handle MTU failures): Already handled — send_media()/send_trunk() check max_datagram_size() and return DatagramTooLarge errors. These are logged and the packet is dropped gracefully.
  • Phase 3 (codec-aware MTU): Not yet implemented. Future video frames will need application-layer fragmentation when they exceed the discovered MTU.