feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window

- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous
  DRED duration every ~500ms instead of discrete tier-locked values.
  Includes jitter-spike detection for pre-emptive Starlink-style boost.
- Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports)
- PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval
- TrunkedForwarder respects discovered MTU (was hard-coded 1200)
- QuinnPathSnapshot exposes quinn internal stats + discovered MTU
- AudioEncoder trait: set_expected_loss() + set_dred_duration() methods
- PathMonitor: sliding-window jitter variance for spike detection
- Integrated into both Android and desktop send tasks in engine.rs
- 14 new tests (10 tuner unit + 4 encoder integration)
- Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Siavash Sameni
2026-04-12 19:38:37 +04:00
parent 24cc74d93c
commit 766c9df442
19 changed files with 829 additions and 32 deletions

View File

@@ -103,11 +103,13 @@ sequenceDiagram
participant RNN as RNNoise<br/>(2 x 480)
participant VAD as SilenceDetector
participant Codec as Opus / Codec2
participant DT as DredTuner<br/>(wzp-proto)
participant FEC as RaptorQ FEC
participant INT as Interleaver<br/>(depth=3)
participant HDR as MediaHeader<br/>(12B or Mini 4B)
participant Enc as ChaCha20-Poly1305
participant QUIC as QUIC Datagram
participant QPS as QuinnPathSnapshot
Mic->>Ring: f32 x 512 (macOS callback)
Ring->>Ring: Accumulate to 960 samples
@@ -118,10 +120,19 @@ sequenceDiagram
else Silence (>100ms)
VAD->>Codec: ComfortNoise (every 200ms)
end
Codec->>FEC: Compressed bytes (pad to 256B symbol)
FEC->>FEC: Accumulate block (5-10 symbols)
FEC->>INT: Source + repair symbols
INT->>HDR: Interleaved packets
Note over QPS,DT: Every 25 frames (~500ms)
QPS->>DT: loss_pct, rtt_ms, jitter_ms
DT->>Codec: set_dred_duration() + set_expected_loss()
alt Opus tier (any bitrate)
Codec->>HDR: Compressed bytes + DRED side-channel (no RaptorQ)
else Codec2 tier
Codec->>FEC: Compressed bytes (pad to 256B symbol)
FEC->>FEC: Accumulate block (5-10 symbols)
FEC->>INT: Source + repair symbols
INT->>HDR: Interleaved packets
end
HDR->>Enc: Header as AAD
Enc->>QUIC: Encrypted payload + 16B tag
```
@@ -134,6 +145,9 @@ sequenceDiagram
- Silence detection uses VAD + 100ms hangover before switching to ComfortNoise
- FEC symbols are padded to **256 bytes** with a 2-byte LE length prefix
- MiniHeaders (4 bytes) replace full headers (12 bytes) for 49 of every 50 frames
- DRED tuner polls quinn path stats every 25 frames (~500ms) and adjusts DRED lookback duration continuously
- Opus tiers bypass RaptorQ entirely -- DRED handles loss recovery at the codec layer
- Opus6k DRED window: 1040ms (maximum libopus allows)
## Audio Decode Pipeline
@@ -154,13 +168,30 @@ sequenceDiagram
Dec->>AR: Decrypt (header = AAD)
AR->>AR: Check seq window (reject replay)
AR->>HDR: Verified packet
HDR->>DEINT: MediaHeader + payload
DEINT->>FEC: Reordered symbols by block
FEC->>FEC: Attempt decode (need K of K+R)
FEC->>JIT: Recovered audio frames
alt Opus packet
HDR->>JIT: Direct to jitter buffer (no FEC/interleave)
else Codec2 packet
HDR->>DEINT: MediaHeader + payload
DEINT->>FEC: Reordered symbols by block
FEC->>FEC: Attempt decode (need K of K+R)
FEC->>JIT: Recovered audio frames
end
JIT->>JIT: BTreeMap ordered by seq
JIT->>JIT: Wait until depth >= target
JIT->>Codec: Pop lowest seq frame
alt Packet present
JIT->>Codec: Pop lowest seq frame
else Packet missing (Opus)
JIT->>Codec: DRED reconstruction (neural)
alt DRED fails or unavailable
Codec->>Codec: Classical PLC fallback
end
else Packet missing (Codec2)
Codec->>Codec: Classical PLC
end
Codec->>Ring: PCM i16 x 960
Ring->>SPK: Audio callback pulls samples
```
@@ -172,6 +203,8 @@ sequenceDiagram
- Jitter buffer target: **10 packets (200ms)** for client, **50 packets (1s)** for relay
- Desktop client uses **direct playout** (no jitter buffer) with lock-free ring
- Codec2 frames at 8 kHz are resampled to 48 kHz transparently
- DRED reconstruction: on packet loss, decoder tries neural DRED reconstruction before falling back to classical PLC
- Jitter-spike detection pre-emptively boosts DRED to ceiling when jitter variance spikes >30%
## Relay SFU Forwarding
@@ -348,7 +381,7 @@ Used for 49 of every 50 frames (~1s cycle). Saves 8 bytes per packet (67% header
[session_id: 2][len: u16][payload: len] x count
```
Packs multiple session packets into one QUIC datagram. Maximum 10 entries or 1200 bytes, flushed every 5ms.
Packs multiple session packets into one QUIC datagram. Maximum 10 entries or PMTUD-discovered MTU (starts at 1200, grows to ~1452 on Ethernet), flushed every 5ms.
### QualityReport (4 bytes, optional trailer)
@@ -361,6 +394,40 @@ Byte 3: bitrate_cap_kbps (0-255 kbps)
Appended to a media packet when the Q flag is set in the MediaHeader.
## Path MTU Discovery
Quinn's PLPMTUD is enabled with:
- `initial_mtu`: 1200 bytes (QUIC minimum, always safe)
- `upper_bound`: 1452 bytes (Ethernet minus IP/UDP/QUIC headers)
- `interval`: 300s (re-probe every 5 minutes)
- `black_hole_cooldown`: 30s (faster retry on lossy links)
The discovered MTU is exposed via `QuinnPathSnapshot::current_mtu` and used by:
- `TrunkedForwarder`: refreshes `max_bytes` on every send to fill larger datagrams
- Future video framer: larger MTU = fewer application-layer fragments per frame
## Continuous DRED Tuning
Instead of locking DRED duration to 3 discrete quality tiers, the `DredTuner` (in `wzp-proto::dred_tuner`) maps live path quality to a continuous DRED duration:
| Input | Source | Update Rate |
|-------|--------|-------------|
| Loss % | `QuinnPathSnapshot::loss_pct` (from quinn ACK frames) | Every 25 packets (~500ms) |
| RTT ms | `QuinnPathSnapshot::rtt_ms` (quinn congestion controller) | Every 25 packets |
| Jitter ms | `PathMonitor::jitter_ms` (EWMA of RTT variance) | Every 25 packets |
### Mapping Logic
- **Baseline**: codec-tier default (Studio=100ms, Good=200ms, Degraded=500ms)
- **Ceiling**: codec-tier max (Studio=300ms, Good=500ms, Degraded=1040ms)
- **Continuous**: linear interpolation between baseline and ceiling based on loss (0%->baseline, 40%->ceiling)
- **RTT phantom loss**: high RTT (>200ms) adds phantom loss contribution to keep DRED generous
- **Jitter spike**: >30% EWMA spike pre-emptively boosts to ceiling for ~5s cooldown
### Output
`DredTuning { dred_frames: u8, expected_loss_pct: u8 }` -> fed to `CallEncoder::apply_dred_tuning()` -> `OpusEncoder::set_dred_duration()` + `set_expected_loss()`
## Signal Message Handshake Flow
```mermaid