feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous DRED duration every ~500ms instead of discrete tier-locked values. Includes jitter-spike detection for pre-emptive Starlink-style boost. - Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports) - PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval - TrunkedForwarder respects discovered MTU (was hard-coded 1200) - QuinnPathSnapshot exposes quinn internal stats + discovered MTU - AudioEncoder trait: set_expected_loss() + set_dred_duration() methods - PathMonitor: sliding-window jitter variance for spike detection - Integrated into both Android and desktop send tasks in engine.rs - 14 new tests (10 tuner unit + 4 encoder integration) - Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -103,11 +103,13 @@ sequenceDiagram
|
||||
participant RNN as RNNoise<br/>(2 x 480)
|
||||
participant VAD as SilenceDetector
|
||||
participant Codec as Opus / Codec2
|
||||
participant DT as DredTuner<br/>(wzp-proto)
|
||||
participant FEC as RaptorQ FEC
|
||||
participant INT as Interleaver<br/>(depth=3)
|
||||
participant HDR as MediaHeader<br/>(12B or Mini 4B)
|
||||
participant Enc as ChaCha20-Poly1305
|
||||
participant QUIC as QUIC Datagram
|
||||
participant QPS as QuinnPathSnapshot
|
||||
|
||||
Mic->>Ring: f32 x 512 (macOS callback)
|
||||
Ring->>Ring: Accumulate to 960 samples
|
||||
@@ -118,10 +120,19 @@ sequenceDiagram
|
||||
else Silence (>100ms)
|
||||
VAD->>Codec: ComfortNoise (every 200ms)
|
||||
end
|
||||
Codec->>FEC: Compressed bytes (pad to 256B symbol)
|
||||
FEC->>FEC: Accumulate block (5-10 symbols)
|
||||
FEC->>INT: Source + repair symbols
|
||||
INT->>HDR: Interleaved packets
|
||||
|
||||
Note over QPS,DT: Every 25 frames (~500ms)
|
||||
QPS->>DT: loss_pct, rtt_ms, jitter_ms
|
||||
DT->>Codec: set_dred_duration() + set_expected_loss()
|
||||
|
||||
alt Opus tier (any bitrate)
|
||||
Codec->>HDR: Compressed bytes + DRED side-channel (no RaptorQ)
|
||||
else Codec2 tier
|
||||
Codec->>FEC: Compressed bytes (pad to 256B symbol)
|
||||
FEC->>FEC: Accumulate block (5-10 symbols)
|
||||
FEC->>INT: Source + repair symbols
|
||||
INT->>HDR: Interleaved packets
|
||||
end
|
||||
HDR->>Enc: Header as AAD
|
||||
Enc->>QUIC: Encrypted payload + 16B tag
|
||||
```
|
||||
@@ -134,6 +145,9 @@ sequenceDiagram
|
||||
- Silence detection uses VAD + 100ms hangover before switching to ComfortNoise
|
||||
- FEC symbols are padded to **256 bytes** with a 2-byte LE length prefix
|
||||
- MiniHeaders (4 bytes) replace full headers (12 bytes) for 49 of every 50 frames
|
||||
- DRED tuner polls quinn path stats every 25 frames (~500ms) and adjusts DRED lookback duration continuously
|
||||
- Opus tiers bypass RaptorQ entirely -- DRED handles loss recovery at the codec layer
|
||||
- Opus6k DRED window: 1040ms (maximum libopus allows)
|
||||
|
||||
## Audio Decode Pipeline
|
||||
|
||||
@@ -154,13 +168,30 @@ sequenceDiagram
|
||||
Dec->>AR: Decrypt (header = AAD)
|
||||
AR->>AR: Check seq window (reject replay)
|
||||
AR->>HDR: Verified packet
|
||||
HDR->>DEINT: MediaHeader + payload
|
||||
DEINT->>FEC: Reordered symbols by block
|
||||
FEC->>FEC: Attempt decode (need K of K+R)
|
||||
FEC->>JIT: Recovered audio frames
|
||||
|
||||
alt Opus packet
|
||||
HDR->>JIT: Direct to jitter buffer (no FEC/interleave)
|
||||
else Codec2 packet
|
||||
HDR->>DEINT: MediaHeader + payload
|
||||
DEINT->>FEC: Reordered symbols by block
|
||||
FEC->>FEC: Attempt decode (need K of K+R)
|
||||
FEC->>JIT: Recovered audio frames
|
||||
end
|
||||
|
||||
JIT->>JIT: BTreeMap ordered by seq
|
||||
JIT->>JIT: Wait until depth >= target
|
||||
JIT->>Codec: Pop lowest seq frame
|
||||
|
||||
alt Packet present
|
||||
JIT->>Codec: Pop lowest seq frame
|
||||
else Packet missing (Opus)
|
||||
JIT->>Codec: DRED reconstruction (neural)
|
||||
alt DRED fails or unavailable
|
||||
Codec->>Codec: Classical PLC fallback
|
||||
end
|
||||
else Packet missing (Codec2)
|
||||
Codec->>Codec: Classical PLC
|
||||
end
|
||||
|
||||
Codec->>Ring: PCM i16 x 960
|
||||
Ring->>SPK: Audio callback pulls samples
|
||||
```
|
||||
@@ -172,6 +203,8 @@ sequenceDiagram
|
||||
- Jitter buffer target: **10 packets (200ms)** for client, **50 packets (1s)** for relay
|
||||
- Desktop client uses **direct playout** (no jitter buffer) with lock-free ring
|
||||
- Codec2 frames at 8 kHz are resampled to 48 kHz transparently
|
||||
- DRED reconstruction: on packet loss, decoder tries neural DRED reconstruction before falling back to classical PLC
|
||||
- Jitter-spike detection pre-emptively boosts DRED to ceiling when jitter variance spikes >30%
|
||||
|
||||
## Relay SFU Forwarding
|
||||
|
||||
@@ -348,7 +381,7 @@ Used for 49 of every 50 frames (~1s cycle). Saves 8 bytes per packet (67% header
|
||||
[session_id: 2][len: u16][payload: len] x count
|
||||
```
|
||||
|
||||
Packs multiple session packets into one QUIC datagram. Maximum 10 entries or 1200 bytes, flushed every 5ms.
|
||||
Packs multiple session packets into one QUIC datagram. Maximum 10 entries or PMTUD-discovered MTU (starts at 1200, grows to ~1452 on Ethernet), flushed every 5ms.
|
||||
|
||||
### QualityReport (4 bytes, optional trailer)
|
||||
|
||||
@@ -361,6 +394,40 @@ Byte 3: bitrate_cap_kbps (0-255 kbps)
|
||||
|
||||
Appended to a media packet when the Q flag is set in the MediaHeader.
|
||||
|
||||
## Path MTU Discovery
|
||||
|
||||
Quinn's PLPMTUD is enabled with:
|
||||
- `initial_mtu`: 1200 bytes (QUIC minimum, always safe)
|
||||
- `upper_bound`: 1452 bytes (Ethernet minus IP/UDP/QUIC headers)
|
||||
- `interval`: 300s (re-probe every 5 minutes)
|
||||
- `black_hole_cooldown`: 30s (faster retry on lossy links)
|
||||
|
||||
The discovered MTU is exposed via `QuinnPathSnapshot::current_mtu` and used by:
|
||||
- `TrunkedForwarder`: refreshes `max_bytes` on every send to fill larger datagrams
|
||||
- Future video framer: larger MTU = fewer application-layer fragments per frame
|
||||
|
||||
## Continuous DRED Tuning
|
||||
|
||||
Instead of locking DRED duration to 3 discrete quality tiers, the `DredTuner` (in `wzp-proto::dred_tuner`) maps live path quality to a continuous DRED duration:
|
||||
|
||||
| Input | Source | Update Rate |
|
||||
|-------|--------|-------------|
|
||||
| Loss % | `QuinnPathSnapshot::loss_pct` (from quinn ACK frames) | Every 25 packets (~500ms) |
|
||||
| RTT ms | `QuinnPathSnapshot::rtt_ms` (quinn congestion controller) | Every 25 packets |
|
||||
| Jitter ms | `PathMonitor::jitter_ms` (EWMA of RTT variance) | Every 25 packets |
|
||||
|
||||
### Mapping Logic
|
||||
|
||||
- **Baseline**: codec-tier default (Studio=100ms, Good=200ms, Degraded=500ms)
|
||||
- **Ceiling**: codec-tier max (Studio=300ms, Good=500ms, Degraded=1040ms)
|
||||
- **Continuous**: linear interpolation between baseline and ceiling based on loss (0%->baseline, 40%->ceiling)
|
||||
- **RTT phantom loss**: high RTT (>200ms) adds phantom loss contribution to keep DRED generous
|
||||
- **Jitter spike**: >30% EWMA spike pre-emptively boosts to ceiling for ~5s cooldown
|
||||
|
||||
### Output
|
||||
|
||||
`DredTuning { dred_frames: u8, expected_loss_pct: u8 }` -> fed to `CallEncoder::apply_dred_tuning()` -> `OpusEncoder::set_dred_duration()` + `set_expected_loss()`
|
||||
|
||||
## Signal Message Handshake Flow
|
||||
|
||||
```mermaid
|
||||
|
||||
@@ -358,3 +358,31 @@ End-to-end testing, in order:
|
||||
- **OSCE enable**: opusic-c has an `osce` feature flag for Opus Speech Coding Enhancement, a separate libopus 1.5 neural post-processor. Out of scope for this PRD but should be the next audio-quality follow-up. Probably one-line enable once opusic-c is in.
|
||||
- **Upstream PR to opusic-c**: our own `dred_ffi.rs` wrapper should be proven in production first, then the fixes upstreamed to `opusic-c/src/dred.rs` (preserve `dred_end`, fix `dred_offset` double-pass, expose `DredPacket` externally). Follow-up task, not blocking this PRD.
|
||||
- **`feat/desktop-audio-rewrite` merge**: the vendored `audiopus_sys` patch on that branch becomes obsolete under this PRD. Coordinate removal with whoever owns that branch.
|
||||
|
||||
## Phase A: Continuous DRED Tuning (Implemented 2026-04-12)
|
||||
|
||||
Phase A extends the discrete tier-locked DRED durations from Phases 1-3 with continuous, network-driven tuning.
|
||||
|
||||
### What was built
|
||||
|
||||
- **`DredTuner`** (`crates/wzp-proto/src/dred_tuner.rs`): Maps `(loss_pct, rtt_ms, jitter_ms)` → `(dred_frames, expected_loss_pct)` continuously
|
||||
- **Quinn stats exposure** (`crates/wzp-transport/src/quic.rs`): `QuinnPathSnapshot` provides quinn's internal RTT, loss, congestion events — more accurate than sequence-gap heuristics
|
||||
- **Jitter variance window** (`crates/wzp-transport/src/path_monitor.rs`): 10-sample sliding window for RTT standard deviation, used for spike detection
|
||||
- **`AudioEncoder` trait extensions** (`crates/wzp-proto/src/traits.rs`): `set_expected_loss()` and `set_dred_duration()` with default no-op, overridden by `OpusEncoder` and `AdaptiveEncoder`
|
||||
- **Engine integration** (`desktop/src-tauri/src/engine.rs`): Both Android and desktop send tasks poll every 25 frames and apply tuning
|
||||
|
||||
### Opus6k DRED extended
|
||||
|
||||
`dred_duration_for(Opus6k)` changed from 50 (500ms) to 104 (1040ms) — the maximum libopus 1.5 supports. The RDO-VAE's quality-vs-offset curve makes this nearly free in bitrate terms while doubling burst resilience on the worst links.
|
||||
|
||||
### Jitter spike detection ("Sawtooth" prediction)
|
||||
|
||||
When instantaneous jitter exceeds the EWMA × 1.3 (asymmetric: fast-up α=0.3, slow-down α=0.05), the tuner enters spike-boost mode:
|
||||
- DRED immediately jumps to the codec tier's ceiling
|
||||
- Cooldown: 10 cycles (~5 seconds at 25 packets/cycle)
|
||||
- Designed for Starlink satellite handover sawtooth jitter pattern
|
||||
|
||||
### Test coverage
|
||||
|
||||
- 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op)
|
||||
- 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification)
|
||||
|
||||
@@ -57,3 +57,28 @@ When the path MTU is small, the relay or client should:
|
||||
- MTU-based codec selection (future, needs adaptive quality)
|
||||
|
||||
## Effort: 1 day
|
||||
|
||||
## Implementation Status (2026-04-12)
|
||||
|
||||
Phase 1 is now implemented:
|
||||
|
||||
### What was built
|
||||
|
||||
- **Transport config** (`crates/wzp-transport/src/config.rs`):
|
||||
- `MtuDiscoveryConfig` with `upper_bound=1452`, `interval=300s`, `black_hole_cooldown=30s`
|
||||
- `initial_mtu=1200` (safe QUIC minimum)
|
||||
- Quinn's PLPMTUD binary-searches from 1200 up to 1452 automatically
|
||||
|
||||
- **`QuinnPathSnapshot::current_mtu`** (`crates/wzp-transport/src/quic.rs`):
|
||||
- Reads `connection.max_datagram_size()` which reflects the PMTUD-discovered value
|
||||
- Available to all callers via `transport.quinn_path_stats()`
|
||||
|
||||
- **Trunk batcher MTU-aware** (`crates/wzp-relay/src/room.rs`):
|
||||
- `TrunkedForwarder::new()` initializes `max_bytes` from discovered MTU
|
||||
- `send()` refreshes `max_bytes` on every call (cheap atomic read in quinn)
|
||||
- Federation trunk frames grow automatically as PMTUD discovers larger paths
|
||||
|
||||
### Phases 2-3 status
|
||||
|
||||
- Phase 2 (handle MTU failures): Already handled — `send_media()`/`send_trunk()` check `max_datagram_size()` and return `DatagramTooLarge` errors. These are logged and the packet is dropped gracefully.
|
||||
- Phase 3 (codec-aware MTU): Not yet implemented. Future video frames will need application-layer fragmentation when they exceed the discovered MTU.
|
||||
|
||||
@@ -120,7 +120,7 @@
|
||||
|
||||
- **Web audio drift**: The browser AudioWorklet playback buffer caps at 200ms, but clock drift between the WebSocket message arrival rate and the AudioContext output rate can cause occasional underruns or accumulation. The cap prevents unbounded growth but may cause glitches.
|
||||
|
||||
- **No adaptive loop integration**: The `PathMonitor` feeds and `AdaptiveQualityController` are implemented but not wired together in the client's main loop. Quality reports are consumed when present in packets, but the client does not currently generate periodic quality reports from transport metrics.
|
||||
- **No adaptive loop integration (partially resolved)**: PathMonitor and DredTuner are wired into the send loop — DRED duration adapts continuously. Full AdaptiveQualityController integration (codec tier switching from transport metrics) remains TODO.
|
||||
|
||||
- **Relay FEC pass-through**: In room mode, the relay forwards packets opaquely without FEC decode/re-encode. This means FEC protection is end-to-end only, not per-hop. In forward mode, the relay pipeline does perform FEC decode/re-encode.
|
||||
|
||||
@@ -128,18 +128,18 @@
|
||||
|
||||
## Test Coverage
|
||||
|
||||
119 tests across 7 crates (wzp-web has no Rust tests):
|
||||
307+ tests across 7 crates (wzp-web has no Rust tests):
|
||||
|
||||
| Crate | Test Files | Test Count |
|
||||
|-------|-----------|------------|
|
||||
| wzp-proto | 5 | 27 |
|
||||
| wzp-codec | 3 | 24 |
|
||||
| wzp-fec | 5 | 21 |
|
||||
| wzp-crypto | 5 | 21 |
|
||||
| wzp-transport | 3 | 12 |
|
||||
| wzp-relay | 4 | 10 |
|
||||
| wzp-client | 3 | 8 |
|
||||
| **Total** | **28** | **119** |
|
||||
| Crate | Test Count |
|
||||
|-------|------------|
|
||||
| wzp-proto | ~79 |
|
||||
| wzp-codec | ~69 |
|
||||
| wzp-fec | ~21 |
|
||||
| wzp-crypto | ~21 |
|
||||
| wzp-transport | ~11 |
|
||||
| wzp-relay | ~50 |
|
||||
| wzp-client | ~57 |
|
||||
| **Total** | **307+** |
|
||||
|
||||
Tests cover:
|
||||
- Wire format roundtrip (header, quality report, full packet)
|
||||
@@ -214,3 +214,28 @@ Run with `wzp-bench --all`. Representative results (Apple M-series, single core)
|
||||
- `build-tauri-android.sh --arch arm64|armv7|all`
|
||||
- Separate per-arch APKs (~25MB each vs ~50MB universal)
|
||||
- Release APKs signed with `wzp-release.jks` via `apksigner`
|
||||
|
||||
### Continuous DRED Tuning (Phase A: opus-DRED-v2)
|
||||
- `DredTuner` in `wzp-proto::dred_tuner` maps live network metrics to continuous DRED duration
|
||||
- Polls quinn path stats every 25 frames (~500ms): loss%, RTT, jitter
|
||||
- Linear interpolation between baseline and ceiling per codec tier (not discrete tier jumps)
|
||||
- Jitter-spike detection: >30% EWMA spike pre-emptively boosts DRED to ceiling for ~5s
|
||||
- RTT phantom loss: high RTT (>200ms) adds phantom contribution to keep DRED generous
|
||||
- `set_expected_loss()` and `set_dred_duration()` added to `AudioEncoder` trait
|
||||
- Integrated into both Android and desktop send tasks in engine.rs
|
||||
|
||||
### Extended DRED Window
|
||||
- Opus6k DRED duration increased from 500ms to 1040ms (max libopus 1.5 supports)
|
||||
- RDO-VAE naturally degrades quality at longer offsets — extra window costs ~1-2 kbps
|
||||
|
||||
### PMTUD (Path MTU Discovery)
|
||||
- Quinn's PLPMTUD explicitly configured: initial 1200, upper bound 1452, 300s interval
|
||||
- `QuinnPathSnapshot` exposes discovered MTU via `current_mtu` field
|
||||
- `TrunkedForwarder` refreshes `max_bytes` from PMTUD (was hard-coded 1200)
|
||||
- Federation trunk frames now fill the discovered path MTU automatically
|
||||
|
||||
### New Tests
|
||||
- 4 DRED tuner integration tests in wzp-client (encoder adjustment, spike boost, Codec2 no-op, profile switch)
|
||||
- 10 unit tests in wzp-proto for DredTuner mapping logic
|
||||
- Jitter variance window tests in wzp-transport PathMonitor
|
||||
- Pre-existing test fixes: added missing `build_version` fields to 7 SignalMessage constructors
|
||||
|
||||
Reference in New Issue
Block a user