- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous DRED duration every ~500ms instead of discrete tier-locked values. Includes jitter-spike detection for pre-emptive Starlink-style boost. - Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports) - PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval - TrunkedForwarder respects discovered MTU (was hard-coded 1200) - QuinnPathSnapshot exposes quinn internal stats + discovered MTU - AudioEncoder trait: set_expected_loss() + set_dred_duration() methods - PathMonitor: sliding-window jitter variance for spike detection - Integrated into both Android and desktop send tasks in engine.rs - 14 new tests (10 tuner unit + 4 encoder integration) - Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
42 KiB
PRD: DRED Integration & Opus-Tier FEC Simplification
Problem
WarzonePhone's audio loss-recovery stack is built around classical Opus + application-level RaptorQ FEC. It was the right answer when WZP was designed, but libopus 1.5 (December 2023) introduced Deep REDundancy (DRED) — a neural speech-recovery feature that is strictly better than classical FEC for the loss patterns VoIP calls actually experience. We are paying real latency, bitrate, and complexity costs for protection that DRED now does better and cheaper.
Concretely, on every Opus call today we pay:
- ~40–100 ms of receiver-side latency waiting for RaptorQ block completion before decode
- 10–20% bitrate overhead from RaptorQ repair symbols (more on studio profiles)
- ~20–40% codec-internal overhead from Opus inband FEC (LBRR)
- Classical Opus PLC on loss bursts exceeding the RaptorQ block size — which sounds robotic and gap-ridden
…in exchange for bit-exact recovery of isolated single-frame losses, which is perceptually indistinguishable from classical Opus PLC for 20 ms of speech. The protection is misaligned with the failure modes.
DRED delivers:
- Zero added receive latency — reconstruction runs only on detected loss
- ~1 kbps flat bitrate overhead regardless of base bitrate
- Plausible reconstruction of bursts up to ~1 second — DRED's headline capability, exactly the regime RaptorQ can't touch
- Neural PLC that sounds like continuous speech, not a gap
We also have a second, unrelated problem blocking adoption: our FFI crate audiopus_sys 0.2.2 vendors libopus 1.3, predating DRED entirely. We cannot enable DRED without first swapping the FFI layer. The naïve choice (opus crate from SpaceManiac) is a trap — it depends on the same dead audiopus_sys. The real target is opusic-c 1.5.5 by DoumanAsh, which vendors libopus 1.5.2 with full DRED support and documents Android NDK cross-compile.
This PRD covers the FFI swap, DRED enablement, the decision to remove RaptorQ and Opus inband FEC from the Opus tiers entirely (keeping RaptorQ only for Codec2 where DRED is N/A), and the jitter buffer refactor that the DRED lookahead/backfill pattern requires.
Goals
- Replace
audiopus 0.3.0-rc.0+audiopus_sys 0.2.2(dead upstream, libopus 1.3) withopusic-c 1.5.5+opusic-sys 0.6.0(active upstream, libopus 1.5.2) - Enable DRED on every Opus profile with a tiered duration policy, lower at studio bitrates and higher at degraded bitrates
- Disable Opus inband FEC (LBRR) on all Opus profiles — opusic-c's own docs recommend this, and it overlaps DRED's job
- Remove
wzp-fec(RaptorQ) from the Opus tiers entirely — the latency and bitrate savings are real, and DRED strictly dominates it on speech - Keep RaptorQ + current FEC ratios on the Codec2 tiers unchanged — DRED is libopus-only, Codec2 has no neural equivalent
- Refactor
wzp-transport::jitterto a lookahead/backfill pattern that lets DRED reconstruct loss windows when the next packet arrives, instead of the current "wait for block completion or fall through to classical PLC" policy - Ship behind a runtime escape hatch (
AUDIO_USE_LEGACY_FEC) for the first rollout window so we can revert to RaptorQ if DRED has surprises in real-world conditions
Non-goals
- Changing Codec2 at all. Codec2 1200 / 3200 are outside the DRED lineage and keep their current RaptorQ protection, block sizes, and PLC path.
- Adding new Opus bitrate tiers or changing the quality adaptation thresholds. This PRD is about the protection layer, not the bitrate ladder.
- Enabling OSCE (Opus Speech Coding Enhancement — a separate libopus 1.5 neural post-processor that opusic-c exposes via an
oscefeature flag). Valuable, complementary, and free once opusic-c is in — but out of scope here to keep the PRD focused. Track as follow-up. - Video, audio-over-MoQ, or any protocol-layer changes discussed in prior conversations.
- Touching the wzp-web / browser client. Browser Opus is a separate codepath via WebAudio / WASM libopus and is not affected by the native FFI swap.
Background
How the three protection mechanisms actually differ
| Opus inband FEC (LBRR) | RaptorQ (wzp-fec) | DRED | |
|---|---|---|---|
| Layer | codec-internal | application, across Opus packets | codec-internal |
| What it sends | low-bitrate copy of the previous frame, embedded in every packet | fountain-code repair symbols across a block | neural-coded history of the recent past |
| Protection horizon | 1 packet back | block duration (currently 100 ms, proposed 40 ms) | configurable, 0–1040 ms |
| Recovery granularity | 1 frame (lower quality) | 1 frame (bit-exact) | 10 ms frames (plausible reconstruction) |
| Latency cost | 0 ms | block duration on receive | 0 ms |
| Bitrate cost | ~20–40% of base | fec_ratio × base (currently +20% GOOD, +50% DEGRADED) |
~1 kbps flat |
| Effective loss tolerance | ~single-packet losses | up to (repair symbols / block) losses, cliff beyond |
bursts up to the configured duration |
| Content assumption | any Opus audio | any | speech (DRED model is speech-trained) |
Why DRED dominates on the Opus tiers
Loss-scenario walkthrough (verified against opusic-c and libopus 1.5 docs):
- 1-frame loss (20 ms): RaptorQ recovers bit-exactly, DRED wouldn't run (classical Opus PLC is perceptually indistinguishable for single 20 ms frames). RaptorQ "wins" on paper but not on ears.
- 2–3 frame burst (40–60 ms): RaptorQ at current ratio 0.2 hits its tolerance cliff. DRED handles this trivially — well within a 200 ms window.
- 5–10 frame burst (100–200 ms): RaptorQ completely overwhelmed at any reasonable ratio. DRED's sweet spot.
- 10+ frame burst (>200 ms): RaptorQ useless. DRED at 500–1000 ms still recovers.
The only scenario where RaptorQ strictly beats DRED is bit-exact recovery of isolated single-frame losses — which is perceptually irrelevant for speech. In every other scenario DRED either ties or wins.
Why Codec2 keeps RaptorQ
DRED lives inside libopus — it does not help Codec2 at all. Codec2's classical PLC is a parametric-vocoder interpolation that produces noticeably robotic artifacts on loss. On the Codec2 tiers, RaptorQ is the only protection we have, and it should stay at current ratios (1.0 on CATASTROPHIC, 0.5 on the Codec2 3200 tier).
The opusic-c / opusic-sys situation
opusic-sys 0.6.0— FFI crate, published 2026-03-17, vendors libopus 1.5.2 via itsbundledfeature (on by default), documents Android NDK cross-compile viaANDROID_NDK_HOME(which ourwzp-android/build.rsalready sets). Exposes raw bindings toopus_dred_parse,opus_decoder_dred_decode, and theOpusDREDstate struct.opusic-c 1.5.5— high-level safe wrapper. Its encoder side is fine: exposesEncoder::set_dred_duration(value: u8) -> Result<(), ErrorCode>with range0..=104(each unit is 10 ms, so 0–1040 ms configurable). Also exposesset_bitrate,set_inband_fec,set_dtx,set_packet_loss,set_signal,set_complexity,set_bandwidth,set_applicationon the encoder.- opusic-c's decoder-side DRED wrapper is NOT sufficient for our architecture. Confirmed by reading the source of
opusic-c/src/dred.rs:Dred::decode_toignores thedred_endoutput ofopus_dred_parse(prefixed_dred_end), so the caller cannot know how much DRED history a given packet actually carried.- In
opus_decoder_dred_decode(decoder, dred, dred_offset, pcm, frame_size), the wrapper passesframe_sizeto BOTH thedred_offsetandframe_sizearguments. This looks like a bug — it means reconstruction always starts at offsetframe_sizeinto the DRED window, not at an arbitrary caller-chosen offset. Arbitrary-gap reconstruction (which we need for the lookahead/backfill pattern) requires proper offset control. DredPacketis owned internally by aDredinstance; its internal buffer is overwritten on everydecode_tocall. We cannot hold a ring of parsed DredPackets from multiple recent arrivals — which is exactly what the lookahead/backfill jitter buffer pattern requires.
- Decision: use opusic-c for the encoder path (its wrapper is correct and saves work), and drop to
opusic-sysraw FFI for the entire decoder path AND the DRED reconstruction path. Both use a single sharedDecoderHandleso internal decoder state stays consistent. Verified at pre-flight:opusic_c::Decoder.innerispub(crate), so there is no way to reach the raw*mut OpusDecoderfrom outside opusic-c. Running two parallel decoders (one from opusic-c for audio, one from opusic-sys for DRED) would cause state drift because the DRED-only decoder wouldn't see the normal decode calls. Single unified decoder via opusic-sys is the only correct architecture. - Three FFI handles required per decode session:
opusic_c::Encoder(encoder side, unchanged), our ownDecoderHandlewrapping*mut OpusDecoderfrom opusic-sys (for normal decode AND for theOpusDecoderpointer passed toopus_decoder_dred_decode), and a newDredDecoderHandlewrapping*mut OpusDREDDecoderfrom opusic-sys (passed toopus_dred_parse). Note:OpusDREDDecoderis a separate struct fromOpusDecoderin libopus 1.5 — verified from opus.h. Allocation viaopus_dred_decoder_create()(confirm exact symbol name at Phase 3a start). - The
opuscrate from SpaceManiac (0.3.1, published 2026-01-03) is a trap: it depends onaudiopus_sys ^0.2.0— the same dead FFI crate we're trying to get away from. Do not use. - Follow-up (out of scope for this PRD): upstream the fixes to
opusic-c/src/dred.rs(preservedred_end, fix thedred_offsetdouble-pass, exposeDredPacketexternally). Worth a GitHub PR once our own implementation has proven correct. Would let us eventually delete our internal FFI wrapper.
Critical note from opusic-c docs
From the dred module documentation: "The documentation recommends disabling in-band FEC and using Application::Voip for optimal results." This applies to the codec-internal Opus inband FEC (LBRR), not our application-level RaptorQ. The two are independent layers. This PRD disables both on Opus tiers, but for different reasons — inband FEC per upstream recommendation, RaptorQ per the analysis above.
The libopus 1.5 loss-percentage gating quirk
In libopus 1.5, both inband FEC and DRED are gated on OPUS_SET_PACKET_LOSS_PERC being non-zero. If the encoder thinks loss is 0%, it will not emit DRED data even when set_dred_duration is configured. We must plumb a meaningful loss percentage into the encoder continuously, floored at a small non-zero value so DRED stays active even when the network is perfect. Planned floor: 5%, overridden upward by the real QualityReport loss value when it exceeds the floor.
Solution
High-level architecture change
Before (per Opus frame encode path):
PCM → AdaptiveEncoder.encode (Opus)
→ inband FEC embedded in packet
→ wzp-fec FEC encoder (accumulate into block, generate repair symbols)
→ DATAGRAM out
Before (per Opus frame decode path):
DATAGRAM in → wzp-fec block assembly (wait for block, recover if possible)
→ AdaptiveDecoder.decode (Opus) / decode_lost (classical PLC)
→ PCM
After (Opus tiers):
PCM → OpusEncoder.encode (opusic-c, DRED enabled via set_dred_duration, inband FEC off)
→ DATAGRAM out directly (no RaptorQ block)
DATAGRAM in → jitter buffer (lookahead/backfill)
→ on frame arrival: OpusDecoder.decode
→ on detected gap: if next packet has DRED state → dred::Dred.reconstruct(gap)
else → OpusDecoder.decode_lost (classical PLC)
→ PCM
After (Codec2 tiers): unchanged. RaptorQ block encoding + classical Codec2 decode path stay exactly as they are today.
New per-profile protection matrix
| Profile | Codec | Inband FEC | RaptorQ ratio | DRED duration | Total overhead |
|---|---|---|---|---|---|
STUDIO_64K |
Opus 64k | off | none | 10 frames (100 ms) | +1 kbps |
STUDIO_48K |
Opus 48k | off | none | 10 frames (100 ms) | +1 kbps |
STUDIO_32K |
Opus 32k | off | none | 10 frames (100 ms) | +1 kbps |
GOOD |
Opus 24k | off | none | 20 frames (200 ms) | +1 kbps |
NORMAL_16K |
Opus 16k | off | none | 20 frames (200 ms) | +1 kbps |
DEGRADED |
Opus 6k | off | none | 50 frames (500 ms) | +1 kbps |
CODEC2_3200 |
Codec2 3200 | N/A | 0.5 (unchanged) | N/A | +50% |
CATASTROPHIC |
Codec2 1200 | N/A | 1.0 (unchanged) | N/A | +100% |
COMFORT_NOISE |
CN | — | — | — | — |
DRED duration rationale:
- Studio tiers (100 ms): loss is rare on the networks where users pick studio quality. Short DRED window keeps decode-side CPU modest. Still covers multi-frame bursts that classical PLC can't touch.
- Normal tiers (200 ms): balanced baseline. Handles the common VoIP loss pattern (20–150 ms bursts from wifi roam, transient congestion).
- Degraded tier (500 ms): users on Opus 6k are by definition on a bad link. Long DRED window buys maximum burst resilience where it matters most. Still well under the 1040 ms cap.
Runtime escape hatch
Ship with a single environment variable / settings flag: AUDIO_USE_LEGACY_FEC. When set, the entire Opus-tier path reverts to the pre-PRD behavior: RaptorQ re-enabled at the old ratios, Opus inband FEC re-enabled, DRED disabled (set_dred_duration(0)). This is the rollback safety valve for the first production window.
Escape hatch semantics:
- Read once at
CallEncoder::new/CallDecoder::newtime. Call-scoped, not re-read mid-call. - Exposed via Android Settings UI as a hidden "Legacy FEC (debug)" toggle, and as a CLI flag
--legacy-fecon the desktop client. - Logged in
DebugReporterso we can tell which mode a call was in when diagnosing. - Removed entirely after 2 months of stable production with no regressions reported. Removal is a follow-up PR, not part of this PRD's scope.
Detailed design
Phase 0 — FFI crate swap (prerequisite, no behavior change)
Files touched:
Cargo.toml(workspace root) — replaceaudiopus = "0.3.0-rc.0"withopusic-c = { version = "1.5.5", features = ["bundled", "dred"] }andopusic-sys = { version = "0.6.0", features = ["bundled"] }. Theopusic-sysdirect dep is for the DRED decoder path below.crates/wzp-codec/Cargo.toml— updateaudiopus = { workspace = true }toopusic-c = { workspace = true }, addopusic-sys = { workspace = true }, addbytemuck = "1"for the i16↔u16 slice cast.crates/wzp-codec/src/opus_enc.rs— rewrite against opusic-c. API mapping:audiopus::coder::Encoder::new(SampleRate::Hz48000, Channels::Mono, Application::Voip)→opusic_c::Encoder::new(Channels::Mono, SampleRate::Hz48000, Application::Voip)(argument order swapped)set_bitrate(Bitrate::BitsPerSecond(bps))→set_bitrate(Bitrate::Bits(bps))or equivalent variant — verify at implementation timeset_inband_fec(true/false)→set_inband_fec(InbandFec::On/Off)(now an enum)set_packet_loss_perc(u8)→set_packet_loss(u8)(method renamed)set_dtx(bool),set_signal(Signal::Voice),set_complexity(u8)— names matchencode(&[i16], &mut [u8])→encode_to_slice(&[u16], &mut [u8])withbytemuck::cast_slice::<i16, u16>(pcm)at the call site
crates/wzp-codec/src/opus_dec.rs— same-style rewrite for theDecoderpath. Note that opusic-c's decoder methods takedecode_fec: boolas a parameter directly (not a separate ctl).vendor/audiopus_sys/— delete the directory (only exists onfeat/desktop-audio-rewrite, not onandroid-rewrite, so this is a no-op on the current branch but do remove the[patch.crates-io]block from Cargo.toml when merging back).
Acceptance criteria:
cargo check --workspacepasses on Linux x86_64, macOS, and Android NDK cross-compile.- All existing codec unit tests in
crates/wzp-codec/src/adaptive.rspass unchanged. DRED is still disabled at this phase (defaultset_dred_duration(0)), so behavior is equivalent to pre-swap libopus 1.3 for call quality purposes. - A short real-call smoke test produces audio identical to current behavior (no audible regression).
opusic_c::version()at startup logs libopus version containing1.5.2— hard signal that the swap landed correctly.
Phase 1 — DRED encoder enable on all Opus profiles
Files touched:
crates/wzp-codec/src/opus_enc.rs:- Add
fn dred_duration_for(codec: CodecId) -> u8returning the per-profile value from the matrix above (10 / 20 / 50 frames). - In
OpusEncoder::new, after the existingset_bitrate/set_signal/set_complexityblock: callinner.set_inband_fec(InbandFec::Off), theninner.set_dred_duration(dred_duration_for(profile.codec)), theninner.set_packet_loss(5)as the default floor. - Add
pub fn set_dred_duration(&mut self, frames: u8)to allow the adaptive ladder to update DRED duration on profile switch. - In the existing
set_profileimpl, callset_dred_duration(dred_duration_for(profile.codec))afterapply_bitrate.
- Add
crates/wzp-codec/src/adaptive.rs:AdaptiveEncoder::set_profilealready delegates toself.opus.set_profile— no changes needed. DRED update rides along.
crates/wzp-client/src/call.rs(and equivalent onwzp-android/src/pipeline.rs):- In the
QualityReporthandler (wherever we currently callset_expected_loss/set_packet_loss_perc), also ensure the loss value is floored at 5% before passing to the Opus encoder. This is a 1-line change.
- In the
Acceptance criteria:
- Encoder produces DRED-enabled Opus packets. Verifiable via libopus's reference decoder in debug mode, or by wire capture + inspection — a DRED-bearing Opus packet has a larger
opus_packet_get_nb_framesfootprint than a non-DRED one of the same nominal bitrate. - Total outgoing bitrate on Opus 24k is ~25 kbps (up from ~24 kbps) — confirms ~1 kbps DRED overhead.
- On a lossless path, decoder output is audibly identical to Phase 0.
- Escape hatch
AUDIO_USE_LEGACY_FEC=1cleanly reverts the DRED enable (callsset_dred_duration(0)andset_inband_fec(InbandFec::On)instead).
Phase 2 — RaptorQ removal on Opus tiers
Files touched:
crates/wzp-client/src/call.rs:- In
CallEncoder::encode_frame(or whereverwzp_fec::Encoder::add_source_symbolis called), gate the RaptorQ path on!profile.codec.is_opus()— Opus frames go straight to DATAGRAM emit, Codec2 frames continue through RaptorQ. - When a profile switch crosses the Opus↔Codec2 boundary, flush/reset the RaptorQ encoder state.
- In
crates/wzp-android/src/pipeline.rs:- Mirror the same gate in the Android encode path.
crates/wzp-proto/src/packet.rs:MediaHeader.fec_blockandfec_symbolare still valid fields on the wire. For Opus packets we emitfec_block = 0,fec_symbol = 0,fec_ratio_encoded = 0. No wire format change; the receiver just sees all-zeros in the FEC fields for Opus packets and skips the FEC decoder path.- Bump protocol version to v1 → v2? No — the change is semantically backward compatible because existing RaptorQ decoders handle a zero ratio correctly (ratio 0.0 means "no repair symbols expected"). Old receivers can still decode new Opus packets; they just won't see any DRED benefit because their libopus is old. This is a property we want: the opposite (new receiver, old sender) is the more common mixed-version case during rollout and also Just Works.
crates/wzp-client/src/call.rs—CallDecoder:- Symmetric change: Opus frames bypass the RaptorQ block assembly, go straight to the decoder. Only Codec2 frames (
codec_id.is_codec2()) feed throughwzp-fecblock decoding.
- Symmetric change: Opus frames bypass the RaptorQ block assembly, go straight to the decoder. Only Codec2 frames (
Acceptance criteria:
- Outgoing Opus packets have
fec_ratio_encoded == 0(verifiable with the existing wire capture tooling inwzp-client/src/echo_test.rs). - On a clean network, receiver latency (measured as encode-to-playout one-way delay) drops by ~40 ms versus Phase 1. This is the primary win and should be directly measurable with the existing telemetry.
- Codec2 calls show no latency change and no packet-format change. Regression-test Codec2 3200 and Codec2 1200 specifically.
- Total outgoing bitrate on Opus 24k drops from ~28.8 kbps (24k base + 0.2 RaptorQ ratio) to ~25 kbps (24k base + ~1 kbps DRED). Direct savings observable in network telemetry.
Phase 3 — DRED reconstruction wrapper + jitter buffer lookahead/backfill refactor
This phase is larger than originally estimated because opusic-c's decoder-side DRED wrapper is unusable for our architecture (see Background). We write our own safe wrapper over opusic-sys raw FFI first, then plumb it through the jitter buffer.
Step 3a — Safe DRED reconstruction wrapper in wzp-codec:
New file crates/wzp-codec/src/dred_ffi.rs. Wraps the raw libopus 1.5 DRED API:
pub struct DredState— owns anOpusDREDbuffer (allocated viaopusic_sys::opus_dred_allocor equivalent; size is fixed at 10,592 bytes per libopus 1.5).Cloneis intentionally NOT implemented — the state is heap-owned and non-trivial to copy.pub fn parse_from_packet(&mut self, decoder: &opusic_c::Decoder, packet: &[u8], max_dred_samples: i32) -> Result<DredParseResult, DredError>— wrapsopus_dred_parse, preserves thedred_endoutput (number of samples of history the packet carried), returns it inDredParseResult { samples_available: i32, frames_available: u8 }.pub fn reconstruct_into(&self, decoder: &mut opusic_c::Decoder, dred_offset_samples: i32, output: &mut [i16]) -> Result<usize, DredError>— wrapsopus_decoder_dred_decode, takes the offset explicitly, decodesoutput.len()samples starting from that offset in the DRED window.- All
unsafecontained here, strict bounds checking on offsets, Rust-level panic safety. Unit tests use a reference encoder + known-good reference decoder to verify that reconstruction at specific offsets produces expected output. - Depends on
opusic-sysdirectly and onopusic-c::Decoderfor the decoder handle. The Decoder handle must be reachable as a raw pointer; opusic-c exposes this via an unstable internal or we wrap the pointer ourselves. Verify at implementation time — if opusic-c doesn't expose the raw decoder pointer safely, we create our own thin Decoder wrapper indred_ffi.rsusing raw opusic-sys, losing the convenience of opusic-c's decoder but keeping its encoder. This is the smaller-risk fallback.
New pub trait DredReconstructor in wzp-codec/src/lib.rs:
pub trait DredReconstructor: Send {
/// Parse DRED state from an arriving Opus packet into `state`.
/// Returns number of 48 kHz samples of history available, or 0 if the packet has no DRED.
fn parse(&mut self, state: &mut DredState, packet: &[u8]) -> Result<i32, DredError>;
/// Reconstruct `output.len()` samples from `state`, starting at the given
/// sample offset (measured from the end of the DRED window going backward).
fn reconstruct(&mut self, state: &DredState, offset_samples: i32, output: &mut [i16]) -> Result<usize, DredError>;
}
Implement DredReconstructor over the dred_ffi::DredState + opusic-c Decoder combination. This is the clean boundary the jitter buffer will talk to.
Step 3b — Jitter buffer refactor in crates/wzp-transport/src/jitter.rs:
- Current behavior: buffer waits a fixed number of frames of jitter before emitting; on a missing slot, after a timeout it gives up and signals the decoder to run
decode_lost()(classical Opus PLC or Codec2 PLC). - New behavior on Opus tiers: when a frame arrives (in-order or late), first call
DredReconstructor::parseon it to update a rolling ring ofDredStateinstances tagged with their originating sequence number. When a gap is detected (missing sequence number between last-emitted and current arrival), and the ring contains aDredStatefrom a nearby packet that covers the gap's sample offset, callDredReconstructor::reconstructwith the correct offset to synthesize the missing frames, splice them into playout, then continue normal decode. - If no DRED state covers the gap (e.g., gap too far back, or every nearby packet was dropped), fall through to classical PLC exactly as today. The classical path stays intact as the ultimate fallback.
- Codec2 packets bypass the entire DRED ring. They are not inspected for DRED state and take the unchanged classical PLC path.
- Ring sizing:
max_dred_duration_frames+jitter_depth_framesworth ofDredStateinstances. At 500 ms DRED on degraded tier + 60 ms jitter depth, that's ~28 DredState instances × 10,592 bytes ≈ 300 KB. Acceptable. On studio tier with 100 ms DRED it's only ~80 KB. - The jitter buffer takes a
Box<dyn DredReconstructor>at construction, passed in by the call engine.wzp-transportdoes NOT take a direct dep onopusic-coropusic-sys— it only knows about the trait defined inwzp-codec.
Files touched:
crates/wzp-codec/src/dred_ffi.rs(new, ~150–300 lines)crates/wzp-codec/src/lib.rs— exposeDredReconstructor,DredState,DredErrortypescrates/wzp-codec/Cargo.toml— addopusic-sys = { workspace = true }as a direct dep (already done in Phase 0)crates/wzp-transport/src/jitter.rs— lookahead/backfill refactor, DRED ringcrates/wzp-transport/Cargo.toml— addwzp-codec = { workspace = true }(likely already present) for the trait importcrates/wzp-client/src/call.rs— construct aDredReconstructorand pass intoCallDecoder's jitter buffercrates/wzp-android/src/pipeline.rs— same on Android
Acceptance criteria:
- Unit tests in
dred_ffi.rs: round-trip a known speech waveform through an encoder with DRED enabled, parse the resulting packets, reconstruct at several different offsets, verify the reconstructed samples are within an energy/spectral threshold of the original. (Not bit-exact — DRED reconstruction is lossy by design.) - Synthetic loss test on the full pipeline: inject 200 ms bursts at 10% rate into a looped call, verify the DRED reconstruction rate on receiver telemetry is ≥95% of all loss events whose gaps fall within the configured DRED duration window.
- Reconstructed audio is audibly continuous on 40–200 ms bursts — no gaps, no classical-PLC robot artifact. Verified on real voice samples (not just sine tones), and on at least two distinct speaker profiles (male, female) because DRED can have voice-dependent quality.
- End-to-end latency metric is unchanged versus Phase 2 (no regression from adding the lookahead path). The DRED ring insertion on packet arrival must be O(1) in practice.
- Existing
echo_test.rsanddrift_test.rspass with the new jitter buffer. - Codec2 path uses classical PLC exclusively (no DRED invocation) because Codec2 packets don't carry DRED state. Verify by injecting loss on a Codec2 call and confirming zero DRED reconstruction telemetry events during that call.
wzp-transporthas no direct dependency onopusic-sysoropusic-cin itsCargo.tomlafter the refactor — only onwzp-codec. Verify by grepping the Cargo.toml file.
Phase 4 — Telemetry and tooling updates
Files touched:
crates/wzp-proto/src/packet.rs—QualityReportor equivalent telemetry message gainsdred_reconstructions: u32as a new counter (frames reconstructed via DRED this reporting window) andclassical_plc_invocations: u32(frames filled by Opus/Codec2 classical PLC). These are separate counters because they're different recovery mechanisms.crates/wzp-relay/src/*— relay telemetry pipeline surfaces both counters in Prometheus metrics:wzp_dred_reconstructions_total{call_id},wzp_classical_plc_total{call_id}.docs/grafana-dashboard.json— new panel: "Loss recovery breakdown" stacked bar, DRED vs classical PLC vs clean decode, per call.android/app/src/main/java/com/wzp/debug/DebugReporter.kt— surfacesdredReconstructionsandclassicalPlccounts in the debug report; also logs active DRED duration and whether legacy-FEC mode is engaged.
Acceptance criteria:
- Grafana dashboard shows a clear visual distinction between DRED-recovered and classical-PLC-recovered frames across a test fleet of calls.
- Debug report includes the active protection mode ("DRED 200 ms" / "Legacy RaptorQ") and reconstruction counts, so incidents can be classified unambiguously.
Phase 5 — Escape hatch removal (follow-up, ~2 months post-ship)
After 2 months of stable production with no rollbacks triggered:
- Delete
AUDIO_USE_LEGACY_FEChandling inopus_enc.rs/call.rs/pipeline.rs - Delete the Opus-tier paths of
wzp-fec(the crate stays for Codec2) - Delete the Android settings toggle and desktop CLI flag
- Remove the
--legacy-fecpath from smoke tests
Critical files to modify (summary)
Cargo.toml(workspace) — dep swap (audiopus → opusic-c + opusic-sys)crates/wzp-codec/Cargo.toml— dep swap +bytemuckfor slice castcrates/wzp-codec/src/opus_enc.rs— opusic-c rewrite + DRED enable + inband FEC offcrates/wzp-codec/src/opus_dec.rs— opusic-c rewritecrates/wzp-codec/src/dred_ffi.rs— new file, safe wrapper over opusic-sys raw DRED FFIcrates/wzp-codec/src/lib.rs— exposeDredReconstructortrait,DredState,DredErrorcrates/wzp-codec/src/adaptive.rs— verify profile switch carries DRED durationcrates/wzp-client/src/call.rs— Opus/Codec2 gate on RaptorQ path, loss floor, wire DredReconstructor into CallDecodercrates/wzp-android/src/pipeline.rs— same gate, same loss floor, wire DredReconstructorcrates/wzp-transport/src/jitter.rs— lookahead/backfill refactor, DRED ring, reconstruction dispatchcrates/wzp-transport/Cargo.toml— verify it depends only onwzp-codec, not directly on opusic-*crates/wzp-proto/src/packet.rs— new telemetry counterscrates/wzp-relay/— Prometheus metric exposureandroid/app/src/main/java/com/wzp/debug/DebugReporter.kt— debug outputdocs/grafana-dashboard.json— loss-recovery panel- (delete)
vendor/audiopus_sys/onfeat/desktop-audio-rewritewhen merging back
Existing utilities to reuse
wzp_codec::resample::Downsampler48to8/Upsampler8to48— unchanged, only Codec2 path uses themwzp_codec::adaptive::AdaptiveEncoder/AdaptiveDecoder— existing profile-switching machinery, DRED duration changes ride alongwzp_codec::silence::SilenceDetector/ComfortNoise— unchangedwzp_codec::agc::AutoGainControl— unchanged, runs before encode as todaywzp_fec::RaptorQFecEncoder/ decoder — unchanged, still used for Codec2 tierswzp_client::call::QualityAdapter— unchanged; drives profile switching, which now also reconfigures DRED duration via the existingset_profilepath
Verification
End-to-end testing, in order:
- Unit:
cargo test -p wzp-codec— Opus encode/decode round-trip at every profile, DRED enabled. Verifyversion()reports libopus 1.5.2. - Unit:
cargo test -p wzp-transport— jitter buffer lookahead/backfill behavior with injected loss patterns (0%, 5%, 15%, 30%, 50% loss; isolated losses, 40 ms bursts, 200 ms bursts, 500 ms bursts). - Integration:
crates/wzp-client/src/echo_test.rs— existing echo test must pass on all Opus profiles with <5% perceived quality regression (measure via the time-window analysis already built intoecho_test.rs). - Integration:
crates/wzp-client/src/drift_test.rs— latency measurement. Must show ~40 ms reduction on Opus profiles versus pre-PRD baseline. Codec2 profiles unchanged. - Manual: Android release build, real call over bad wifi (or a shaped network via
tc netemon Linux). Burst losses of 200 ms should be perceptually continuous speech, not robotic gaps. - Manual: Same call with
AUDIO_USE_LEGACY_FEC=1— verify behavior reverts to current production behavior. This is the pre-ship rollback rehearsal. - Cross-compile: full build matrix — Android arm64-v8a + armeabi-v7a (via
scripts/build-and-notify.sh), macOS universal, Linux x86_64 (viascripts/build-linux-docker.sh). Windows cross-compile via cargo-xwin should also pass — libopus 1.5 upstream fixed the clang-cl SIMD issue that required the vendor patch onfeat/desktop-audio-rewrite. - Telemetry smoke: deploy to staging relay, make 10 test calls, verify Grafana's new "Loss recovery breakdown" panel shows DRED reconstruction events firing on injected loss and classical-PLC on packet-loss beyond DRED's window.
Risks and mitigations
- Custom DRED FFI wrapper is WZP-maintained code with no second source. opusic-c's decoder-side DRED wrapper is insufficient (see Background), so we carry our own
dred_ffi.rsthat callsopus_dred_parseandopus_decoder_dred_decodedirectly via opusic-sys. Bugs in this wrapper — offset arithmetic off-by-ones, lifetime errors onOpusDREDbuffers, UB from misuse of the C API — could manifest as silent audio corruption on loss bursts, hard to diagnose. Mitigation: extensive unit tests indred_ffi.rsusing a reference encoder + reference decoder round-trip with known offsets; strict bounds checking on everyunsafeboundary; Miri run in CI if feasible; the legacy-FEC escape hatch disables the entire DRED code path including our custom wrapper, giving us a single flag to revert any wrapper bug in production. Long-term: upstream the fixes to opusic-c (follow-up task, not blocking). - opusic-c's encoder-side API and internal Decoder pointer access. Step 3a depends on being able to call opusic-sys raw functions that take an
*mut OpusDecoderpointer while still using opusic-c'sDecoderfor normal decode. If opusic-c doesn't expose the raw pointer cleanly, we fall back to a thin opusic-sys-direct Decoder wrapper insidedred_ffi.rsand lose some of opusic-c's convenience. Mitigation: verify at the start of Phase 3 (one afternoon of reading opusic-c source). If the clean path doesn't work, the fallback is not difficult — it's what we'd have built anyway if opusic-c didn't exist. - DRED reconstruction quality varies by voice / content. The neural model is trained on speech; edge cases (shouting, whispering, heavy accents, music-on-hold, cough, laughter) may reconstruct less cleanly than continuous speech. Mitigation: escape hatch ships from day one. If production telemetry shows perceptible quality regression on specific voice patterns, flip legacy mode for affected users while tuning. Also: classical Opus PLC remains as the third-tier fallback when DRED state is unavailable.
- Removing RaptorQ removes bit-exact recovery. Isolated single-packet losses are now reconstructed plausibly instead of bit-exactly. Mitigation: as argued in Background, bit-exactness on a single 20 ms speech frame is perceptually meaningless. The assumption is "speech is the workload" — if we ever add non-speech features (music bot, ringtones over the call path, DTMF-over-audio) we revisit.
- libopus 1.5 DRED API stability. Verified at pre-flight: opus.h in the upstream xiph/opus repository has no "experimental" marker on the DRED API declarations. The earlier characterization was incorrect. DRED shipped as a first-class feature in libopus 1.5.0 (Dec 2023) and has been iterated in 1.5.1 and 1.5.2. Google Meet and Duo ship it at scale. Mitigation: pin
opusic-sysexactly (no^range) to ensure reproducible builds, follow upstream 1.5.x bugfixes as they land. No special stability concerns beyond normal dependency hygiene. - Jitter buffer refactor is the largest code change. Jitter bugs are notoriously subtle (off-by-one on sequence wraparound, clock drift interactions, playout starvation corner cases). Mitigation: keep the classical-PLC path intact as the DRED fallback, so jitter bugs degrade to "current behavior" rather than "broken audio". Write targeted unit tests for the buffer at each loss-pattern scenario before touching production paths. Consider shipping Phase 3 behind a sub-flag separate from the main escape hatch, so we can independently toggle "DRED enabled but classical jitter buffer" for bisection.
- Cross-compile surprises.
opusic-sysis actively maintained but our exact combination of Android NDK version / Docker builder environment / Windows cross-compile via cargo-xwin has not been tested by upstream. Mitigation: Phase 0 includes the full cross-compile matrix as an acceptance criterion. Any blockers surface before we touch loss-recovery behavior. - Wire-format compatibility during rollout. Mixed-version calls (new sender + old receiver, or vice versa) need to keep working. Verified at pre-flight: traced both live receive paths (
wzp-client/src/call.rs::CallDecoder::ingestandwzp-android/src/engine.rsthe JNI-driven engine path), and both degrade gracefully: new-sender Opus packets withfec_ratio_encoded=0/fec_block=0/fec_symbol=0flow through to the jitter buffer and decode normally on old receivers. The RaptorQ decoder either ignores zero-FEC packets entirely (Android pipeline.rs gates on non-zero fec_block/fec_symbol) or accumulates them harmlessly until the 2-second staleness eviction (desktop call.rs). Old-sender packets with populated RaptorQ fields are handled by new receivers via the unchanged Codec2 path (new receivers keep wzp-fec for Codec2 tiers and simply ignore RaptorQ fields on Opus packets). No wire format version bump required. - Pre-existing desktop RaptorQ gap (incidental finding, NOT caused by this PRD). The desktop
wzp-client/src/call.rs::CallDecoderfeeds packets intofec_dec.add_symbolbut never callsfec_dec.try_decode— RaptorQ recovery is effectively dead code on the desktop path today. Main decode reads from the jitter buffer directly, falling through to classical Opus PLC on missing packets. The Androidengine.rspath properly usestry_decodefor recovery. This PRD does not fix the desktop gap — it's unrelated — but is noted here so nobody is surprised that removing RaptorQ from Opus tiers on the desktop client causes no measurable recovery regression (there was nothing to lose). Recommend filing a follow-up task to either fix or remove the vestigial desktop RaptorQ wiring independently of this work. AUDIO_USE_LEGACY_FECitself becoming permanent tech debt. Escape hatches have a way of outliving their intended lifespan. Mitigation: put an explicit removal date in a// TODO(2026-06-15): remove legacy FEC pathcomment at the flag-handling site. Track in taskmaster.
Open questions
Does opusic-c exposeResolved at pre-flight: no, it'sopusic_c::Decoder's raw inner pointer?pub(crate). We build a unifiedDecoderHandleover raw opusic-sys indred_ffi.rsand use it for both normal decode and DRED reconstruction. Opusic-c is used only for the encoder side.- Exact opusic-sys symbol name for DRED decoder allocation. opus.h documents the
OpusDREDDecodertype andopus_dred_parse/opus_decoder_dred_decodefunctions, but the allocation function name is not in the fetched snippet. Expected to beopus_dred_decoder_create/opus_dred_decoder_destroyper libopus naming convention, but confirm at the very start of Phase 3a by reading the actual opusic-sys bindings. If the function is not exported by opusic-sys, we file a PR upstream to opusic-sys (small fix, trivially mergeable) and temporarily vendor the function declaration locally. - Should the 5% loss floor be configurable per profile? Currently specified as a constant. A future refinement might make it higher at degraded tiers and lower at studio tiers, but without real telemetry we don't know if the constant is wrong. Keep as a constant for now, revisit after 1 month of production data.
- OSCE enable: opusic-c has an
oscefeature flag for Opus Speech Coding Enhancement, a separate libopus 1.5 neural post-processor. Out of scope for this PRD but should be the next audio-quality follow-up. Probably one-line enable once opusic-c is in. - Upstream PR to opusic-c: our own
dred_ffi.rswrapper should be proven in production first, then the fixes upstreamed toopusic-c/src/dred.rs(preservedred_end, fixdred_offsetdouble-pass, exposeDredPacketexternally). Follow-up task, not blocking this PRD. feat/desktop-audio-rewritemerge: the vendoredaudiopus_syspatch on that branch becomes obsolete under this PRD. Coordinate removal with whoever owns that branch.
Phase A: Continuous DRED Tuning (Implemented 2026-04-12)
Phase A extends the discrete tier-locked DRED durations from Phases 1-3 with continuous, network-driven tuning.
What was built
DredTuner(crates/wzp-proto/src/dred_tuner.rs): Maps(loss_pct, rtt_ms, jitter_ms)→(dred_frames, expected_loss_pct)continuously- Quinn stats exposure (
crates/wzp-transport/src/quic.rs):QuinnPathSnapshotprovides quinn's internal RTT, loss, congestion events — more accurate than sequence-gap heuristics - Jitter variance window (
crates/wzp-transport/src/path_monitor.rs): 10-sample sliding window for RTT standard deviation, used for spike detection AudioEncodertrait extensions (crates/wzp-proto/src/traits.rs):set_expected_loss()andset_dred_duration()with default no-op, overridden byOpusEncoderandAdaptiveEncoder- Engine integration (
desktop/src-tauri/src/engine.rs): Both Android and desktop send tasks poll every 25 frames and apply tuning
Opus6k DRED extended
dred_duration_for(Opus6k) changed from 50 (500ms) to 104 (1040ms) — the maximum libopus 1.5 supports. The RDO-VAE's quality-vs-offset curve makes this nearly free in bitrate terms while doubling burst resilience on the worst links.
Jitter spike detection ("Sawtooth" prediction)
When instantaneous jitter exceeds the EWMA × 1.3 (asymmetric: fast-up α=0.3, slow-down α=0.05), the tuner enters spike-boost mode:
- DRED immediately jumps to the codec tier's ceiling
- Cooldown: 10 cycles (~5 seconds at 25 packets/cycle)
- Designed for Starlink satellite handover sawtooth jitter pattern
Test coverage
- 10 unit tests for tuner math (baseline, scaling, spike, cooldown, codec switch, Codec2 no-op)
- 4 integration tests (encoder adjustment, spike boost, Codec2 no-op, profile switch with encode verification)