Phase 1 of the DRED integration (docs/PRD-dred-integration.md). The Opus
encoder now emits DRED (Deep REDundancy) bytes in every packet, carrying
a neural-coded history of recent audio that the decoder can use to
reconstruct loss bursts up to the configured window. Opus inband FEC
(LBRR) is disabled because DRED does the same job better and running both
wastes bitrate on overlapping protection.
Tiered DRED duration policy per PRD:
Studio (Opus 32k/48k/64k): 10 frames = 100 ms
Normal (Opus 16k/24k): 20 frames = 200 ms
Degraded (Opus 6k): 50 frames = 500 ms
Each profile switch (via adaptive quality) updates the DRED duration to
match the new tier. A 5% packet_loss floor is applied whenever DRED is
active, because libopus 1.5 gates DRED emission on non-zero packet_loss.
Real loss measurements from the quality adapter override upward.
Escape hatch: AUDIO_USE_LEGACY_FEC=1 reverts the encoder to Phase 0
behavior (inband FEC Mode1, DRED off, no loss floor). Read once at
OpusEncoder::new; call-scoped, not re-read mid-call. Trait-level
set_inband_fec becomes a no-op in DRED mode to preserve the invariant
even if external callers forget.
Observations from the bitrate probe test (dred_mode_roundtrip_voice_pattern):
DRED mode: 3649 bytes/sec (~29.2 kbps) on Opus 24k + 300 Hz sine
Legacy mode: 2383 bytes/sec (~19.1 kbps)
Delta: +10.1 kbps
The delta is considerably larger than the "+1 kbps flat" figure I carried
into the PRD from hazy memory of published DRED benchmarks. Likely because
the input (300 Hz sine) is very compressible so the base Opus rate in
legacy mode is well below the 24 kbps target, making the delta look
disproportionate. Signal-dependent — real speech would probably show a
different ratio. If production telemetry shows the overhead is excessive,
we can cut DRED duration on the normal tier from 200 ms to 100 ms as a
first tuning lever. Not blocking Phase 1 since the test still passes
within the reasonable 2000–8000 bytes/sec bounds.
Test changes (+8 tests, total wzp-codec: 61 passing):
- dred_duration_for_studio_tiers_is_100ms (per-profile policy)
- dred_duration_for_normal_tiers_is_200ms
- dred_duration_for_degraded_tier_is_500ms
- dred_duration_for_codec2_is_zero
- default_mode_is_dred_not_legacy (sanity check on fresh construction)
- dred_mode_roundtrip_voice_pattern (observes DRED bitrate, asserts bounds)
- profile_switch_refreshes_dred_duration (verifies set_profile updates DRED)
- set_inband_fec_noop_in_dred_mode (trait-level inband FEC no-op)
Verification:
- cargo check --workspace: zero errors, no new warnings
- cargo test -p wzp-codec: 61/61 passing (53 pre-Phase-1 baseline + 8 new)
- Empirical DRED bitrate observed via `rtk proxy cargo test
dred_mode_roundtrip_voice_pattern -- --nocapture`
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 0 of the DRED integration (docs/PRD-dred-integration.md). No behavior
change: inband FEC stays ON, no DRED, same bitrate, same quality. This
commit unblocks Phase 1+ by getting us onto libopus 1.5.2 where DRED lives.
Rationale for going straight to a custom DecoderHandle: opusic-c::Decoder's
inner *mut OpusDecoder pointer is pub(crate), so we cannot reach it for the
Phase 3 DRED reconstruction path. Running two parallel decoders (one for
audio, one for DRED) would drift because the DRED decoder wouldn't see
normal decode calls. Single unified DecoderHandle over raw opusic-sys is
the only correct architecture, so we build it in Phase 0 rather than
rewriting opus_dec.rs twice.
Changes:
- Cargo.toml (workspace + wzp-codec): remove audiopus 0.3.0-rc.0, add
opusic-c 1.5.5 (bundled + dred features), opusic-sys 0.6.0 (bundled),
bytemuck 1. Pinned exactly for reproducible libopus 1.5.2.
- opus_enc.rs: rewritten against opusic_c::Encoder. Argument order for
Encoder::new swapped (Channels first). set_inband_fec(bool) now maps
to InbandFec::Mode1 (the libopus 1.5 equivalent of 1.3's LBRR). encode
uses bytemuck::cast_slice<i16,u16> at the &[u16] boundary.
- dred_ffi.rs (new): DecoderHandle wrapping *mut OpusDecoder directly via
opusic-sys. Owns the allocation, frees on Drop. Exposes decode,
decode_lost, and a pub(crate) as_raw_ptr() for the future Phase 3 DRED
reconstruction. Send+Sync justified via &mut self access discipline.
- opus_dec.rs: rewritten as a thin AudioDecoder impl over DecoderHandle.
Behavior identical to pre-swap.
Verification (Phase 0 acceptance gates):
- cargo check --workspace: clean (30 pre-existing warnings in jni_bridge.rs
unrelated to this work; zero in changed files).
- cargo test -p wzp-codec: 53 tests pass (50 pre-swap + 6 new: 3 in
dred_ffi.rs for DecoderHandle lifecycle, 3 in opus_enc.rs for version
check and roundtrip).
- linked_libopus_is_1_5 test asserts opusic_c::version() contains "1.5" —
hard signal that the swap landed correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch the webrtc-audio-processing dep from the 2.x git source (bundled
mode) back to crates.io 0.3, and link against Debian's apt package
libwebrtc-audio-processing-dev (0.3-1+b1 on Bookworm). The 2.x path
fails because both the crates.io tarball and the upstream git main
branch of webrtc-audio-processing-sys 2.0.3 have a build.rs bug where
\`meson setup --reconfigure\` is passed unconditionally, panicking on
first-run empty build dirs with "Directory does not contain a valid
build tree". The 0.x line sidesteps bundled mode entirely by linking
the apt-provided library.
Trade-off: we get AEC2 (the older generation) instead of AEC3, but
it's the same algorithm family and is what PulseAudio's
module-echo-cancel and PipeWire's filter-chain use on current
Debian-family distros. Fine for shipping — we can revisit AEC3 once
the 2.x bundled build is fixed upstream.
API changes:
- 0.3's Processor::process_capture_frame and process_render_frame
take &mut self, so wrap the module-level processor in a Mutex.
Capture and playback threads each lock briefly (sub-ms per 10 ms
frame); contention is minimal.
- Import NUM_SAMPLES_PER_FRAME from the crate directly instead of
hardcoding 480, so the code tracks whatever sample rate the
upstream C++ lib exposes (currently 48 kHz hardcoded -> 480).
- Helper fns drain_frames_through_apm / tee_render_samples / etc.
take &Mutex<Processor> instead of &Processor.
- Use explicit EchoCancellationSuppressionLevel and
NoiseSuppressionLevel imports rather than fully-qualified paths.
Dockerfile:
- Drop meson / ninja-build / python3 (only needed for bundled build).
- Add libwebrtc-audio-processing-dev for the system link path.
- Keep clang (may be needed by the bindgen step in some versions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v2.0.3 bundled build hits 'Directory does not contain a valid build
tree' because the crate's build.rs uses `meson setup --reconfigure`
unconditionally, which fails on first run when the build dir doesn't
yet contain prior meson state. Try the main branch in case it's been
fixed post-release.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The crates.io tarball of webrtc-audio-processing-sys 2.0.3 is missing
the vendored C++ submodule — the bundled build fails with 'Directory
does not contain a valid build tree' when meson tries to configure
the ./webrtc-audio-processing subdirectory. Cargo clones git deps with
submodules auto-initialized since ~1.27, so pulling from the upstream
git repo (pinned to tag v2.0.3) gives us the full source tree.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds gold-standard Linux echo cancellation: in-app WebRTC AEC3 (Audio
Processing Module) via the webrtc-audio-processing crate, using the
same algorithm as Chrome WebRTC, Zoom, Teams, and Jitsi. Runs entirely
in-process, so it works identically on ALSA / PulseAudio / PipeWire
systems — no dependency on user-configured echo-cancel modules.
Architecture:
- New crates/wzp-client/src/audio_linux_aec.rs module (~470 lines).
Contains LinuxAecCapture and LinuxAecPlayback, both using CPAL
under the hood but routing samples through a shared
Arc<webrtc_audio_processing::Processor>. The playback path tees
each 20 ms frame into APM.process_render_frame as the echo
reference BEFORE handing the samples to CPAL's output callback.
The capture path runs APM.process_capture_frame on each mic frame
in place before pushing to the audio ring buffer. This is the
"tee the playback ring" approach that Zoom/Teams/Jitsi use.
- New `linux-aec` feature in wzp-client pulling in the
webrtc-audio-processing crate at v2.x with the `bundled`
sub-feature. Bundled means the vendored PulseAudio WebRTC C++
sources are statically compiled via meson+ninja at cargo build
time — no runtime .so dependency, avoids Debian Bookworm's stale
libwebrtc-audio-processing-dev 0.3 package (which predates AEC3).
Dep is target-gated to Linux, so enabling the feature on non-Linux
is a no-op.
- lib.rs re-exports LinuxAecCapture/LinuxAecPlayback as
AudioCapture/AudioPlayback when `linux-aec` is on, otherwise
falls back to the CPAL audio_io path. Shared public API
(start/ring/stop/Drop) means downstream code is unchanged.
- New `linux-aec` feature in wzp-desktop forwards to
wzp-client/linux-aec so `cargo tauri build -- --features
wzp-desktop/linux-aec` builds the AEC variant.
APM configuration:
- EchoCancellation: High suppression, delay-agnostic mode on,
extended filter on, stream_delay_ms=60 initial hint
- NoiseSuppression: High
- HighPassFilter: on
- AGC: off (can fight Opus encoder's own gain staging + adaptive
quality controller; add later if users report low mic level)
Frame size handling:
- Pipeline uses 20 ms frames (960 samples @ 48 kHz mono)
- APM requires strict 10 ms (480 samples) per call
- Each 20 ms frame is split into two 480-sample halves, APM called
twice, halves stitched back
- Same pattern for render and capture sides
- Carry-buffer logic handles the case where CPAL delivers samples in
arbitrary chunk sizes that don't divide 960
Build infrastructure:
- scripts/Dockerfile.linux-desktop-builder adds meson, ninja-build,
python3, clang for the webrtc-audio-processing bundled build
- scripts/build-linux-desktop-docker.sh takes a new --aec flag that
enables the linux-aec feature and renames the output artifacts
with an `-aec` suffix so noAEC and AEC variants can coexist on disk
Task #30.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CreateEventW is gated behind Win32_Security in the windows crate
because its signature takes SECURITY_ATTRIBUTES; add to features.
- Remove unused HANDLE import.
- Wrap GetId() and PWSTR::to_string() in explicit unsafe { ... }
blocks for Rust 2024 edition's unsafe_op_in_unsafe_fn lint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a direct WASAPI microphone capture path for the Windows desktop
build that opens the default communications endpoint via
IMMDeviceEnumerator -> IAudioClient2 -> SetClientProperties with
AudioCategory_Communications, turning on Windows's communications
audio processing chain (AEC, noise suppression, automatic gain
control). The communications AEC operates at the OS level and uses
the system render mix as the reference signal, so echo from our
existing CPAL playback stream is cancelled automatically with no
per-process reference plumbing.
Architecture:
- New crates/wzp-client/src/audio_wasapi.rs module (~280 lines).
Event-driven capture loop on a dedicated thread; pushes PCM into
the same lock-free AudioRing used by the CPAL path. Same public
API as audio_io::AudioCapture so downstream code is unchanged.
- New `windows-aec` feature in wzp-client that pulls in the
`windows` crate (Microsoft's official Rust COM bindings) gated to
target_os = "windows" only. Enabling the feature on non-Windows
targets is a no-op since both the module and the dep are
cfg(target_os = "windows").
- lib.rs re-exports WasapiAudioCapture as AudioCapture when the
feature is on, otherwise falls back to the CPAL AudioCapture.
AudioPlayback is always the CPAL one — no reason to swap it.
- desktop/src-tauri/Cargo.toml Windows target enables the new
feature: `features = ["audio", "windows-aec"]`.
Implementation notes:
- Uses eCommunications role (not eConsole) for GetDefaultAudioEndpoint
— the user-configured "communications" device that Teams/Zoom
pick up, and the one Windows's AEC is tuned for.
- Requests 48 kHz mono i16 with AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM +
SRC_DEFAULT_QUALITY so Windows handles any format conversion in
the audio engine instead of rejecting our format.
- Event-driven with SetEventHandle / WaitForSingleObject — no
polling, minimal CPU cost between packets.
- 200 ms wait timeout so the capture thread polls `running` often
enough for Drop to stop cleanly even if the audio engine stalls
(e.g. device unplug).
Task #24.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
First step of the Windows x86_64 desktop build: stop pulling
coreaudio-rs into the Windows dependency graph so the project can at
least run `cargo check --target x86_64-pc-windows-msvc`. Software AEC
is already disabled in engine.rs so there's nothing else to stub — the
macOS-specific VPIO path is skipped via #[cfg(target_os = "macos")] on
both sides and Windows falls through to the plain CPAL
AudioCapture/AudioPlayback branch that already existed.
crates/wzp-client/Cargo.toml
- coreaudio-rs optional dep moved under [target.'cfg(target_os = "macos")']
- `vpio` feature now uses `dep:coreaudio-rs` syntax and the gated dep
- Enabling `vpio` on Windows/Linux is a no-op at resolution time
crates/wzp-client/src/lib.rs
- `pub mod audio_vpio` is now #[cfg(all(feature = "vpio", target_os = "macos"))]
- Previously `vpio` alone was enough to try to compile the Core Audio
bindings, which would fail on non-Apple targets the moment the
feature flag was flipped on
desktop/src-tauri/Cargo.toml
- [target.'cfg(not(target_os = "android"))'] removed — was leaking
vpio into Windows/Linux via the catch-all.
- macOS: wzp-client with features = ["audio", "vpio"]
- Windows: wzp-client with features = ["audio"]
- Linux: wzp-client with features = ["audio"]
- Android: wzp-client with default-features = false (unchanged)
- Dropped the unused direct coreaudio-rs = "0.11" dep on macOS —
wzp-desktop's own sources never call Core Audio directly.
Verified via `cargo tree --target x86_64-pc-windows-msvc -p wzp-desktop`
that the Windows target now resolves wzp-client with cpal but without
coreaudio-rs. macOS target still resolves with coreaudio (direct via
vpio feature and transitively via cpal). macOS `cargo check` still
builds cleanly.
Cross-compile from macOS hit a cargo-xwin + llvm-lib setup issue in
ring's build.rs, so the actual `cargo check --target
x86_64-pc-windows-msvc` did not complete locally. Build verification
belongs on the user's Windows x86_64 host where MSVC is present
natively.
See tasks #23 (this one), #24 (Voice Capture DSP / WASAPI Communications
for OS-level AEC on Windows), and #25 (aarch64-pc-windows-msvc support).
With da106bd (Usage::Media + MODE_NORMAL) audio works but is always on
the loudspeaker — we want handset as the default with a user-driven
toggle for speaker (and later bluetooth). The right Oboe usage for a
VoIP app is VoiceCommunication, which honours
AudioManager.setSpeakerphoneOn / setBluetoothScoOn for routing.
Bisection across previous builds showed that setAudioApi(AAudio) +
Usage::VoiceCommunication made the playout callback stop draining the
ring after cb#0 (build 8c36fb5 logs). Letting Oboe pick the AudioApi
implicitly keeps the callback alive — 96be740's Media-usage callbacks
fired at steady 50Hz without any explicit setAudioApi. So: keep the
Usage change, DROP the explicit AAudio force.
- oboe_bridge.cpp: Usage::VoiceCommunication, no setAudioApi, no
ContentType override.
- MainActivity.kt: setMode(MODE_IN_COMMUNICATION) +
setSpeakerphoneOn(false) = handset default, plus max both
STREAM_VOICE_CALL and STREAM_MUSIC volumes for belt-and-braces.
Next build will add a JNI-based Tauri command to flip speakerphoneOn
at runtime so the user can toggle handset↔speaker during a call.
Build 8c36fb5 logs showed a new regression: Oboe playout cb#0 fires once
at startup then the callback STOPS DRAINING the ring entirely.
written_samples sticks at 7679 (= RING_CAPACITY - 1) across every recv
heartbeat in a 40-second test. Meanwhile the recv task decodes 1800+ real
audio frames (sample range up to [-27920..31907], rms 12065) which all
get dropped on the floor by audio_write_playout returning 0 because the
ring is full.
Bisection: 96be740 (Usage::Media, no setAudioApi, no ContentType, no
MainActivity audio mode change) DID drive the playout callback at the
expected 50Hz (playout heartbeat: calls=1100 total_played_real=1055040
over 22 seconds). User still heard nothing there because of OS routing,
but at least Oboe accepted the PCM.
8c36fb5 added three changes on top of 96be740:
1. Oboe Usage::Media → Usage::VoiceCommunication
2. Oboe setAudioApi(oboe::AudioApi::AAudio) explicit
3. Oboe setContentType(ContentType::Speech)
4. MainActivity setMode(MODE_IN_COMMUNICATION) + setSpeakerphoneOn(true)
Every one of those could have killed the callback; combined they did.
Revert to 96be740's exact Oboe config: Usage::Media, no setAudioApi, no
ContentType. Keep the PCM recorder, heartbeat logging, and stream-open
logging. Separately, MainActivity now maxes STREAM_MUSIC (the stream
Usage::Media routes to) but leaves audio mode in MODE_NORMAL — no more
speakerphone/call-mode combo that makes Oboe unhappy. In NORMAL mode a
STREAM_MUSIC stream plays through the loud speaker by default.
Proof that the Rust pipeline is perfect: decoded.pcm recorded in 8c36fb5
was pulled via `adb shell run-as com.wzp.desktop cat .wzp/decoded.pcm`,
converted with ffmpeg, and played back on the Mac — user confirmed
audible speech. So 100% of the remaining bug surface is Android audio
routing, not anything in the Rust/C++ decode path.
cc-rs build of oboe_bridge.cpp failed at cfa9ff6 because the Oboe
ResultWithValue<T> template returned by getXRunCount() does not have
a .value_or(T) method — only .value(). Replace with an explicit
bool-conversion + .value() guard that yields -1 on error.
Build 96be740 logs proved the entire software pipeline is healthy:
capture heartbeat: calls=1100 to_write=960 full_drops=0 total_written=1056000
recv heartbeat: decoded_frames=1035 last_written=960 decode_errs=0
recv decoded PCM: range=[-13564..9244] rms=8044 (real audio)
playout WRITE: in_len=960 written=960 rms=2318 (real audio into the ring)
playout heartbeat: calls=1100 nonempty=1099 total_played_real=1055040
1055040 samples / 48000 Hz = 22s — exactly matches wall-clock elapsed,
meaning Oboe IS calling our playout callback at the expected rate and
WE ARE handing it real PCM every 20ms. User still heard nothing. Ergo
Oboe accepted the PCM and routed it to a silent output. Two fixes:
1) MainActivity.kt: switch to MODE_IN_COMMUNICATION + speakerphone ON
right after permissions are granted, and crank STREAM_VOICE_CALL to
max. Without this, an Oboe Usage::VoiceCommunication stream gets
opened, the OS creates a real AAudio pipeline, the callback fires on
schedule — and audio goes to either the earpiece at muted volume or
a "call not active" dead end. Logs the audio mode + volume levels
before and after the switch so we can confirm the state change in
logcat next run.
2) oboe_bridge.cpp: revert Usage::Media → VoiceCommunication (the mode
that matches MODE_IN_COMMUNICATION), pin the audio API to AAudio
explicitly instead of letting Oboe fall back to OpenSLES (which has
its own silent-drop failure modes on some devices), and add getState
+ getXRunCount to the playout heartbeat so we'll see silent stream
disconnects instead of reading zeros forever.
3) engine.rs recv task: dump the first ~10s of post-AGC decoded PCM to
`<app_data_dir>/decoded.pcm` as raw i16 LE so we can adb pull it and
play it back locally:
adb shell run-as com.wzp.desktop cat .wzp/decoded.pcm > decoded.pcm
ffmpeg -f s16le -ar 48000 -ac 1 -i decoded.pcm decoded.wav
This divorces "is our decoder actually producing audible audio" from
"is Android's audio stack playing it". If the recorded WAV sounds
correct when played on a laptop, the decoder is fine and 100% of the
remaining bug surface is AudioManager / Oboe routing.
4) engine.rs: also log when spk_muted=true blocks the write. User
reported the Speaker button in the UI has inconsistent semantics
between desktop and android — adding this log rules out the accidental
"first click muted playback" theory for good.
User confirmed: mac hears android, android does not hear mac. So Oboe
capture works end-to-end but Oboe playout on Android silently drops
audio even though QUIC forwards the packets. Archaeology on the legacy
wzp-android crate also revealed that the "last known good" Android audio
path NEVER used Oboe in production — it used Kotlin AudioRecord +
AudioTrack via JNI, and cpp/oboe_bridge.cpp was dead code. So every time
we've "tested" Oboe end-to-end this week was the first production use,
and any of its config knobs could be the bug.
Instrumenting every stage of the pipeline so one smoke-test log dump can
isolate the layer at fault:
C++ (oboe_bridge.cpp)
- Log the ACTUAL stream parameters after openStream for both capture
and playout (sample rate, channels, format, framesPerBurst,
framesPerDataCallback, bufferCapacityInFrames, sharing, perf mode).
Oboe may silently override values we requested — e.g. if we ask for
48kHz mono but the device gives us 44.1kHz stereo our 960-sample
frames are the wrong duration and the pipeline drifts.
- Capture callback: on cb#0 log sample range+RMS of the first frame
to prove we get real mic data (not zeros). Every 50 callbacks
(~1s at 20ms burst) log calls, numFrames, ring available_write,
bytes actually written, ring_full_drops, total_written.
- Playout callback: on cb#0 log numFrames + ring state. On the FIRST
non-empty read log sample range+RMS so we can tell if the samples
coming out of the ring are real audio or zeros. Every 50 callbacks
log calls, nonempty count, numFrames, ring available_read,
underrun_frames, total_played_real.
Rust wzp-native (src/lib.rs)
- wzp_native_audio_write_playout now logs the first 3 writes and then
every 50th: in_len, written, sample range, RMS, ring write/read
cursors before, available_read and available_write after. Reveals
ring-overflow and whether the engine is actually handing us audio.
- Minimal android logcat shim via __android_log_write extern — no
new crate dependency.
- AudioBackend grows a `playout_write_log_count` AtomicU64 to gate
the write-side log throttle.
Rust engine.rs (android branch)
- Recv task: log sample range + RMS for the first 3 decoded PCM
frames and then every 100th. Reveals whether decoder.decode is
producing real audio or silent buffers.
- Recv task: if audio_write_playout returns fewer samples than we
handed it (partial write → ring nearly full) warn about it in the
first 10 frames.
- Recv heartbeat every 2s: recv_fr, decoded_frames, last_decode_n,
last_written, written_samples, decode_errs, codec.
Expected flow in a healthy log:
capture cb#0: numFrames=960 range=[-1200..900] rms=180 ← mic OK
capture stream opened: actualSR=48000 Ch=1 ... ← no override
playout stream opened: actualSR=48000 Ch=1 ...
CallEngine::start invoked ... → connected → audio started
recv: first media packet received ...
recv: decoded PCM sample range decoded_frames=1 range=[-300..250] rms=92
playout WRITE #0: in_len=960 written=960 range=[-300..250] rms=92
playout FIRST nonempty read: to_read=960 range=[-300..250] rms=92
playout heartbeat: calls=50 nonempty=50 underrun=0 ...
recv heartbeat: decoded_frames=100 last_written=960 ...
If any of those are missing/zero we know the exact stage to fix.
Three real bugs, one smoke-test session's worth of progress.
1. RELAY: wrong advertised addr in CallSetup
The direct-call CallSetup computed `relay_addr = addr.ip()` where
`addr = connection.remote_address()` — i.e. the CLIENT'S IP, not the
relay's. So the relay was telling both parties "the call room is at
the answerer's IP:4433", which meant each client dialed either the
other client (no server listening) or themselves. Both endpoint.connect
calls hung forever and the call never happened.
Fix: compute the relay's own advertised IP once at startup. If the
listen addr is 0.0.0.0, probe the primary outbound interface via the
classic UDP-bind-and-connect(8.8.8.8:80) trick to discover the LAN
IP the OS would use to reach external hosts. Thread the resulting
advertised_addr_str into the CallSetup sender for both parties.
2. RELAY: accept loop serialized QUIC handshakes
Previously the main accept loop called `wzp_transport::accept` which
did both `endpoint.accept().await` AND `incoming.await` (the server-
side QUIC handshake). A single slow handshake therefore blocked every
subsequent client from being accepted. Unroll the helper here and
move `incoming.await` into the per-connection spawned task, so every
handshake runs in parallel. Also log "accept queue: new Incoming",
"QUIC handshake complete", and "QUIC handshake failed" so we can tell
immediately whether a client's packets are reaching the relay at all.
3. ANDROID: playout was routed to the silent in-call stream
The Oboe playout stream was configured with Usage::VoiceCommunication,
which routes to the Android in-call earpiece stream. That stream is
silent unless the Activity has called AudioManager.setMode(
IN_COMMUNICATION) and, even then, only the earpiece/BT headset get
audio (not the loud speaker). Result: android→mac calls worked
because mac had a normal media output, but mac→android calls were
silent even though packets flowed through the relay just fine.
Switch to Usage::Media + ContentType::Speech so Oboe routes to the
loud speaker and uses the media volume slider. A later polish step
will wire setMode + setSpeakerphoneOn from MainActivity.kt so we can
go back to VoiceCommunication for AEC and proximity-sensor routing.
Plus: heartbeat tracing every 2s in the send/recv tasks — frames_sent,
last_rms, last_pkt_bytes, short_reads on the send side; decoded_frames,
last_decode_n, last_written, decode_errs on the recv side. Will make the
next "no sound" regression trivial to localize.
Direct-call accept hangs forever at the QUIC handshake on Android. Logs
from d7b37a5 showed:
CallEngine::start (android) invoked relay=172.16.81.172:4433 room=call-…
resolved relay addr
identity loaded
endpoint created, dialing relay ← reached
← nothing, 90s+, no error
The "connect failed" and "QUIC connection established" log lines never
fire, meaning endpoint.connect_with(…).await never makes progress.
Repro is 100%: SFU room join (one endpoint) works perfectly; direct call
(opens a SECOND quinn::Endpoint on top of the signal one) hangs in the
QUIC handshake. Creating two quinn::Endpoints on Android's AAudio-adjacent
UDP stack apparently causes the second one's datagrams to never reach the
relay (the server never sees the Initial packet). Rather than fight the
platform, quinn is happy to multiplex multiple Connections on a single
Endpoint — so we reuse the signal endpoint for the media connection.
- SignalState now stores the quinn::Endpoint alongside the QuinnTransport.
register_signal populates both at the same time.
- CallEngine::start (both android and desktop branches) takes an
Option<wzp_transport::Endpoint>. Some → reuse (direct-call path, after
register_signal). None → create fresh (SFU room join path).
- The connect tauri command reads state.signal.endpoint and threads it
through to CallEngine::start, so the direct-call auto-connect (fired by
the "setup" signal-event in main.ts) lands on the existing UDP socket.
- wzp_transport re-exports quinn::Endpoint so wzp-desktop doesn't need to
depend on quinn directly.
- Also wraps the android connect in tokio::time::timeout(10s) so future
hangs become deterministic "connect TIMED OUT" errors in logcat
instead of silent deadlock.
Same fix applies verbatim to the desktop client — the user suspects
direct call is broken there too and this was likely always the cause,
just never surfaced because desktop was only tested via SFU rooms.
PlayoutCallback::onAudioReady crashed with SIGSEGV(SEGV_ACCERR) on the
first AAudio callback because g_rings was a `const WzpOboeRings*` pointing
at the caller's stack frame. wzp_native_audio_start() constructs the
rings struct as a stack local in Rust, passes &rings to wzp_oboe_start
(which stored the raw pointer), and returns — at which point the stack
frame unwinds and g_rings becomes a dangling reference. The first audio
callback then read from freed memory and died.
- g_rings is now a static WzpOboeRings value (was `const WzpOboeRings*`).
The raw int16 buffer + atomic index pointers inside the struct still
point into the Rust-owned AudioBackend singleton, which is leaked for
the lifetime of the process, so deep-copying the struct by value is
safe and keeps the inner pointers valid forever.
- g_rings_valid atomic bool gates the audio-callback reads: set to true
after the value copy in wzp_oboe_start, cleared in wzp_oboe_stop BEFORE
the streams are torn down so any in-flight callback sees "no backend"
and returns Stop instead of racing on g_rings.
- All g_rings->x accesses in the capture + playout callbacks switched to
g_rings.x (member-of-value).
Reproduced on Pixel 6 / Android 15 with build 0105b0f:
F libc: Fatal signal 11 (SIGSEGV), code 2 (SEGV_ACCERR),
fault addr 0x71aa717eb0 in tid 11822 (AudioTrack)
#00 PlayoutCallback::onAudioReady(oboe::AudioStream*, void*, int)+120
#01 oboe::AudioStream::fireDataCallback(void*, int)+136
...
Now that Phase 1 proved the split-cdylib pipeline (build #37 launched
cleanly with 'wzp-native dlopen OK: version=42 msg=...' in logcat),
this commit brings the real audio code into wzp-native without ever
touching the Tauri crate:
- cpp/oboe_bridge.{h,cpp}, oboe_stub.cpp, getauxval_fix.c copied
verbatim from crates/wzp-android/cpp/ (same files that work in the
legacy wzp-android .so on this phone)
- build.rs near-identical to crates/wzp-android/build.rs: clones
google/oboe@1.8.1 into OUT_DIR, compiles oboe_bridge.cpp + all
oboe source files as a single static lib with c++_shared linkage,
emits -llog + -lOpenSLES. On non-android hosts it compiles just
oboe_stub.cpp so `cargo check` works locally without an NDK.
- Cargo.toml gets cc = "1" in [build-dependencies]. This is SAFE
because wzp-native is a single-cdylib crate — crate-type is only
["cdylib"], no staticlib, so rust-lang/rust#104707 does not apply.
- src/lib.rs extends the FFI surface with the real audio API:
wzp_native_audio_start() -> i32
wzp_native_audio_stop()
wzp_native_audio_read_capture(*mut i16, usize) -> usize
wzp_native_audio_write_playout(*const i16, usize) -> usize
wzp_native_audio_capture_latency_ms() -> f32
wzp_native_audio_playout_latency_ms() -> f32
wzp_native_audio_is_running() -> i32
Plus a static AudioBackend singleton holding the two SPSC ring
buffers (capture + playout) that are shared with the C++ Oboe
callbacks via AtomicI32 cursors. The wzp_native_version() and
wzp_native_hello() smoke tests from Phase 1 are preserved.
Compiles cleanly on macOS host with the stub oboe .cpp. Next build
will exercise the full cargo-ndk path inside docker to verify the
whole Oboe compile still works standalone.
Phase 3 (next commit): wzp-desktop engine.rs on Android calls
wzp-native's audio FFI via the already-wired libloading handle, and
the real CallEngine::start() is implemented for Android using the
same codec/handshake/send/recv pipeline as desktop but with Oboe
rings instead of CPAL rings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 of the big refactor. Escape the Tauri Android
__init_tcb+4 symbol leak (rust-lang/rust#104707) by making
wzp-desktop's Android .so pure Rust — ZERO cc::Build, no cpp/ files,
no C++ in the rustc link step. All future C++ (Oboe audio bridge)
lives in a new standalone cdylib crate `wzp-native` which is built
with cargo-ndk (the same path the legacy wzp-android crate uses
successfully on the same phone + same NDK), copied into Tauri's
gen/android/app/src/main/jniLibs at build time, and dlopened by
wzp-desktop at runtime via libloading.
Changes in this commit:
- NEW crate crates/wzp-native/ with crate-type = ["cdylib"] only
(no staticlib, no rlib — rust#104707 shows mixing staticlib with
cdylib leaks non-exported symbols, which is the original bug
source). Phase 1 scaffold has TWO extern "C" functions:
wzp_native_version() -> i32 (returns 42)
wzp_native_hello(buf, cap) -> usize (writes a string)
So we can verify dlopen + dlsym + cross-.so FFI end-to-end
before adding any real C++.
- desktop/src-tauri/cpp/ directory DELETED (7 files gone).
- desktop/src-tauri/build.rs reduced to just the git hash capture
+ tauri_build::build(). No more cc::Build of any kind.
- desktop/src-tauri/Cargo.toml: drop cc from build-dependencies,
add libloading = "0.8" as an Android-only runtime dep.
- desktop/src-tauri/src/lib.rs Builder::setup() now (on Android only)
dlopens libwzp_native.so, calls wzp_native_version() and
wzp_native_hello(), and logs the result:
"wzp-native dlopen OK: version=42 msg=\"hello from wzp-native\""
If this log appears in logcat when the app launches and the home
screen still renders, the split-cdylib pipeline is validated and
Phase 2 (port the Oboe bridge into wzp-native) can proceed.
- scripts/build-tauri-android.sh: insert a `cargo ndk -t arm64-v8a
build --release -p wzp-native` step before `cargo tauri android
build`, with `-o desktop/src-tauri/gen/android/app/src/main/jniLibs`
so the resulting libwzp_native.so lands in the place gradle will
package into the final APK.
- Workspace Cargo.toml: add crates/wzp-native to [workspace] members.
Phase 2 (separate commit, only if Phase 1 works):
- Copy cpp/oboe_bridge.{h,cpp} + getauxval_fix.c from the legacy
wzp-android crate into crates/wzp-native/cpp/.
- Add cc = "1" as a build-dependency on wzp-native (safe: it's a
single-cdylib crate with no staticlib, so no symbol leak).
- Add build.rs that compiles the Oboe C++ and the wzp-native Rust
FFI exposes the audio start/stop/read/write functions.
- wzp-desktop::engine.rs dlopens wzp-native at CallEngine::start,
uses its audio functions instead of CPAL on Android.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When relay listens on 0.0.0.0, derive the actual IP from the client's
connection address for the CallSetup message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive a 4-digit code from the shared DH secret via HKDF with label
"warzone-sas-code". Both peers compute the same code; a MITM relay
produces a different one. Users compare verbally during the call.
- CryptoSession::sas_code() -> Option<u32> on the trait
- ChaChaSession stores and returns the SAS
- HKDF derivation in WarzoneKeyExchange::derive_session()
- Tests: both peers match, MITM produces different code
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Call rooms (call-*) restricted to the two authorized participants only
- Room capacity enforced at 2 for call rooms
- Unauthorized clients get immediate connection close
- Unified fingerprint format: SHA-256(Ed25519 pub)[:16] as xxxx:xxxx:...
Used consistently in signal registration, handshake, and ACL checks
Tested: Alice+Bob authorized, attacker rejected with "not authorized"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New feature: call someone directly by fingerprint through the relay.
- Client connects with SNI "_signal" for persistent signaling
- RegisterPresence/RegisterPresenceAck for relay registration
- DirectCallOffer routed to target by fingerprint
- DirectCallAnswer with AcceptGeneric/AcceptTrusted/Reject modes
- Relay creates private room (call-{id}), sends CallSetup to both
- Both clients connect to private room for media (existing SFU path)
- Hangup forwarding + cleanup on disconnect
- Desktop CLI: --signal + --call <fingerprint> for testing
- CallRegistry tracks call state (Pending/Ringing/Active/Ended)
- SignalHub manages persistent signaling connections
Tested: Alice calls Bob by fingerprint, relay routes offer, Bob
auto-accepts, both join private room, media flows bidirectionally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Time-based dedup (2s TTL) replaces fixed-window dedup — consecutive
senders with same seq numbers no longer collide
- Raw byte forwarding for federation local delivery (no re-serialization)
- Jitter buffer resets on large backward seq jumps (>100)
- recv_media skips malformed datagrams instead of returning connection-closed
- SIGTERM handler for clean QUIC shutdown on wzp-client
- JSONL event log infrastructure (--event-log flag) for protocol analysis
- FEC disabled on GOOD profile for federation debugging (fec_ratio=0.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Federation media from different senders had conflicting seq numbers,
FEC block IDs, and Opus decoder state. The relay now assigns fresh
monotonic seq/fec_block/fec_symbol to all federation-delivered packets,
ensuring clients see a clean continuous stream regardless of sender changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When propagating GlobalRoomActive to other peers, use tagged participants
(with relay_label set to the originating relay) instead of the raw
untagged participants. This shows "Relay C" instead of "Relay B" when
C's participants are forwarded through hub B to A.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a new sender reuses the same block_id values as a previous sender,
the FEC decoder was silently dropping all data because blocks were marked
as "already decoded". Now blocks older than 2 seconds are automatically
reset when new data arrives for them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Dedup key now includes source peer fingerprint hash, preventing
packets from different senders with same room+seq from being dropped
as duplicates (was silently killing all multi-hop audio)
- Build scripts default to --pull (use --no-pull to skip)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deduplicate remote participants by fingerprint in all merge sites
(canonical == raw room name caused double-lookup, doubling every remote participant)
- GlobalRoomInactive now propagates updated participant list to other peers
(hub relay B was not informing A when C's participants left)
- Add 15-second stale presence sweeper that purges remote participants
from peers that stop sending data (safety net for QUIC timeout delays)
- Add @Synchronized to WzpEngine.getStats/stopCall/destroy to prevent
TOCTOU race between stats polling coroutine and engine teardown (SIGSEGV)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Android default room changed from 'android' to 'general'
- Relay choose_profile capped at GOOD (Opus 24k) — studio tiers
(32k/48k/64k) cause high packet loss on federation paths due to
larger datagrams exceeding path MTU. Will re-enable after MTU
discovery is implemented.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a new federation link is established, announce not only LOCAL
global rooms but also rooms from OTHER peers (remote_participants).
This fixes multi-hop: when R2 connects to R3, R2 tells R3 about
R1's rooms that R2 learned about earlier.
Previously, only local rooms were announced on link setup. If R1
had a client but R2 had no clients, R2 wouldn't tell R3 about R1.
Also added diagnostic logging for room announcements on link setup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for 3-relay chain (R1→R2→R3):
1. Room lookup in handle_datagram: hub relay (R2) has no local
participants, so active_rooms() was empty and datagrams were
silently dropped. Now also checks global_rooms config directly,
allowing hub relays to forward without local clients.
2. Multi-hop forwarding: removed active_rooms filter — forward to
ALL connected peers except source. The receiving peer decides
whether to deliver or forward further.
3. Android relay_label: native RoomMember now includes relay_label
from RoomUpdate signal. Kotlin UI reads it for relay grouping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a remote relay's room goes inactive (all participants left),
the receiving relay now:
1. Clears remote_participants for that peer+room
2. Broadcasts updated RoomUpdate to local clients with the remote
participant removed
3. Updates federation_active_rooms metric
Previously, remote participants lingered in the participant list
after disconnect, causing ghost entries and stale media forwarding.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Connects to a relay over QUIC with SNI "version", reads build hash
from a unidirectional stream, prints "<relay> <git-hash>" and exits.
Usage: wzp-client --version-check 172.16.81.175:4434
Output: 172.16.81.175:4434 8dbda3e
Relay side: detects "version" SNI, opens uni stream, writes
BUILD_GIT_HASH, waits 100ms for client to read, closes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wzp-relay --version prints "wzp-relay <short-git-hash>".
Build hash also logged on startup: version=abc1234.
Enables verifying deployed relay matches expected build.
Also fixed federation-test.sh: use kill -INT (not SIGTERM) so
clients save recordings before exit. Added save delay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RoomParticipant.relay_label identifies which relay a participant is
connected to. Local participants have None, federated participants
get tagged with the peer relay's label when storing remote_participants.
This enables clients to group participants by relay in the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RoomParticipant now has optional relay_label field. Desktop client
groups participants by relay: "This Relay" (green dot) for local,
peer label (blue dot) for federated. Shows all relays in the chain
including intermediate ones.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. CLI client now sends raw room names (no hash), matching Android
JNI and Desktop Tauri. All three clients are now consistent.
2. When a client joins a global room, the relay merges federated
remote participants into the initial RoomUpdate. Previously,
clients that joined after the GlobalRoomActive signal only saw
local participants. Now they see everyone immediately.
3. Added get_remote_participants() to FederationManager for querying
cached remote participants from all peer links.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires up the existing RelayMetrics federation fields:
- wzp_federation_peer_status{peer} — 1=connected, 0=disconnected
- wzp_federation_packets_forwarded_total{peer,direction} — in/out counts
- wzp_federation_active_rooms — number of active federated rooms
These are critical for monitoring federation health and will feed into
the adaptive codec selection system (PRD-coordinated-codec.md).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GlobalRoomActive signal now carries participant list from the
announcing relay. When received, the relay:
1. Stores remote participants per peer link
2. Broadcasts merged RoomUpdate to local clients (local + all remote)
This means clients on different relays can now SEE each other in the
participant list. Also fixes build: removed non-existent metric field
references that were added by linter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Prometheus metrics for federation links (per-peer RTT, packet
counters, active rooms gauge, dedup/rate-limit drop counters).
Add dedup filter (4096-entry ring buffer) to drop duplicate packets
arriving via multiple federation paths. Add per-room token bucket
rate limiter (500 pps) to prevent amplification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Different clients send different room names:
- Android: raw "general" as SNI
- Desktop: hash_room_name("general") = "f09ae11d..." as SNI
Federation datagrams are tagged with an 8-byte room hash. Previously,
each relay computed the hash from the client-provided room name,
causing mismatches between relays with different client types.
Fix: resolve_global_room() maps any room name (raw or hashed) to the
canonical [[global_rooms]] name. global_room_hash() always uses the
canonical name for federation hashing. handle_datagram uses both raw
and canonical hash matching to find the local room.
Also: run_participant now receives the pre-computed federation_room_hash
so the egress uses the canonical hash, not the client-specific name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire AdaptiveQualityController into Android engine for auto codec
switching based on network quality reports. Add color-coded TX/RX
codec badges to the in-call screen showing active codecs and Auto mode.
- Recv task: ingest QualityReports, feed to controller, signal profile
changes via AtomicU8 to send task
- Send task: check for pending profile switch at frame boundaries,
update encoder/FEC/frame size
- Track peer codec from incoming packet headers
- Kotlin UI: codec badges (blue=studio, green=good, amber=degraded,
red=catastrophic) with Auto tag
- Add .taskmaster to .gitignore
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bug: when a local client joins a global room and sends media, the
egress task checked peer_links.active_rooms to decide where to forward.
But active_rooms tracks what PEERS announced (their rooms), not what
WE announced. So our own GlobalRoomActive signal went out but our
peer_links had empty active_rooms — media was dropped.
Fix: for locally-originated media, send to ALL connected federation
peers unconditionally. The receiving relay decides whether to deliver
to local participants (if it has the room) or forward further. This
is correct because federation peers are explicitly configured — if
they're connected, they should receive global room media.
Multi-hop forwarding (handle_datagram) still filters by active_rooms
to prevent loops — only forwards to peers that announced the room.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --config points to a non-existent file, the relay now generates
a personalized example config that includes:
- listen_addr matching the --listen flag (not hardcoded 0.0.0.0:4433)
- Pre-filled [[peers]] section with this relay's detected IP, port,
and TLS fingerprint — ready to copy/paste into other relay configs
This makes setting up federation much easier: start each relay, it
generates its config with its own peering info commented out, you
just uncomment and copy between configs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables running multiple relays on the same machine:
wzp-relay -c ~/.wzp1/config.toml -i ~/.wzp1/relay-identity --listen :4433
wzp-relay -c ~/.wzp2/config.toml -i ~/.wzp2/relay-identity --listen :4434
wzp-relay -c ~/.wzp3/config.toml -i ~/.wzp3/relay-identity --listen :4435
Config auto-creation: if the config file doesn't exist, writes an
example config with all fields documented and commented. The relay
starts with defaults but the file is ready to edit.
Identity auto-generation: if the identity file doesn't exist, generates
a new random seed (OsRng via wzp_crypto::Seed::generate) and saves it.
Subsequent starts load the same identity.
Short flags: -c for --config, -i for --identity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added logging to trace federation media flow:
- media_task logs first + every 250th received datagram (count, len)
- handle_datagram multi-hop forward logs errors (was silently dropped)
- forward_to_peers logs when no peer matches
2-relay (A→B): WORKING — full audio received, 300 packets forwarded
3-relay (A→B→C): B receives datagrams from A but only 1 arrives —
remaining packets not received, likely a QUIC read_datagram issue
when handle_datagram holds locks during processing. Needs further
investigation into async lock contention or datagram buffering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>