263 Commits

Author SHA1 Message Date
Siavash Sameni
d249b32ee5 test+docs: add tests for QualityDirective, ParticipantQuality; update docs
- QualityDirective signal roundtrip tests (with/without reason)
- ParticipantQuality unit tests (initial tier, degradation, weakest-link)
- Updated PROGRESS.md with desktop adaptive quality, relay coordinated
  switching, Oboe state polling entries
- Updated ARCHITECTURE.md SFU fan-out rules with QualityDirective
- Updated PRD-coordinated-codec.md with implementation status
- 312 tests passing across all modified crates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:56:46 +04:00
Siavash Sameni
22045bc5e6 feat: adaptive quality in desktop, relay quality directive, Oboe state polling
- Wire AdaptiveQualityController into desktop engine send/recv tasks
  (mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
  AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
  tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
  requestStart() to ensure both streams reach Started before proceeding
  (fixes intermittent silent calls on cold start, Nothing Phone A059)

Tasks: #7, #25, #26, #31, #35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:54:04 +04:00
Siavash Sameni
766c9df442 feat(dred): continuous DRED tuning, PMTUD, extended Opus6k window
- DredTuner: maps live network metrics (loss/RTT/jitter) to continuous
  DRED duration every ~500ms instead of discrete tier-locked values.
  Includes jitter-spike detection for pre-emptive Starlink-style boost.
- Opus6k DRED extended from 500ms to 1040ms (max libopus 1.5 supports)
- PMTUD: quinn MtuDiscoveryConfig with upper_bound=1452, 300s interval
- TrunkedForwarder respects discovered MTU (was hard-coded 1200)
- QuinnPathSnapshot exposes quinn internal stats + discovered MTU
- AudioEncoder trait: set_expected_loss() + set_dred_duration() methods
- PathMonitor: sliding-window jitter variance for spike detection
- Integrated into both Android and desktop send tasks in engine.rs
- 14 new tests (10 tuner unit + 4 encoder integration)
- Updated ARCHITECTURE.md, PROGRESS.md, PRD-dred-integration, PRD-mtu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 19:38:37 +04:00
Siavash Sameni
24cc74d93c fix(audio): clear BT SCO communication device on call end
Without clearCommunicationDevice(), the BT headset stays locked in SCO
mode after the call. Media playback (video, music) can't route to BT
A2DP, requiring a device reboot to restore normal audio.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:40:44 +04:00
Siavash Sameni
300ea66d13 docs: update DESIGN, ARCHITECTURE, PRDs, PROGRESS for BT + network + build changes
Reflects the current reality: setCommunicationDevice API 31+, deferred
MODE_IN_COMMUNICATION, BT-mode Oboe (bt_active flag), per-arch builds,
Hangup call_id fix, and network monitoring integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:39:59 +04:00
Siavash Sameni
114d69e488 fix: use tracing::warn! instead of bare warn! in engine.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:31:12 +04:00
Siavash Sameni
15c237ceea fix(audio): defer MODE_IN_COMMUNICATION to call start, restore on end
Root cause: MainActivity set MODE_IN_COMMUNICATION at app launch,
hijacking system audio routing immediately — BT A2DP music dropped to
earpiece, and the pre-existing communication mode confused subsequent
setCommunicationDevice calls for BT SCO.

Fix: MainActivity now only sets volumes. MODE_IN_COMMUNICATION is set
via JNI right before Oboe audio_start() in CallEngine, and MODE_NORMAL
is restored after audio_stop() when the call ends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:29:59 +04:00
Siavash Sameni
a37c8b30fe fix(native): add missing bt_active field to stall detector config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:25:11 +04:00
Siavash Sameni
137fe5f084 fix(bluetooth): BT SCO mode skips 48kHz + VoiceCommunication on capture
Root cause: Oboe capture at 48kHz with InputPreset::VoiceCommunication
cannot open against a BT SCO device (only supports 8/16kHz). The stream
silently falls back to builtin mic, delivering zeros.

Fix: add bt_active flag to WzpOboeConfig. When set, capture skips
setSampleRate and setInputPreset, letting the system route to BT SCO
at its native rate. Oboe's SampleRateConversionQuality::Best resamples
to 48kHz for our ring buffers. Playout uses Usage::Media in BT mode.

New API: wzp_native_audio_start_bt() for BT mode, called from
set_bluetooth_sco(on=true). Normal audio_start() restores the
standard config when switching back to earpiece/speaker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:23:19 +04:00
Siavash Sameni
5dfb5b3581 fix(bluetooth): use Shared mode for Oboe + delay restart for BT route
Two fixes for BT audio silence:

1. Switch Oboe streams from Exclusive to Shared sharing mode. Exclusive
   mode bypasses Oboe's internal resampler, so opening a 48kHz stream
   against a BT SCO device (8/16kHz only) fails at the AudioPolicy
   level. Shared mode lets Oboe's resampler bridge the gap.

2. Add 500ms post-SCO delay before Oboe restart. The audio policy needs
   time to apply the bt-sco route after setCommunicationDevice returns.
   Without the delay, Oboe opens against the old device (handset).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:14:06 +04:00
Siavash Sameni
fd0ccf8e99 fix(bluetooth): enable Oboe sample rate conversion for BT SCO (8/16kHz)
BT SCO devices only support 8kHz or 16kHz but our Oboe streams request
48kHz. Without resampling, AudioPolicyManager rejects the input stream
("getInputProfile could not find profile for... sampling rate 48000").

Fix: add setSampleRateConversionQuality(Best) to both capture and
playout stream builders. Oboe resamples internally so our ring buffers
stay at 48kHz regardless of the hardware sample rate.

Also removes the broken setBluetoothScoOn/isBluetoothScoOn calls from
stop_bluetooth_sco — just call stopBluetoothSco() unconditionally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:08:48 +04:00
Siavash Sameni
2d4948a7b3 fix(bluetooth): add missing &[] arg to getAvailableCommunicationDevices JNI call
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:02:57 +04:00
Siavash Sameni
19703ff66c fix(bluetooth): use setCommunicationDevice API on Android 12+
Root cause: setBluetoothScoOn(true) is silently rejected on Android 12+
for non-system apps ("is greater than FIRST_APPLICATION_UID exiting").
Audio policy routed to handset instead of BT despite SCO link being up.

Fix: use the modern setCommunicationDevice(AudioDeviceInfo) API on
API 31+ which properly routes voice audio to the BT device. Falls back
to deprecated startBluetoothSco() on older APIs.

Also uses getCommunicationDevice() for is_bluetooth_sco_on() and
clearCommunicationDevice() for stop, matching the modern API surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:01:33 +04:00
Siavash Sameni
7e8dc400dc fix(bluetooth): wait for SCO link before Oboe restart + detect A2DP devices
Three fixes for Bluetooth audio not working:

1. is_bluetooth_available() now checks for TYPE_BLUETOOTH_A2DP (8) in
   addition to TYPE_BLUETOOTH_SCO (7) — many headsets only register as
   A2DP until SCO is explicitly started.

2. set_bluetooth_sco(on=true) polls isBluetoothScoOn() for up to 3s
   before restarting Oboe. startBluetoothSco() is async — the SCO link
   takes 500ms-2s to establish. Without waiting, Oboe opens against
   earpiece and audio goes nowhere.

3. Frontend skips redundant set_speakerphone(false) when transitioning
   to BT — start_bluetooth_sco() handles speaker-off internally,
   avoiding a double Oboe restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:46:56 +04:00
Siavash Sameni
a798634b3d fix(signal): add call_id to Hangup — prevents stale hangup killing new calls
Root cause: Hangup had no call_id field. The relay forwarded hangups to
ALL active calls for a user. When user A hung up call 1 and user B
immediately placed call 2, the relay's processing of A's hangup would
also kill call 2 (race window ~1-2s).

Fix: add optional call_id to Hangup (backwards-compatible via serde
skip_serializing_if). When present, the relay only ends the named call.
Old clients send call_id=None and get the legacy broadcast behavior.

Also: clear pending_path_report in Hangup recv handler and
internal_deregister to prevent stale oneshot channels from blocking
subsequent call setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:39:21 +04:00
Siavash Sameni
d89376016a fix(build): sign release APKs with project keystore (wzp-release.jks)
Release builds from cargo-tauri are unsigned. After Gradle produces the
APK, zipalign + apksigner now sign it with the release keystore
(android/keystore/wzp-release.jks). Falls back to debug keystore if
release is missing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:21:38 +04:00
Siavash Sameni
678695776e fix(build): correct APK output path — target/ is mounted from cache dir
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:10:03 +04:00
Siavash Sameni
4c1ad841e1 feat(android): Bluetooth audio routing + network change detection + per-arch APK builds
Bluetooth: wire existing AudioRouteManager SCO support through both app
variants. Replace binary speaker toggle with 3-way route cycling
(Earpiece → Speaker → Bluetooth). Tauri side adds JNI bridge functions
(start/stop/query SCO, device availability) and Oboe stream restart.

Network awareness: integrate Android ConnectivityManager to detect
WiFi/cellular transitions and feed them to AdaptiveQualityController
via lock-free AtomicU8 signaling. Enables proactive quality downgrade
and FEC boost on network handoffs.

Build: add --arch flag to build-tauri-android.sh supporting arm64,
armv7, or all (separate per-arch APKs for smaller tester binaries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:07:41 +04:00
Siavash Sameni
29cd23fe39 fix(p2p): connection cleanup — 4 fixes for stale/dead connections
PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC
handshakes succeed but connections die immediately on datagram
send ("connection lost"). IPv4 candidates work reliably. IPv6
candidates still gathered but filtered at dial time.

PRD 1: Close losing transport after Phase 6 negotiation. The
non-selected transport now gets an explicit QUIC close frame
instead of silently dropping after 30s idle timeout. Prevents
phantom connections from polluting future accept() calls.

PRD 2: Harden accept loop with max 3 stale retries. Stale
connections are explicitly closed (conn.close) and counted.
After 3 stale connections, the accept loop aborts instead of
spinning until the race timeout.

PRD 3: Resource cleanup — close old IPv6 endpoint before
creating a new one in place_call/answer_call. Add Drop impl
to CallEngine so tasks are signalled to stop on ungraceful
shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:11:50 +04:00
Siavash Sameni
4d66d3769d fix(relay): set peer_relay_fp on originating relay when answer arrives
The originating relay (where the caller is) never set peer_relay_fp
because the call was created locally. When the callee's answer
arrived via federation, the cross-relay dispatcher handled it but
didn't mark the call as cross-relay. This meant the caller's
MediaPathReport was delivered via local hub.send_to() to a peer
fingerprint that isn't connected locally — silently dropped.

Fix: in the cross-relay answer dispatcher, call
reg.set_peer_relay_fp(call_id, Some(origin_relay_fp)) so the
originating relay knows to forward MediaPathReport via federation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:49:34 +04:00
Siavash Sameni
002df15c5e fix(cli): add .. rest pattern for RegisterPresenceAck error arm
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:32:57 +04:00
Siavash Sameni
1eb82d77b8 feat(relay+client): relay reports build version in Ack
Add relay_build field to RegisterPresenceAck so the client logs
which relay version it connected to. Shows in the debug log as
register_signal:ack_received {"relay_build":"f843a93"}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:27:58 +04:00
Siavash Sameni
f843a934fe fix(relay): forward MediaPathReport across federation
MediaPathReport was only delivered via local signal_hub, so calls
between peers on different relays always hit peer_report_timeout
and fell back to relay — even when direct P2P worked perfectly.

Fix: check peer_relay_fp in call_registry (same pattern as
DirectCallAnswer). If the peer is on a remote relay, wrap in
FederatedSignalForward and send via federation link. Also fix
the cross-relay dispatcher to deliver to BOTH caller and callee
(not just caller), since the report can come from either side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:14:30 +04:00
Siavash Sameni
b79073c649 Revert "fix(connect): trust direct path on peer report timeout"
This reverts commit 82b439595c.
2026-04-12 14:10:44 +04:00
Siavash Sameni
82b439595c fix(connect): trust direct path on peer report timeout
When peers are on different relays, MediaPathReport can't be
forwarded — causing a 3s timeout and false relay fallback even
though direct P2P works perfectly.

Fix: on timeout, if local_direct_ok is true AND the direct
transport's connection is still alive (no close_reason), trust
the direct path instead of falling back to relay. The timeout
indicates a relay forwarding issue, not a direct path failure.

Also fix ALT build paste URL (paste.tbs.manko.yoga not amn.gg).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:07:44 +04:00
Siavash Sameni
1904b19d05 fix(direct): validate A-role accepted connection, skip stale ones
The Acceptor's accept() on the shared signal endpoint can dequeue
a stale QUIC connection from a previous call that the Dialer has
already dropped. This results in "connection lost" errors when
media datagrams are sent — 100% drops on both sides.

Fix: after accepting a connection, check close_reason(). If the
connection is already closed, log a warning and re-accept. Also
verify max_datagram_size() is available before returning.

Additionally: emit transport details (remote addr, max_datagram,
close_reason) in the call_engine_starting debug event so stale
connection issues are visible in the user-facing debug log.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:50:21 +04:00
Siavash Sameni
40955bd11c debug(media): add connection diagnostics for direct P2P drops
When direct P2P calls show 100% datagram drops, we need to know
WHY send_media() fails. This commit adds:

- Remote address + stable_id logging on A-role accept and D-role
  dial success (dual_path.rs) — tells us which candidate won
- Remote address + max_datagram_size on engine transport init —
  verifies datagrams are negotiated
- last_send_err in send heartbeat — captures the actual error
  from send_datagram() failures
- QuinnTransport::remote_address() helper

Also fixes UI badge: was looking for wrong event name
("dual_path_race_won" → "path_negotiated").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:29:58 +04:00
Siavash Sameni
7554959baa fix(ui): show correct P2P Direct / Via Relay badge
The UI looked for event "connect:dual_path_race_won" which doesn't
exist — the actual event is "connect:path_negotiated" with a
use_direct boolean. Badge always showed "Via Relay" even when the
call was direct P2P.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:22:00 +04:00
Siavash Sameni
0b62d3e22f fix(cli): add missing build_version fields to Offer/Answer
CLI binary was missing the new caller_build_version and
callee_build_version fields, causing E0063 compile errors on
Linux relay/client builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:09:26 +04:00
Siavash Sameni
4cfcd5117f fix(connect): install MediaPathReport oneshot BEFORE race starts
The peer's MediaPathReport can arrive while our dual_path::race is
still running. Previously, the oneshot was created AFTER the race
completed, so the recv loop had nowhere to deliver the report —
it was silently dropped, causing a 3s timeout and false relay
fallback on ~50% of calls.

Fix: create the oneshot and install it in SignalState BEFORE
starting the race. The oneshot::Receiver buffers the value so the
connect command can read it immediately after the race finishes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:06:13 +04:00
Siavash Sameni
bd6733b2e5 feat(signal): advertise build version in Offer/Answer
Add caller_build_version / callee_build_version (git short hash)
to DirectCallOffer and DirectCallAnswer so peers can identify each
other's build in debug logs. Also log own build at register time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:43:55 +04:00
Siavash Sameni
7d1b8f1fdc fix(android): add missing CallSetup pattern fields (.. rest)
The CallSetup enum gained peer_direct_addr and peer_local_addrs
in Phase 5.5 but the wzp-android signal recv match arm was never
updated, breaking cargo ndk builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:09:44 +04:00
Siavash Sameni
c2d298beb5 feat(net): Phase 7 — dual-socket IPv4+IPv6 ICE
Adds a dedicated IPv6 QUIC endpoint (IPV6_V6ONLY=1 via socket2)
alongside the existing IPv4 signal endpoint for proper dual-stack
P2P connectivity. Previous [::]:0 dual-stack attempt broke IPv4
on Android; this uses separate sockets per address family like
WebRTC/libwebrtc.

- create_ipv6_endpoint(): socket2-based IPv6-only UDP socket,
  tries same port as IPv4 signal EP, falls back to ephemeral
- local_host_candidates(v4_port, v6_port): now gathers IPv6
  global-unicast (2000::/3) and unique-local (fc00::/7) addrs
- dual_path::race(): A-role accepts on both v4+v6 via select!,
  D-role routes each candidate to matching-AF endpoint
- Graceful fallback: if IPv6 unavailable, .ok() → None → pure
  IPv4 behavior identical to pre-Phase-7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 11:54:13 +04:00
Siavash Sameni
aee41a638d fix(audio+net): revert dual-stack [::]:0, add Oboe playout stall auto-restart
Two fixes:

## Revert [::]:0 dual-stack sockets → back to 0.0.0.0:0

Android's IPV6_V6ONLY=1 default on some kernels (confirmed on
Nothing Phone) makes [::]:0 IPv6-only, silently killing ALL
IPv4 traffic. This broke P2P direct calls: IPv4 LAN candidates
(172.16.81.x) couldn't complete QUIC handshakes through the
IPv6-only socket, causing local_direct_ok=false and relay
fallback on every call after the first.

Reverted all bind sites to 0.0.0.0:0 (reliable IPv4). IPv6 host
candidates are disabled in local_host_candidates() until a
proper dual-socket approach (one IPv4 + one IPv6 endpoint,
Phase 7) is implemented.

## Fix A (task #35): Oboe playout callback stall auto-restart

The Nothing Phone's Oboe playout callback fires once (cb#0) and
then stops draining the ring on ~50% of cold-launch calls. Fix
D+C (stop+prime from previous commit) didn't help because
audio_stop is a no-op on cold launch.

New approach: self-healing watchdog in audio_write_playout.
Tracks the playout ring's read_idx across writes. If read_idx
hasn't advanced in 50 consecutive writes (~1 second), the Oboe
playout callback has stopped:

1. Log "playout STALL detected"
2. Call wzp_oboe_stop() to tear down the stuck streams
3. Clear both ring buffers (prevent stale data reads)
4. Call wzp_oboe_start() to rebuild fresh streams
5. Log success/failure
6. Return 0 (caller retries on next frame)

This is the same teardown+rebuild that "rejoin" does — but
triggered automatically from the first stalled call instead of
requiring the user to hang up and redial. The watchdog runs
on every write so it fires within 1s of the stall starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 11:24:16 +04:00
Siavash Sameni
9fb92967eb fix(net): bind all endpoints to [::]:0 for dual-stack IPv4+IPv6
Every QUIC endpoint was bound to 0.0.0.0:0 (IPv4-only). This
silently killed ALL IPv6 host candidates: the Dialer couldn't
send packets to [2a0d:...] addresses (wrong address family on
the socket), and the Acceptor couldn't receive incoming IPv6
QUIC handshakes. The IPv6 candidates were gathered and advertised
in DirectCallOffer/Answer but were completely non-functional.

On same-LAN with dual-stack (which both test phones have), this
meant:
- JoinSet fanned out 3+ candidates (2× IPv6 + 1× IPv4)
- IPv6 dials failed silently or timed out
- IPv4 dial worked but competed with failed IPv6 for JoinSet
  attention
- Sometimes the JoinSet returned an IPv6 failure before the
  IPv4 success, causing unnecessary fallback to relay

Fix: bind to [::]:0 (IPv6 any) instead of 0.0.0.0:0. On
dual-stack systems (Linux/Android default), [::]:0 creates a
socket that handles BOTH:
- IPv6 natively (global unicast, ULA)
- IPv4 via v4-mapped addresses (::ffff:172.16.81.x)

One socket, both protocols. All 7 bind sites updated:
- register_signal (signal endpoint)
- do_register_signal
- ping_relay
- probe_reflect_addr (fresh endpoint fallback)
- dual_path::race (A-role fresh, D-role fresh, relay fresh)

With this fix, same-LAN P2P should prefer the IPv6 path (no
NAT, direct routing, lower latency) and fall through to IPv4
if IPv6 fails — relay is the last resort after ALL candidates
are exhausted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 11:09:06 +04:00
Siavash Sameni
9f2ff6a6ec fix(android-audio): Fix D+C — stop+prime cycle on every call start
Addresses the first-join no-audio regression (tasks #35-37) where
the Oboe playout callback fires once (cb#0) and then stops
draining the ring on the Nothing Phone, causing written_samples
to freeze at 7679 (ring capacity minus one burst). Second call
(rejoin) always works because audio_stop tears down the streams
and audio_start rebuilds them fresh.

Two combined fixes:

**Fix D (task #37)**: always call audio_stop() before audio_start()
at the top of CallEngine::start. On a cold launch this is a no-op
(streams not yet started). On subsequent calls it guarantees a
clean teardown before rebuild — the same thing rejoin does. Added
a 50ms pause between stop and start to let the Android HAL release
the audio session.

**Fix C (task #36)**: after audio_start(), immediately write 960
samples (20ms) of silence into the playout ring. This ensures the
Oboe playout callback has data to drain on its first invocation.
On devices where an empty-ring first callback causes the stream
to self-pause (Nothing Phone's Qualcomm HAL), the priming data
keeps the callback loop alive until real decoded audio arrives
from the recv task.

Together these cover the two most likely root causes:
1. Stale Oboe state from a previous audio_start that didn't
   clean up properly → Fix D forces a clean rebuild
2. Playout callback self-pausing on an empty ring → Fix C
   ensures the ring is non-empty at callback time

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:50:58 +04:00
Siavash Sameni
134ee3a77f fix(engine): pass is_direct_p2p explicitly instead of deriving from is_some
Critical Phase 6 bug: when the negotiation agreed on relay path
but delivered the relay transport via pre_connected_transport,
CallEngine saw is_some() = true → is_direct_p2p = true → skipped
perform_handshake. The relay couldn't authenticate the participant
→ room join silently failed → recv_fr: 0, both sides sending
into the void.

Fix: add explicit is_direct_p2p: bool parameter to CallEngine::
start (both android and desktop branches). The connect command
sets it from the Phase 6 negotiation result (use_direct), not
from whether pre_connected_transport is Some.

Now relay-negotiated calls correctly run perform_handshake,
and direct P2P calls correctly skip it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:34:21 +04:00
Siavash Sameni
e61397ca85 fix(connect): remove pre-Phase-6 same-IP heuristic
The commit de007ec added a heuristic that forced relay-only when
peers had different public IPs. That was a stopgap for the race
condition where one side picked Direct and the other picked Relay.
Phase 6 (f5542ef) solved this properly via MediaPathReport
negotiation, but the heuristic wasn't cleaned up and was still
running BEFORE the Phase 6 code — suppressing the race entirely
for cross-network calls.

Removed. Phase 6 negotiation now handles ALL cases: both sides
race, exchange reports, and agree on the same path before
committing media. Cross-network calls that can't go P2P will
have both sides report direct_ok=false and agree on relay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:23:36 +04:00
Siavash Sameni
f5542ef822 feat(p2p): Phase 6 — ICE-style path negotiation
Before Phase 6, each side's dual-path race ran independently and
committed to whichever transport completed first. When one side
picked Direct and the other picked Relay, they sent media to
different places — TX > 0 RX: 0 on both, completely silent call.

Phase 6 adds a negotiation step: after the local race completes,
each side sends a MediaPathReport { call_id, direct_ok, winner }
to the peer through the relay. Both wait for the other's report
before committing a transport to the CallEngine. The decision
rule is simple: if BOTH report direct_ok = true, use direct; if
EITHER reports false, BOTH use relay.

## Wire protocol

New `SignalMessage::MediaPathReport { call_id, direct_ok,
race_winner }`. The relay forwards it to the call peer via the
same signal_hub routing used for DirectCallOffer/Answer. The
cross-relay dispatcher also forwards it.

## dual_path::race restructured

Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`:
- `direct_transport: Option<Arc<QuinnTransport>>`
- `relay_transport: Option<Arc<QuinnTransport>>`
- `local_winner: WinningPath`

Both paths are run as spawned tasks. After the first completes,
a 1s grace period lets the loser also finish. The connect
command gets BOTH transports (when available) and picks the
right one based on the negotiation outcome. The unused transport
is dropped.

## connect command flow (revised)

1. Run race() → RaceResult with both transports
2. Send MediaPathReport to relay with our direct_ok
3. Install oneshot; wait for peer's report (3s timeout)
4. Decision: both direct_ok → use direct; else → use relay
5. Start CallEngine with the agreed transport

If the peer never responds (old build, timeout), falls back to
relay — backward compatible.

## Relay forwarding

MediaPathReport is forwarded like DirectCallOffer/Answer: via
signal_hub.send_to(peer_fp) for same-relay calls, and via
cross-relay dispatcher for federated calls.

## Debug log events

- `connect:dual_path_race_done` — local race result
- `connect:path_report_sent` — our report to the peer
- `connect:peer_report_received` — peer's report
- `connect:peer_report_timeout` — peer didn't respond (3s)
- `connect:path_negotiated` — final agreed path with reasons

Full workspace test: 423 passing (no regressions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:03:42 +04:00
Siavash Sameni
de007ec2fd fix(p2p): skip direct P2P when peers are on different public IPs
Race condition: when two phones are on different networks (WiFi
vs LTE, home vs office, etc.), each side's dual-path race runs
independently. One side may pick Direct while the other picks
Relay, causing both to send media to different places — TX > 0,
RX: 0 on both sides, completely silent call.

Root cause: the dual-path race doesn't have a negotiation step.
Each side picks the first transport that completes a QUIC
handshake, which may be a different path than the other side
picked. On same-LAN this doesn't matter because direct always
wins on both (the 500ms relay delay guarantees it). On cross-
network, the asymmetry bites.

Heuristic fix: compare own_reflex_addr IP to peer_reflex_addr
IP. If they're different → different networks → force relay-only
(set role = None, which skips the dual-path race entirely).

Same public IP means same LAN / same NAT:
  → LAN host candidates work, direct always wins on both sides
  → Safe for P2P

Different public IPs means cross-network:
  → Direct may work on one side but not the other
  → Relay is the safe choice for both

This preserves the proven same-LAN P2P and eliminates the broken
cross-network case. The full fix is ICE-style path negotiation
(Phase 6) where both sides exchange connectivity check results
through the signal plane and agree on a winner before committing
media — but that's a 500+ line protocol change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:50:56 +04:00
Siavash Sameni
0a973b234b fix(engine): import tauri::Emitter for AppHandle::emit on Android target 2026-04-12 09:29:56 +04:00
Siavash Sameni
026940d492 fix(federation): diagnostic logging for cross-relay media routing
Added warn-level log in handle_datagram when a federation
datagram arrives but no matching local room is found. Prints:
- room_hash (8-byte tag from the datagram)
- active_rooms (all rooms the relay currently has)
- seq + peer label

This diagnoses the cross-relay recv_fr=0 issue: if media IS
arriving from the peer relay but the room hash doesn't match any
active room, the log tells us exactly what hash is expected vs
what rooms exist locally. If no datagram log fires at all, the
issue is upstream (peer relay not forwarding, federation link
down, etc.).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:27:34 +04:00
Siavash Sameni
0ccf4ed6b5 feat(call): media health watchdog — warn user when no audio arrives
When a P2P direct call establishes successfully but the underlying
network path dies (phone switched from WiFi to LTE mid-call, or
cross-relay media forwarding isn't working), the call stays up
silently with recv_fr frozen at 0. No feedback to the user.

New watchdog in the Android recv task: tracks consecutive
heartbeat ticks (2s each) where recv_fr hasn't advanced. After 3
ticks (6s) with no new packets, emits:

- call-event { kind: "media-degraded" } — user-facing warning
  banner: "No audio — connection may be lost. Try hanging up and
  reconnecting, or switch to a different relay."
- call-debug media:no_recv_timeout for the debug log

If packets resume (recv_fr advances), clears the banner via:
- call-event { kind: "media-recovered" }

JS listener creates/removes a red-tinted banner dynamically at
the top of the call screen. Banner is also cleaned up on
showConnectScreen (call end).

This covers:
- Direct P2P that established on WiFi but died when the phone
  switched to LTE (stale NAT mapping, unreachable peer)
- Cross-relay calls where federation media isn't forwarding
  (relay not upgraded, not federated, etc.)
- Any other "connected but silent" scenario

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:18:38 +04:00
Siavash Sameni
847699bf66 fix(ui): pre-flight ping + cancel button for register
Two UX issues when the selected relay is unreachable (e.g. user
switched from WiFi to LTE and the LAN relay is gone):

1. Pressing Register blocked the UI for ~30s while the QUIC
   connect timed out against a dead host. No way to abort.
2. No feedback that the relay was unreachable — just a long
   wait followed by a cryptic error.

Fix:

**Pre-flight ping**: before attempting the full register flow,
run `ping_relay` (existing Tauri command, 3s QUIC handshake
timeout). If it fails, immediately show "Server unavailable:
<error>" and re-enable the Register button. No blocking, no
wasted time. If it succeeds, proceed to register_signal.

**Cancel button**: during the register_signal await, the
Register button becomes "Cancel". Tapping it calls `deregister`
which closes the in-flight transport and makes the connect
fail immediately, breaking the await. The button goes back to
"Register on Relay" with a "Registration cancelled" message.

Flow:
  [Register] → "Checking..." (disabled, 3s ping) →
    ping fails → "Server unavailable" (re-enabled)
    ping ok → "Cancel" (enabled, register in flight) →
      user taps Cancel → "Registration cancelled" (re-enabled)
      register succeeds → registered panel shown
      register fails → error shown (re-enabled)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:13:35 +04:00
Siavash Sameni
6cd61fc63b feat(federation): Phase 4.1 — call-* rooms are implicitly global
All rooms with names starting with 'call-' are now treated as
global rooms by the federation pipeline. This enables relay-
mediated media fallback for cross-relay direct calls: when Alice
on Relay A and Bob on Relay B both join the same call-<id> room,
the federation media forwarding pipeline (GlobalRoomActive
announcements + datagram forwarding + presence replication)
kicks in automatically without any runtime registration step.

Previously, cross-relay direct calls that couldn't go P2P
(symmetric NAT on either side) failed with "no media path"
because the call-<id> room wasn't in the configured global_rooms
set and media datagrams weren't forwarded across the federation
link.

The relay's existing ACL for call-* rooms (only the two
authorized fingerprints from the call registry can join)
prevents random clients from creating or eavesdropping on
call rooms.

## Changes

### `is_global_room` (federation.rs)
Added `room.starts_with("call-")` check before the static
global_rooms set lookup. Returns true immediately for any
call-prefixed room.

### `resolve_global_room` (federation.rs)
Return type changed from `Option<&str>` to `Option<String>`
(owned) because call-* room names aren't stored on `self` —
they come from the caller and resolve to themselves as the
canonical name. The 13 callers continue to work via String/&str
auto-deref; 4 HashMap lookups needed explicit `.as_str()` or
`&` borrows.

Full workspace test: 423 passing (no regressions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:55:01 +04:00
Siavash Sameni
50e6a50de4 feat(ui): phone-style layout for direct calls
The call screen now shows two different layouts depending on
whether the call is a 1:1 direct call or a room/group call:

**Direct call (directCallPeer set):**
- Large centered identicon (96px circular with glow)
- Peer name (22px bold) + fingerprint (11px mono)
- Connection badge: "P2P Direct" (green), "Via Relay" (blue),
  or "Connecting..." (yellow) — auto-detected from the
  call-debug buffer's dual_path_race_won event
- Room name header shows the peer's alias/fp instead of "general"
- Group participant list is hidden

**Room/group call (directCallPeer null):**
- Existing group participant list layout — unchanged

The badge updates live from pollStatus by scanning the debug
buffer for the connect:dual_path_race_won event. If the path
was "Direct" → green P2P badge; if "Relay" → blue relay badge.
Before the race resolves, shows yellow "Connecting...".

directCallView is cleared on showConnectScreen (call end).

CSS in style.css: .direct-call-view, .dc-identicon, .dc-name,
.dc-fp, .dc-badge with .relay and .connecting modifiers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:47:13 +04:00
Siavash Sameni
0cb8d34b21 fix(ui): show peer identity on direct P2P calls instead of "Waiting for participants"
On relay-mediated calls, the relay broadcasts RoomUpdate with the
participant list and pollStatus renders it. On direct P2P calls
neither peer joins the relay's media room, so RoomUpdate never
fires and the UI showed "Waiting for participants..." even though
audio was flowing bidirectionally.

Fix: track the peer's identity (fingerprint + alias) from the
signal plane in a `directCallPeer` variable:

- Set on incoming call from the DirectCallOffer (caller_fp +
  caller_alias)
- Set on outgoing call from the Call button click (target_fp)
- Cleared on showConnectScreen (call ended)

pollStatus now checks: if the engine's participant list is empty
AND directCallPeer is set, inject a synthetic participant entry
with relay_label = "P2P Direct". The participant row renders with
identicon + fingerprint + alias as normal, but grouped under a
"P2P Direct" header instead of "This Relay".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:26:17 +04:00
Siavash Sameni
2427630472 fix(connect): make peerLocalAddrs optional + skip handshake on direct P2P
Two regressions from Phase 5.5/5.6:

1. Room connect broken: the connect Tauri command required
   peerLocalAddrs as a Vec<String>, but the room-join JS path
   doesn't pass it (only the direct-call setup handler does).
   Error: "invalid args 'peerLocalAddrs' for command 'connect':
   command connect missing required key peerLocalAddrs".

   Fix: change to Option<Vec<String>>, unwrap_or_default() at
   usage sites. Room connect works again with zero peer addrs.

2. Direct P2P call connects but then CallEngine fails with
   "expected CallAnswer, got Discriminant(0)". Root cause: after
   the dual-path race picked a direct P2P transport, CallEngine
   still ran perform_handshake() on it. That handshake is a
   relay-specific protocol — sends a CallOffer signal and waits
   for CallAnswer back. On a direct QUIC connection to a phone,
   there's nobody running accept_handshake, so the handshake
   reads garbage from the peer's first media packet and errors.

   Fix: track is_direct_p2p = pre_connected_transport.is_some()
   and skip perform_handshake when true. The direct connection
   is already TLS-encrypted by QUIC, and both peers' identities
   were verified through the signal channel (DirectCallOffer/
   Answer carry identity_pub + ephemeral_pub + signature). Both
   android and desktop branches updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:09:32 +04:00
Siavash Sameni
16793be36f fix(p2p): Phase 5.6 — direct-path head start + hangup propagation + media debug events
Three fixes from a field-test log where same-LAN calls were
still losing the dual-path race to the relay path, peers were
getting stuck on an empty call screen when the other side
hung up, and 1-way audio was hard to diagnose because the
GUI debug log had no media-level events.

## 1. Direct-path 500ms head start (dual_path.rs)

The race was resolving in ~105ms with Relay winning even when
both phones were on the same MikroTik LAN with valid IPv6 host
candidates. Root cause: the relay dial is a plain outbound QUIC
connect that completes in whatever the client→relay RTT is
(~100ms), while the direct path needs the PEER to also process
its CallSetup, spin up its own race, and complete at least one
LAN dial back to us. That cross-client sequence reliably takes
longer than 100ms, so relay always won.

Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before
starting its connect. Same-LAN direct dials complete in 30-50ms
typically, so the head start gives direct plenty of time to win
cleanly. Users on setups where direct genuinely can't work
(LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback,
which is invisible for a call setup.

## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts)

The hangup button was calling `disconnect` which stopped the
local media engine but never sent a SignalMessage::Hangup to
the relay. The peer never got notified and was stuck on the
call screen with silent audio. My earlier fix (commit e75b045)
only handled the RECEIVE side — auto-dismiss call screen on
recv:Hangup — but the SEND side was still missing.

New Tauri command `hangup_call`:
  1. Acquire state.signal.lock(), send SignalMessage::Hangup
     over the signal transport (best-effort; log + continue if
     signal is down)
  2. Acquire state.engine.lock(), stop the CallEngine

JS hangupBtn click handler now calls hangup_call with a fallback
to raw disconnect if the command is missing (older builds).

## 3. Media debug events (engine.rs + lib.rs)

Threaded tauri::AppHandle into CallEngine::start so the send/
recv tasks can emit call-debug events when the user has debug
logs enabled. Added on the Android branch (desktop branch
accepts the arg for API symmetry but doesn't emit yet):

  - media:first_send — emitted when the first encoded frame is
    handed to the transport. Useful for 1-way audio diagnosis:
    if this fires on side A but side B never sees media:first_recv,
    A's outbound is broken.
  - media:first_recv — emitted when the first packet from the
    peer arrives. Mirror of first_send.
  - media:send_heartbeat — every 2s with frames_sent, last_rms,
    last_pkt_bytes, short_reads, drops. A stalled last_rms
    (== 0) tells you the mic isn't producing samples; a frozen
    frames_sent tells you the encode pipeline hung.
  - media:recv_heartbeat — every 2s with recv_fr, decoded_frames,
    last_written, written_samples, decode_errs, codec. Mirror
    invariants for the inbound direction.

All four are gated by `call_debug_logs_enabled()` via
`emit_call_debug`, so they only show up in the GUI log when the
user has the Call Flow Debug Logs checkbox on. Tracing::info!
still runs unconditionally so logcat (adb) keeps its copy
regardless.

The `emit_call_debug` fn in lib.rs is now `pub(crate)` so
engine.rs can call it via `crate::emit_call_debug`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 07:55:41 +04:00
Siavash Sameni
fa038df057 feat(p2p): Phase 5.5 — ICE LAN host candidates (IPv4 + IPv6)
Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.

Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.

Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.

## Changes

### `wzp_client::reflect::local_host_candidates(port)` (new)

Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:

- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
  (already covered by reflex-addr), unspecified

Safe from any thread, one `getifaddrs(3)` syscall.

### Wire protocol (wzp-proto/packet.rs)

Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:

- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`

### Call registry (wzp-relay/call_registry.rs)

`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.

### Relay cross-wiring (wzp-relay/main.rs)

Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.

### Client side (desktop/src-tauri/lib.rs)

- `place_call`: gathers local host candidates via
  `local_host_candidates(signal_endpoint.local_addr().port())`
  and includes them in `DirectCallOffer.caller_local_addrs`.
  The port match is critical — it's the Phase 5 shared signal
  socket, so incoming dials to these addrs land on the same
  endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
  LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
  arg. Builds a `PeerCandidates` bundle and passes it to the
  dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
  new field to JS via the signal-event payload.

### `dual_path::race` (wzp-client/dual_path.rs)

Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.

LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.

### JS side (desktop/main.ts)

`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.

### Tests

All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.

Full workspace test: 423 passing (same as before 5.5).

## Expected behavior on the reporter's setup

Two phones behind MikroTik, both on the same LAN:

  place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
  recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
  recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
                  "peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
  connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
  dual_path: direct dial succeeded on candidate 0   ← LAN v4 wins
  connect:dual_path_race_won {"path":"Direct"}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 07:34:49 +04:00
Siavash Sameni
8990514417 fix(call): default Accept to AcceptTrusted + add log Copy/Share buttons
## Accept button regression — diagnosed from a user log

Field report: incoming call → callee taps Accept → debug log
shows the dual-path race being skipped with
`connect:dual_path_skipped {"has_own":false,"has_peer":true,
"role":"None"}` and the call falling to relay-only on the
callee side.

Root cause: the Accept button was calling `answer_call` with
`mode: 2` which falls through to `AcceptGeneric` (privacy
mode). By design, privacy mode SKIPS the reflex query on the
callee so the callee's IP stays hidden from the caller — but
the side effect is that `own_reflex_addr` never gets cached in
`SignalState`. When `connect` runs a moment later, it sees
`own_reflex_addr = None`, can't compute the deterministic role
for the dual-path race, and falls back to relay.

For a normal VoIP app where P2P is the desired default, the
right behavior is `AcceptTrusted` — which queries reflect,
advertises the callee's addr in the answer, and enables direct
P2P. Privacy mode can come back as a dedicated second button
if anyone actually needs it.

Changed `acceptCallBtn` click handler from `mode: 2` to
`mode: 1`. The next call from a Phase-5 APK should show
`connect:dual_path_race_start` + `connect:dual_path_race_won
{"path":"Direct"}` on a cone-NAT-to-cone-NAT pair.

## Debug log export — new Copy / Share buttons

Field-testing the GUI debug log required me to keep asking the
user to type out what they saw. Added two new buttons next to
Clear:

- **Copy log** — serialises the rolling buffer as plain text
  (same HH:MM:SS.mmm format the on-screen panel uses) and
  writes to `navigator.clipboard`. Falls back to the old
  selection-based `execCommand("copy")` for WebViews that
  refuse the new API without a permission prompt.

- **Share** — tries the Web Share API (`navigator.share(...)`)
  first. On Android WebView this opens the system share sheet
  so the user can send the text straight to a messaging app.
  Falls back to clipboard copy on WebViews that don't expose
  navigator.share (most desktop ones). Also falls back if the
  user cancels the share sheet.

Flash status line below the buttons shows a 2.5s confirmation
("✓ Copied 47 entries") or an error hint. The log is plain
text so anyone can paste a log fragment into a message and
send it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 07:04:46 +04:00
Siavash Sameni
1618ff6c9d feat(p2p): Phase 5 — single-socket architecture (Nebula-style)
Before Phase 5 WarzonePhone used THREE separate UDP sockets per
client:

  1. Signal endpoint         (register_signal, client-only)
  2. Reflect probe endpoints (one fresh socket per relay probe)
  3. Dual-path race endpoint (fresh per call setup)

This broke two things in production on port-preserving NATs
(MikroTik masquerade, most consumer routers):

  a. Phase 2 NAT detection was WRONG. Each probe used a fresh
     internal port, so MikroTik mapped each one to a different
     external port, and the classifier saw "different port per
     relay" and labeled it SymmetricPort. The real NAT was
     cone-like but measurement via fresh sockets hid that.

  b. Phase 3.5 dual-path P2P race was BROKEN. The reflex addr
     we advertised in DirectCallOffer was observed by the signal
     endpoint's socket. The actual dual-path race listened on a
     DIFFERENT fresh socket, on a different internal (and
     therefore external) port. Peers dialed the advertised addr
     and hit MikroTik's mapping for the signal socket, which
     forwarded to the signal endpoint — a client-only endpoint
     that doesn't accept incoming connections. Direct path
     silently failed, relay always won the race.

Nebula-style fix: one socket for everything. The signal endpoint
is now dual-purpose (client + server_config), and both the
reflect probes and the dual-path race reuse it instead of
creating fresh ones. MikroTik's port-preservation then gives us
a stable external port across all flows → classifier correctly
sees Cone NAT → advertised reflex addr is the actual listening
port → direct dials from peers land on the right socket →
`endpoint.accept()` in the A-role branch of the dual-path race
picks up the incoming connection.

## Changes

### `register_signal` (desktop/src-tauri/src/lib.rs)
- Endpoint now created with `Some(server_config())` instead of
  `None`. The socket can now accept incoming QUIC connections as
  well as dial outbound.
- Every code path that previously read `sig.endpoint` for the
  relay-dial reuse benefits automatically — same socket is now
  ALSO listening for peer dials.

### `probe_reflect_addr` (wzp-client/src/reflect.rs)
- New `existing_endpoint: Option<Endpoint>` arg. `Some` reuses
  the caller's socket (production: pass the signal endpoint).
  `None` creates a fresh one (tests + pre-registration).
- Removed the `drop(endpoint)` at the end — was correct for
  fresh endpoints (explicit early socket close) but incorrect
  for shared ones. End-of-scope drop does the right thing in
  both cases via Arc semantics.

### `detect_nat_type` (wzp-client/src/reflect.rs)
- New `shared_endpoint: Option<Endpoint>` arg, forwarded to
  every probe in the JoinSet fan-out. One shared socket means
  the classifier sees the true NAT type.

### `detect_nat_type` Tauri command (desktop/src-tauri/src/lib.rs)
- Reads `state.signal.endpoint` and passes it as the shared
  endpoint. Falls back to None when not registered. NAT detection
  now produces accurate classifications against MikroTik / most
  consumer NATs.

### `dual_path::race` (wzp-client/src/dual_path.rs)
- New `shared_endpoint: Option<Endpoint>` arg.
- A-role: when `Some`, reuses it for `accept()`. This is the
  critical change — the reflex addr advertised to peers is now
  the address listening for incoming direct dials.
- D-role: when `Some`, reuses it for the outbound direct dial.
  MikroTik keeps the same external port for the dial as for
  the signal flow → direct dial through a cone-mapped NAT.
- Relay path: also reuses the shared endpoint so MikroTik has
  a single consistent mapping across the whole call (saves one
  extra external port and makes firewall traces cleaner).
- When `None`, falls back to fresh per-role endpoints as before.

### `connect` Tauri command (desktop/src-tauri/src/lib.rs)
- Reads `state.signal.endpoint` once when acquiring own reflex
  addr and passes it through to `dual_path::race`.

### Tests
- `wzp-client/tests/dual_path.rs` and
  `wzp-relay/tests/multi_reflect.rs` updated to pass `None` for
  the new endpoint arg — tests use fresh sockets and that's
  fine because the loopback harness doesn't care about
  port-preserving NAT behavior.

Full workspace test: 423 passing (no regressions).

## Expected behavior after this commit on real hardware

Behind MikroTik + Starlink-bypass (the reporter's setup):
- Phase 2 NAT detect → **Cone NAT** (was SymmetricPort — false
  positive from the measurement artifact)
- Phase 3.5 direct-P2P dial → succeeds for both cone-cone and
  cone-CGNAT cases where the remote side was previously blocked
  by our own socket mismatch
- LTE ↔ LTE cross-carrier → still likely relay fallback; that's
  genuinely strict symmetric and needs Phase 5.5 port prediction.

## Phase 5.5 (next, separate PRD)

Multi-candidate port prediction + ICE-style candidate aggregation
for truly strict symmetric NATs. Not needed for the 95% case —
Phase 5 alone fixes most consumer-router setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 19:47:20 +04:00
Siavash Sameni
05ec926317 fix(ui): don't nuke the registered panel's children on status update
Regression from 20375ec: the `signal-event reconnecting` and
`signal-event registered` handlers were assigning to
`directRegistered.textContent`, which is the PARENT element that
holds the entire registered UI — the "Registered — waiting"
header, incoming-call panel, recent-contacts section, call
history, the fingerprint-input bar, and the Call button. Setting
textContent on that parent wiped every child with a single text
node, so after registration the user saw " Registered" with
NOTHING below it — no call input, no history, no call button.
App unusable post-registration.

Fix:
- Add a dedicated `#registered-status` <p> inside the header of
  `#direct-registered` (this element already existed as a plain
  paragraph without an id; just giving it an id).
- Rewrite both handlers to target that element by id instead of
  the parent, so `textContent =` only touches the status line
  and leaves the rest of the panel intact.
- The `registered` handler now also explicitly
  `registerBtn.classList.add("hidden")` and
  `directRegistered.classList.remove("hidden")` so the first
  register event correctly reveals the UI. Belt-and-braces for
  the transparent-reconnect case too — if the supervisor
  re-registers after a drop, the UI stays in the registered
  state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 19:28:16 +04:00
Siavash Sameni
b7a48bf13b feat(ui): incoming-call ring tone + system notification
Previously: incoming calls silently popped an "Accept/Reject"
panel. Easy to miss — no audible cue, no system-level alert if
the app was backgrounded. Now the incoming-call path triggers
both a synthesized ring tone and a system notification banner.

## Ring tone (desktop/src/main.ts)

New `Ringer` class using Web Audio API directly — no external
asset files, no new npm dep. Synthesizes a classic NANP two-tone
cadence (440Hz + 480Hz sine mix, 2s tone + 4s silence, looped)
through an envelope-gated gain node that ramps on/off to avoid
clicks. Audible on every Tauri-supported platform because
WebView carries Web Audio.

- `start()` — lazily creates AudioContext on first use
  (platforms that require a user gesture for AudioContext
  creation still work because the incoming-call event is
  user-adjacent from the webview's perspective), starts
  setInterval(6000) loop.
- `stop()` — clears the timer AND disconnects any active
  oscillators so there's no tail audio.
- Active-nodes array is swept every cycle so it doesn't grow
  unbounded across long rings.

Hooked into signal-event handlers:
- `"incoming"` → `ringer.start()` + notifyIncomingCall
- `"answered"`, `"setup"`, `"hangup"` → `ringer.stop()`
- Accept/Reject button click handlers → `ringer.stop()` as
  the first thing they do (before any await)

## System notification (desktop/src-tauri + main.ts)

Added `tauri-plugin-notification = "2"` to the Tauri app and
registered in the builder. Capabilities updated with the four
notification permissions.

Frontend calls the plugin commands via the generic `invoke`
instead of adding `@tauri-apps/plugin-notification` as a JS
dep — Tauri plugins expose `plugin:notification|notify` etc.
directly. Flow:

1. `is_permission_granted` — check cached
2. If not granted → `request_permission` (Android prompts the
   user once, cached thereafter)
3. `notify` with title="Incoming call", body="From <alias>"

All wrapped in try/catch with console.debug fallback — plugin
missing or permission denied is non-fatal, the visible panel +
ring tone still alert the user.

## Known gaps (deferred)

- Android native system ringtone (RingtoneManager) + full-
  screen intent for lockscreen-visible ringer. Requires
  platform-specific Java/Kotlin glue in the Tauri Android
  shell — bigger lift.
- Desktop window flash / taskbar attention-seek on incoming
  call when app is backgrounded.
- Vibration pattern on Android.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:46:13 +04:00
Siavash Sameni
e75b045470 fix(ui): auto-dismiss call screen when peer hangs up
Previously: peer hangs up → Rust emits signal-event {type:hangup}
→ JS clears callStatusText + hides incoming panel, but the call
screen stays on with a dangling Hangup button the user has to
press to acknowledge a call that's already over. Dead UX.

Now: the hangup event handler tears down our side of the media
engine via `invoke("disconnect")` and transitions back to the
connect screen when we're currently in the call screen.
Incoming-call panel still hides as before.

`userDisconnected = true` is set so the existing call-event
"disconnected" auto-reconnect path (which fires on transport
drop) doesn't kick in — the peer-hangup signal is an intentional
end-of-call, not a transport blip worth retrying.

Also documented: "not connected" errors from the `disconnect`
command are silently swallowed because they happen when there's
no engine to tear down (e.g. incoming call that was never
answered — caller bailed), which is the correct outcome there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:41:26 +04:00
Siavash Sameni
20375eceb9 feat(signal): transparent reconnect + auto-swap on relay change
Two related UX fixes, same state-machine surface:

1. Relay drops / goes offline / restarts: the client now auto-
   reconnects in the background instead of silently falling to
   "not registered" and requiring the user to tap Deregister +
   Register.
2. User switches relay in settings: client auto-swaps — close
   old transport, register against new, all transparent.

## Signal state additions (desktop/src-tauri/src/lib.rs)

- `SignalState.desired_relay_addr: Option<String>` — what the
  user CURRENTLY wants. `Some(x)` means "keep me connected to x",
  `None` means "user explicitly asked for idle". This is the
  pivot that distinguishes "connection dropped, retry" from
  "user deregistered, stop".
- `SignalState.reconnect_in_progress: bool` — single-flight
  guard so concurrent triggers (recv-loop exit + manual
  register_signal + another recv-loop exit after a brief
  success) don't spawn duplicate supervisors.

## Refactor

The old `register_signal` Tauri command was doing the whole
connect + Register + spawn-recv-loop flow inline. Split into:

- `internal_deregister(signal_state, keep_desired)` — shared
  teardown helper that nulls out transport/endpoint/call state
  and optionally clears `desired_relay_addr`.
- `do_register_signal(signal_state, app, relay)` — core
  connect + register + spawn-recv-loop flow, callable from both
  the Tauri command and the reconnect supervisor. Returns an
  explicit `impl Future<...> + Send` to avoid auto-trait
  inference bailing inside the tokio::spawn chain (rustc loses
  the Send trail through the recv-loop spawn inside the fn
  body).
- `register_signal` Tauri command — now thin: if already
  registered to the same relay, no-op; otherwise
  internal_deregister(keep_desired=false), set
  desired_relay_addr = Some(new), call do_register_signal. The
  Rust side handles the "change of server" transition entirely
  on its own, no deregister+register dance from JS needed.
- `deregister` Tauri command — internal_deregister(keep_desired
  = false) so the recv-loop exit path sees the cleared desired
  addr and does NOT spawn a supervisor.

## Reconnect supervisor

New `signal_reconnect_supervisor(signal_state, app, relay)`
task. Spawned from the recv-loop exit path when the loop exits
unexpectedly AND `desired_relay_addr.is_some()` AND no
supervisor is already running.

- Exponential backoff: 1s, 2s, 4s, 8s, 15s, 30s (capped at 30s,
  never gives up). First attempt is immediate (attempt 0 skips
  the wait).
- On each iteration checks whether `desired_relay_addr` was
  cleared (user deregistered mid-flight) or another path
  already re-registered; either short-circuits the supervisor.
- Also detects if the user changed relays while the supervisor
  was sleeping — resets the backoff counter and retries against
  the new addr.
- On success, exits so the newly-spawned recv loop owns the
  connection from that point. If THAT drops again, a fresh
  supervisor spawns.
- Emits `call-debug-log` and `signal-event` events at every
  state transition so the GUI can display "reconnecting...",
  "registered" banners.

## UI wiring (desktop/src/main.ts)

- signal-event handler gets two new cases:
  - `"reconnecting"` — amber "🔄 reconnecting to <relay>…" in
    the registered banner area
  - `"registered"` — green "✓ registered (<fp prefix>…)" to
    clear the reconnecting badge
- Relay-selection click handler checks if a signal is
  currently registered and, if the user picked a different
  relay, fires `register_signal` with the new address. Rust
  side handles the swap transparently.

Full workspace test: 423 passing (no regressions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:40:11 +04:00
Siavash Sameni
00deb97a5d fix(reflect): drop LAN/private reflex addrs from NAT classification
Real-world report: a user with one LAN relay + one internet relay
got "Multiple IPs — treating as symmetric" because the LAN relay
saw the client's LAN IP (172.16.81.172) while the internet relay
saw the WAN IP (150.228.49.65). Two observations of "different
public IPs" from the classifier's perspective, but semantically
they describe two different network paths and shouldn't be
compared.

The LAN relay's reflection is always true, just not useful for
public NAT classification: there's no NAT between the client and
the LAN relay, so that path's reflex addr is always the LAN
interface IP regardless of what the public-facing NAT beyond it
looks like.

Fix: new `is_private_or_loopback` helper filters the probe set
before classification. Drops:
 - 127.0.0.0/8 loopback
 - 10/8, 172.16/12, 192.168/16 RFC1918 private
 - 169.254/16 link-local
 - 100.64/10 CGNAT shared-transition (same reasoning: a relay
   that sees the client with a CGNAT addr is on the same carrier
   network and can't describe public NAT state)
 - IPv6 loopback, unspecified, fe80::/10 link-local

Failed probes still filtered out of classification (they were
already) but now dimmed in the UI list instead of highlighted
amber. Same rationale: a momentarily-offline probe target isn't
a warning-worthy state, it's just a fact about the probe run.

UI palette rebalance: only Cone gets green, everything else
neutral text-dim. Wording changed from warning-tone
"⚠ must use relay" to informational "ℹ P2P falls back to relay,
calls still work" — symmetric NAT isn't broken state, it just
means media takes the relay path.

Tests added (4 new in wzp_client::reflect):
- classify_drops_private_ip_probes — LAN + public → Unknown
- classify_drops_loopback_probes — loopback + 2 public → Cone
- classify_drops_cgnat_probes — CGNAT + 2 public same-IP-
  diff-port → SymmetricPort
- classify_two_lan_probes_is_unknown_not_cone — all LAN → Unknown

Existing multi_reflect integration test updated: two loopback
relays now correctly classify as Unknown (because loopback reflex
addrs are filtered) with the plumbing-works invariant preserved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:29:09 +04:00
Siavash Sameni
da08723fe7 fix(signal): forward-compat — log+continue on unknown SignalMessage variants
Both sides of the signal channel previously broke their recv loop
on any deserialize error, which meant adding a new variant in one
build silently killed signal connections from peers running an
older build. This bit us during Phase 1 testing: a new client
sending SignalMessage::Reflect to a pre-Phase-1 relay caused the
relay to drop the whole signal connection, which looked like
"Error: not registered" on the next place_call.

Fix:
- New TransportError::Deserialize(String) variant in wzp-proto
  carries serde errors as a distinct category.
- wzp-transport/reliable.rs::recv_signal returns Deserialize on
  serde_json::from_slice failures (was wrapped in Internal).
- wzp-relay/main.rs signal loop matches on Deserialize → warn +
  continue (instead of break).
- desktop/src-tauri/lib.rs recv loop does the same.

Other TransportError variants (ConnectionLost, Io, Internal) still
break the loop — only pure parse failures are recoverable.

This means future SignalMessage variant additions are backward-
compat by construction: older peers will see "unknown variant,
continuing" in their logs while newer peers can keep evolving the
protocol.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:13:31 +04:00
Siavash Sameni
8cdf8d486a feat(p2p): Phase 4 cross-relay direct calling over federation
Teaches the relay pair to route direct-call signaling across an
existing federation link. Alice on Relay A can now place a direct
call to Bob on Relay B if A and B are federation peers — the
wire protocol, call registry, and signal dispatch all learn to
track and route the cross-relay flow.

Phase 3.5's dual-path QUIC race then carries the media directly
peer-to-peer using the advertised reflex addrs, with zero
changes needed on the client side.

## Wire protocol (wzp-proto)

New `SignalMessage::FederatedSignalForward { inner, origin_relay_fp }`
envelope variant, appended at end of enum — JSON serde is
name-tagged so pre-Phase-4 relays just log "unknown variant" and
drop it. 2 new roundtrip tests (any-inner nesting + single
DirectCallOffer case).

## Call registry (wzp-relay)

`DirectCall.peer_relay_fp: Option<String>` — federation TLS fp
of the peer relay that forwarded the offer/answer for this call.
`None` on local calls, `Some` on cross-relay. Used by the answer
path to route the reply back through the same federation link
instead of trying (and failing) to deliver via local signal_hub.
New `set_peer_relay_fp` setter + 1 new unit test.

## FederationManager (wzp-relay)

Three new methods:
- `local_tls_fp()` — exposes the relay's own federation TLS fp
  so main.rs can build `origin_relay_fp` fields.
- `broadcast_signal(msg) -> usize` — fan out any signal message
  (in practice `FederatedSignalForward`) to every active peer
  link, returning the reach count. Used when Relay A doesn't
  know which peer has the target fingerprint.
- `send_signal_to_peer(fp, msg)` — targeted send for the reply
  path where the registry already knows which peer relay to
  hit.

Plus a new `cross_relay_signal_tx: Mutex<Option<Sender<...>>>`
field that `set_cross_relay_tx()` wires at startup so the
federation `handle_signal` can push unwrapped inner messages
into the main signal dispatcher.

## Federation handle_signal (wzp-relay)

New match arm for `FederatedSignalForward`:
- Loop prevention: drops forwards whose `origin_relay_fp` equals
  this relay's own fp (prevents A→B→A echo loops without needing
  TTL yet).
- Otherwise pulls the inner message out and pushes it through
  `cross_relay_signal_tx` so the main loop's dispatcher task
  handles it as if it had arrived locally.

## Main signal loop (wzp-relay)

### DirectCallOffer when target not local
Before falling through to Hangup, try the federation path:
- Wrap the offer in `FederatedSignalForward` with
  `origin_relay_fp = this relay's tls_fp`
- `fm.broadcast_signal(forward)` — returns peer count
- If any peers reached, stash the call in local registry with
  `caller_reflexive_addr` set, `peer_relay_fp` still None
  (broadcast — the answer-side will identify itself when it
  replies)
- Send `CallRinging` to caller immediately for UX feedback
- Only if no federation or no peers → legacy Hangup path

### DirectCallAnswer when peer is remote
- Registry lookup now reads both `peer_fingerprint` and
  `peer_relay_fp` in one acquisition
- If `peer_relay_fp.is_some()`:
  * Reject → forward a `Hangup` over federation via
    `send_signal_to_peer` instead of local signal_hub
  * Accept → wrap the raw answer in `FederatedSignalForward`,
    route to the specific origin peer, then emit the LOCAL
    CallSetup to our callee with `peer_direct_addr =
    caller_reflexive_addr` (caller is remote; this side only
    has the callee)
- If `peer_relay_fp.is_none()` → existing Phase 3 same-relay
  path with both CallSetups (caller + callee)

### Cross-relay signal dispatcher task
New long-running task reading `(inner, origin_relay_fp)` from
`cross_relay_rx`. In Phase 4 MVP handles:
- `DirectCallOffer` — if target is local, create the call in
  the registry with `peer_relay_fp = origin_relay_fp`, stash
  caller addr, deliver offer to local callee. If target isn't
  local, drop (no multi-hop in Phase 4 MVP).
- `DirectCallAnswer` — look up local caller by call_id, stash
  callee addr, forward raw answer to local caller via
  signal_hub, emit local CallSetup with `peer_direct_addr =
  callee_reflexive_addr` (peer is local now; this side only
  has the caller).
- `CallRinging` — best-effort forward to local caller for UX.
- `Hangup` — logged for now; Phase 4.1 will target by call_id.

## Integration tests

`crates/wzp-relay/tests/cross_relay_direct_call.rs` — 3 tests
that reproduce the main.rs cross-relay dispatcher logic inline
and assert the invariants without spinning up real binaries:

1. `cross_relay_offer_forwards_and_stashes_peer_relay_fp` —
   Relay A gets Alice's offer, broadcasts. Relay B's dispatcher
   creates the call with `peer_relay_fp = relay_a_tls_fp`.
2. `cross_relay_answer_crosswires_peer_direct_addrs` — full
   round trip; both CallSetups (one on each relay) carry the
   OTHER party's reflex addr.
3. `cross_relay_loop_prevention_drops_self_sourced_forward` —
   explicit loop-prevention check.

Full workspace test goes from 413 → 419 passing. Clippy clean
on touched files.

## Non-goals (deferred to Phase 4.1+)

- Relay-mediated media fallback across federation — if P2P
  direct fails (symmetric NAT on either side), the call errors
  out with "no media path". Making the existing federation
  media pipeline carry ephemeral call-<id> rooms is the Phase
  4.1 lift.
- Multi-hop federation (A → B → C). Phase 4 MVP supports a
  direct federation link between A and B only.
- Fingerprint → peer-relay routing gossip.

PRD: .taskmaster/docs/prd_phase4_cross_relay_p2p.txt
Tasks: 70-78 all completed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:31:43 +04:00
Siavash Sameni
59ce52f8e8 feat(p2p): Phase 3.5 dual-path QUIC race + GUI call-flow debug logs
Two features in one commit because they ship and test together:
Phase 3.5 closes the hole-punching loop and the call-flow debug
logs give the user live visibility into every step of a call so
real-hardware testing of the new P2P path is debuggable.

## Phase 3.5 — dual-path QUIC connect race

Completes the hole-punching work Phase 3 scaffolded. On receiving
a CallSetup with peer_direct_addr, the client now actually races a
direct QUIC handshake against the relay dial and uses whichever
completes first. Symmetric role assignment avoids the two-conns-
per-call problem:

- Both peers compare `own_reflex_addr` vs `peer_reflex_addr`
  lexicographically.
- Smaller addr → **Acceptor** (A-role): builds a server-capable
  dual endpoint, awaits an incoming QUIC session. Does NOT dial.
- Larger addr → **Dialer** (D-role): builds a client-only
  endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT
  listen.
- Both sides always dial the relay in parallel as fallback.
- `tokio::select!` with `biased` preference for direct, `tokio::pin!`
  so each branch can await the losing opposite as fallback.
- Direct timeout 2s, relay fallback timeout 5s (so 7s worst case
  from CallSetup to "no media path" error).

New crate module `wzp_client::dual_path::{race, WinningPath}`
(moved here from desktop/src-tauri so it's testable from a
workspace test). `determine_role` in `wzp_client::reflect` is
pure-function and unit-tested.

### CallEngine integration
- New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg
  on both android + desktop `CallEngine::start` branches. Skips
  the internal wzp_transport::connect step when Some. Backward-
  compat: None keeps Phase 0 relay-only behavior.
- `connect` Tauri command reads own_reflex_addr from SignalState,
  computes role, runs the race, passes the winning transport
  into CallEngine. If ANY input is missing (no peer addr, no own
  addr, equal addrs), falls back to classic relay path —
  identical to pre-Phase-3.5 behavior.

### Tests (9 new, all passing)
- 6 unit tests for `determine_role` truth table in
  `wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer,
  port-only diff, equal, missing-side, symmetry)
- 3 integration tests in `crates/wzp-client/tests/dual_path.rs`:
    * `dual_path_direct_wins_on_loopback` — two-endpoint test
      rig, Dialer wins direct path vs loopback mock relay
    * `dual_path_relay_wins_when_direct_is_dead` — dead peer
      port, 2s direct timeout, relay fallback wins
    * `dual_path_errors_cleanly_when_both_paths_dead` — <10s
      error, no hang

## GUI call-flow debug logs

Runtime-toggled structured events at every step of a call so the
user can see where a call progressed or stalled on real hardware.
Modeled on the existing DRED_VERBOSE_LOGS pattern.

### Rust side
- `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app,
  step, details)` helper. Always logs via `tracing::info!`
  (logcat always has a copy); GUI Tauri `call-debug-log` event
  only fires when the flag is on.
- Tauri commands `set_call_debug_logs` / `get_call_debug_logs`.

### Instrumented steps (24 emit_call_debug sites)
- `register_signal`: start, identity loaded, endpoint created,
  connect failed/ok, RegisterPresence sent, ack received/failed,
  recv loop spawning
- Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr),
  DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/
  peer_direct_addr), Hangup
- `place_call`: start, reflect query start/ok/none, offer sent,
  send failed
- `answer_call`: start, reflect query start/ok/none or privacy
  skip, answer sent, send failed
- `connect`: start, dual_path_race_start (w/ role), won (w/
  path), failed, skipped (w/ reasons), call_engine_starting/
  started/failed

### JS side
- New `callDebugLogs: boolean` field on Settings type.
- Boot-time hydrate of the Rust flag from localStorage so the
  choice survives restarts (like `dredDebugLogs`).
- Settings panel: new "Call flow debug logs" checkbox alongside
  the DRED toggle.
- New "Call Debug Log" section that ONLY shows when the flag is
  on. Rolling in-memory buffer of the last 200 events, rendered
  as monospace `HH:MM:SS.mmm step {details}` lines with auto-
  scroll and a Clear button.
- `listen("call-debug-log", ...)` subscribed at app startup,
  appends to the buffer, re-renders on every event.

Full workspace test goes from 404 → 413 passing. Clippy clean
on touched crates.

PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt
Tasks: 61-69 all completed

Next: APK + desktop build carrying everything — Phase 2 NAT
detect, Phase 3 advertising, Phase 3.5 dual-path + call debug
logs, plus the earlier Android first-join diagnostics — so the
user can validate the P2P path on real hardware with live
per-step visibility into where any failures happen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:06:44 +04:00
Siavash Sameni
39277bf3a0 feat(hole-punching): advertise peer reflexive addrs in DirectCall flow — Phase 3
Completes the signal-plane plumbing for P2P direct calling: both
peers now learn their own server-reflexive address (Phase 1
Reflect), include it in DirectCallOffer / DirectCallAnswer, and
the relay cross-wires them into each side's CallSetup so the
client knows the OTHER party's direct addr. Dual-path QUIC race
is scaffolded but deferred to Phase 3.5 — this commit ships the
full advertising layer so real-hardware testing can confirm the
addrs flow end-to-end before adding the concurrent-connect logic.

Wire protocol (wzp-proto/src/packet.rs):
- DirectCallOffer gains optional `caller_reflexive_addr`
- DirectCallAnswer gains optional `callee_reflexive_addr`
- CallSetup gains optional `peer_direct_addr`
- All #[serde(default, skip_serializing_if = "Option::is_none")] so
  pre-Phase-3 peers and relays stay backward compatible by
  construction — the new fields are elided from the JSON on the
  wire when None, and older clients parse the JSON ignoring any
  fields they don't know.
- 2 new roundtrip tests (Some + None cases, old-JSON parse-back).

Call registry (wzp-relay/src/call_registry.rs):
- DirectCall gains caller_reflexive_addr + callee_reflexive_addr.
- set_caller_reflexive_addr / set_callee_reflexive_addr setters.
- 2 new unit tests: stores and returns addrs, clearing works.

Relay cross-wiring (wzp-relay/src/main.rs):
- On DirectCallOffer: stash the caller's addr in the registry.
- On DirectCallAnswer: stash the callee's addr (only set by
  AcceptTrusted answers — privacy-mode leaves it None).
- Send two different CallSetup messages: one to the caller with
  peer_direct_addr=callee_addr, and one to the callee with
  peer_direct_addr=caller_addr. The cross-wiring means each side
  gets the OTHER party's direct addr, not its own.
- Logs `p2p_viable=true` when both sides advertised.

Client advertising (desktop/src-tauri/src/lib.rs):
- New `try_reflect_own_addr` helper that reuses the Phase 1
  oneshot pattern WITHOUT holding state.signal.lock() across the
  await (critical: the recv loop reacquires the same mutex to
  fire the oneshot, so holding it would deadlock).
- `place_call` queries reflect first and includes the returned
  addr in DirectCallOffer. Falls back to None on any failure —
  call still proceeds via the relay path.
- `answer_call` queries reflect ONLY on AcceptTrusted so
  AcceptGeneric keeps the callee's IP private by design. Reject
  and AcceptGeneric both pass None.
- recv loop's CallSetup handler destructures and forwards
  peer_direct_addr to the JS layer in the signal-event payload.

Client scaffolding for dual-path (desktop/src-tauri/src/lib.rs +
desktop/src/main.ts):
- `connect` Tauri command gets a new optional `peer_direct_addr`
  argument. Currently LOGS the addr but still uses the relay
  path for the media connection — Phase 3.5 will swap in a
  tokio::select! race between direct dial + relay dial. Scaffolding
  lands here so the JS wire is stable, real-hardware testing can
  confirm advertising works end-to-end, and Phase 3.5 is a pure
  Rust change with no JS touches.
- JS setup handler forwards `data.peer_direct_addr` to invoke.

Back-compat with the CLI client (crates/wzp-client/src/cli.rs):
- CLI test harness updated for the new fields — always passes
  None for both reflex addrs (no hole-punching). Also destructures
  peer_direct_addr: _ in its CallSetup handler.

Tests (8 new, all passing):
- wzp-proto: hole_punching_optional_fields_roundtrip,
  hole_punching_backward_compat_old_json_parses
- wzp-relay call_registry: call_registry_stores_reflexive_addrs,
  call_registry_clearing_reflex_addr_works
- wzp-relay integration: crates/wzp-relay/tests/hole_punching.rs
    * both_peers_advertise_reflex_addrs_cross_wire_in_setup
    * privacy_mode_answer_omits_callee_addr_from_setup
    * pre_phase3_caller_leaves_both_setups_relay_only
    * neither_peer_advertises_both_setups_are_relay_only

Full workspace test goes from 396 → 404 passing.

PRD: .taskmaster/docs/prd_hole_punching.txt
Tasks: 53-60 all completed (58 = scaffolding-only; 3.5 follow-up)

Next up: **Phase 3.5 — dual-path QUIC connect race**. With the
advertising layer live, this becomes a focused change: on
CallSetup-with-peer_direct_addr, start a server-capable dual
endpoint, and tokio::select! across (direct dial, relay dial,
inbound accept). Whichever QUIC handshake completes first wins,
the losers drop, 2s direct timeout falls back to relay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 13:37:04 +04:00
Siavash Sameni
8d903f16c6 feat(reflect): multi-relay NAT type detection — Phase 2
Builds on Phase 1's SignalMessage::Reflect to probe N relays in
parallel through transient QUIC connections and classify the
client's NAT type for the future P2P hole-punching path. No wire
protocol changes — Phase 1's Reflect/ReflectResponse pair is
reused unchanged.

New client-side module (crates/wzp-client/src/reflect.rs):
- probe_reflect_addr(relay, timeout_ms): opens a throwaway
  quinn::Endpoint (fresh ephemeral source port per probe,
  essential for NAT-type detection — sharing one endpoint would
  make a symmetric NAT look like a cone NAT), connects to _signal,
  sends RegisterPresence with zero identity, consumes the Ack,
  sends Reflect, awaits ReflectResponse, cleanly closes.
- detect_nat_type(relays, timeout_ms): parallel probes via
  tokio::task::JoinSet (bounded by slowest probe not sum) and
  returns a NatDetection with per-probe results + aggregate
  classification.
- classify_nat(probes): pure-function classifier split out for
  network-free unit tests. Rules:
    * 0-1 successful probes              → Unknown
    * 2+ successes, same ip same port    → Cone (P2P viable)
    * 2+ successes, same ip diff ports   → SymmetricPort (relay)
    * 2+ successes, different ips        → Multiple (treat as
                                             symmetric)

Tauri command (desktop/src-tauri/src/lib.rs):
- detect_nat_type({ relays: [{ name, address }] }) -> NatDetection
  as JSON. Takes the relay list from JS because localStorage
  owns the config. Parse-up-front so a malformed entry fails
  clean instead of as a probe error. 1500ms per-probe timeout.

UI (desktop/index.html + src/main.ts):
- New "NAT type" row + "Detect NAT" button in the Network
  settings section. Renders per-probe status (name, address,
  observed addr, latency, or error) plus the colored verdict:
    * green  Cone — shows consensus addr
    * amber  SymmetricPort / Multiple — must relay
    * gray   Unknown — not enough data

Tests:
- 7 unit tests in wzp-client/src/reflect.rs covering every
  classifier branch (empty, 1 success, 2 identical, 2 diff ports,
  2 diff ips, success+failure mix, pure-failure).
- 3 integration tests in crates/wzp-relay/tests/multi_reflect.rs:
    * probe_reflect_addr_happy_path — single mock relay end-to-end
    * detect_nat_type_two_loopback_relays_is_cone — two concurrent
      relays, asserts both see 127.0.0.1 and classifier returns
      Cone or SymmetricPort (accepted because the test harness
      uses fresh ephemeral ports per probe which look like
      SymmetricPort on single-host loopback)
    * detect_nat_type_dead_relay_is_unknown — alive + dead port
      mix, asserts the dead probe surfaces an error string and
      the aggregator returns Unknown (only 1 success)

Full workspace test goes from 386 → 396 passing.

PRD: .taskmaster/docs/prd_multi_relay_reflect.txt
Tasks: 47-52 all completed

Next up: hole-punching (Phase 3) — use the reflected address in
DirectCallOffer/Answer and CallSetup so peers attempt a direct
QUIC handshake to each other, with relay fallback on timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 12:47:12 +04:00
Siavash Sameni
921856eba9 feat(reflect): QUIC-native NAT reflection ("STUN for QUIC") — Phase 1
Lets a client ask its registered relay "what IP:port do you see for
me?" over the existing TLS-authenticated signal channel, returning
the client's server-reflexive address as a SocketAddr. Replaces the
need for a classic STUN deployment and becomes the bootstrap step
for future P2P hole-punching: once both peers know their own reflex
addrs, they can advertise them in DirectCallOffer and attempt a
direct QUIC handshake to each other.

Wire protocol (wzp-proto):
- SignalMessage::Reflect — unit variant, client -> relay
- SignalMessage::ReflectResponse { observed_addr: String } — relay -> client
- JSON-serde, appended at end of enum: zero ordinal concerns,
  backward compat with pre-Phase-1 relays by construction (older
  relays log "unexpected message" and drop; newer clients time out
  cleanly within 1s).

Relay handler (wzp-relay/src/main.rs, signal loop):
- New match arm next to Ping reuses the already-bound `addr` from
  connection.remote_address() and replies with observed_addr as a
  string. debug!-level log on success, warn!-level on send failure.

Client side (desktop/src-tauri/src/lib.rs):
- SignalState gains pending_reflect: Option<oneshot::Sender<SocketAddr>>.
- get_reflected_address Tauri command installs the oneshot before
  sending Reflect and awaits it with a 1s timeout; cleans up on
  every exit path (send failure, timeout, parse error).
- recv loop's new ReflectResponse arm fires the pending sender or
  emits a debug log for unsolicited responses — never crashes the
  loop on malformed input.
- Integrated into invoke_handler! alongside the other signal
  commands.

UI (desktop/index.html + src/main.ts):
- New "Network" section in settings panel with a "Detect" button
  that displays the reflected address or a categorized warning
  ("register first" / "relay does not support reflection" / error).

Tests (crates/wzp-relay/tests/reflect.rs — 3 new, all passing):
- reflect_happy_path: client on loopback gets back 127.0.0.1:<its own port>
- reflect_two_clients_distinct_ports: two concurrent clients see
  their own distinct ports, proving per-connection remote_address
- reflect_old_relay_times_out: mock relay that ignores Reflect —
  client times out between 1000-1200ms and does not hang

Also pre-existing test bit-rot unrelated to this PR — fixed so the
full workspace `cargo test` goes green:
- handshake_integration tests in wzp-client, wzp-relay and
  featherchat_compat in wzp-crypto all missed the `alias` field
  addition to CallOffer and the 3-arg form of perform_handshake
  plus 4-tuple return of accept_handshake. Updated to the current
  API surface.

Results:
  cargo test --workspace --exclude wzp-android: 386 passed
  cargo check --workspace: clean
  cargo clippy: no new warnings in touched files

Verification excludes wzp-android because it's dead code on this
branch (Tauri mobile uses wzp-native instead) and can't link -llog
on macOS host — unchanged status quo.

PRD: .taskmaster/docs/prd_reflect_over_quic.txt
Tasks: 39-46 all completed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 12:29:07 +04:00
Siavash Sameni
7e7968b2f9 diag(android-engine): first-join no-audio ordering instrumentation
Adds a single call_t0 = Instant::now() at the top of the Android
CallEngine::start path, threaded through send + recv tasks as
send_t0 / recv_t0, and tags the following milestones with
t_ms_since_call_start so we can build a clean side-by-side log of
first-call vs rejoin:

  1. QUIC connection established
  2. handshake complete
  3. wzp-native audio_start returned (+ how long audio_start itself took)
  4. send task spawned
  5. send: first full capture frame read (+ short_reads_before count)
  6. send: first non-zero capture RMS
  7. recv task spawned
  8. recv: first media packet received
  9. recv: first successful decode
 10. recv: first playout-ring write

Combined with the existing C++-side cb#0 logs in
crates/wzp-native/cpp/oboe_bridge.cpp ("capture cb#0", "playout
cb#0") this gives us full-pipeline ordering with no native-side
changes needed.

PRD: .taskmaster/docs/prd_android_first_join_no_audio.txt
Task: 32 (first task in the chain — diagnostics before any fix
attempts so we know which of the 5 suspect causes is real).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 10:00:20 +04:00
Siavash Sameni
578ff8cff4 feat(debug): GUI toggle for DRED verbose logs + macOS mic permission
DRED verbose logs (off by default — keeps logcat clean in normal use):
- wzp-codec: DRED_VERBOSE_LOGS atomic flag with dred_verbose_logs() /
  set_dred_verbose_logs() helpers
- opus_enc: gate "DRED enabled" + libopus version logs behind the flag
- desktop/src-tauri/engine.rs: gate DredRecvState parse log,
  reconstruction log, classical PLC log, and DRED-counter fields in
  the Android recv heartbeat (non-verbose path still logs basic recv
  stats)
- Tauri commands set_dred_verbose_logs / get_dred_verbose_logs
- Settings panel gets a "DRED debug logs (verbose, dev only)"
  checkbox; preference persists in wzp-settings localStorage and is
  pushed to Rust on save and on app boot

macOS mic permission:
- Add desktop/src-tauri/Info.plist with NSMicrophoneUsageDescription.
  Without it, modern macOS silently denies CoreAudio capture for
  ad-hoc-signed Tauri builds — capture starts but every callback
  hands you zeros. Symptom: phones could not hear desktop client,
  desktop could still hear phones (playout has no TCC gate). The
  Tauri 2 bundler auto-merges this file into WarzonePhone.app's
  Contents/Info.plist on the next build, so first launch will pop
  the standard mic prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 09:48:32 +04:00
Siavash Sameni
16890576fb feat(observability): logcat-visible DRED proof of life on Android
Adds enough INFO-level logging that an opus-DRED-v2 APK on Android can
be verified end-to-end by reading logcat alone — no debugger, no
Prometheus, no telemetry pipeline required. Three observation points:

1. Encoder construction (opus_enc.rs)
   - Bumped the "DRED enabled" log from debug! to info! so the
     per-call DRED config is in logcat by default. Each call's first
     OpusEncoder construction logs codec, dred_frames, dred_ms,
     loss_floor_pct.
   - Added a one-shot static OnceLock that logs `opusic_c::version()`
     the first time an OpusEncoder is built in the process. This is
     the smoking gun for "is the new libopus actually loaded" — pre-
     Phase-0 audiopus shipped libopus 1.3 with no DRED, post-Phase-0
     should print 1.5.2 here.

2. DRED state ingest (DredRecvState::ingest_opus in
   desktop/src-tauri/src/engine.rs)
   - First successful parse on a call logs immediately so we can see
     "DRED is on the wire" in logcat.
   - Subsequent parses sample every 100th to confirm steady-state
     samples_available without drowning the log.
   - New parses_total / parses_with_data counters track the parse
     rate vs the success rate (a packet without DRED in it returns
     `available == 0`, so a low ratio means the encoder isn't
     emitting DRED bytes).

3. DRED reconstruction events (DredRecvState::fill_gap_to)
   - Every DRED reconstruction logs at INFO with missing_seq,
     anchor_seq, offset_samples, offset_ms, samples_available,
     gap_size, and the running total. These events are rare on a
     clean network and we want to know exactly which gap was filled.
   - First three classical PLC fills + every 50th thereafter log so
     we can see when DRED couldn't cover a gap (offset out of range,
     no good state, or reconstruct error).

4. Recv heartbeat (Android start() in engine.rs)
   - Existing 2-second heartbeat now includes dred_recv,
     classical_plc, dred_parses_with_data, dred_parses_total
     so a steady-state call shows the cumulative counters in
     logcat without parsing.

How to verify on a real call:

  adb logcat -s 'RustStdoutStderr:*' | grep -i 'dred\|libopus version'

Expected output sequence on a successful Opus call:
  - "linked libopus version libopus_version=libopus 1.5.2-..."  (once per process)
  - "opus encoder: DRED enabled codec=Opus24k dred_frames=20 dred_ms=200 loss_floor_pct=15"  (per call)
  - "DRED state parsed from Opus packet seq=N samples_available=4560 ms=95 ..."  (after first DRED-bearing packet)
  - "recv heartbeat (android) ... dred_recv=0 classical_plc=0 dred_parses_with_data=58 dred_parses_total=58"  (every 2s)

If you see "linked libopus version libopus 1.3" — the FFI swap didn't
take. If dred_parses_with_data stays at 0 while dred_parses_total
climbs — the sender isn't emitting DRED (check the encoder's loss
floor and the receiver's libopus version). If gaps trigger
"classical PLC fill" instead of "DRED reconstruction fired" —
DRED state coverage is too small for the observed loss pattern,
and the loss floor or DRED duration policy needs tuning.

Verification:
- cargo check -p wzp-codec -p wzp-client: 0 errors
- cargo check -p wzp-desktop: 0 Rust errors (only the pre-existing
  tauri::generate_context!() proc macro panic on missing ../dist
  which fires at host check time, irrelevant on the remote build)
- cargo test -p wzp-codec --lib: 69 passing (no regressions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:58:03 +04:00
Siavash Sameni
daf7bcd9ba chore(warnings): sweep the workspace — zero warnings on lib + bin targets
Addressed every rustc warning surfaced by \`cargo check --workspace
--release --lib --bins\` on opus-DRED-v2. Split across three
categories:

## Real bugs surfaced by the audit (fix, don't silence)

- **crates/wzp-relay/src/federation.rs** — the per-peer RTT monitor
  task computed \`rtt_ms\` every 5 s and threw it on the floor. The
  \`wzp_federation_peer_rtt_ms\` gauge has been registered in
  metrics.rs the whole time but was never receiving samples, leaving
  the Grafana panel blank. Wired it up: the task now calls
  \`fm_rtt.metrics.federation_peer_rtt_ms.with_label_values(&[&label_rtt]).set(rtt_ms)\`
  on every sample. Fixes three warnings (\`rtt_ms\`, \`fm_rtt\`,
  \`label_rtt\` were all captured for this task and all dead).

## Dead code removal

- **crates/wzp-relay/src/federation.rs** — removed \`local_delivery_seq:
  AtomicU16\` field and its initializer. It was described in comments
  as "per-room seq counter for federation media delivered to local
  clients" but was declared, initialized to 0, and never read or
  written anywhere else. Genuine half-wired feature; deletable with
  zero behavior change.
- **crates/wzp-relay/src/room.rs** — removed \`let recv_start =
  Instant::now()\` at the top of a recv loop that was never read.
  Separate variable \`last_recv_instant\` already measures the actual
  gap that's used for the \`max_recv_gap_ms\` stat.
- **crates/wzp-client/src/cli.rs** — removed \`let my_fp = fp.clone()\`
  from the signal loop setup. Cloned but never used in any match arm.

## Stub-intent warnings (underscore + explanatory comment)

- **crates/wzp-relay/src/handshake.rs** — \`choose_profile\` hardcodes
  \`QualityProfile::GOOD\` and ignores its \`supported\` parameter.
  Comment already documented "Cap at GOOD (24k) for now — studio
  tiers not yet tested for federation reliability". Renamed to
  \`_supported\`, expanded the comment to explicitly note the future
  plan (pick highest supported ≤ relay ceiling).
- **crates/wzp-relay/src/federation.rs** — \`forward_to_peers\` takes
  \`room_name: &str\` but only uses \`room_hash\`. The caller
  (handle_datagram) passes the name for caller-site symmetry with
  other helpers; kept the param shape and underscored the binding
  with a comment noting it's reserved for future per-name logging.

## Cosmetic fixes

- **crates/wzp-relay/src/event_log.rs** — dropped \`use std::sync::Arc\`
  (unused).
- **crates/wzp-relay/src/signal_hub.rs** — trimmed \`use tracing::{info,
  warn}\` to \`use tracing::info\`. Also removed unnecessary \`mut\` on
  \`hub\` binding in the \`register_unregister\` test.
- **crates/wzp-relay/src/room.rs** — trimmed \`use tracing::{debug,
  error, info, trace, warn}\` to \`{error, info, warn}\`. Also removed
  unnecessary \`mut\` on \`mgr\` binding in the \`room_join_leave\` test.
- **crates/wzp-relay/src/main.rs** — removed unnecessary \`mut\` on the
  \`config\` destructured binding from \`parse_args()\`; and dropped
  \`ref caller_alias\` from the \`DirectCallOffer\` match pattern since
  the relay just forwards the full \`msg\` (caller_alias is preserved
  end-to-end, we don't need to read it on the relay).
- **crates/wzp-crypto/tests/featherchat_compat.rs** — dropped
  \`CallSignalType\` from a \`use wzp_client::featherchat::{...}\`
  (unused in the test body). Note: this test file has pre-existing
  compile errors from SignalMessage schema drift unrelated to this
  sweep; that's tracked separately.

## Crate-level annotation

- **crates/wzp-android/src/lib.rs** — added
  \`#![allow(dead_code, unused_imports, unused_variables, unused_mut)]\`
  with a doc block explaining the crate is dead code since the Tauri
  mobile rewrite. The legacy Kotlin+JNI Android app that consumed
  this crate was replaced by desktop/src-tauri (live Android recv
  path) + crates/wzp-native (Oboe bridge). Rather than piecemeal
  cleanup of a crate that shouldn't be maintained, the whole-crate
  allow keeps CI clean until someone removes the crate entirely. Kills
  all 6 wzp-android warnings (4 unused imports/vars, 1 unused \`mut\`
  on a JNI env param, 1 dead \`command_rx\` field) in one line.

## Not touched

- **deps/featherchat/warzone/crates/warzone-protocol/src/x3dh.rs** —
  3 unused-variable warnings in \`alice_spk_secret\`, \`alice_bundle\`,
  \`bob_bundle_bytes\`. This is a vendored third-party submodule;
  upstream's problem, not ours. Would need to be reported to
  featherchat upstream if we care.

## Verification

- \`cargo check --workspace --release --lib --bins\` → 0 warnings, 0 errors
- \`cargo check --workspace --release --all-targets\` → only the 3
  featherchat submodule warnings remain, plus the pre-existing 3
  broken integration tests (SignalMessage schema drift from Phase 2,
  tracked separately and explicitly out of scope).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:28:26 +04:00
Siavash Sameni
df1a45a5f5 fix(cli): port live mode to ring API (read_frame/write_frame removed)
AudioCapture and AudioPlayback no longer expose the old read_frame()
and write_frame() methods — they were replaced with ring() returning
&Arc<AudioRing> when the lock-free SPSC ring was introduced. The CLI
live-mode loop still referenced the removed methods, which broke every
workspace build that touched wzp-client bin (including the remote
Linux x86_64 docker build).

- Send loop: allocate a 960-sample scratch buffer, fill it in a loop
  via capture.ring().read() until a full 20 ms frame is available,
  sleep 2 ms between empty reads to avoid hot-spinning.
- Recv loop: write decoded PCM into playback.ring() instead of
  calling write_frame(). Short writes on full ring drop the tail,
  which is the correct real-time behavior for CLI live mode.

No behavioral change on the wire or in the call pipeline — this is
purely a compile fix for cli.rs bitrot that accumulated since the
ring API landed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:08:14 +04:00
Siavash Sameni
dd0c714caa Revert "fix(deps): restore Cargo.lock from 8ceb6f4 — minimize dep drift from Phase 0"
This reverts commit 575a39d07a.
2026-04-11 08:06:04 +04:00
Siavash Sameni
a7b2f850f1 build(script): parametrize branch via WZP_BRANCH (default opus-DRED-v2)
The Linux build script was hardcoded to feat/android-voip-client, which
is an older branch that doesn't have the current DRED work or the relay
fixes from 8c4d640. Default the branch to opus-DRED-v2 (current active
development branch), thread it through to the remote script as a third
positional arg, and allow override via `WZP_BRANCH=<name> ./build-linux-docker.sh`.

This is also what let us discover that the relay at 172.16.81.175:4433
was running d0c1731 (android-rewrite) and missing the 8c4d640
CallSetup/advertised-IP fix — direct calls failed until the relay was
rebuilt locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:05:56 +04:00
Siavash Sameni
575a39d07a fix(deps): restore Cargo.lock from 8ceb6f4 — minimize dep drift from Phase 0
Phase 0 cherry-pick regenerated the lockfile from scratch via
`cargo generate-lockfile`, which bumped at least tokio (1.50.0 → 1.51.1)
and downgraded the lockfile format from version 4 → version 3. Many
other transitive deps may have shifted silently.

Symptoms that pointed here:
1. Direct-call media QUIC handshake silently stalls for exactly the
   client-side 10s timeout, with no errors in the log. Classic tokio
   runtime / async waker mismatch — tasks queued from one runtime
   never run because the endpoint's I/O driver is on another runtime.
2. Every `place_call` gets an immediate `signal: Hangup reason=Normal`
   back from the signal recv loop, as if it's consuming stale state.
3. Eventually hits `FORTIFY: pthread_mutex_lock called on a destroyed
   mutex` and the process dies.

All three are consistent with a tokio async primitive being shared
across runtimes in a way that tokio 1.51.1 handles differently than
1.50.0 (which was the version on the user's known-good build). Rather
than chase the specific bisection, restore the exact base lockfile
and let cargo add only the three deps Phase 0 actually needs
(opusic-c, opusic-sys, bytemuck).

Verification:
- `git diff 8ceb6f4..HEAD -- Cargo.lock | grep -c '^[+-]version = '` → 0
  (no version-line changes beyond what Cargo auto-pulls for new crates)
- tokio back to 1.50.0
- rustls, quinn, quinn-proto, quinn-udp all unchanged
- Lockfile version restored to 4
- cargo test -p wzp-codec --lib: 69 passing (unchanged)
- cargo test -p wzp-client --lib: 35 passing + 1 ignored (unchanged)

Does not fix the pre-existing relay-side advertised-IP bug
(CallSetup may still contain a relay address that the callee cannot
reach from its network), but that is an orthogonal issue that existed
on 8ceb6f4 too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 22:13:35 +04:00
Siavash Sameni
d63d50cdc0 fix(build): remove apostrophe from libc++_shared comment (broke docker bash -c quoting)
Previous commit d269600 added the libc++_shared.so copy step but the
comment block included "Android's dynamic linker" — the apostrophe
closed the enclosing `bash -c '...'` single-quoted string prematurely.
Everything after "Android" was interpreted as wrapper-script bash
instead of docker-container bash, so JNI_ABI_DIR (set inside the
docker context) was unbound when the wrapper tried to use it.

Build failed with:
  /tmp/wzp-tauri-build.sh: line 149: JNI_ABI_DIR: unbound variable

Note the pre-existing script uses backticks in its comments ("cargo-
tauri`s linker wiring") exactly to avoid this trap. Matched that style
and added an explicit NOTE to the comment explaining the quoting
hazard for future editors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:49:54 +04:00
Siavash Sameni
d269600aa7 fix(build): build-tauri-android.sh — copy libc++_shared.so into jniLibs
Root cause of "wzp-native not loaded" at runtime on opus-DRED-v2 APK:

libwzp_native.so has a NEEDED entry for libc++_shared.so (because
crates/wzp-native/build.rs uses cpp_link_stdlib(Some("c++_shared"))),
but the APK only contained:
    lib/arm64-v8a/libwzp_desktop_lib.so  (192 MB)
    lib/arm64-v8a/libwzp_native.so       (683 KB)

No libc++_shared.so → Android's dynamic linker fails the dlopen of
libwzp_native.so at runtime with "library libc++_shared.so not found",
and every audio path that routes through wzp_native (capture, playout,
register, direct call) refuses to start.

Diagnosis:
- readelf -d libwzp_native.so shows NEEDED libc++_shared.so
- python zipfile listing of the APK confirms libc++_shared.so is
  absent from lib/arm64-v8a/
- scripts/build-and-notify.sh (the legacy wzp-android build path)
  already had this fix at lines 126-134 with an explicit comment:
  "cargo-ndk may not copy libc++_shared.so — grab it from the NDK if
  missing". That fix was never ported to build-tauri-android.sh when
  the Tauri mobile pipeline was set up.

Fix: after `cargo ndk build -p wzp-native --release` produces
libwzp_native.so into jniLibs, copy libc++_shared.so from the NDK
sysroot (same find pattern as build-and-notify.sh) into the same
jniLibs dir. Abort with a clear error if the NDK doesn't have the file.

Also noting the 191 MB vs 359 MB size discrepancy the user saw: that's
almost entirely libwzp_desktop_lib.so being a 192 MB debug build. The
old working APK was probably a release build (smaller main lib) or
included multiple arches (doubling/tripling the .so count). The size
is cosmetic — the crash is the real issue, and libc++_shared.so is
~2 MB so this fix doesn't close the size gap. Can investigate the
size difference separately after register + direct call work again.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:43:47 +04:00
Siavash Sameni
dfbe21fe6e feat(tauri-engine): Phase 3b/3c re-port — DRED reconstruction on the live Tauri mobile engine
The original Phase 3b landed on wzp-client/CallDecoder and Phase 3c
landed on wzp-android/src/engine.rs. Both of those are DEAD CODE on
feat/desktop-audio-rewrite: the legacy Kotlin app in android/app/ is
not built by the Tauri mobile pipeline, and the Tauri engine bypasses
CallDecoder by calling wzp_codec::create_decoder directly.

The live Android call engine lives at desktop/src-tauri/src/engine.rs
with two `pub async fn start<F>` functions — one cfg-gated on Android
(Oboe via wzp-native) and one for desktop (CPAL). Both recv tasks
were using `let mut decoder = wzp_codec::create_decoder(...)` which
returns `Box<dyn AudioDecoder>` and doesn't expose the inherent
`reconstruct_from_dred` method.

Changes:

New helper struct `DredRecvState` at the top of engine.rs, wrapping:
  - DredDecoderHandle (libopus DRED side-channel parser)
  - DredState scratch (for parse_into)
  - DredState last_good (cached valid state, swapped on success)
  - last_good_seq: Option<u16> (DRED anchor sequence)
  - expected_seq: Option<u16> (for gap detection)
  - dred_reconstructions / classical_plc_invocations counters

With three methods:
  - ingest_opus(seq, payload): parse DRED, swap on success
  - fill_gap_to(decoder, current_seq, frame_samples, scratch, emit):
    detect gap back from expected_seq, reconstruct each missing
    frame via DRED if state covers it, fall through to classical
    decoder.decode_lost() when it doesn't. Calls emit() once per
    frame with a slice the caller uses for AGC + playout write.
  - reset_on_profile_switch(): invalidate tracking when codec changes

Both recv tasks (Android @ ~line 297 and desktop @ ~line 907):
  - Decoder type changed from `Box<dyn AudioDecoder>` via
    `wzp_codec::create_decoder` to concrete `AdaptiveDecoder::new(profile)`
    so we can call the inherent reconstruct_from_dred method.
  - Added `use wzp_proto::traits::AudioDecoder;` at the top of
    engine.rs to bring decode/decode_lost/set_profile trait methods
    into scope on the concrete type.
  - New `current_profile` local alongside `current_codec` (used for
    frame_duration lookups that drive the DRED sample offset math).
  - On codec/profile switch, call dred_recv.reset_on_profile_switch()
    because the cached DRED state is tied to the old profile's
    frame rate.
  - For each arriving Opus source packet:
      1. dred_recv.ingest_opus(seq, payload) — parse DRED
      2. dred_recv.fill_gap_to(...) — detect gap and reconstruct
         missing frames, each emitted through a closure that does
         AGC + playout write (wzp_native on Android, playout_ring
         on desktop)
      3. Normal decoder.decode() fallthrough for the current packet
         (unchanged)
  - Codec2 packets skip the DRED path entirely (is_opus() gate) —
    libopus can't reconstruct Codec2 audio.

Ordering invariant: gap reconstruction writes to playout BEFORE the
current packet's decoded audio, preserving temporal order since the
playout ring is FIFO. The closure captures the `spk_muted` flag once
before the gap loop to avoid mid-gap-fill state changes.

Kept `crates/wzp-android/src/engine.rs` and `crates/wzp-android/src/
stats.rs` from the earlier Phase 3c commit as-is — they're dead code
on feat/desktop-audio-rewrite but harmless, and deleting them would
diverge this branch from an independently-useful intermediate state.
The old Phase 3c commit (505a834) stays as historical reference.

Verification:
- cargo check -p wzp-codec -p wzp-client -p wzp-relay: 0 errors
- cargo check -p wzp-desktop: only pre-existing `tauri::generate_context!()`
  panic on missing ../dist (Vite output not built on host) — no Rust
  compile errors from our changes
- cargo test -p wzp-codec --lib: 69 passing (unchanged)
- cargo test -p wzp-client --lib: 35 passing + 1 ignored (unchanged)

Next: scripts/build-tauri-android.sh to get the actual Tauri APK —
NOT build-and-notify.sh which builds the dead legacy android/app.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:31:09 +04:00
Siavash Sameni
b83c31b5d1 fix(android): remove duplicate TextAlign import in InCallScreen.kt
Pre-existing build breakage on feat/desktop-audio-rewrite @ 8ceb6f4 —
TextAlign was imported twice (line 5 and line 50), causing Kotlin
compilation to fail with:

  e: InCallScreen.kt:5:39 Conflicting import, imported name 'TextAlign' is ambiguous
  e: InCallScreen.kt:50:39 Conflicting import, imported name 'TextAlign' is ambiguous

The line-5 copy was squeezed into the middle of the foundation.* block
(alphabetically out of place) — an accidental extra paste. The line-50
copy sits in the correct alphabetical position. Removed the former.

This blocks the APK build for the opus-DRED-v2 rebase. Unrelated to DRED
itself but the error surfaced because the cherry-picked phases caused
a clean Gradle build (no UP-TO-DATE short-circuit) that re-compiled
InCallScreen.kt against the fresh class graph.

Also noting that the previous working APK (unridden-alfonso.apk) was
built from the stale d0c1731 baseline which didn't have this bug —
one more reason the stale-branch build problem went unnoticed until
the opus-DRED-v2 rebase forced a clean Gradle pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:12:23 +04:00
Siavash Sameni
1f607281fd fix(build): build-and-notify.sh — parameterize branch, fail loud on pull errors
Same fix that landed on the old opus-DRED branch as c95255d: the remote
build script hardcoded `feat/android-voip-client` and swallowed the
reset failure with `|| true`, silently leaving the tree on whatever
branch was there. This ported the fix forward to feat/desktop-audio-
rewrite (which had the same bug).

Fix:
  Local side:
  - Auto-detect current branch via `git branch --show-current`
  - Accept `--branch NAME` override
  - Pass branch as a third positional arg to the remote script
  - Abort on detached HEAD
  - Updated usage docs for the "build what I'm working on" default

  Remote side:
  - Read BRANCH from $3, abort if empty
  - `git fetch origin "$BRANCH"` — errors surface
  - `git reset --hard "origin/$BRANCH"` — no `|| true`, failures abort
  - Echo the resolved commit hash + subject after reset
  - Notifications include both branch and hash:
    "WZP Android [opus-DRED-v2 @ <hash>] done! APK: ..."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:07:15 +04:00
Siavash Sameni
7515417202 feat(telemetry): Phase 4 — LossRecoveryUpdate protocol + relay metrics + DebugReporter
Phase 4 lays the telemetry foundation for distinguishing DRED recoveries
from classical PLC in production: a new SignalMessage variant, two new
per-session Prometheus counters on the relay side, and a highlighted
loss-recovery section in the Android DebugReporter.

The periodic emitter (client → relay) and Grafana panel are deferred to
Phase 4b — this commit ships the protocol surface, the relay sink, and
the immediate user-visible debug output. Once 4b lands the full path
(emitter → relay → Prometheus → Grafana), the metrics here will
automatically start receiving data.

Scope decision — why not extend QualityReport instead:
The existing wire-format QualityReport is a fixed 4-byte media packet
trailer. Adding counter fields to it would shift the binary layout and
break backward compatibility (old receivers would parse the last 4
bytes of the extended trailer as QR, corrupting audio). Using a
new SignalMessage variant on the reliable QUIC signal stream sidesteps
the wire-format problem entirely — serde JSON enums tolerate unknown
variants gracefully on old receivers, and the signal channel is the
right layer for periodic telemetry aggregates.

Changes:

  wzp-proto/src/packet.rs:
    - New SignalMessage::LossRecoveryUpdate variant carrying:
        * dred_reconstructions: u64 (monotonic since call start)
        * classical_plc_invocations: u64 (monotonic)
        * frames_decoded: u64 (for rate calculation)
    - All three fields tagged #[serde(default)] for forward compat.

  wzp-client/src/featherchat.rs:
    - Added a match arm so signal_to_call_type() handles the new
      variant (treat as Offer for featherChat bridging purposes).

  wzp-relay/src/metrics.rs:
    - Two new IntCounterVec metrics on the relay, labeled by session_id:
        * wzp_relay_session_dred_reconstructions_total
        * wzp_relay_session_classical_plc_total
    - New method update_session_loss_recovery(session_id, dred, plc)
      applies monotonic deltas: if the incoming totals exceed the
      current counter, the difference is inc_by'd. If the incoming
      totals are LOWER (client restart or counter reset), the
      Prometheus counter holds steady until the client catches up.
      This matches the existing update_session_buffer delta pattern.
    - remove_session_metrics() now cleans up the two new labels.
    - New test session_loss_recovery_monotonic_delta exercises:
        * initial population (10 DRED, 2 PLC)
        * forward advance (25, 5 → delta +15, +3)
        * lower values ignored (client reset → counters unchanged)
        * client catches up (30, 8 → advances to new max)
    - Existing session_metrics_cleanup test extended to cover the
      new counters.

  android/app/src/main/java/com/wzp/debug/DebugReporter.kt:
    - Phase 4 users — and incident responders — need to quickly see
      whether DRED is actually firing during a call. The stats JSON
      already carries the counters (after Phase 3c), but they were
      buried in the trailing JSON dump. Added a dedicated
      "=== Loss Recovery ===" section to the meta preamble that
      extracts dred_reconstructions, classical_plc_invocations,
      frames_decoded, and fec_recovered from the JSON and displays
      them plainly, plus computed percentages when frames_decoded > 0.
    - New extractLongField helper: tiny hand-rolled JSON integer
      extractor. We don't want to pull in a full JSON parser for this
      single use case and CallStats has a flat, well-known schema.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-proto --lib: 63 passing
- cargo test -p wzp-codec --lib: 68 passing
- cargo test -p wzp-client --lib: 35 passing (+1 ignored probe)
- cargo test -p wzp-relay --lib: 68 passing (+1 new Phase 4 test)
- cargo check -p wzp-android --lib: zero errors
- Android APK build verified earlier today (unridden-alfonso.apk
  via the remote Docker builder) — Phase 0–3c confirmed to compile
  end-to-end on the NDK target.

Phase 4b remaining (not blocking this commit):
- Periodic LossRecoveryUpdate emitter in wzp-client/src/call.rs and
  wzp-android/src/engine.rs (every ~5 s)
- Relay-side handler in main.rs that matches the new variant and
  calls metrics.update_session_loss_recovery
- Grafana "Loss recovery breakdown" panel in docs/grafana-dashboard.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:39 +04:00
Siavash Sameni
505a834c5b feat(codec): Phase 3c — Android engine.rs DRED reconstruction on packet loss
Phase 3c mirrors Phase 3b on the Android receive path. With Phase 0-3b
landed on desktop + Android encoder, this commit completes codec-layer
loss recovery on the Android decoder side.

Architectural difference vs desktop: engine.rs has NO jitter buffer.
The recv task reads packets directly from the transport via
recv_media().await and writes decoded audio straight into the playout
ring. There is no PlayoutResult::Missing equivalent. Gap detection
therefore has to be done via sequence-number tracking — when a packet
arrives with seq > expected_seq, the frames in between are missing and
we attempt to reconstruct them via DRED before decoding the newly-
arrived packet.

Implementation:

  Imports & types:
    - Added wzp_codec::AdaptiveDecoder, wzp_codec::dred_ffi::{
      DredDecoderHandle, DredState} imports.
    - Changed the `decoder` local from Box<dyn AudioDecoder> (via
      wzp_codec::create_decoder) to concrete AdaptiveDecoder::new(profile).
      Same reasoning as Phase 3b: reconstruct_from_dred is an inherent
      method, not a trait method, so we need the concrete type.

  Recv task state (all task-local, no new struct fields):
    - dred_decoder: DredDecoderHandle
    - dred_parse_scratch: DredState (reused, overwritten per parse)
    - last_good_dred: DredState (cached most-recent valid state)
    - last_good_dred_seq: Option<u16>
    - expected_seq: Option<u16> (for gap detection)
    - dred_reconstructions: u64 (telemetry)
    - classical_plc_invocations: u64 (telemetry)

  Recv loop body (Opus source packets only):
    1. Parse DRED from the new packet first so last_good_dred reflects
       the freshest state available for gap recovery.
    2. Detect a gap: gap = pkt.seq.wrapping_sub(expected_seq). Cap at
       MAX_GAP_FRAMES = 16 (320 ms) to avoid huge wraparound scenarios.
    3. For each missing seq in the gap:
         offset = (last_good_dred_seq - missing_seq) * frame_samples
         if 0 < offset <= last_good_dred.samples_available():
             reconstruct_from_dred + write to playout ring
             bump dred_reconstructions
         else:
             decoder.decode_lost (classical PLC) + write + bump plc counter
    4. Decode the current packet normally and write to playout ring
       (unchanged from Phase 2).
    5. Update expected_seq = pkt.seq.wrapping_add(1).

  Profile-switch handling: when the incoming codec changes (triggering
  decoder.set_profile), reset last_good_dred_seq and expected_seq to
  None. The cached DRED state is tied to the old profile's frame rate
  and would produce wrong offsets after the switch; starting fresh is
  correct.

  Decode-error fallback: the existing `Err(e) => decode_lost` branch
  now also increments classical_plc_invocations so the counter
  accurately reflects all PLC invocations (gap-detected AND decode-
  error-triggered).

Telemetry (CallStats additions):
  - stats.dred_reconstructions: u64
  - stats.classical_plc_invocations: u64
  Both updated on every packet arrival in the existing stats.lock()
  block alongside frames_decoded/fec_recovered, so the Android UI and
  JNI bridge already have these values without any further plumbing.
  The periodic recv stats log now includes both counters.

Ordering note: DRED gap reconstruction happens BEFORE decoding the new
packet's audio because the playout ring is FIFO. Gap samples must be
written before the new packet's samples so temporal order is preserved.
Out-of-order late arrivals (seq < expected_seq) are naturally dropped
as stale by the gap detection (gap would be a large wraparound value
exceeding MAX_GAP_FRAMES).

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (unchanged from Phase 3b)
- cargo test -p wzp-client --lib: 35 passing (unchanged from Phase 3b)
- cargo check -p wzp-android --lib: zero errors
- cargo test -p wzp-android cannot run on macOS host (pre-existing
  -llog linker dep, unrelated). Real end-to-end verification happens
  via the Android APK build on the remote Docker builder
  (scripts/build-and-notify.sh).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:31 +04:00
Siavash Sameni
27bc264738 feat(codec): Phase 3b — CallDecoder DRED reconstruction on packet loss
Phase 3b of the DRED integration — wires the Phase 3a FFI primitives
into the desktop receive path. When the jitter buffer reports a missing
Opus frame, CallDecoder now attempts to reconstruct the audio from the
most recently parsed DRED side-channel state before falling through to
classical PLC.

Architectural refinement vs the PRD's literal wording: the PRD said
"jitter buffer takes a Box<dyn DredReconstructor>". After checking deps,
wzp-transport depends only on wzp-proto (not wzp-codec). Putting DRED
state in the jitter buffer would require a new cross-crate dep and
couple the codec-agnostic buffer to libopus. Instead, this commit keeps
the DRED state ring and reconstruction dispatch inside CallDecoder (one
layer up from the jitter buffer), intercepting the existing
PlayoutResult::Missing signal. Same lookahead/backfill semantics,
cleaner layering, zero change to wzp-transport.

Changes:

  CallDecoder field type: Box<dyn AudioDecoder> → AdaptiveDecoder.
  Required because Phase 3b calls the inherent reconstruct_from_dred
  method, which cannot live on the AudioDecoder trait without dragging
  libopus DredState through wzp-proto. In practice AdaptiveDecoder was
  the only AudioDecoder implementor anyway — the trait abstraction was
  buying nothing. Method call sites unchanged because AdaptiveDecoder
  also implements AudioDecoder.

  New CallDecoder fields:
    - dred_decoder: DredDecoderHandle
    - dred_parse_scratch: DredState  (scratch for parse_into)
    - last_good_dred: DredState      (cached most-recent valid state)
    - last_good_dred_seq: Option<u16>
    - dred_reconstructions: u64      (Phase 4 telemetry)
    - classical_plc_invocations: u64 (Phase 4 telemetry)

  CallDecoder::ingest — on Opus non-repair packets, parse DRED into the
  scratch state. On success (samples_available > 0), std::mem::swap the
  scratch into last_good_dred and record the seq. This is O(1) per
  packet, zero allocation after construction (the two DredState buffers
  are allocated once in new() and reused forever).

  CallDecoder::decode_next — on PlayoutResult::Missing(seq) for Opus
  profiles: if last_good_dred_seq > seq and the seq delta × frame_samples
  fits within samples_available, call audio_dec.reconstruct_from_dred
  and bump dred_reconstructions. Otherwise fall through to classical
  PLC and bump classical_plc_invocations. The Codec2 path always falls
  through to classical PLC since DRED is libopus-only and
  AdaptiveDecoder::reconstruct_from_dred rejects Codec2 tiers
  explicitly.

  OpusDecoder and AdaptiveDecoder: new inherent reconstruct_from_dred
  method that delegates to the underlying DecoderHandle. Needed to
  bridge CallDecoder's wzp-client code to the Phase 3a FFI wrappers
  without touching the AudioDecoder trait.

CRITICAL FINDING — raised DRED loss floor from 5% to 15%:

Phase 3b testing discovered that libopus 1.5's DRED emission window
scales aggressively with OPUS_SET_PACKET_LOSS_PERC. Empirical data
(see probe_dred_samples_available_by_loss_floor, an #[ignore]'d
diagnostic test in call.rs):

  loss_pct   samples_available   effective_ms
    5%        720                  15 ms  (useless!)
   10%        2640                 55 ms
   15%        4560                 95 ms
   20%        6480                135 ms
   25%+       8400 (capped)       175 ms  (~87% of 200 ms configured)

The Phase 1 default of 5% produced only a 15 ms reconstruction window
— too small to even cover a single 20 ms Opus frame. DRED was
effectively disabled even though it was emitting bytes. Raised the
floor to 15% (95 ms window) as the minimum that actually provides
single-frame loss recovery. This updates Phase 1's DRED_LOSS_FLOOR_PCT
constant in opus_enc.rs and the accompanying module docstring.

Trade-off: 15% assumed loss slightly increases encoder bitrate overhead
on clean networks. Measured via the existing phase1 bitrate probe:

  Before (5% floor):  3649 bytes/sec at Opus 24k + 300 Hz sine
  After  (15% floor): 3568 bytes/sec at Opus 24k + 300 Hz sine

The delta is within noise — 15% isn't meaningfully more expensive than
5% on this signal, which suggests the DRED emission size is signal-
dependent rather than loss-dependent for small values. Net result: we
get a 6x larger reconstruction window for essentially free.

Tests (+3 DRED recovery, +1 #[ignore]'d probe):
- opus_single_packet_loss_is_recovered_via_dred — full encode → ingest
  → decode_next loop with one packet dropped mid-stream. Asserts
  dred_reconstructions ≥ 1 and observes the exact counter deltas.
- opus_lossless_ingest_never_triggers_dred_or_plc — baseline behavior,
  lossless stream never takes the Missing branch.
- codec2_loss_falls_through_to_classical_plc — Codec2 never
  reconstructs via DRED even if state were populated (which it won't
  be — Codec2 packets don't carry DRED bytes).
- probe_dred_samples_available_by_loss_floor — #[ignore]'d diagnostic
  that sweeps loss_pct values and prints the resulting DRED window
  sizes. Kept for future tuning work.

New CallDecoder introspection accessors (public but undocumented in
the PRD): last_good_dred_seq() and last_good_dred_samples_available()
for test diagnostics and future telemetry surfaces in Phase 4.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (Phase 3a baseline held)
- cargo test -p wzp-client --lib: 35 passing (+3 Phase 3b tests,
  +1 ignored diagnostic, no regressions)

Next up: Phase 3c mirrors this on the Android engine.rs receive path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:24 +04:00
Siavash Sameni
c27b39d553 feat(codec): Phase 3a — DRED FFI primitives (DredDecoderHandle + DredState)
Phase 3a of the DRED integration — the foundation for codec-layer loss
recovery. Adds three new safe wrappers to crates/wzp-codec/src/dred_ffi.rs
over the raw opusic-sys FFI, plus the reconstruction method on the existing
DecoderHandle. No call-site integration yet — that lands in Phase 3b (desktop)
and Phase 3c (Android).

New types:
- `DredDecoderHandle`: owns *mut OpusDREDDecoder from opus_dred_decoder_create.
  Used for parsing DRED side-channel data out of arriving Opus packets.
  This is a SEPARATE libopus object from OpusDecoder — it has its own
  internal state. Freed via opus_dred_decoder_destroy on Drop.
- `DredState`: owns *mut OpusDRED from opus_dred_alloc (a fixed ~10.6 KB
  buffer per libopus 1.5). Holds parsed DRED data between the parse and
  reconstruct steps. Reusable — parse_into overwrites contents. Tracks
  samples_available as a cached u32 so callers don't thread the value
  separately. Freed via opus_dred_free on Drop.

New methods:
- `DredDecoderHandle::parse_into(&mut self, state: &mut DredState, packet)`
  wraps opus_dred_parse with max_dred_samples=48000 (1s max), sampling_rate
  =48000, defer_processing=0. Returns the positive sample offset of the
  first decodable DRED sample, 0 if no DRED is present, or an error.
  Populates state.samples_available so subsequent reconstruct calls know
  the valid offset range.
- `DecoderHandle::reconstruct_from_dred(&mut self, state, offset_samples,
  output)` wraps opus_decoder_dred_decode. Reconstructs audio at a specific
  sample position (positive, measured backward from the DRED anchor packet)
  into a caller-provided output buffer. Validates that 0 < offset_samples
  <= state.samples_available() before calling the FFI to catch range bugs.

Tests (+7, wzp-codec total: 68 passing):
- dred_decoder_handle_creates_and_drops
- dred_state_creates_and_drops
- dred_state_reset_zeroes_counter
- dred_parse_and_reconstruct_roundtrip — end-to-end validation. Encodes
  60 frames of a 300 Hz sine wave through a DRED-enabled Opus 24k encoder,
  parses DRED state out of each arriving packet, asserts that at least one
  packet carries non-zero samples_available (DRED warm-up completes within
  the first second), then reconstructs 20 ms of audio from inside the
  window and asserts non-zero total energy. This is the hard signal that
  the full libopus 1.5 DRED FFI chain is correctly wired on our side.
- reconstruct_with_out_of_range_offset_errors — offset > samples_available
  is rejected at the Rust layer before the FFI call.
- reconstruct_with_zero_offset_errors — offset <= 0 rejected.
- dred_parse_empty_packet_returns_zero — graceful handling of empty input.

Architectural note (divergence from PRD's literal wording):
The PRD said "jitter buffer takes a Box<dyn DredReconstructor>". After
checking Cargo.toml for wzp-transport, it does NOT depend on wzp-codec —
only wzp-proto. Adding a DRED state ring inside the jitter buffer would
require a new cross-crate dependency and couple the codec-agnostic jitter
buffer to libopus internals. Instead, Phase 3b will put the DRED state
ring and reconstruction dispatch in CallDecoder (one layer up from the
jitter buffer), intercepting the existing PlayoutResult::Missing signal
and attempting reconstruction before falling through to classical PLC.
The jitter buffer itself stays unchanged. Same lookahead/backfill
semantics, cleaner layering. PRD's intent preserved, implementation
refined.

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 68 passing (61 Phase 2 baseline + 7 new)
- The roundtrip test is the acceptance gate — it proves that
  opus_dred_decoder_create, opus_dred_alloc, opus_dred_parse, and
  opus_decoder_dred_decode all work correctly through our wrappers on
  real libopus 1.5.2 output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:14 +04:00
Siavash Sameni
6db5c25b54 feat(codec): Phase 2 — remove RaptorQ from Opus tiers, Codec2 unchanged
Phase 2 of the DRED integration (docs/PRD-dred-integration.md). With
Phase 1 having enabled DRED on every Opus profile, the app-level RaptorQ
layer is now redundant overhead on those tiers: +20% bitrate, +40–100 ms
receive-side latency (block wait), +CPU for stats we never used. This
phase removes RaptorQ from the Opus encode and decode paths on both the
desktop (wzp-client/call.rs) and Android (wzp-android/engine.rs) sides.
Codec2 tiers keep RaptorQ with their current ratios unchanged — DRED is
libopus-only and Codec2 has no neural equivalent.

Encoder changes (the real bandwidth / CPU win):
- CallEncoder::encode_frame and engine.rs encode loop now gate the
  RaptorQ path on !codec.is_opus():
    - Opus source packets emit fec_block=0, fec_symbol=0,
      fec_ratio_encoded=0 in the MediaHeader
    - fec_enc.add_source_symbol is skipped on Opus
    - generate_repair + repair packet emission is skipped on Opus
    - block_id and frame_in_block counters stay frozen at 0 for Opus
- Codec2 path is byte-for-byte identical to pre-Phase-2 behavior.

Decoder changes (mostly cleanup, since both live decoder paths were
already reading audio directly from source packets and only using the
RaptorQ decoder output for stats):
- CallDecoder::ingest skips fec_dec.add_symbol on Opus packets. Source
  packets still flow to the jitter buffer; Opus repair packets from old
  senders are dropped cleanly (repair packets never hit the jitter
  buffer either).
- engine.rs recv loop skips fec_dec.add_symbol, fec_dec.try_decode, and
  fec_dec.expire_before on Opus packets. The `fec_recovered` stat
  counter becomes Codec2-only (a separate DRED reconstruction counter
  lands in Phase 4).

Wire-format backward compat verified at pre-flight:
- Old receiver + new sender: engine.rs pipeline.rs path gates on
  non-zero fec_block/fec_symbol which now never fire for Opus, so the
  RaptorQ decoder simply isn't fed. Audio flows normally. Desktop
  CallDecoder's old path accumulated packets into the stale-eviction
  HashMap, which cleans up after 2s — harmless.
- New receiver + old sender: new receiver skips RaptorQ on Opus so
  old-sender repair packets are ignored entirely (no crash, no double-
  decode). Loses the (previously vestigial) RaptorQ recovery benefit,
  which was never actually active in the audio path. Source packets
  still decode normally.
- No wire format version bump required. MediaHeader is unchanged; we
  just zero the FEC fields on Opus packets.

Test changes:
- Removed `encoder_generates_repair_on_full_block` — asserted the old
  (pre-Phase-2) RaptorQ-on-Opus behavior and is now incorrect. Replaced
  with two symmetric tests:
    - `opus_source_packets_have_zero_fec_header_fields` — verifies
      Phase 2 invariants on Opus packets
    - `opus_encoder_never_emits_repair_packets` — runs 20 frames of
      non-silent sine wave through a GOOD-profile encoder, asserts
      exactly 20 output packets, zero repair
    - `codec2_encoder_generates_repair_on_full_block` — same shape as
      the old test but on CATASTROPHIC profile (Codec2 1200, 8
      frames/block, ratio 1.0) to verify Codec2 path still emits
      repairs as before

Verification:
- cargo check --workspace: zero errors
- cargo test -p wzp-codec --lib: 61 passing (Phase 1 baseline held)
- cargo test -p wzp-client --lib: 32 passing (+3 new Phase 2 tests,
  -1 old test removed)
- cargo check -p wzp-android --lib: zero errors (host link of
  wzp-android tests fails on -llog per pre-existing Android-only
  build.rs, unrelated to this work; integration build via
  build-and-notify.sh will validate Android end-to-end)
- Pre-existing broken integration test in
  crates/wzp-client/tests/handshake_integration.rs (SignalMessage
  schema drift) is NOT caused by this commit — baseline had the same
  3 compile errors before Phase 2. Flagged as a separate cleanup task.

Expected observable effects on a real call:
- Opus 24k outgoing bitrate drops from ~28.8 kbps (ratio 0.2 RaptorQ)
  to ~25 kbps (base 24 kbps + DRED ~1–10 kbps signal-dependent)
- Opus receive-side latency drops ~40 ms on clean network (no more
  block wait — jitter buffer emits as soon as a source packet arrives)
- Codec2 calls show no latency or bitrate change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:02:42 +04:00
Siavash Sameni
54cbebd34e feat(codec): Phase 1 — enable DRED on all Opus profiles, disable inband FEC
Phase 1 of the DRED integration (docs/PRD-dred-integration.md). The Opus
encoder now emits DRED (Deep REDundancy) bytes in every packet, carrying
a neural-coded history of recent audio that the decoder can use to
reconstruct loss bursts up to the configured window. Opus inband FEC
(LBRR) is disabled because DRED does the same job better and running both
wastes bitrate on overlapping protection.

Tiered DRED duration policy per PRD:
  Studio  (Opus 32k/48k/64k): 10 frames = 100 ms
  Normal  (Opus 16k/24k):     20 frames = 200 ms
  Degraded (Opus 6k):         50 frames = 500 ms

Each profile switch (via adaptive quality) updates the DRED duration to
match the new tier. A 5% packet_loss floor is applied whenever DRED is
active, because libopus 1.5 gates DRED emission on non-zero packet_loss.
Real loss measurements from the quality adapter override upward.

Escape hatch: AUDIO_USE_LEGACY_FEC=1 reverts the encoder to Phase 0
behavior (inband FEC Mode1, DRED off, no loss floor). Read once at
OpusEncoder::new; call-scoped, not re-read mid-call. Trait-level
set_inband_fec becomes a no-op in DRED mode to preserve the invariant
even if external callers forget.

Observations from the bitrate probe test (dred_mode_roundtrip_voice_pattern):
  DRED mode:   3649 bytes/sec (~29.2 kbps) on Opus 24k + 300 Hz sine
  Legacy mode: 2383 bytes/sec (~19.1 kbps)
  Delta:       +10.1 kbps

The delta is considerably larger than the "+1 kbps flat" figure I carried
into the PRD from hazy memory of published DRED benchmarks. Likely because
the input (300 Hz sine) is very compressible so the base Opus rate in
legacy mode is well below the 24 kbps target, making the delta look
disproportionate. Signal-dependent — real speech would probably show a
different ratio. If production telemetry shows the overhead is excessive,
we can cut DRED duration on the normal tier from 200 ms to 100 ms as a
first tuning lever. Not blocking Phase 1 since the test still passes
within the reasonable 2000–8000 bytes/sec bounds.

Test changes (+8 tests, total wzp-codec: 61 passing):
- dred_duration_for_studio_tiers_is_100ms  (per-profile policy)
- dred_duration_for_normal_tiers_is_200ms
- dred_duration_for_degraded_tier_is_500ms
- dred_duration_for_codec2_is_zero
- default_mode_is_dred_not_legacy  (sanity check on fresh construction)
- dred_mode_roundtrip_voice_pattern  (observes DRED bitrate, asserts bounds)
- profile_switch_refreshes_dred_duration  (verifies set_profile updates DRED)
- set_inband_fec_noop_in_dred_mode  (trait-level inband FEC no-op)

Verification:
- cargo check --workspace: zero errors, no new warnings
- cargo test -p wzp-codec: 61/61 passing (53 pre-Phase-1 baseline + 8 new)
- Empirical DRED bitrate observed via `rtk proxy cargo test
  dred_mode_roundtrip_voice_pattern -- --nocapture`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:02:35 +04:00
Siavash Sameni
86526a7ad4 feat(codec): Phase 0 — swap audiopus → opusic-c + opusic-sys (libopus 1.5.2)
Phase 0 of the DRED integration (docs/PRD-dred-integration.md). No behavior
change: inband FEC stays ON, no DRED, same bitrate, same quality. This
commit unblocks Phase 1+ by getting us onto libopus 1.5.2 where DRED lives.

Rationale for going straight to a custom DecoderHandle: opusic-c::Decoder's
inner *mut OpusDecoder pointer is pub(crate), so we cannot reach it for the
Phase 3 DRED reconstruction path. Running two parallel decoders (one for
audio, one for DRED) would drift because the DRED decoder wouldn't see
normal decode calls. Single unified DecoderHandle over raw opusic-sys is
the only correct architecture, so we build it in Phase 0 rather than
rewriting opus_dec.rs twice.

Changes:
- Cargo.toml (workspace + wzp-codec): remove audiopus 0.3.0-rc.0, add
  opusic-c 1.5.5 (bundled + dred features), opusic-sys 0.6.0 (bundled),
  bytemuck 1. Pinned exactly for reproducible libopus 1.5.2.
- opus_enc.rs: rewritten against opusic_c::Encoder. Argument order for
  Encoder::new swapped (Channels first). set_inband_fec(bool) now maps
  to InbandFec::Mode1 (the libopus 1.5 equivalent of 1.3's LBRR). encode
  uses bytemuck::cast_slice<i16,u16> at the &[u16] boundary.
- dred_ffi.rs (new): DecoderHandle wrapping *mut OpusDecoder directly via
  opusic-sys. Owns the allocation, frees on Drop. Exposes decode,
  decode_lost, and a pub(crate) as_raw_ptr() for the future Phase 3 DRED
  reconstruction. Send+Sync justified via &mut self access discipline.
- opus_dec.rs: rewritten as a thin AudioDecoder impl over DecoderHandle.
  Behavior identical to pre-swap.

Verification (Phase 0 acceptance gates):
- cargo check --workspace: clean (30 pre-existing warnings in jni_bridge.rs
  unrelated to this work; zero in changed files).
- cargo test -p wzp-codec: 53 tests pass (50 pre-swap + 6 new: 3 in
  dred_ffi.rs for DecoderHandle lifecycle, 3 in opus_enc.rs for version
  check and roundtrip).
- linked_libopus_is_1_5 test asserts opusic_c::version() contains "1.5" —
  hard signal that the swap landed correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:02:15 +04:00
Siavash Sameni
56e3417063 docs: add PRD for DRED integration and Opus-tier FEC simplification
Plans the libopus 1.5.2 upgrade (audiopus → opusic-c/opusic-sys), DRED
enablement with tiered durations (100/200/500ms studio/normal/degraded),
removal of RaptorQ and Opus inband FEC from the Opus tiers, jitter buffer
lookahead/backfill refactor, and runtime escape hatch for rollout safety.
RaptorQ + current ratios preserved on Codec2 tiers (no DRED there).

Includes pre-flight verification findings: opusic-c Decoder inner pointer
is inaccessible (requires unified opusic-sys DecoderHandle), libopus 1.5
DRED API semantics clarified against xiph/opus opus.h, wire-format
backward compat verified on both live receive paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:57:01 +04:00
Siavash Sameni
8ceb6f45d5 fix(build): declare VARIANT in local script half (was remote-only)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
The VARIANT variable was set inside the REMOTE_SCRIPT heredoc for
naming artifacts during the cargo tauri build, but never declared
in the local half of the script where it's used to rename downloaded
files. Under `set -u` strict mode this aborted the local downloads
with "unbound variable: VARIANT" after a successful remote build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:16:07 +04:00
Siavash Sameni
07873ea598 fix(linux-aec): fall back to 0.3 crate + apt lib (2.x bundled is broken)
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 4m6s
Mirror to GitHub / mirror (push) Failing after 45s
Switch the webrtc-audio-processing dep from the 2.x git source (bundled
mode) back to crates.io 0.3, and link against Debian's apt package
libwebrtc-audio-processing-dev (0.3-1+b1 on Bookworm). The 2.x path
fails because both the crates.io tarball and the upstream git main
branch of webrtc-audio-processing-sys 2.0.3 have a build.rs bug where
\`meson setup --reconfigure\` is passed unconditionally, panicking on
first-run empty build dirs with "Directory does not contain a valid
build tree". The 0.x line sidesteps bundled mode entirely by linking
the apt-provided library.

Trade-off: we get AEC2 (the older generation) instead of AEC3, but
it's the same algorithm family and is what PulseAudio's
module-echo-cancel and PipeWire's filter-chain use on current
Debian-family distros. Fine for shipping — we can revisit AEC3 once
the 2.x bundled build is fixed upstream.

API changes:
- 0.3's Processor::process_capture_frame and process_render_frame
  take &mut self, so wrap the module-level processor in a Mutex.
  Capture and playback threads each lock briefly (sub-ms per 10 ms
  frame); contention is minimal.
- Import NUM_SAMPLES_PER_FRAME from the crate directly instead of
  hardcoding 480, so the code tracks whatever sample rate the
  upstream C++ lib exposes (currently 48 kHz hardcoded -> 480).
- Helper fns drain_frames_through_apm / tee_render_samples / etc.
  take &Mutex<Processor> instead of &Processor.
- Use explicit EchoCancellationSuppressionLevel and
  NoiseSuppressionLevel imports rather than fully-qualified paths.

Dockerfile:
- Drop meson / ninja-build / python3 (only needed for bundled build).
- Add libwebrtc-audio-processing-dev for the system link path.
- Keep clang (may be needed by the bindgen step in some versions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:06:56 +04:00
Siavash Sameni
cc00f7cace fix(linux-aec): try main branch of webrtc-audio-processing
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
v2.0.3 bundled build hits 'Directory does not contain a valid build
tree' because the crate's build.rs uses `meson setup --reconfigure`
unconditionally, which fails on first run when the build dir doesn't
yet contain prior meson state. Try the main branch in case it's been
fixed post-release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:58:28 +04:00
Siavash Sameni
eb9de988d6 fix(linux-aec): use git dep for webrtc-audio-processing
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Has been cancelled
The crates.io tarball of webrtc-audio-processing-sys 2.0.3 is missing
the vendored C++ submodule — the bundled build fails with 'Directory
does not contain a valid build tree' when meson tries to configure
the ./webrtc-audio-processing subdirectory. Cargo clones git deps with
submodules auto-initialized since ~1.27, so pulling from the upstream
git repo (pinned to tag v2.0.3) gives us the full source tree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:55:04 +04:00
Siavash Sameni
4ba77c8c0e feat(linux): WebRTC AEC3 capture/playback backend with render-side tee
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Has been cancelled
Adds gold-standard Linux echo cancellation: in-app WebRTC AEC3 (Audio
Processing Module) via the webrtc-audio-processing crate, using the
same algorithm as Chrome WebRTC, Zoom, Teams, and Jitsi. Runs entirely
in-process, so it works identically on ALSA / PulseAudio / PipeWire
systems — no dependency on user-configured echo-cancel modules.

Architecture:
- New crates/wzp-client/src/audio_linux_aec.rs module (~470 lines).
  Contains LinuxAecCapture and LinuxAecPlayback, both using CPAL
  under the hood but routing samples through a shared
  Arc<webrtc_audio_processing::Processor>. The playback path tees
  each 20 ms frame into APM.process_render_frame as the echo
  reference BEFORE handing the samples to CPAL's output callback.
  The capture path runs APM.process_capture_frame on each mic frame
  in place before pushing to the audio ring buffer. This is the
  "tee the playback ring" approach that Zoom/Teams/Jitsi use.
- New `linux-aec` feature in wzp-client pulling in the
  webrtc-audio-processing crate at v2.x with the `bundled`
  sub-feature. Bundled means the vendored PulseAudio WebRTC C++
  sources are statically compiled via meson+ninja at cargo build
  time — no runtime .so dependency, avoids Debian Bookworm's stale
  libwebrtc-audio-processing-dev 0.3 package (which predates AEC3).
  Dep is target-gated to Linux, so enabling the feature on non-Linux
  is a no-op.
- lib.rs re-exports LinuxAecCapture/LinuxAecPlayback as
  AudioCapture/AudioPlayback when `linux-aec` is on, otherwise
  falls back to the CPAL audio_io path. Shared public API
  (start/ring/stop/Drop) means downstream code is unchanged.
- New `linux-aec` feature in wzp-desktop forwards to
  wzp-client/linux-aec so `cargo tauri build -- --features
  wzp-desktop/linux-aec` builds the AEC variant.

APM configuration:
- EchoCancellation: High suppression, delay-agnostic mode on,
  extended filter on, stream_delay_ms=60 initial hint
- NoiseSuppression: High
- HighPassFilter: on
- AGC: off (can fight Opus encoder's own gain staging + adaptive
  quality controller; add later if users report low mic level)

Frame size handling:
- Pipeline uses 20 ms frames (960 samples @ 48 kHz mono)
- APM requires strict 10 ms (480 samples) per call
- Each 20 ms frame is split into two 480-sample halves, APM called
  twice, halves stitched back
- Same pattern for render and capture sides
- Carry-buffer logic handles the case where CPAL delivers samples in
  arbitrary chunk sizes that don't divide 960

Build infrastructure:
- scripts/Dockerfile.linux-desktop-builder adds meson, ninja-build,
  python3, clang for the webrtc-audio-processing bundled build
- scripts/build-linux-desktop-docker.sh takes a new --aec flag that
  enables the linux-aec feature and renames the output artifacts
  with an `-aec` suffix so noAEC and AEC variants can coexist on disk

Task #30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:53:23 +04:00
Siavash Sameni
7b8a2d0fba feat(build): add Linux x86_64 Tauri desktop build pipeline
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m50s
New Dockerfile and build script for producing wzp-desktop as a Linux
x86_64 binary (plus .deb and .AppImage bundles via tauri-cli).

- scripts/Dockerfile.linux-desktop-builder: thin extension of
  wzp-android-builder that adds the Tauri Linux runtime deps
  (libwebkit2gtk-4.1-dev, libsoup-3.0-dev, libgtk-3-dev,
  libayatana-appindicator3-dev, librsvg2-dev, libglib2.0-dev, patchelf).
  Everything else (Rust, Node, cmake, pkg-config, libasound2-dev,
  tauri-cli) is inherited from the base image.

- scripts/build-linux-desktop-docker.sh: mirrors the pattern of
  build-windows-docker.sh and build-linux-docker.sh. Ships
  \`cargo tauri build\` which produces target/release/wzp-desktop
  plus bundles under target/release/bundle/{deb,appimage}/. Uploads
  the .deb (or raw binary if bundling fails) to rustypaste and
  notifies ntfy.sh/wzp on start + completion. Downloads all three
  artifact types (raw binary, .deb, .AppImage) to target/linux-desktop/
  when they exist.

Image cache volumes are shared with the Android pipeline for cargo
registry + git, but the target dir is in its own cache-linux-desktop/
path to avoid stomping on the Android / Linux-CLI / Windows target
caches.

Branch default is feat/desktop-audio-rewrite (where the actual
wzp-desktop source lives), not feat/android-voip-client.

Task #29.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:28:47 +04:00
Siavash Sameni
5cd7a20152 fix(ui): disable WebView pinch-zoom and desktop right-click menu
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Has been cancelled
Two small WebView hardening tweaks that apply to both Android (Tauri
mobile) and desktop (Tauri) since the frontend is shared:

- index.html viewport meta now sets maximum-scale=1.0, minimum-scale=1.0,
  and user-scalable=no. This stops users on Android from pinch-zooming
  out of the fixed-layout UI. Desktop is unaffected because the Tauri
  WebView ignores pinch gestures anyway.
- main.ts installs global listeners that preventDefault on contextmenu
  (kills the browser-style right-click menu that exposed Inspect /
  Reload / Back / Forward entries on desktop), keydown Ctrl+-/+/0
  (stops keyboard zoom of the fixed layout), and gesture* + ctrl-wheel
  events (trackpad pinch on WebKit + Chromium respectively).

Dev tools remain accessible via F12 / Cmd-Opt-I keyboard shortcuts —
only the right-click entry point is suppressed. Android has no
right-click so that part is a no-op there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:26:08 +04:00
Siavash Sameni
a5c00fe5cb docs: add BRANCH-desktop-audio-rewrite.md and update ARCH/ADMIN/USER_GUIDE
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
Documents the feat/desktop-audio-rewrite branch story end-to-end:
- Purpose: shared codebase with android-rewrite via Tauri, platform-
  specific audio backends via target-dep sections + feature flags
- Audio backend matrix: CPAL baseline + macOS VPIO + Windows WASAPI
  AudioCategory_Communications
- Recent work: desktop direct calling feature with history dedup,
  macOS VPIO integration, Windows cross-compile via cargo-xwin, the
  libopus/clang-cl vendored audiopus_sys fix, icon.ico generation,
  and the WASAPI communications capture backend (task #24)
- Build pipelines: native cargo on macOS/Linux, Docker on SepehrHomeserverdk
  for Windows, Hetzner Cloud alternative
- Testing procedures for direct calling parity and Windows AEC A/B
- Known quirks: vendor path relative, cargo-xwin override.cmake clobber,
  WebView2 runtime prerequisite, 2024 edition unsafe lint warnings

Also appends shared-doc sections (identical on both branches):
- ARCHITECTURE.md: "Audio Backend Architecture (Platform Matrix)"
- ADMINISTRATION.md: "Build Pipelines"
- USER_GUIDE.md: "Direct 1:1 Calling" and "Windows AEC Variants"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:20:21 +04:00
Siavash Sameni
ec41f179cd fix(windows): drop dead override.cmake patch from Dockerfile
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 3m44s
The RUN step that baked an OPUS_DISABLE_INTRINSICS patch into
cargo-xwin's override.cmake was inert from the start: cargo-xwin
rewrites that file from scratch on every \`cargo xwin build\` invocation
(src/compiler/clang_cl.rs line ~444 uses include_bytes! to overwrite
it), so anything baked at image build time gets wiped at runtime.

The libopus SSE4.1/SSSE3 compile failure is now fixed upstream at the
source level by the vendored audiopus_sys patch (see
vendor/audiopus_sys/opus/CMakeLists.txt and the MSVC_CL distinction
for clang-cl). Remove the dead RUN step and leave a breadcrumb
comment pointing at the real fix location.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:07:06 +04:00
Siavash Sameni
4e9244eb00 fix(windows): add Win32_Security feature + 2024 edition unsafe wrappers
Some checks failed
Mirror to GitHub / mirror (push) Failing after 43s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- CreateEventW is gated behind Win32_Security in the windows crate
  because its signature takes SECURITY_ATTRIBUTES; add to features.
- Remove unused HANDLE import.
- Wrap GetId() and PWSTR::to_string() in explicit unsafe { ... }
  blocks for Rust 2024 edition's unsafe_op_in_unsafe_fn lint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:36:50 +04:00
Siavash Sameni
03a80a3196 feat(windows): WASAPI capture backend with OS-level AEC
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Has been cancelled
Adds a direct WASAPI microphone capture path for the Windows desktop
build that opens the default communications endpoint via
IMMDeviceEnumerator -> IAudioClient2 -> SetClientProperties with
AudioCategory_Communications, turning on Windows's communications
audio processing chain (AEC, noise suppression, automatic gain
control). The communications AEC operates at the OS level and uses
the system render mix as the reference signal, so echo from our
existing CPAL playback stream is cancelled automatically with no
per-process reference plumbing.

Architecture:
- New crates/wzp-client/src/audio_wasapi.rs module (~280 lines).
  Event-driven capture loop on a dedicated thread; pushes PCM into
  the same lock-free AudioRing used by the CPAL path. Same public
  API as audio_io::AudioCapture so downstream code is unchanged.
- New `windows-aec` feature in wzp-client that pulls in the
  `windows` crate (Microsoft's official Rust COM bindings) gated to
  target_os = "windows" only. Enabling the feature on non-Windows
  targets is a no-op since both the module and the dep are
  cfg(target_os = "windows").
- lib.rs re-exports WasapiAudioCapture as AudioCapture when the
  feature is on, otherwise falls back to the CPAL AudioCapture.
  AudioPlayback is always the CPAL one — no reason to swap it.
- desktop/src-tauri/Cargo.toml Windows target enables the new
  feature: `features = ["audio", "windows-aec"]`.

Implementation notes:
- Uses eCommunications role (not eConsole) for GetDefaultAudioEndpoint
  — the user-configured "communications" device that Teams/Zoom
  pick up, and the one Windows's AEC is tuned for.
- Requests 48 kHz mono i16 with AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM +
  SRC_DEFAULT_QUALITY so Windows handles any format conversion in
  the audio engine instead of rejecting our format.
- Event-driven with SetEventHandle / WaitForSingleObject — no
  polling, minimal CPU cost between packets.
- 200 ms wait timeout so the capture thread polls `running` often
  enough for Drop to stop cleanly even if the audio engine stalls
  (e.g. device unplug).

Task #24.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:35:36 +04:00
Siavash Sameni
7fecf285ea fix(windows): add icons/icon.ico for tauri-build Windows resource
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
tauri-build's Windows path unconditionally looks up icons/icon.ico to
embed as the PE file resource (taskbar/Explorer icon). We only had
icon.png (32x32 placeholder) which is fine on macOS/Linux but blocks
the Windows cross-compile with "icons/icon.ico not found; required for
generating a Windows Resource file during tauri-build".

Generated a multi-size ICO (16/24/32/48/64/128/256) from the existing
placeholder icon.png via Pillow. It's ugly at 256 due to upscaling from
32x32 with LANCZOS, but unblocks the build. Real branded icons can
replace it later without any build-system changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:15:04 +04:00
Siavash Sameni
0683dde5d3 fix(windows): vendor audiopus_sys + patch libopus for clang-cl SIMD
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Has been cancelled
cargo-xwin drives the Windows MSVC cross-compile via clang-cl, under
which CMake sets MSVC=1 — causing libopus 1.3.1's `if(NOT MSVC)` guards
to skip the per-file `-msse4.1` / `-mssse3` COMPILE_FLAGS that its x86
SIMD source files need. Clang-cl (unlike real cl.exe) still honors
Clang's target-feature system, so those files then fail to compile
with "always_inline function '_mm_cvtepi16_epi32' requires target
feature 'sse4.1'" errors across silk/NSQ_sse4_1.c, NSQ_del_dec_sse4_1.c,
and VQ_WMat_EC_sse4_1.c.

Earlier attempts to fix this downstream (cargo-xwin toolchain file,
override.cmake CMAKE_C_COMPILE_OBJECT <FLAGS> replace, CFLAGS env vars)
all failed because cargo-xwin rewrites override.cmake from scratch on
every `cargo xwin build` invocation and cmake-rs's -DCMAKE_C_FLAGS=
assembly happens before toolchain FORCE sets propagate.

Fixing it upstream at the source: vendor audiopus_sys 0.2.2 into
vendor/audiopus_sys, patch its bundled opus/CMakeLists.txt to introduce
an MSVC_CL var (true only when CMAKE_C_COMPILER_ID == "MSVC", i.e. real
cl.exe), and flip the eight `if(NOT MSVC)` SIMD guards to
`if(NOT MSVC_CL)`. Clang-cl then gets the GCC-style per-file flags and
the SSE4.1 sources build cleanly. Also flip the `if(MSVC)` global /arch
block at line 445 to `if(MSVC_CL)` so only cl.exe applies /arch:AVX and
clang-cl relies purely on per-file flags (no global/per-file mixing).

Wire via [patch.crates-io] in the workspace root Cargo.toml; the patch
is resolved relative to the workspace root as `vendor/audiopus_sys`.

Upstream context: xiph/opus#256, xiph/opus PR #257 (both stale).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:12:59 +04:00
Siavash Sameni
53f57eea07 fix(windows): printf instead of heredoc in Dockerfile RUN (parser hated <<EOF)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m56s
2026-04-10 13:05:04 +04:00
Siavash Sameni
ff3f7e8e4f fix(windows): patch override.cmake not toolchain — inject SSE via COMPILE_OBJECT template
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Has started running
The previous 'patch the toolchain file' approach (234a798, 48d2bd4) did
write the SSE flags into the COMPILE_FLAGS list correctly in the baked
image, but the CMakeCache.txt from the libopus configure ended up
without them in CMAKE_C_FLAGS, so cmake's final compile commands
didn't see them either. Most plausible explanation: cmake-rs passes
`-DCMAKE_C_FLAGS=…` on the command line, and its assembly of that
string happens outside the toolchain's FORCE set path, so the
toolchain patch never propagated.

Switch to a different lever: cargo-xwin already ships a tiny
`override.cmake` loaded via CMAKE_USER_MAKE_RULES_OVERRIDE. That
file is the right place to manipulate the compile-command
`CMAKE_C_COMPILE_OBJECT` / `CMAKE_CXX_COMPILE_OBJECT` templates —
it runs after cmake has initialised its compile rules but before
any source is compiled. Append two string(REPLACE '<FLAGS>' '<FLAGS>
/clang:-msse4.1 /clang:-mssse3 /clang:-msse3 /clang:-msse2') lines
to that file so every C and C++ compile command generated by cmake
gets the SSE feature flags inline, no matter what the project's
CMAKE_C_FLAGS is set to.

This is the CMake equivalent of a compiler wrapper and works
regardless of how cmake-rs / cargo-xwin / libopus juggle their
respective flag variables.
2026-04-10 13:03:06 +04:00
Siavash Sameni
48d2bd4f65 fix(windows): bake SSE patch into docker image instead of runtime
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m40s
2026-04-10 12:55:48 +04:00
Siavash Sameni
234a798df2 fix(windows): append SSE flags as a pure-CMake block to xwin toolchain
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m34s
The previous sed-based patch didn't stick in the docker-bash-c
heredoc (bash single-quoting made the newline escaping fragile).
Switch to a much simpler approach: just 'cat >>' a pure-CMake block
to the end of the cargo-xwin toolchain file. The block does:

    set(CMAKE_C_FLAGS   "${CMAKE_C_FLAGS}   /clang:-msse4.1 ..." CACHE STRING "" FORCE)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /clang:-msse4.1 ..." CACHE STRING "" FORCE)

Running AFTER the toolchain's own FORCE-set and AFTER cmake-rs's
-DCMAKE_C_FLAGS= command-line override, it unconditionally wins. No
sed, no awk, no python, no newline escaping — just CMake reading the
toolchain file like it normally does.

Idempotent via the WZP_SSE_PATCH sentinel grep in the comment block.
2026-04-10 12:50:00 +04:00
Siavash Sameni
fa042b130c fix(windows): sed-patch cargo-xwin toolchain to enable SSE4.1/SSSE3
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m51s
The CFLAGS_x86_64_pc_windows_msvc env-var approach from 990b6f1 did
nothing — cargo-xwin ships its own clang-cl cmake toolchain file at
~/.cache/cargo-xwin/cmake/clang-cl/x86_64-pc-windows-msvc-toolchain.cmake
which hardcodes COMPILE_FLAGS and FORCE-overrides CMAKE_C_FLAGS. Any
env-var CFLAGS gets dropped before opus's cmake build sees it.

The only place that actually makes it into every C file compilation
in the libopus subbuild is the toolchain file itself. Patch it in
place with an idempotent sed that appends

    /clang:-msse4.1
    /clang:-mssse3
    /clang:-msse3
    /clang:-msse2

right before the closing paren of the COMPILE_FLAGS setter. The patch
is marked with a WZP_SSE_PATCH comment so re-runs skip it.

Confirmed the error message matches with/without the env var — same
20 clang errors from NSQ_del_dec_sse4_1.c / NSQ_sse4_1.c before and
after 990b6f1, which is how we ruled out the env-var path.
2026-04-10 12:43:36 +04:00
Siavash Sameni
990b6f1ee0 fix(windows): set CFLAGS +sse4.1 +ssse3 so audiopus_sys builds under clang-cl
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Has been cancelled
libopus ships per-file SSE4.1 / SSSE3 C sources (opus/silk/x86/NSQ_del_dec_sse4_1.c
etc.) that assume the compiler picks up `-msse4.1` / `-mssse3` as per-file
CMake COMPILE_FLAGS. With clang-cl those bare -m flags are silently dropped,
so _mm_cvtepi16_epi32 + friends fail compile with 'always_inline function
requires target feature sse4.1, but would be inlined into a function that
is compiled without support for sse4.1'.

Workaround: set CFLAGS_x86_64_pc_windows_msvc + CXXFLAGS_x86_64_pc_windows_msvc
to `/clang:-msse4.1 /clang:-mssse3 /clang:-msse3 /clang:-msse2` before running
cargo xwin build. Every x86_64 Windows CPU shipped since 2008 has these
instruction sets so globally enabling them on this target is safe.

Also bump the tail -30 on cargo xwin output to tail -50 so the actual
compiler errors (not just the cmake wrapper panic) make it into the
ntfy / remote log file next time.
2026-04-10 12:40:38 +04:00
Siavash Sameni
7949266e11 windows: docker + hcloud build scripts for cross-compile
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m59s
Two parallel paths to build wzp-desktop.exe for x86_64-pc-windows-msvc:

scripts/Dockerfile.windows-builder
  Debian 12 base, matches scripts/Dockerfile.android-builder's layout:
  - apt: build-essential, cmake, ninja-build, llvm, clang, lld, nasm,
    libssl-dev, node 20 LTS
  - rust stable + x86_64-pc-windows-msvc target
  - cargo-xwin pre-installed
  - Pre-warmed ~/.cache/cargo-xwin layer: creates a throwaway cargo
    project and runs `cargo xwin build` once during image build so the
    MSVC CRT + Windows SDK (~1.5 GB) is baked into an image layer.
    Saves ~4 minutes off every cold cross-compile run.
  - Builder user uid 1000 to match existing bind-mount perms on
    SepehrHomeserverdk.

scripts/build-windows-docker.sh
  Same pattern as scripts/build-tauri-android.sh but for Windows:
  - Fires a remote build on SepehrHomeserverdk via ssh + heredoc
  - Mounts the shared cargo-registry + cargo-git cache + a
    target-windows dir (separate from the android target cache so
    different triples don't stomp each other)
  - Runs npm install + npm run build for the frontend dist, then
    cargo xwin build --release --target x86_64-pc-windows-msvc
    --bin wzp-desktop inside the container
  - Uploads the resulting .exe to rustypaste (via the .env token on
    the remote, same as android script) and fires ntfy.sh/wzp
    notifications at start + completion
  - scp's the .exe back to target/windows-exe/wzp-desktop.exe locally
  - --image-build flag triggers a fire-and-forget `docker build` of
    the Dockerfile.windows-builder on the remote (used once after the
    Dockerfile changes). The image is already built at the moment of
    this commit — sha256:f3895cb2fde7

scripts/build-windows-cloud.sh
  Kept as an alternative cross-compile path using a fresh Hetzner VM
  (cx33, 8 vCPU, 8 GB — bumped from cx23 after the smaller size OOM'd
  mid-rustc). The docker-on-SepehrHomeserverdk path is now the
  preferred fast path because the image has a pre-warmed xwin cache
  and a persistent cargo target volume, making warm builds ~3 minutes
  vs the cloud path's ~20 minutes cold each run. The cloud script
  stays around for when we want a truly isolated environment.

Both scripts notify via ntfy.sh/wzp and upload to paste.dk.manko.yoga
so the user can pick up the artefact + see status without polling.
2026-04-10 12:35:02 +04:00
Siavash Sameni
d774f5f8c5 feat(history): dedupe by call_id + explicit Incoming/Outgoing/Missed labels
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Has been cancelled
User reported that outgoing direct calls from macOS show up in the
history list as "missed" even when the call completes successfully.
Adds two changes to fix / diagnose:

1. history::log now dedupes by call_id. If an entry for this call_id
   already exists in the store, it updates the existing row's
   direction + timestamp in place instead of appending a duplicate.
   Protects against double-emit (caller side adding Missed on top of
   Placed, or any future signal loop that fires twice). One row per
   call_id, which matches what the user intuitively expects.

2. history::log now logs every write with tracing::info — call_id,
   peer_fp, direction, alias. Plus an extra line when we replace
   an existing entry: "history::log replacing existing entry
   from=Placed to=Missed" etc. Makes it easy to see in the desktop
   stderr which side is writing what, so we can find the outgoing =>
   missed regression immediately if it recurs.

3. main.ts now renders an explicit text label next to the direction
   arrow: "Outgoing", "Incoming", or "Missed" instead of just the ↗
   ↙ ✗ icons. Removes any ambiguity about what the icon means so
   future users can't misread a Placed entry as Missed based on icon
   shape alone.

Side fix for scripts/build-windows-cloud.sh:
- die() and the do_full ERR trap now respect WZP_KEEP_VM=1 so a failed
  build doesn't auto-destroy the debug VM (previously the trap fired
  before the KEEP_VM check and tore down the VM on any error).
- Bump default server type cx23 → cx33. 4GB RAM is not enough for a
  cold tauri + rustls + quinn + wzp-client cross-compile — the cx23
  run got "Read from remote host ... Connection reset by peer"
  partway through rustc, which is the classic signature of an OOM
  kill on the SSH session. cx33 has 8GB RAM and 8 vCPU which should
  comfortably fit the build.
2026-04-10 12:34:19 +04:00
Siavash Sameni
2fd94651e4 fix(desktop): direct calls used wrong identity file — mac identity mismatch
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m40s
The non-Android branch of CallEngine::start loaded the seed from
\$HOME/.wzp/identity directly, while register_signal in lib.rs goes
through the shared load_or_create_seed() helper which resolves via
APP_DATA_DIR → Tauri's app_data_dir(). On macOS those are two
completely different files:

  register_signal → ~/Library/Application Support/com.wzp.desktop/.wzp/identity
  CallEngine::start (old) → ~/.wzp/identity

On a fresh install they end up holding two different random seeds.
Register and CallEngine then derive two different fingerprints from
those seeds, and when a direct call comes in the relay routes it to
"you" under the register_signal fingerprint, but once CallEngine tries
to join the call-* room it advertises a DIFFERENT fingerprint — which
fails the call_registry ACL check on the relay side (only the two
authorised participants of a call can join its room). Silent hang, the
call never completes.

Android hit this bug earlier in the week and was fixed by switching
its CallEngine::start branch to `crate::load_or_create_seed()`.
Backport the same single-line change to the desktop branch so both
platforms share one identity source of truth.

Also bring the desktop branch up to parity with the android branch on
diagnostic logging:
- log CallEngine::start entry with relay/room/alias/quality/has_reuse
- log endpoint.local_addr on reuse / create
- log "QUIC connection established, performing handshake" between
  connect() and perform_handshake() so a hang at either step is
  immediately localisable
- map_err all three potential failure points (create_endpoint,
  connect, perform_handshake) to an explicit error! trace
2026-04-10 12:15:23 +04:00
Siavash Sameni
da09fdb6e9 windows(desktop): gate coreaudio / VoiceProcessingIO to macOS-only targets
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m34s
First step of the Windows x86_64 desktop build: stop pulling
coreaudio-rs into the Windows dependency graph so the project can at
least run `cargo check --target x86_64-pc-windows-msvc`. Software AEC
is already disabled in engine.rs so there's nothing else to stub — the
macOS-specific VPIO path is skipped via #[cfg(target_os = "macos")] on
both sides and Windows falls through to the plain CPAL
AudioCapture/AudioPlayback branch that already existed.

crates/wzp-client/Cargo.toml
  - coreaudio-rs optional dep moved under [target.'cfg(target_os = "macos")']
  - `vpio` feature now uses `dep:coreaudio-rs` syntax and the gated dep
  - Enabling `vpio` on Windows/Linux is a no-op at resolution time

crates/wzp-client/src/lib.rs
  - `pub mod audio_vpio` is now #[cfg(all(feature = "vpio", target_os = "macos"))]
  - Previously `vpio` alone was enough to try to compile the Core Audio
    bindings, which would fail on non-Apple targets the moment the
    feature flag was flipped on

desktop/src-tauri/Cargo.toml
  - [target.'cfg(not(target_os = "android"))'] removed — was leaking
    vpio into Windows/Linux via the catch-all.
  - macOS: wzp-client with features = ["audio", "vpio"]
  - Windows: wzp-client with features = ["audio"]
  - Linux: wzp-client with features = ["audio"]
  - Android: wzp-client with default-features = false (unchanged)
  - Dropped the unused direct coreaudio-rs = "0.11" dep on macOS —
    wzp-desktop's own sources never call Core Audio directly.

Verified via `cargo tree --target x86_64-pc-windows-msvc -p wzp-desktop`
that the Windows target now resolves wzp-client with cpal but without
coreaudio-rs. macOS target still resolves with coreaudio (direct via
vpio feature and transitively via cpal). macOS `cargo check` still
builds cleanly.

Cross-compile from macOS hit a cargo-xwin + llvm-lib setup issue in
ring's build.rs, so the actual `cargo check --target
x86_64-pc-windows-msvc` did not complete locally. Build verification
belongs on the user's Windows x86_64 host where MSVC is present
natively.

See tasks #23 (this one), #24 (Voice Capture DSP / WASAPI Communications
for OS-level AEC on Windows), and #25 (aarch64-pc-windows-msvc support).
2026-04-10 11:12:08 +04:00
Siavash Sameni
510eae2089 feat(direct-call): call history, recent contacts, deregister button
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
Persistent JSON-backed call history for the direct-call screen so users
can see what they've placed / received / missed and dial back with one
click. Also fixes two small latent UX issues reported alongside.

Backend (Rust)
- new crate/module desktop/src-tauri/src/history.rs: thread-safe in-
  process store (OnceLock<RwLock<Vec<CallHistoryEntry>>>) backed by
  <APP_DATA_DIR>/call_history.json. Atomic writes via temp+rename. Max
  200 entries, FIFO pruning. CallDirection { Placed, Received, Missed }.
- Log hooks in the signal loop + commands:
    * place_call     → Placed entry (with target fingerprint)
    * DirectCallOffer → Missed entry up front; upgraded to Received
                        inside answer_call when accept_mode != Reject
                        via history::mark_received_if_pending(call_id).
                        If user rejects or never answers, it stays Missed.
- New Tauri commands:
    * get_call_history()     → all entries, newest first
    * get_recent_contacts()  → unique peers by fp, newest interaction first
    * clear_call_history()   → wipes JSON + in-memory
    * deregister()           → tears down signal transport + endpoint
  Backend emits `history-changed` events so the UI can live-refresh
  without polling.

Frontend (main.ts + index.html + style.css)
- Direct-call panel now has:
    * Recent contacts chip row (top 6 unique peers). Click a chip → dial.
    * Call history list (up to 50 rows). Direction icon (↗ placed, ↙
      received, ✗ missed), peer alias/fp, relative timestamp, callback
      button. Both click handlers populate target-fp and fire place_call.
    * Deregister button in the "registered" header — calls the new
      deregister command, tears down the signal transport, returns the
      UI to the pre-register state.
    * Clear-history link in the history header.
- Subscribes to `history-changed` events so the list updates the moment
  the backend logs a new entry. Also refreshed on register + after a
  clear.
- Nothing is rendered until there is data — empty sections stay hidden.

Tasks #20 + #21 (small UX items bundled in)
- Default room "general" for new installations: the html input value
  attribute is now "general" and loadSettings() defaults match. Existing
  users' localStorage still wins.
- Random alias on desktop: already latent but confirmed working — the
  startup IIFE at main.ts:374 calls get_app_info() and prefills the
  alias input from derive_alias(seed) when the input is empty. No code
  change needed, just verified it flows through the same path as the
  Android client.

Known follow-ups (deferred to step 6 polish)
- Call duration tracking (currently all entries have no duration field)
- Hangup signal from an unanswered incoming should emit history-changed
  so the missed state is visible even when the user never tapped accept
- Android UI layout fit-check on the smaller Nothing screen
2026-04-10 11:03:36 +04:00
Siavash Sameni
76a4c53e21 fix(android-audio): spawn_blocking for Oboe restart — unblock tokio executor
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
Build 4c6aac6 added a stop+sleep+start Oboe restart inside the
set_speakerphone Tauri command, but calling wzp_native::audio_stop()
and audio_start() synchronously from an async fn blocks the tokio
executor thread — those FFI calls wait for AAudio to finalise the
stream teardown/bringup, which takes ~400ms each on Nothing phone
(Pixel is fast enough to hide the bug).

Reproduced on Nothing: 7 rapid Speaker button clicks across ~30
seconds, each restarting Oboe. After the 5th click the engine send
and recv tokio tasks froze for 22 seconds — decoded_frames stuck at
1159 across 9 heartbeats, send_drops growing from 148 to 1720 as
encoded frames couldn't make it past `send_t.send_media(pkt).await`.
At 08:40:48 the runtime finally caught up and processed a 911-frame
burst at once (buffered QUIC datagrams flooding through). Classic
"blocking sync call in async context" anti-pattern.

Fix: run the stop + start sequence inside tokio::task::spawn_blocking
so the Oboe teardown + reopen happens on a dedicated blocking thread,
leaving the tokio runtime free to keep driving the send and recv
tasks. AAudio's requestStop returns only after the stream is actually
in Stopped state, so the explicit sleep that bridged stop and start
is no longer needed and is dropped.

Send and recv tasks still see a ~500ms window of empty reads /
partial writes during the blocking restart, but they get SCHEDULED
through it — network packets keep being received + decoded + dropped
into the playout ring, and captured mic samples keep being encoded +
sent through quinn. No more executor starvation, no more 22-second
audio dropouts, no more send_drops burst.

Pixel still worked before this fix only because its AAudio teardown
is fast enough to not exceed the scheduler's cooperative yield
interval — same bug was latent on both devices, Nothing just made it
visible.
2026-04-10 08:45:54 +04:00
Siavash Sameni
4c6aac654a fix(android-audio): restart Oboe on speakerphone toggle + unbreak button UI
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m25s
Build 4f2ad65 wired the Speaker button to AudioManager.setSpeakerphoneOn
but user testing found that flipping speakerphone on an active Oboe
VoiceCommunication stream silently tears down the AAudio streams on
Pixel-class devices — both capture and playout stop producing data.
Only ending the call and rejoining brings audio back (because the fresh
Oboe open runs with the new routing already applied).

Also the earpiece state showed up red in the UI because the button was
getting the `.muted` CSS class when speakerphoneOn=false. Earpiece is a
valid routing state, not a muted one.

Fix set_speakerphone Tauri command:
  1. Flip AudioManager.setSpeakerphoneOn via JNI (as before).
  2. If the Oboe backend is currently running, stop it, sleep 50 ms to
     let AAudio finalise the transition, then start it again. The Rust
     send/recv tokio tasks keep running across the gap — they just read
     zero samples and write into the preserved ring buffers for a few
     frames, which is acceptable. The AudioBackend singleton's ring
     state is preserved across stop+start because it's in a 'static
     OnceLock.
  3. Debounce the UI click via speakerphoneBusy + spkBtn.disabled so
     users can't queue up multiple toggles during the restart window.

Fix main.ts Speaker button:
  - Remove the `.muted` classList toggle (added `.speaker-on` for CSS).
  - Update label text to "🔊 Speaker" / "🔈 Earpiece" for clarity.
  - On showCallScreen(), invoke is_speakerphone_on to sync the label
    with the real AudioManager state, so it matches reality after a
    rejoin (which was another symptom the user hit — the button label
    desynced from the actual routing after ending and restarting a
    call).
  - Debounce click + disable button while the restart is in flight.

Drops #[allow(dead_code)] from wzp_native::audio_is_running now that it
is actually called from the set_speakerphone restart guard.
2026-04-10 07:35:12 +04:00
Siavash Sameni
4f2ad65418 fix(android_audio): add explicit pointer types for .cast() — was rejected by rustc E0282 on android target
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 4m6s
2026-04-09 22:02:48 +04:00
Siavash Sameni
0178cbd91d android(audio): Speaker button toggles earpiece↔speaker via JNI (WIP, untested)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Has been cancelled
Build 9e37201 confirmed on-device that Usage::VoiceCommunication +
MODE_IN_COMMUNICATION + speakerphoneOn=false routes Oboe playout to the
handset earpiece and the callback drains the ring correctly. Next step:
let the user flip speakerphoneOn at runtime so the existing Speaker
button actually switches audio routing instead of just gating writes.

- Cargo.toml (android target): pull in `jni = 0.21` and
  `ndk-context = 0.1`. Both are already transitively in the lockfile
  via Tauri/Wry, so this just promotes them to direct deps.
- desktop/src-tauri/src/android_audio.rs: new module. Grabs the JavaVM +
  current Activity from `ndk_context::android_context()`, attaches a
  JNI thread, calls `activity.getSystemService("audio")` to get the
  AudioManager, and exposes `set_speakerphone(bool)` +
  `is_speakerphone_on()` helpers that call the AudioManager method of
  the same name. All gated behind `#[cfg(target_os = "android")]`.
- lib.rs: adds `mod android_audio;` (android only), two new Tauri
  commands `set_speakerphone(on)` and `is_speakerphone_on()` — desktop
  gets no-op stubs so the same frontend invoke() works everywhere.
  Both registered in the invoke_handler.
- desktop/src/main.ts: the Speaker button (previously toggled the
  playout-write gate via `toggle_speaker`) now calls `set_speakerphone`
  and reads back the new routing state. Labels switched from
  "Spk" / "Spk Off" to "Earpiece" / "Speaker" so users can't be
  confused into thinking clicking turns audio off. pollStatus no longer
  clobbers the spkBtn label based on engine spk_muted, since the two
  concepts are now decoupled.

WIP because this has NOT been built or tested yet — committing at night
to save the work. Tomorrow: build #50 with this change, smoke-test the
Handset↔Speaker toggle, then move on to call history + last-contacts UI
and the Speaker-button mute bug on the other phone.
2026-04-09 22:00:34 +04:00
Siavash Sameni
9e37201198 android(audio): Usage::VoiceCommunication + MODE_IN_COMMUNICATION, default handset
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m44s
With da106bd (Usage::Media + MODE_NORMAL) audio works but is always on
the loudspeaker — we want handset as the default with a user-driven
toggle for speaker (and later bluetooth). The right Oboe usage for a
VoIP app is VoiceCommunication, which honours
AudioManager.setSpeakerphoneOn / setBluetoothScoOn for routing.

Bisection across previous builds showed that setAudioApi(AAudio) +
Usage::VoiceCommunication made the playout callback stop draining the
ring after cb#0 (build 8c36fb5 logs). Letting Oboe pick the AudioApi
implicitly keeps the callback alive — 96be740's Media-usage callbacks
fired at steady 50Hz without any explicit setAudioApi. So: keep the
Usage change, DROP the explicit AAudio force.

- oboe_bridge.cpp: Usage::VoiceCommunication, no setAudioApi, no
  ContentType override.
- MainActivity.kt: setMode(MODE_IN_COMMUNICATION) +
  setSpeakerphoneOn(false) = handset default, plus max both
  STREAM_VOICE_CALL and STREAM_MUSIC volumes for belt-and-braces.

Next build will add a JNI-based Tauri command to flip speakerphoneOn
at runtime so the user can toggle handset↔speaker during a call.
2026-04-09 21:50:06 +04:00
Siavash Sameni
da106bd939 fix(android-audio): revert to 96be740's Oboe config — VoiceCommunication broke callback drain
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
Build 8c36fb5 logs showed a new regression: Oboe playout cb#0 fires once
at startup then the callback STOPS DRAINING the ring entirely.
written_samples sticks at 7679 (= RING_CAPACITY - 1) across every recv
heartbeat in a 40-second test. Meanwhile the recv task decodes 1800+ real
audio frames (sample range up to [-27920..31907], rms 12065) which all
get dropped on the floor by audio_write_playout returning 0 because the
ring is full.

Bisection: 96be740 (Usage::Media, no setAudioApi, no ContentType, no
MainActivity audio mode change) DID drive the playout callback at the
expected 50Hz (playout heartbeat: calls=1100 total_played_real=1055040
over 22 seconds). User still heard nothing there because of OS routing,
but at least Oboe accepted the PCM.

8c36fb5 added three changes on top of 96be740:
  1. Oboe Usage::Media → Usage::VoiceCommunication
  2. Oboe setAudioApi(oboe::AudioApi::AAudio) explicit
  3. Oboe setContentType(ContentType::Speech)
  4. MainActivity setMode(MODE_IN_COMMUNICATION) + setSpeakerphoneOn(true)
Every one of those could have killed the callback; combined they did.

Revert to 96be740's exact Oboe config: Usage::Media, no setAudioApi, no
ContentType. Keep the PCM recorder, heartbeat logging, and stream-open
logging. Separately, MainActivity now maxes STREAM_MUSIC (the stream
Usage::Media routes to) but leaves audio mode in MODE_NORMAL — no more
speakerphone/call-mode combo that makes Oboe unhappy. In NORMAL mode a
STREAM_MUSIC stream plays through the loud speaker by default.

Proof that the Rust pipeline is perfect: decoded.pcm recorded in 8c36fb5
was pulled via `adb shell run-as com.wzp.desktop cat .wzp/decoded.pcm`,
converted with ffmpeg, and played back on the Mac — user confirmed
audible speech. So 100% of the remaining bug surface is Android audio
routing, not anything in the Rust/C++ decode path.
2026-04-09 21:38:19 +04:00
Siavash Sameni
8c36fb5651 fix(wzp-native): Oboe ResultWithValue has no value_or, unfold explicitly
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m55s
cc-rs build of oboe_bridge.cpp failed at cfa9ff6 because the Oboe
ResultWithValue<T> template returned by getXRunCount() does not have
a .value_or(T) method — only .value(). Replace with an explicit
bool-conversion + .value() guard that yields -1 on error.
2026-04-09 21:25:38 +04:00
Siavash Sameni
cfa9ff67cf fix(android-audio): VoIP mode + speakerphone + debug PCM recorder
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Has been cancelled
Build 96be740 logs proved the entire software pipeline is healthy:
  capture heartbeat:   calls=1100 to_write=960 full_drops=0 total_written=1056000
  recv heartbeat:      decoded_frames=1035 last_written=960 decode_errs=0
  recv decoded PCM:    range=[-13564..9244] rms=8044 (real audio)
  playout WRITE:       in_len=960 written=960 rms=2318 (real audio into the ring)
  playout heartbeat:   calls=1100 nonempty=1099 total_played_real=1055040

1055040 samples / 48000 Hz = 22s — exactly matches wall-clock elapsed,
meaning Oboe IS calling our playout callback at the expected rate and
WE ARE handing it real PCM every 20ms. User still heard nothing. Ergo
Oboe accepted the PCM and routed it to a silent output. Two fixes:

1) MainActivity.kt: switch to MODE_IN_COMMUNICATION + speakerphone ON
   right after permissions are granted, and crank STREAM_VOICE_CALL to
   max. Without this, an Oboe Usage::VoiceCommunication stream gets
   opened, the OS creates a real AAudio pipeline, the callback fires on
   schedule — and audio goes to either the earpiece at muted volume or
   a "call not active" dead end. Logs the audio mode + volume levels
   before and after the switch so we can confirm the state change in
   logcat next run.

2) oboe_bridge.cpp: revert Usage::Media → VoiceCommunication (the mode
   that matches MODE_IN_COMMUNICATION), pin the audio API to AAudio
   explicitly instead of letting Oboe fall back to OpenSLES (which has
   its own silent-drop failure modes on some devices), and add getState
   + getXRunCount to the playout heartbeat so we'll see silent stream
   disconnects instead of reading zeros forever.

3) engine.rs recv task: dump the first ~10s of post-AGC decoded PCM to
   `<app_data_dir>/decoded.pcm` as raw i16 LE so we can adb pull it and
   play it back locally:
       adb shell run-as com.wzp.desktop cat .wzp/decoded.pcm > decoded.pcm
       ffmpeg -f s16le -ar 48000 -ac 1 -i decoded.pcm decoded.wav
   This divorces "is our decoder actually producing audible audio" from
   "is Android's audio stack playing it". If the recorded WAV sounds
   correct when played on a laptop, the decoder is fine and 100% of the
   remaining bug surface is AudioManager / Oboe routing.

4) engine.rs: also log when spk_muted=true blocks the write. User
   reported the Speaker button in the UI has inconsistent semantics
   between desktop and android — adding this log rules out the accidental
   "first click muted playback" theory for good.
2026-04-09 21:24:26 +04:00
Siavash Sameni
96be740fd9 diag(android-audio): aggressive logging across the whole Oboe pipeline
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
User confirmed: mac hears android, android does not hear mac. So Oboe
capture works end-to-end but Oboe playout on Android silently drops
audio even though QUIC forwards the packets. Archaeology on the legacy
wzp-android crate also revealed that the "last known good" Android audio
path NEVER used Oboe in production — it used Kotlin AudioRecord +
AudioTrack via JNI, and cpp/oboe_bridge.cpp was dead code. So every time
we've "tested" Oboe end-to-end this week was the first production use,
and any of its config knobs could be the bug.

Instrumenting every stage of the pipeline so one smoke-test log dump can
isolate the layer at fault:

C++ (oboe_bridge.cpp)
  - Log the ACTUAL stream parameters after openStream for both capture
    and playout (sample rate, channels, format, framesPerBurst,
    framesPerDataCallback, bufferCapacityInFrames, sharing, perf mode).
    Oboe may silently override values we requested — e.g. if we ask for
    48kHz mono but the device gives us 44.1kHz stereo our 960-sample
    frames are the wrong duration and the pipeline drifts.
  - Capture callback: on cb#0 log sample range+RMS of the first frame
    to prove we get real mic data (not zeros). Every 50 callbacks
    (~1s at 20ms burst) log calls, numFrames, ring available_write,
    bytes actually written, ring_full_drops, total_written.
  - Playout callback: on cb#0 log numFrames + ring state. On the FIRST
    non-empty read log sample range+RMS so we can tell if the samples
    coming out of the ring are real audio or zeros. Every 50 callbacks
    log calls, nonempty count, numFrames, ring available_read,
    underrun_frames, total_played_real.

Rust wzp-native (src/lib.rs)
  - wzp_native_audio_write_playout now logs the first 3 writes and then
    every 50th: in_len, written, sample range, RMS, ring write/read
    cursors before, available_read and available_write after. Reveals
    ring-overflow and whether the engine is actually handing us audio.
  - Minimal android logcat shim via __android_log_write extern — no
    new crate dependency.
  - AudioBackend grows a `playout_write_log_count` AtomicU64 to gate
    the write-side log throttle.

Rust engine.rs (android branch)
  - Recv task: log sample range + RMS for the first 3 decoded PCM
    frames and then every 100th. Reveals whether decoder.decode is
    producing real audio or silent buffers.
  - Recv task: if audio_write_playout returns fewer samples than we
    handed it (partial write → ring nearly full) warn about it in the
    first 10 frames.
  - Recv heartbeat every 2s: recv_fr, decoded_frames, last_decode_n,
    last_written, written_samples, decode_errs, codec.

Expected flow in a healthy log:
  capture cb#0: numFrames=960 range=[-1200..900] rms=180          ← mic OK
  capture stream opened: actualSR=48000 Ch=1 ...                   ← no override
  playout stream opened: actualSR=48000 Ch=1 ...
  CallEngine::start invoked ... → connected → audio started
  recv: first media packet received ...
  recv: decoded PCM sample range decoded_frames=1 range=[-300..250] rms=92
  playout WRITE #0: in_len=960 written=960 range=[-300..250] rms=92
  playout FIRST nonempty read: to_read=960 range=[-300..250] rms=92
  playout heartbeat: calls=50 nonempty=50 underrun=0 ...
  recv heartbeat: decoded_frames=100 last_written=960 ...

If any of those are missing/zero we know the exact stage to fix.
2026-04-09 21:13:29 +04:00
Siavash Sameni
8c4d640f89 fix(android): playout Usage::Media + relay CallSetup advertises real IP
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Three real bugs, one smoke-test session's worth of progress.

1. RELAY: wrong advertised addr in CallSetup
   The direct-call CallSetup computed `relay_addr = addr.ip()` where
   `addr = connection.remote_address()` — i.e. the CLIENT'S IP, not the
   relay's. So the relay was telling both parties "the call room is at
   the answerer's IP:4433", which meant each client dialed either the
   other client (no server listening) or themselves. Both endpoint.connect
   calls hung forever and the call never happened.
   Fix: compute the relay's own advertised IP once at startup. If the
   listen addr is 0.0.0.0, probe the primary outbound interface via the
   classic UDP-bind-and-connect(8.8.8.8:80) trick to discover the LAN
   IP the OS would use to reach external hosts. Thread the resulting
   advertised_addr_str into the CallSetup sender for both parties.

2. RELAY: accept loop serialized QUIC handshakes
   Previously the main accept loop called `wzp_transport::accept` which
   did both `endpoint.accept().await` AND `incoming.await` (the server-
   side QUIC handshake). A single slow handshake therefore blocked every
   subsequent client from being accepted. Unroll the helper here and
   move `incoming.await` into the per-connection spawned task, so every
   handshake runs in parallel. Also log "accept queue: new Incoming",
   "QUIC handshake complete", and "QUIC handshake failed" so we can tell
   immediately whether a client's packets are reaching the relay at all.

3. ANDROID: playout was routed to the silent in-call stream
   The Oboe playout stream was configured with Usage::VoiceCommunication,
   which routes to the Android in-call earpiece stream. That stream is
   silent unless the Activity has called AudioManager.setMode(
   IN_COMMUNICATION) and, even then, only the earpiece/BT headset get
   audio (not the loud speaker). Result: android→mac calls worked
   because mac had a normal media output, but mac→android calls were
   silent even though packets flowed through the relay just fine.
   Switch to Usage::Media + ContentType::Speech so Oboe routes to the
   loud speaker and uses the media volume slider. A later polish step
   will wire setMode + setSpeakerphoneOn from MainActivity.kt so we can
   go back to VoiceCommunication for AEC and proximity-sensor routing.

Plus: heartbeat tracing every 2s in the send/recv tasks — frames_sent,
last_rms, last_pkt_bytes, short_reads on the send side; decoded_frames,
last_decode_n, last_written, decode_errs on the recv side. Will make the
next "no sound" regression trivial to localize.
2026-04-09 20:55:10 +04:00
Siavash Sameni
49f101d785 fix(android): reuse signal endpoint for direct-call media connection
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
Direct-call accept hangs forever at the QUIC handshake on Android. Logs
from d7b37a5 showed:
  CallEngine::start (android) invoked relay=172.16.81.172:4433 room=call-…
  resolved relay addr
  identity loaded
  endpoint created, dialing relay   ← reached
                                    ← nothing, 90s+, no error
The "connect failed" and "QUIC connection established" log lines never
fire, meaning endpoint.connect_with(…).await never makes progress.

Repro is 100%: SFU room join (one endpoint) works perfectly; direct call
(opens a SECOND quinn::Endpoint on top of the signal one) hangs in the
QUIC handshake. Creating two quinn::Endpoints on Android's AAudio-adjacent
UDP stack apparently causes the second one's datagrams to never reach the
relay (the server never sees the Initial packet). Rather than fight the
platform, quinn is happy to multiplex multiple Connections on a single
Endpoint — so we reuse the signal endpoint for the media connection.

- SignalState now stores the quinn::Endpoint alongside the QuinnTransport.
  register_signal populates both at the same time.
- CallEngine::start (both android and desktop branches) takes an
  Option<wzp_transport::Endpoint>. Some → reuse (direct-call path, after
  register_signal). None → create fresh (SFU room join path).
- The connect tauri command reads state.signal.endpoint and threads it
  through to CallEngine::start, so the direct-call auto-connect (fired by
  the "setup" signal-event in main.ts) lands on the existing UDP socket.
- wzp_transport re-exports quinn::Endpoint so wzp-desktop doesn't need to
  depend on quinn directly.
- Also wraps the android connect in tokio::time::timeout(10s) so future
  hangs become deterministic "connect TIMED OUT" errors in logcat
  instead of silent deadlock.

Same fix applies verbatim to the desktop client — the user suspects
direct call is broken there too and this was likely always the cause,
just never surfaced because desktop was only tested via SFU rooms.
2026-04-09 20:29:51 +04:00
Siavash Sameni
d7b37a5749 diag: tracing for direct-call signal loop + CallEngine::start stages
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
User reports tapping "answer" on an incoming direct call does nothing
visible, and suspects the same may affect desktop. The signal recv loop
had no tracing at all, so we can't tell whether CallSetup is being
received, whether the recv loop died silently, or whether
CallEngine::start is failing between "identity loaded" and
"connected to relay, handshake complete".

- register_signal recv loop now logs every message type with fields
  (CallRinging, DirectCallOffer, DirectCallAnswer, CallSetup, Hangup,
  unhandled), plus a warn! on recv errors and a final warn when the
  loop exits.
- place_call / answer_call commands log entry + success / error. The
  answer_call error path logs the underlying send_signal error so we
  can see it in logcat instead of only in the JS error toast.
- CallEngine::start android branch logs relay/room/alias on entry,
  logs "endpoint created, dialing relay" between create_endpoint and
  connect, "QUIC connection established, performing handshake" between
  connect and perform_handshake, and promotes all three potential
  failures to explicit error! logs so a silent hang / error becomes
  visible in logcat.

No functional changes — pure diagnostics. Stacks on b35a6b7 (the Oboe
stack-pointer-escape fix) so build #43 carries both.
2026-04-09 19:17:03 +04:00
Siavash Sameni
b35a6b7d92 fix(wzp-native): copy WzpOboeRings by value, not by pointer
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
PlayoutCallback::onAudioReady crashed with SIGSEGV(SEGV_ACCERR) on the
first AAudio callback because g_rings was a `const WzpOboeRings*` pointing
at the caller's stack frame. wzp_native_audio_start() constructs the
rings struct as a stack local in Rust, passes &rings to wzp_oboe_start
(which stored the raw pointer), and returns — at which point the stack
frame unwinds and g_rings becomes a dangling reference. The first audio
callback then read from freed memory and died.

- g_rings is now a static WzpOboeRings value (was `const WzpOboeRings*`).
  The raw int16 buffer + atomic index pointers inside the struct still
  point into the Rust-owned AudioBackend singleton, which is leaked for
  the lifetime of the process, so deep-copying the struct by value is
  safe and keeps the inner pointers valid forever.
- g_rings_valid atomic bool gates the audio-callback reads: set to true
  after the value copy in wzp_oboe_start, cleared in wzp_oboe_stop BEFORE
  the streams are torn down so any in-flight callback sees "no backend"
  and returns Stop instead of racing on g_rings.
- All g_rings->x accesses in the capture + playout callbacks switched to
  g_rings.x (member-of-value).

Reproduced on Pixel 6 / Android 15 with build 0105b0f:
  F libc: Fatal signal 11 (SIGSEGV), code 2 (SEGV_ACCERR),
          fault addr 0x71aa717eb0 in tid 11822 (AudioTrack)
  #00 PlayoutCallback::onAudioReady(oboe::AudioStream*, void*, int)+120
  #01 oboe::AudioStream::fireDataCallback(void*, int)+136
  ...
2026-04-09 19:11:16 +04:00
Siavash Sameni
0105b0fbf3 phase 3(android): RECORD_AUDIO permission + runtime request in MainActivity
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 4m0s
Oboe fails silently to open the AAudio input stream without
android.permission.RECORD_AUDIO, so the call audio would never actually
flow even after phase 3's engine wiring.

- AndroidManifest.xml: declare RECORD_AUDIO and MODIFY_AUDIO_SETTINGS, and
  android.hardware.microphone as a required feature. These files are the
  cargo-tauri-generated scaffold — nothing in .gitignore excludes them, so
  the intended Tauri 2 mobile workflow is to commit them once populated.

- MainActivity.kt: override onCreate to call ActivityCompat.requestPermissions
  for the audio perms on first launch. The dialog shows exactly once; the
  grant is persisted per-package. onRequestPermissionsResult logs the
  outcome so we can spot failures in logcat.

A full native Tauri permission plugin integration is deferred to
Step 6 (polish) together with notifications, icon, and background service.
2026-04-09 19:00:12 +04:00
Siavash Sameni
5beea7de40 phase 3(android): unify connect/disconnect/toggle_*/get_status commands
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
Step 3 of the Tauri Android rewrite was still returning "audio backend not
yet wired on Android (step 3)" because the cfg-gated Android stubs for
connect/disconnect/toggle_mic/toggle_speaker/get_status were shadowing the
real commands. Now that CallEngine::start() has a real Android body (phase
3, commit fdbe502), the gates are unnecessary.

- Drop the #[cfg(not(target_os = "android"))] gates from all five
  engine-backed Tauri commands.
- Delete the Android stub block (~50 LOC of "not connected" boilerplate).
- Ungate `use engine::CallEngine;` and the AppState.engine field so both
  targets share the same Mutex<Option<CallEngine>>.
- CallEngine::stop() now calls crate::wzp_native::audio_stop() on Android so
  the mic + speaker are released between calls, matching the desktop
  behaviour where dropping _audio_handle tears down CPAL.

Direct-call flow on Android: peer sends DirectCallOffer → user accepts via
answer_call → relay sends signal "setup" event → main.ts auto-invokes
connect(relay, room) → CallEngine::start() runs the Android branch →
wzp_native::audio_start() brings up Oboe → send/recv tasks stream PCM
through the dlopen boundary.
2026-04-09 18:53:54 +04:00
Siavash Sameni
fdbe502524 phase 3(android): wire CallEngine::start to wzp-native audio FFI
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
Replaces the Android-side CallEngine::start() stub with a real implementation
that mirrors the desktop start() body but routes all PCM through the
standalone wzp-native cdylib loaded at startup via libloading instead of
using CPAL.

- desktop/src-tauri/src/wzp_native.rs: new module with a static
  OnceLock<libloading::Library> + cached raw fn pointers for every symbol
  we need (version, hello, audio_start/stop, read_capture, write_playout,
  is_running, capture/playout_latency_ms). init() resolves everything once
  at startup; accessors return default values if init() never ran.

- desktop/src-tauri/src/lib.rs: drop the inline dlopen smoke test, add
  `mod wzp_native;` behind target_os="android", and invoke
  wzp_native::init() from the Tauri setup() callback so the library is
  loaded + all symbols cached before any CallEngine can touch audio.

- desktop/src-tauri/src/engine.rs: the Android #[cfg] branch of
  CallEngine::start() now does the full QUIC handshake + signal loop +
  Opus send/recv tasks, calling wzp_native::audio_start() /
  audio_read_capture() / audio_write_playout() instead of the desktop
  CPAL rings. SyncWrapper now holds a placeholder Box<()> on Android
  because the audio backend lives in a process-global singleton inside
  libwzp_native.so rather than being owned per-engine.

Next step: build #39 on the remote docker builder and smoke-test on
Pixel 6 that the Connect button in the UI successfully brings up Oboe
and streams audio through the dlopen boundary.
2026-04-09 18:42:27 +04:00
Siavash Sameni
c769a476a2 phase 2(android): port Oboe C++ bridge + audio FFI into wzp-native
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m56s
Now that Phase 1 proved the split-cdylib pipeline (build #37 launched
cleanly with 'wzp-native dlopen OK: version=42 msg=...' in logcat),
this commit brings the real audio code into wzp-native without ever
touching the Tauri crate:

- cpp/oboe_bridge.{h,cpp}, oboe_stub.cpp, getauxval_fix.c copied
  verbatim from crates/wzp-android/cpp/ (same files that work in the
  legacy wzp-android .so on this phone)
- build.rs near-identical to crates/wzp-android/build.rs: clones
  google/oboe@1.8.1 into OUT_DIR, compiles oboe_bridge.cpp + all
  oboe source files as a single static lib with c++_shared linkage,
  emits -llog + -lOpenSLES. On non-android hosts it compiles just
  oboe_stub.cpp so `cargo check` works locally without an NDK.
- Cargo.toml gets cc = "1" in [build-dependencies]. This is SAFE
  because wzp-native is a single-cdylib crate — crate-type is only
  ["cdylib"], no staticlib, so rust-lang/rust#104707 does not apply.
- src/lib.rs extends the FFI surface with the real audio API:
    wzp_native_audio_start() -> i32
    wzp_native_audio_stop()
    wzp_native_audio_read_capture(*mut i16, usize) -> usize
    wzp_native_audio_write_playout(*const i16, usize) -> usize
    wzp_native_audio_capture_latency_ms() -> f32
    wzp_native_audio_playout_latency_ms() -> f32
    wzp_native_audio_is_running() -> i32
  Plus a static AudioBackend singleton holding the two SPSC ring
  buffers (capture + playout) that are shared with the C++ Oboe
  callbacks via AtomicI32 cursors. The wzp_native_version() and
  wzp_native_hello() smoke tests from Phase 1 are preserved.

Compiles cleanly on macOS host with the stub oboe .cpp. Next build
will exercise the full cargo-ndk path inside docker to verify the
whole Oboe compile still works standalone.

Phase 3 (next commit): wzp-desktop engine.rs on Android calls
wzp-native's audio FFI via the already-wired libloading handle, and
the real CallEngine::start() is implemented for Android using the
same codec/handshake/send/recv pipeline as desktop but with Oboe
rings instead of CPAL rings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:12:01 +04:00
Siavash Sameni
7cc53aedc7 refactor(android): split C++ into wzp-native cdylib, loaded at runtime
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m34s
Phase 1 of the big refactor. Escape the Tauri Android
__init_tcb+4 symbol leak (rust-lang/rust#104707) by making
wzp-desktop's Android .so pure Rust — ZERO cc::Build, no cpp/ files,
no C++ in the rustc link step. All future C++ (Oboe audio bridge)
lives in a new standalone cdylib crate `wzp-native` which is built
with cargo-ndk (the same path the legacy wzp-android crate uses
successfully on the same phone + same NDK), copied into Tauri's
gen/android/app/src/main/jniLibs at build time, and dlopened by
wzp-desktop at runtime via libloading.

Changes in this commit:
- NEW crate crates/wzp-native/ with crate-type = ["cdylib"] only
  (no staticlib, no rlib — rust#104707 shows mixing staticlib with
  cdylib leaks non-exported symbols, which is the original bug
  source). Phase 1 scaffold has TWO extern "C" functions:
    wzp_native_version() -> i32            (returns 42)
    wzp_native_hello(buf, cap) -> usize    (writes a string)
  So we can verify dlopen + dlsym + cross-.so FFI end-to-end
  before adding any real C++.
- desktop/src-tauri/cpp/ directory DELETED (7 files gone).
- desktop/src-tauri/build.rs reduced to just the git hash capture
  + tauri_build::build(). No more cc::Build of any kind.
- desktop/src-tauri/Cargo.toml: drop cc from build-dependencies,
  add libloading = "0.8" as an Android-only runtime dep.
- desktop/src-tauri/src/lib.rs Builder::setup() now (on Android only)
  dlopens libwzp_native.so, calls wzp_native_version() and
  wzp_native_hello(), and logs the result:
    "wzp-native dlopen OK: version=42 msg=\"hello from wzp-native\""
  If this log appears in logcat when the app launches and the home
  screen still renders, the split-cdylib pipeline is validated and
  Phase 2 (port the Oboe bridge into wzp-native) can proceed.
- scripts/build-tauri-android.sh: insert a `cargo ndk -t arm64-v8a
  build --release -p wzp-native` step before `cargo tauri android
  build`, with `-o desktop/src-tauri/gen/android/app/src/main/jniLibs`
  so the resulting libwzp_native.so lands in the place gradle will
  package into the final APK.
- Workspace Cargo.toml: add crates/wzp-native to [workspace] members.

Phase 2 (separate commit, only if Phase 1 works):
- Copy cpp/oboe_bridge.{h,cpp} + getauxval_fix.c from the legacy
  wzp-android crate into crates/wzp-native/cpp/.
- Add cc = "1" as a build-dependency on wzp-native (safe: it's a
  single-cdylib crate with no staticlib, so no symbol leak).
- Add build.rs that compiles the Oboe C++ and the wzp-native Rust
  FFI exposes the audio start/stop/read/write functions.
- wzp-desktop::engine.rs dlopens wzp-native at CallEngine::start,
  uses its audio functions instead of CPAL on Android.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:02:53 +04:00
Siavash Sameni
711137da96 fix(android): -Wl,--exclude-libs,ALL + --no-whole-archive to stop symbol leak
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m54s
llvm-nm on the crashing .so confirmed the research's smoking gun theory:

  000000000130c1f0 t _Z10__init_tcbP10bionic_tcbP18pthread_internal_t
  0000000000000000 a pthread_create.cpp
  0000000001331108 t pthread_create

All lowercase 't' (= LOCAL text symbols), zero UND dynamic references
for pthread_create. So rustc's link step is pulling bionic's own
pthread_create.cpp compilation unit out of libc.a as a whole-archive
inclusion and binding those symbols locally inside our .so, instead
of letting them stay UND and resolved against libc.so at dlopen time.

Rust's libstd thread::spawn then calls the LOCAL (broken) pthread_create
which calls the LOCAL __init_tcb with arguments set up for bionic's
static-executable layout — crashes at __init_tcb+4 with SEGV_ACCERR.

`-Wl,--exclude-libs,ALL` tells the linker to make symbols from static
archives NOT appear in the dynamic symbol table of the output .so.
`-Wl,--no-whole-archive` tells it to only pull archive objects that
satisfy undefined references, not include the whole archive blindly.

If this works, the symbol table should show pthread_create as UND
(or at least not locally bound) and the app should launch. If it
doesn't, the remaining fallback is the research's action #3 —
extract the C++ into its own upstream cdylib crate built with
cargo-ndk, and dlopen it from the Tauri cdylib at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:45:35 +04:00
Siavash Sameni
6071eb1b02 fix(android): drop staticlib from crate-type — root cause of __init_tcb crash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
External research (per rust-lang/rust#104707) pointed at this as the
highest-probability cause of our byte-identical __init_tcb+4 /
pthread_create SIGSEGVs:

> Having 'staticlib' alongside 'cdylib' in crate-type leaks non-exported
> symbols from the staticlib into the cdylib's symbol table. For a
> Tauri Android cdylib, that means bionic's private pthread_create /
> __init_tcb code — which got pulled in statically from libc.a the
> moment any cc::Build C++ file added C++-linkage overhead — ends up
> bound LOCALLY inside our .so instead of being resolved dynamically
> against libc.so at dlopen time.

Symptoms that match the theory exactly:
- llvm-nm on the crashing .so shows __init_tcb and pthread_create as
  LOCAL symbols with C++ name mangling (bionic's own pthread_create.cpp)
- Adding any cc::Build cpp(true) step reliably triggers the crash,
  independent of which linker (android24-clang vs android26-clang) or
  which libc++ linkage (shared/static/none)
- The legacy wzp-android crate (["cdylib", "rlib"]) works fine on the
  same phone with the same NDK + Rust toolchain + Oboe C++ code
- tauri.conf.json bundle.android.minSdkVersion=26 propagates to
  gradle but the .so still crashes byte-identically

Drop 'staticlib' from crate-type. If we ever need it for iOS, re-add
behind a target.'cfg(target_os = "ios")' gate. The desktop binary
still links against the rlib, so the bin target on macOS/Linux/Windows
is unaffected.

Source: https://github.com/rust-lang/rust/issues/104707

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:38:49 +04:00
Siavash Sameni
c9cd043657 test: tauri.conf.json bundle.android.minSdkVersion=26 + cpp_smoke.cpp c++_shared
Some checks failed
Mirror to GitHub / mirror (push) Failing after 1m26s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
User theory: tauri-cli hardcodes minSdkVersion=24 into its rustc
invocation regardless of gradle build.gradle.kts, .cargo/config.toml,
or env var overrides — but DOES read from tauri.conf.json's
bundle.android block. That would explain why every cc::Build C++
compile crashed with __init_tcb+4 via pthread_create: API-24 bionic's
.init_array routines for the linked-in .init_array clash with the
pthread_create state tao later expects.

This commit applies the fix AND re-adds the smallest known crashing
variant (E.1 with cpp_link_stdlib('c++_shared')) so the test has one
clear failure mode to compare against:

  tauri.conf.json bundle:
    "android": { "minSdkVersion": 26 }

  build.rs (on android target):
    - hello.c           (plain C, worked in Step A)
    - getauxval_fix.c   (plain C, worked in Step D)
    - hello2.c          (plain C, worked in Step D+1)
    - cpp_smoke.cpp     (C++ via cc::Build .cpp(true), crashed in E.1)

Also re-emits the libc++_shared.so copy into gen/android jniLibs so
the runtime linker can resolve the NEEDED entry cc-rs added via
cpp_link_stdlib('c++_shared').

If this launches → theory validated, proceed with Oboe integration.
If this crashes → need to keep digging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:58:37 +04:00
Siavash Sameni
6dd62c94c9 step D+1: add third trivial C static lib (hello2.c)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m51s
Step D (hello.c + getauxval_fix.c) launches cleanly. E.minus-1
(hello.c + getauxval_fix.c + cpp_smoke.c) crashes. All three are
plain-C trivial single-function files.

Theory: the regression is triggered by having 3 or more cc::Build
static libs in a Tauri Android cdylib, regardless of what the libs
contain. Test: clone hello.c as hello2.c (same content, different
symbol) and add a third cc::Build step compiling it. If this crashes,
the trigger is just the number of static libs. If it launches, there's
something magical about cpp_smoke.c specifically (unlikely — it was
near-identical content).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:51:50 +04:00
Siavash Sameni
4c998312aa regression check: revert build.rs to exact Step D state
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
Verify the Step D baseline still launches after the environment mutations
we may have caused during the E bisection (docker image rebuild, tauri-cli
version drift, etc). Build.rs is now byte-identical to commit a852cad
(Step D) except for the git hash capture block that already existed at
that point.

If this launches cleanly → the cpp_smoke addition genuinely breaks
something, bisection continues.
If this crashes → the environment regressed between Step D and now,
and we need to rebuild the docker image to an earlier snapshot.
2026-04-09 16:45:34 +04:00
Siavash Sameni
22701830c2 step E.minus-1: cpp_smoke renamed to .c and compiled as plain C
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m53s
c++_shared crashed, c++_static crashed, no stdlib crashed. The remaining
variable isolated to cc::Build::new().cpp(true) itself is the C++
compile-mode invocation of clang++. Rename cpp_smoke.cpp → cpp_smoke.c
and drop .cpp(true), leaving a plain-C cc::Build that compiles the
exact same bytes (minus the 'extern "C"' linkage spec which is C++-
only syntax).

This is structurally identical to Step A (hello.c), which worked. If
THIS build launches, the diff between 'works' and 'crashes' is purely
the .cpp(true) mode — something clang++ does differently at compile
or link time when producing object files for a Tauri Android cdylib.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:38:29 +04:00
Siavash Sameni
47a037368c step E.0: drop cpp_link_stdlib entirely (no libc++ linkage)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
c++_shared crashed. c++_static also crashed. Both have libc++ code
landing in the final .so — one as a NEEDED dynamic lib, the other
bundled statically. So the trigger isn't the NEEDED entry specifically,
it's libc++ being present in any form.

cpp_smoke.cpp is just 'extern "C" int wzp_cpp_hello() { return 42; }'
with zero C++ features used, so we can drop cpp_link_stdlib completely
and the compile still succeeds. No libc++ .a or .so referenced at all.

If this crashes: the trigger is cc::Build::new().cpp(true) switching
rustc's final linker driver from clang to clang++ (which pulls in
different default libraries).

If this launches: the trigger is libc++'s own static initializers or
the libc++ code itself doing something that breaks our .so at dlopen
time, and we have a path forward — C++ code that doesn't need libc++
(e.g., a thin C++ bridge to Oboe that uses only POD types at the
boundary, with all the STL stuff confined to Oboe's own compilation
unit which would still need libc++...). More likely we still need a
C-only audio interface like raw AAudio via the ndk Rust crate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:31:53 +04:00
Siavash Sameni
191e8761d5 step E.1 variant: cpp_link_stdlib c++_shared → c++_static
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m42s
Every E.x variant crashed identically when linked with c++_shared, even
with a 3-line cpp file that's dead-stripped from the final .so. The
crash offsets are byte-identical across E.1, E.2, E.4, and the original
full-Oboe Step E. That points at a non-code link-time delta: the
`cargo:rustc-link-lib=c++_shared` directive that adds a NEEDED entry
for libc++_shared.so to the .so's dynamic table.

Swap to c++_static — bundles libc++ directly into our .so so the
NEEDED entry disappears. If this launches cleanly, we've conclusively
proven the NEEDED libc++_shared.so is the root cause and we have a
workable linkage for any C++ we want to add to the Tauri Android build
(including the eventual Oboe audio backend).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:18:04 +04:00
Siavash Sameni
0d74366592 step E.1: absolute minimum C++ file (no STL, no includes)
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m53s
Last bisection step. cpp/cpp_smoke.cpp reduced to a single extern 'C'
function that returns 42. No #include, no std::atomic, no std::mutex,
no std::thread. Only C++ things remaining are:
  - cc::Build::new().cpp(true) in build.rs (C++ mode compile)
  - cpp_link_stdlib('c++_shared') emitting -lc++_shared

If this still crashes with the same __init_tcb+4 / pthread_create
stack, we've conclusively proven the trigger is NOT any C++ code
that ends up in the final .so (everything gets dead-stripped
anyway because Rust never references wzp_cpp_hello). The trigger
must be either:
  a) cargo:rustc-link-lib=c++_shared (adds NEEDED entry for
     libc++_shared.so in the .so's dynamic table, causing the
     dynamic linker to load libc++_shared.so at dlopen() time
     alongside our .so), or
  b) Some interaction between cpp(true) mode and the rest of the
     build pipeline (toolchain flags, symbol visibility, etc.)

After this build we stop and write an incident report for the
WarzonePhone Tauri Android rewrite bisection so far.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:54:21 +04:00
Siavash Sameni
0224ce654c step E.2: shrink cpp_smoke to std::atomic only — no thread, no mutex
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Incremental bisection within Step E. E.4 (atomic + mutex + thread) still
crashed at __init_tcb. Drop mutex and thread, keep only std::atomic.
Build.rs still emits cargo:rustc-link-lib=c++_shared via
cpp_link_stdlib('c++_shared'), so the NEEDED entry for libc++_shared.so
in the final .so stays identical. Goal: if this crashes, the issue is
purely the dynamic link against libc++_shared (not thread/mutex code).
If it passes, the issue is actually std::thread or std::mutex use.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:47:30 +04:00
Siavash Sameni
aa240c6d83 step E.4(android): replace full Oboe compile with minimal C++ smoke file
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m44s
Bisection for the __init_tcb+4 crash that Step E introduced: drop the
full Oboe C++ build (200+ files, hundreds of KB of code) and replace
it with ONE tiny cpp/cpp_smoke.cpp that exercises the libc++ features
Oboe uses — std::atomic, std::mutex, std::thread — via an
extern "C" wzp_cpp_smoke() function that's exported but NEVER called
from Rust.

Still compiled with cpp_link_stdlib("c++_shared"), same as Oboe.
libc++_shared.so still copied into gen/android jniLibs. But no Oboe
headers, no Oboe source files, no -llog / -lOpenSLES links.

Hypothesis: if cpp_smoke.cpp alone reproduces the __init_tcb crash,
the trigger is "any libc++_shared link that references
std::thread/std::mutex" and Oboe is not the specific culprit. If it
launches cleanly, Oboe itself (its size, its static constructors, or
a specific header) is responsible — and we then bisect Oboe's
source tree.

fetch_oboe() and add_cpp_files_recursive() are retained in build.rs
with #[allow(dead_code)] so re-enabling the full Oboe compile is a
one-line edit once we've identified what's safe to include.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:39:30 +04:00
Siavash Sameni
d216dcc7a3 step E fix (Option 3): bake android24→26 clang shim into image
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m37s
Incremental Step E (commit 4250f1b) proved that merely compiling the
Oboe C++ bridge into libwzp_desktop_lib.so — with NO Rust-side FFI
bindings, no function calls — resurrects the __init_tcb+4 / pthread_
create SIGSEGV at WryActivity.onCreate. Bisection:

  build #17 (baseline)   ✓
  build #18 (Step A, hello.c)              ✓
  build #19 (Step B, wzp-client dep)       ✓
  build #21 (Step C, engine mod compiled)  ✓
  build #22 (Step D, getauxval_fix.c)      ✓
  build #23 (Step E, Oboe C++ compiled)    ✗ — __init_tcb+4 crash

Root cause: tauri-cli hard-codes `aarch64-linux-android24-clang` as the
Rust linker. Without any C++ code in the .so, libstd's pthread_create
reference gets resolved against the dynamic libc.so. The moment we add
a C++ static library that links against libc++_shared, the link-time
resolution pulls in the API-24 libc.a static pthread_create stub — and
Rust's libstd then also calls that stub instead of libc.so's real one.
The stub calls __init_tcb which SIGSEGVs because bionic's TCB state
only exists for static-libc main executables, not .so's loaded via
dlopen. API-26 NDK has proper dynamic bindings that resolve correctly.

Option 3 fix: at image build time, replace every NDK
aarch64-linux-android24-clang (and armv7/x86_64/i686, clang/clang++)
binary with a one-line shell script that exec()s the corresponding
android26-clang. Since tauri-cli invokes the linker via absolute path,
PATH and env var overrides fail — but replacing the binary on disk
inside the image is guaranteed to take effect. The legacy wzp-android
crate doesn't need this because cargo-ndk respects .cargo/config.toml
where a crate-level linker override is set.

Only changing the Dockerfile here. Next: rebuild the image no-cache,
retry Step E, and if the baseline holds, proceed to Steps F/G.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:17:34 +04:00
Siavash Sameni
4250f1b44a step E(android): compile full Oboe C++ bridge (not yet called from Rust)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m53s
Fifth incremental variable — and the first genuinely heavy one. Adds:
  - cpp/oboe_bridge.{h,cpp} (copied verbatim from crates/wzp-android/cpp/)
  - cpp/oboe_stub.cpp (fallback if Oboe can't be fetched)
  - build.rs now clones google/oboe@1.8.1 into OUT_DIR and compiles
    oboe_bridge.cpp + every .cpp file under oboe/src/ as a single
    static library via cc::Build, using shared libc++. Same logic as
    the legacy wzp-android build.rs.
  - libc++_shared.so gets copied from the NDK sysroot into the Tauri
    gen/android jniLibs directory so the runtime linker can find it.
  - rustc-link-lib=log / OpenSLES emitted for Oboe's Android backends.

Deliberately NOT called from Rust yet — no extern "C" FFI declarations,
no oboe_audio.rs module, the `wzp_oboe_*` symbols from the static lib
are simply present but unreferenced.

Goal: isolate whether the Oboe C++ compile + static lib link alone
(with its libc++ dependency and log/OpenSLES bindings) regresses the
working baseline. If the build still launches and renders the home
screen, we know the C++ side is clean and the actual regression is
caused by calling into Oboe at runtime (next step).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:09:16 +04:00
Siavash Sameni
a852cad15e step D(android): compile cpp/getauxval_fix.c alongside hello.c
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m55s
Fourth incremental variable. Adds the getauxval_fix.c shim from the
legacy wzp-android crate (which has been shipping with it for months
without issue) to our cc::Build on Android. The file defines a single
getauxval() function that delegates to bionic's real runtime
implementation via dlsym — this is needed because rustc links
compiler-rt's broken static getauxval stub that SIGSEGVs in .so
libraries loaded via dlopen (reads __libc_auxv which is NULL).

Not imported from Rust. Goal: verify that adding a second C static
archive (and especially one that overrides a libc-ish symbol) doesn't
regress the working build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:03:37 +04:00
Siavash Sameni
19fd3dd9cc step C fix: ungate wzp_proto imports used by resolve_quality() on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Build #20 failed to compile on Android because I over-gated the
wzp_proto imports to non-Android. resolve_quality() is compiled on
every platform (it's outside the CallEngine impl) and references
QualityProfile + CodecId — both platform-independent types from
wzp_proto. Move those back to an unconditional import. tracing stays
gated (only the desktop start() body logs; the Android stub is silent).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:59:00 +04:00
Siavash Sameni
c69195fe06 step C(android): compile engine.rs on Android with a stub CallEngine::start
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Has been cancelled
Third incremental variable. Previously the engine module was cfg-gated
out of the Android build entirely (`#[cfg(not(target_os = "android"))]
mod engine;` in lib.rs). Now it's always compiled, so any link-time
effect of having engine.rs in the compilation unit can be measured
against the working baseline from build #19.

Changes kept deliberately small:
- lib.rs: drop the cfg gate on `mod engine;`. `use engine::CallEngine`
  stays gated because the Android-specific connect/disconnect/... stubs
  in lib.rs don't reference the type.
- engine.rs: the `wzp_client::{audio_io, call}` imports + CodecId +
  QualityProfile are gated to non-Android (they require the `audio`
  feature on wzp-client which Android doesn't pull in). On Android we
  keep only the MediaTransport import for transport.close(). The impl
  block now has two `start()` methods: the full CPAL-backed one for
  desktop, and a 6-line Android stub that returns `Err("audio engine
  not yet wired on Android")` so attempts to `connect` from the UI
  fail cleanly.

Goal: verify that linking in the compiled engine module (plus the
types it references) on Android doesn't regress the working baseline.
Home screen should still render and register_signal should still work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:56:02 +04:00
Siavash Sameni
ae4f366b05 step B(android): depend on wzp-client with default-features=false
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m54s
Second incremental variable on the path to Oboe. Adds a
`[target.'cfg(target_os = "android")'.dependencies]` block that pulls
in wzp-client with NO features enabled — no audio (no CPAL), no vpio
(no VoiceProcessingIO). This gives the Android build access to
wzp-client's platform-independent modules (call, handshake, audio_ring,
codec wiring) without any system audio bindings.

Deliberately no new imports in lib.rs or engine.rs. The only effect
should be: cargo-tauri on Android now has to compile wzp-client and
all its transitive crates (wzp-codec, wzp-fec, wzp-proto, wzp-crypto
already pulled directly; now also audiopus, raptorq, etc.) and link
them into libwzp_desktop_lib.so.

Goal: verify that merely expanding the compiled code set to include
wzp-client doesn't regress the previous working state. If it does, we
know one of wzp-client's transitive deps is the problem — probably a
C dep like audiopus or codec2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:49:49 +04:00
Siavash Sameni
f96d7ce3e1 step A(android): add cc=1 build-dep + compile single trivial hello.c
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m54s
First incremental variable on the path back to Oboe integration. Changes
are deliberately minimal: add cc = "1" to [build-dependencies] (cargo
build-deps resolve against the host so the line is unconditional), and
on the Android target run a single cc::Build step that compiles
cpp/hello.c — a 6-line file that defines one function (`wzp_hello_stub`)
that is never called from Rust.

Goal: verify that merely introducing a C static library into the .so
via cc::Build does not regress the working build (#17, commit 5309938
= build #6 behaviour: launches, renders home screen, registers on
relay). If this build still works, we know cc::Build pipelines alone
are fine and can move to the next variable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:45:24 +04:00
Siavash Sameni
530993854f revert(android): roll back to build #6 (35642d1) — pre-oboe known-good state
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m51s
Spent 10+ builds chasing a __init_tcb+4 / pthread_create SIGSEGV after
adding the oboe audio backend. Every "fix" made things worse. Reverting
all Android-specific files to the state at 35642d1 (build #6), which
was the last commit where the Tauri Android app actually launched,
rendered the home screen, and successfully registered on a relay.

Reverted files (all back to their 35642d1 content):
  - desktop/src-tauri/Cargo.toml        (no build-dep cc, no tracing-android)
  - desktop/src-tauri/build.rs          (git hash only, no Oboe / cc build)
  - desktop/src-tauri/src/lib.rs        (engine cfg-gated on non-android)
  - desktop/src-tauri/src/main.rs       (two-line desktop entry)
  - desktop/src-tauri/src/engine.rs     (desktop-only audio setup)
  - scripts/Dockerfile.android-builder  (no android24→26 clang shim)
  - scripts/build-tauri-android.sh      (no linker env vars / manifest patch)

Deleted (were added between b314138 and e2e023d):
  - desktop/src-tauri/cpp/getauxval_fix.c
  - desktop/src-tauri/cpp/oboe_bridge.{h,cpp}
  - desktop/src-tauri/cpp/oboe_stub.cpp
  - desktop/src-tauri/src/oboe_audio.rs

Next: rebuild image on remote (to drop the baked-in clang shim), build
an APK, install on Pixel 6, verify the UI renders the same way build #6
did. From there we add features back ONE at a time so we can actually
bisect which one triggers the tao::ndk_glue crash. User's rule:
"if you want to change stack, change incrementally, so we can debug".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:22:57 +04:00
Siavash Sameni
e2e023d2bc fix(android): drop pthread_shim — clang shim makes it unnecessary (and harmful)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m49s
Once the Dockerfile rewrites every android24-clang to exec android26-clang,
the linker uses the API-26 NDK sysroot and libstd's pthread_create reference
resolves directly against libc.so's real runtime symbol — no interposition
needed.

The pthread_shim.c approach was actually fighting its own solution: our
shim's dlsym() call bound at link time to libdl.a's STUB dlsym (a
five-line function inside libdl_static.o that just returns NULL and sets
dlerror to "libdl.a is a stub --- use libdl.so instead"). NDK r19 and
glibc 2.34 both replaced libdl.a with empty stubs because dynamic loading
is now part of the main libc/bionic — so no amount of link-order
tinkering can make a static libdl.a dlsym actually work.

Remove pthread_shim.c, the cc::Build::new().file("cpp/pthread_shim.c")
step in build.rs, and the -Wl,--wrap=pthread_create rustc-link-arg. Keep
getauxval_fix.c because that one DOES work at link time (the symbol
override is for a function compiler-rt defines statically, not one that
would depend on the stub libdl.a/libc.a).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:52:53 +04:00
Siavash Sameni
5df9d418c9 fix(android): bake android24→26 clang shim into the docker image itself
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
Build #13's PATH wrapper trick failed because tauri-cli invokes the linker
with an absolute path (/opt/android-sdk/ndk/.../bin/aarch64-linux-android24-
clang), which bypasses \$PATH entirely. The pthread_shim logs confirmed the
broken API-24 stubs were still being linked:

  WZP_pthread_shim: dlsym(RTLD_DEFAULT, pthread_create) returned NULL:
    libdl.a is a stub --- use libdl.so instead

Move the fix up a level — into the Dockerfile itself. On image build, for
each of the four android ABIs × {clang, clang++}, rename
`${abi}24-${suffix}` to `${abi}24-${suffix}.orig` and replace it with a
shell wrapper that exec()s `${abi}26-${suffix}`. Any call to the API-24
wrapper — via PATH, absolute path, or otherwise — now transparently runs
the API-26 wrapper, which uses the real libc.so/libdl.so bindings.

The old bash-c /tmp/wrappers workaround in build-tauri-android.sh is
removed now that the image handles it at the right layer.

Also add `--shell` to build-tauri-android.sh: opens an interactive docker
container on the remote with the same mounts/env as the build, so I can
iterate on cargo tauri android build / manually patch files / etc.
without the full git push → ssh → rebuild → install loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:33:10 +04:00
Siavash Sameni
2718402e96 fix(android): PATH wrapper to redirect tauri-cli's android24-clang → android26
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Build #12's instrumented pthread_shim gave us the definitive diagnosis:

  WZP_pthread_shim: dlsym(RTLD_DEFAULT, pthread_create) returned NULL:
    libdl.a is a stub --- use libdl.so instead

Tauri-cli invokes `aarch64-linux-android24-clang` as the linker and the
API-24 NDK sysroot ships *stub* libdl.a / libc.a: they compile fine but
every symbol crashes if called, because they're meant to coexist with a
separate dynamic .so that the dynamic linker provides at runtime. Rust's
pre-built libstd.rlib has static calls into those stubs baked in, so no
matter what we do at link time the broken code lands in the .so.

Env-var overrides of CARGO_TARGET_AARCH64_LINUX_ANDROID_LINKER don't
stick — tauri-cli resets them before invoking cargo. So instead of
fighting the env, we put a wrapper on $PATH, literally named
`aarch64-linux-android24-clang`, that exec()s the android26 version.
When tauri-cli looks up android24-clang via PATH, it gets our wrapper,
our wrapper runs android26-clang, and suddenly the whole build is using
the API-26 NDK sysroot with real dynamic bindings to libc.so / libdl.so.

Wrappers are installed for all four ABIs (aarch64, armv7, x86_64, i686)
× both suffixes (clang, clang++) directly inside the docker bash -c
preamble before any cargo invocation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:23:47 +04:00
Siavash Sameni
1a8288c95f debug(android): instrument pthread_shim with logcat tracing + try RTLD_DEFAULT first
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Build #11 linked cleanly with --wrap=pthread_create but crashed at launch
on tao::ndk_glue::create with a Rust .expect() panic — meaning the shim's
__wrap_pthread_create successfully intercepted the call but returned
non-zero, triggering std::thread::spawn's Result::expect panic.

Add __android_log_print tracing so logcat shows exactly which resolver
path fired (RTLD_DEFAULT vs dlopen fallback) and what dlerror reports
when they fail. Also try RTLD_DEFAULT first — it's the simplest and
should find libc.so's pthread_create in the process's global symbol
table without any namespace games.
2026-04-09 13:15:47 +04:00
Siavash Sameni
f015be63ec fix(android): use --wrap=pthread_create instead of raw symbol override
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m39s
Build #10 failed with:
  ld.lld: error: duplicate symbol: pthread_create
    >>> defined at pthread_shim.c:30
    >>> ... in archive libpthread_shim.a
  (the other definition coming from libstd's bundled libc.a stub)

The raw-symbol-override approach was naive: when two static archives
both define the same symbol the linker refuses instead of picking one.

Switch to GNU-ld's `--wrap=pthread_create` mechanism:
  - All `pthread_create` references get rewritten to `__wrap_pthread_create`
  - Our shim now defines `__wrap_pthread_create` (no symbol clash)
  - Inside the shim we `dlopen("libc.so")` + `dlsym("pthread_create")` to
    get the real runtime symbol directly, bypassing BOTH the broken static
    stub (libstd's libc.a copy) AND libstd's own pthread_create path
  - `--real_pthread_create` is deliberately NOT used — it would alias the
    same broken stub the wrap exists to avoid

The wrap flag is emitted via `cargo:rustc-link-arg` in build.rs so it
only affects the Android target (the Android-branch of build.rs is the
only place that emits it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:08:41 +04:00
Siavash Sameni
79e876126c fix(android): interpose pthread_create to bypass libstd's broken static stub
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m52s
Builds #7, #8 and #9 all crashed at launch with the same SIGSEGV inside
__init_tcb(bionic_tcb*, pthread_internal_t*)+4 called via pthread_create
from std::sys::thread::unix::Thread::new.

Digging further: the problem is NOT the final linker we pass to cargo.
It's that rustup ships a PRE-COMPILED libstd for aarch64-linux-android
which was built statically against an old NDK libc archive. That archive
has a pthread_create stub which calls a static __init_tcb stub that
assumes libc's static init path has set up the TCB — which never happens
in a .so loaded via dlopen. Bumping minSdk to 26 or forcing the
android26-clang linker (903a07c) doesn't rebuild libstd and therefore
doesn't fix the bundled broken stub.

The legacy wzp-android crate dodged this with a getauxval_fix.c shim that
interposes getauxval via RTLD_NEXT. The same trick works for pthread_create
here: define our own `int pthread_create(...)` in cpp/pthread_shim.c that
forwards to `dlsym(RTLD_NEXT, "pthread_create")` — the real, fully working
version exported from libc.so. The linker processes our static lib before
libstd.rlib, so libstd's unresolved pthread_create reference binds to our
symbol, and the broken libc.a stub inside libstd is never pulled in.

build.rs compiles cpp/pthread_shim.c right after cpp/getauxval_fix.c so
both symbol overrides are in place before any Rust code gets linked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:04:18 +04:00
Siavash Sameni
903a07c1d4 fix(android): force API-26 NDK linker via docker env vars
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m46s
The previous commit bumped minSdk from 24 to 26 in build.gradle.kts
hoping tauri-cli would pick it up and use the android26-clang linker,
but the crash recurred at exactly the same frame (__init_tcb via
pthread_create via std::thread::spawn). That means tauri-cli is
ignoring the gradle minSdk value and sticking with its hardcoded
aarch64-linux-android24-clang.

The android24 linker resolves __init_tcb against the broken static
stub in libc.a (API 24 does NOT export __init_tcb as a dynamic symbol
from libc.so — it only exists in the static archive, and the stub
expects the TCB to be initialised by a running static init path,
which never happens in a dlopen-loaded .so).

Override the linker env vars directly in the docker run invocation
for all four ABIs. These take precedence over anything tauri-cli or
.cargo/config.toml might set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:55:11 +04:00
Siavash Sameni
af20fa418a fix(android): bump minSdk 24 -> 26 to avoid broken __init_tcb in NDK 24 stub
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 3m52s
Build #7 crashed at launch on the Pixel 6 with SIGSEGV in
__init_tcb / pthread_create called from tao::ndk_glue::create in
WryActivity.onCreate:

  #00  __init_tcb(bionic_tcb*, pthread_internal_t*)+4
  #01  pthread_create+360
  #02  std::sys::thread::unix::Thread::new
  #04  tao::platform_impl::platform::ndk_glue::create
  #05  Java_com_wzp_desktop_WryActivity_create

Tauri scaffolds build.gradle.kts with `minSdk = 24`, which makes the
tauri-cli invoke `aarch64-linux-android24-clang` as the Rust linker. That
linker transitively pulls broken static stubs from libc.a for getauxval,
__init_tcb and pthread_create — these stubs only work in statically-
linked executables because they read bionic state (__libc_auxv, TCB) that
only the libc init path sets up. In a .so loaded via dlopen they SIGSEGV
the moment anything spawns a thread.

API 26+ has the real runtime symbols and the NDK-26 linker resolves them
against libc.so instead of the static fallback. This is also the minimum
Oboe supports. Patch the generated build.gradle.kts post-init to swap
`minSdk = 24` for `minSdk = 26` — the legacy wzp-android crate solved
the same issue with a .cargo/config.toml linker override plus a
getauxval_fix.c shim.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:47:36 +04:00
Siavash Sameni
b314138caf feat(android): oboe/AAudio audio backend + runtime mic permission (step 3)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m39s
This is the big one — the Tauri Android app now has a real audio stack
capable of full-duplex VoIP, reusing the proven C++ Oboe bridge from the
legacy wzp-android crate.

Architecture:
- desktop/src-tauri/cpp/  — copies of oboe_bridge.{h,cpp}, oboe_stub.cpp,
  and getauxval_fix.c from crates/wzp-android/cpp/. build.rs clones
  google/oboe@1.8.1 into OUT_DIR and compiles the bridge + all Oboe
  sources as "oboe_bridge" static lib, linking against shared libc++
  (static would pull broken libc stubs that SIGSEGV in .so libraries).
- src/oboe_audio.rs  — Rust side: an SPSC ring buffer matching the C++
  bridge's AtomicI32 layout, plus OboeHandle::start() which returns
  (capture_ring, playout_ring, owning_handle). The ring exposes the same
  (available / read / write) methods as wzp_client::audio_ring::AudioRing
  so CallEngine treats both backends interchangeably.
- src/engine.rs  — compiled on every platform now. A cfg-switched type
  alias picks wzp_client::audio_ring::AudioRing on desktop and
  crate::oboe_audio::AudioRing on Android. The audio setup block has
  three branches: VPIO/CPAL on macOS, CPAL on Linux/Windows, Oboe on
  Android. Send/recv tasks are identical across platforms.
- src/lib.rs  — removes all the "step 3 not done" Android stubs. The
  engine module is no longer cfg-gated; connect / disconnect / toggle_mic
  / toggle_speaker / get_status are single implementations used by both
  desktop and Android. Identity path resolves via app.path().app_data_dir()
  from the Tauri setup() callback (already wired in step 1).

Runtime mic permission:
- scripts/build-tauri-android.sh now injects RECORD_AUDIO + MODIFY_AUDIO_
  SETTINGS into gen/android/app/src/main/AndroidManifest.xml after init,
  and overwrites MainActivity.kt with a version that calls
  ActivityCompat.requestPermissions in onCreate. This is idempotent:
  every build re-applies the patches so tauri re-init can't regress them.

Cargo.toml:
- cc is now an unconditional build-dep (build.rs runs on the host, so
  target-gating build-deps doesn't work).
- wzp-client is now a dep on every platform. On Android it gets default
  features only (no "audio"/"vpio") so CPAL isn't dragged in — oboe_audio
  provides the capture/playout rings instead.
- tracing-android is added on Android so tracing events flow into logcat.

build.rs also gained embedded git hash (WZP_GIT_HASH) capture, which is
shown under the fingerprint on the home screen — already committed in
7639aaf, reinstated here alongside the Oboe build logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:40:38 +04:00
Siavash Sameni
35642d1c54 feat(desktop): bake local Laptop relay into default relay list for testing
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
Adds 172.16.81.125:4433 (the laptop's LAN IP) as the first default relay
so the Android rewrite can be tested against a relay whose logs are on the
same host as the builds and screenshots. On fresh installs the Laptop
relay is pre-selected as index 0. On upgrades from an older cached
settings blob, a one-shot migration unshifts it to the front if missing,
so we don't have to tap through Manage Relays after every reinstall.

Marked "remove once Android rewrite is stable" — the address is a hardcoded
LAN IP that won't be valid in other environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:19:46 +04:00
Siavash Sameni
6b8107504e fix(desktop): tauri capability for android event listeners + persistent debug keystore
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
Two related Android-only papercuts found while testing build #4 on a Pixel 6:

1. Frontend was crashing in the WebView with:
       Tauri/Console: Uncaught (in promise) event.listen not allowed.
       Permissions associated with this command: core:event:allow-listen,
       core:event:default
   The desktop build worked fine because Tauri's default capability set
   covers the desktop side. On Android (and iOS) Tauri 2.x is much stricter
   about ACL — without an explicit capabilities/default.json that lists
   "android" in its platforms, the WebView gets zero permissions. Add a
   default capability granting core:default + the event listener perms
   across all five platforms (linux/macOS/windows/android/iOS).

2. Every fresh docker run produced a new ~/.android/debug.keystore, so
   `adb install -r` of a freshly built APK over an already-installed one
   failed with INSTALL_FAILED_UPDATE_INCOMPATIBLE. Mount a persistent host
   volume at /home/builder/.android in build-tauri-android.sh so the same
   debug keystore is reused across builds and `install -r` keeps working.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:02:01 +04:00
Siavash Sameni
7639aaf08d feat(desktop): deterministic alias from seed + git hash on home screen + fix EACCES on Android
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m41s
Mirror to GitHub / mirror (push) Failing after 38s
Three home-screen issues from the first Tauri Android APK:

1. Alias was empty (no seed-derived name).
   Port the adjective+noun word lists from the old Kotlin SettingsRepository
   into a `derive_alias()` helper that maps the first 4 bytes of the seed to
   indices in those lists. Same seed → same alias forever, different seeds →
   effectively random aliases — so reinstalls keep the user's identity AND
   the friendly name they're used to.

2. Build identity was invisible — couldn't tell which APK was actually
   installed (this caused us a lot of grief on the Kotlin app).
   build.rs now captures `git rev-parse --short HEAD` and emits it as
   `WZP_GIT_HASH`, exposed via a new `get_app_info` command. The frontend
   stamps `build <hash> • <alias>` under the fingerprint on the home screen.

3. Register on relay failed with `Permission denied (os error 13)`.
   Root cause: I hardcoded `/data/data/com.wzp.phone/files/.wzp` as the
   identity dir, but the Tauri Android package id is `com.wzp.desktop` —
   so the app was trying to write into another app's data directory and
   getting EACCES at the filesystem layer. Fix: resolve the data dir from
   Tauri's `path().app_data_dir()` API in the `setup()` callback and stash
   it in a `OnceLock<PathBuf>`. Works on Android, macOS, Linux, Windows
   without any cfg gymnastics.

Also: `get_app_info` returns the resolved `data_dir` so we can debug
storage issues from the UI (it's set as the build-hash element's title).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:55:51 +04:00
Siavash Sameni
69ee3115b6 build: tauri-android docker pipeline + ntfy notifications
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m54s
Dockerfile.android-builder: install Android API 36 platform + build-tools
35.0.0 alongside the existing API 34 set. Tauri 2.x mobile defaults to
compileSdk 36 / build-tools 35; without these the gradle build fails with
"SDK directory is not writable" because the read-only /opt/android-sdk
volume can't grow at build time. Adding Node.js 20, all four Rust android
targets, and tauri-cli 2.x was already in place.

scripts/build-tauri-android.sh: new build wrapper for the desktop/ Tauri
project (parallel to scripts/build-and-notify.sh which targets the legacy
android/ Kotlin app). Pulls the branch on remote, runs cargo tauri android
build inside the docker image, and sends three ntfy.sh/wzp notifications
that all carry the short git hash:
  - STARTED [hash] — <commit subject>
  - OK [hash] (size) — <rustypaste apk url>
  - FAILED [hash] (line N) — <rustypaste log url>
On failure the full /tmp/wzp-tauri-build.log is uploaded to rustypaste so
the URL in the failure ntfy is directly downloadable, same place as the
APK.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:25:54 +04:00
Siavash Sameni
e6f77a78a7 feat(desktop): split main.rs into lib.rs for Tauri Mobile (Android/iOS)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m54s
Tauri 2.x Mobile links the app as a cdylib loaded from a Java Activity, so
all of the Builder/command code has to live in a library crate. Move the
existing logic verbatim into src/lib.rs::run() and reduce src/main.rs to a
two-line desktop entry point that calls into it.

Cargo.toml gets a [lib] section (crate-types: staticlib + cdylib + rlib,
named wzp_desktop_lib) and the wzp-client dependency — which pulls CPAL +
VoiceProcessingIO — is moved behind cfg(not(target_os = "android")) so the
Android cdylib doesn't need an audio backend yet. Engine-backed Tauri
commands (connect/disconnect/toggle_mic/toggle_speaker/get_status) get
Android stubs that return clear "not yet wired" errors. The signaling
commands (register_signal/place_call/answer_call/get_signal_status/
ping_relay/get_identity) are platform-independent and unchanged.

Also: get_identity / register_signal now auto-create the seed if missing
instead of erroring with "connect to a room first", and the identity dir
resolves to /data/data/com.wzp.phone/files/.wzp on Android (proper
app-internal storage) vs \$HOME/.wzp on desktop.

Side note: src/main.rs was previously untracked — desktop builds were
working only because it existed in the local worktree. This commit fixes
that too.

Step 1 of the Android rewrite plan (tauri-mobile scaffold). No audio yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:17:55 +04:00
Siavash Sameni
04a985912a fix: add direct calling Tauri backend commands (register_signal, place_call, answer_call)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m31s
2026-04-09 06:59:16 +04:00
Siavash Sameni
2288c1ae07 feat: direct calling UI for desktop Tauri app + merge android branch
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m33s
Tauri backend:
- register_signal: persistent _signal connection, presence registration
- place_call: send DirectCallOffer by fingerprint
- answer_call: accept/reject incoming calls
- get_signal_status: poll signal state

Frontend:
- Mode toggle: "Room" vs "Direct Call"
- Register button → registers on relay signal channel
- Incoming call panel with Accept/Reject
- Fingerprint input + Call button
- Auto-connect to media room on CallSetup event

Also merges feat/android-voip-client into desktop branch:
- Federation fixes, time-based dedup, FEC stale blocks
- Direct calling protocol types
- ACL + SAS verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:42:47 +04:00
Siavash Sameni
0d3f0d4dcb feat: Android UI for direct 1:1 calling
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m51s
- Mode toggle: "Room" vs "Direct Call" tabs on pre-connection screen
- Direct Call mode: Register button → registers on relay signal channel
- After registration: shows fingerprint dial pad + incoming call panel
- Incoming call: green Accept / red Reject buttons with caller info
- Ringing state display while waiting for callee
- CallSetup auto-connects to media room
- CallStats extended: sas_code, incoming_call_id/fp/alias fields
- CallViewModel: registerForCalls(), placeDirectCall(), answerIncomingCall()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:18:07 +04:00
Siavash Sameni
c184d5e1f3 fix: build scripts use fetch+reset instead of pull to avoid ref lock errors
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m30s
git pull fails when refs are stale from concurrent builds. Switch to
git gc + git fetch + git reset --hard origin/branch for robustness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:07:10 +04:00
Siavash Sameni
5d8e743cbf feat: Android engine + Kotlin API for direct 1:1 calling
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
Rust engine:
- start_signaling(): persistent _signal connection, presence registration
- Signal recv loop: handles DirectCallOffer, CallRinging, CallSetup, Hangup
- New CallState variants: Registered, Ringing, IncomingCall
- Stats expose incoming_call_id, incoming_caller_fp, incoming_caller_alias, sas_code
- New EngineCommands: PlaceCall, AnswerCall, RejectCall

JNI bridge:
- nativeStartSignaling(relay, seed, token, alias)
- nativePlaceCall(targetFp)
- nativeAnswerCall(callId, mode)

Kotlin API (WzpEngine.kt):
- startSignaling(relay, seed, token, alias)
- placeCall(targetFingerprint)
- answerCall(callId, mode) — 0=Reject, 1=AcceptTrusted, 2=AcceptGeneric

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:02:48 +04:00
Siavash Sameni
6694aebfd9 fix: resolve 0.0.0.0 to connectable address in CallSetup relay_addr
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m36s
When relay listens on 0.0.0.0, derive the actual IP from the client's
connection address for the CallSetup message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:56:19 +04:00
Siavash Sameni
d27e85ecf2 feat: SAS (Short Authentication String) for call identity verification
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m19s
Derive a 4-digit code from the shared DH secret via HKDF with label
"warzone-sas-code". Both peers compute the same code; a MITM relay
produces a different one. Users compare verbally during the call.

- CryptoSession::sas_code() -> Option<u32> on the trait
- ChaChaSession stores and returns the SAS
- HKDF derivation in WarzoneKeyExchange::derive_session()
- Tests: both peers match, MITM produces different code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:48:08 +04:00
Siavash Sameni
39ac181d63 feat: ACL + capacity limit on call rooms, unified fingerprint format
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m38s
- Call rooms (call-*) restricted to the two authorized participants only
- Room capacity enforced at 2 for call rooms
- Unauthorized clients get immediate connection close
- Unified fingerprint format: SHA-256(Ed25519 pub)[:16] as xxxx:xxxx:...
  Used consistently in signal registration, handshake, and ACL checks

Tested: Alice+Bob authorized, attacker rejected with "not authorized"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:43:03 +04:00
Siavash Sameni
3351cb6473 feat: direct 1:1 calling via relay signaling (Phase 1)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m43s
New feature: call someone directly by fingerprint through the relay.

- Client connects with SNI "_signal" for persistent signaling
- RegisterPresence/RegisterPresenceAck for relay registration
- DirectCallOffer routed to target by fingerprint
- DirectCallAnswer with AcceptGeneric/AcceptTrusted/Reject modes
- Relay creates private room (call-{id}), sends CallSetup to both
- Both clients connect to private room for media (existing SFU path)
- Hangup forwarding + cleanup on disconnect
- Desktop CLI: --signal + --call <fingerprint> for testing
- CallRegistry tracks call state (Pending/Ringing/Active/Ended)
- SignalHub manages persistent signaling connections

Tested: Alice calls Bob by fingerprint, relay routes offer, Bob
auto-accepts, both join private room, media flows bidirectionally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 05:35:16 +04:00
Siavash Sameni
54a4d91f3e docs: add --event-log, --version-check, and federation troubleshooting to admin guide
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m32s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 04:43:37 +04:00
Siavash Sameni
3b962bd4cb fix: build scripts use git reset --hard before pull to recover from dirty state
Some checks failed
Mirror to GitHub / mirror (push) Failing after 1m14s
Build Release Binaries / build-amd64 (push) Failing after 4m13s
Cargo.lock changes from Docker builds caused pull conflicts. Now uses
reset --hard + clean -fd to guarantee clean state before pulling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:13:26 +04:00
Siavash Sameni
1118eac752 fix: re-enable FEC + time-based dedup for federation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 2m7s
Build Release Binaries / build-amd64 (push) Has been cancelled
Restore fec_ratio=0.2 on GOOD profile. Time-based dedup (2s TTL) with
payload hash prevents consecutive sender collisions while still catching
multi-path duplicates. Verified: 6 consecutive senders across 2 relays,
0 decode errors, 0 drops, FEC active.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:09:15 +04:00
Siavash Sameni
f935bd69cd fix: rewrite seq/fec for federation-delivered packets
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m48s
Mirror to GitHub / mirror (push) Failing after 4m2s
- Time-based dedup (2s TTL) replaces fixed-window dedup — consecutive
  senders with same seq numbers no longer collide
- Raw byte forwarding for federation local delivery (no re-serialization)
- Jitter buffer resets on large backward seq jumps (>100)
- recv_media skips malformed datagrams instead of returning connection-closed
- SIGTERM handler for clean QUIC shutdown on wzp-client
- JSONL event log infrastructure (--event-log flag) for protocol analysis
- FEC disabled on GOOD profile for federation debugging (fec_ratio=0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:55:06 +04:00
Siavash Sameni
1c684f6b47 fix: rewrite seq/fec for federation-delivered packets
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m59s
Federation media from different senders had conflicting seq numbers,
FEC block IDs, and Opus decoder state. The relay now assigns fresh
monotonic seq/fec_block/fec_symbol to all federation-delivered packets,
ensuring clients see a clean continuous stream regardless of sender changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:48:55 +04:00
Siavash Sameni
c92db7e9b7 fix: preserve original relay label through multi-hop presence propagation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 7m26s
When propagating GlobalRoomActive to other peers, use tagged participants
(with relay_label set to the originating relay) instead of the raw
untagged participants. This shows "Relay C" instead of "Relay B" when
C's participants are forwarded through hub B to A.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:34:22 +04:00
Siavash Sameni
c3bd657224 fix: FEC decoder resets stale blocks — fixes consecutive federation connects
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m0s
When a new sender reuses the same block_id values as a previous sender,
the FEC decoder was silently dropping all data because blocks were marked
as "already decoded". Now blocks older than 2 seconds are automatically
reset when new data arrives for them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:26:00 +04:00
Siavash Sameni
8b79cdc6fc fix: dedup filter collision between different senders + build scripts default --pull
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
- Dedup key now includes source peer fingerprint hash, preventing
  packets from different senders with same room+seq from being dropped
  as duplicates (was silently killing all multi-hop audio)
- Build scripts default to --pull (use --no-pull to skip)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:18:52 +04:00
Siavash Sameni
2eab56beec fix: federation presence dedup, stale cleanup, and Android SIGSEGV crash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 29s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
- Deduplicate remote participants by fingerprint in all merge sites
  (canonical == raw room name caused double-lookup, doubling every remote participant)
- GlobalRoomInactive now propagates updated participant list to other peers
  (hub relay B was not informing A when C's participants left)
- Add 15-second stale presence sweeper that purges remote participants
  from peers that stop sending data (safety net for QUIC timeout delays)
- Add @Synchronized to WzpEngine.getStats/stopCall/destroy to prevent
  TOCTOU race between stats polling coroutine and engine teardown (SIGSEGV)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:07:59 +04:00
Siavash Sameni
7dadc1ddd6 fix: default room 'general', cap auto codec at 24k
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m51s
- Android default room changed from 'android' to 'general'
- Relay choose_profile capped at GOOD (Opus 24k) — studio tiers
  (32k/48k/64k) cause high packet loss on federation paths due to
  larger datagrams exceeding path MTU. Will re-enable after MTU
  discovery is implemented.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:41:12 +04:00
Siavash Sameni
be0441295a fix: read git hash outside Docker for Linux build ntfy notification
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
The hash was read inside Docker (/build/source) where .git doesn't
exist. Now reads from $BASE_DIR/data/source before Docker runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:32:03 +04:00
Siavash Sameni
b9f4e7f102 feat: include git hash in ntfy build notifications + MTU PRD
Some checks failed
Mirror to GitHub / mirror (push) Failing after 29s
Build Release Binaries / build-amd64 (push) Has been cancelled
ntfy messages now show: "WZP Linux [abc1234] ready!" and
"WZP Android [abc1234] done! APK: url" so you can verify which
commit was built without checking relay version remotely.

Also added PRD-mtu-discovery.md for QUIC Path MTU Discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:26:13 +04:00
Siavash Sameni
28f4a0fb6f fix: multi-hop presence — propagate remote rooms on new peer connect
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m35s
When a new federation link is established, announce not only LOCAL
global rooms but also rooms from OTHER peers (remote_participants).
This fixes multi-hop: when R2 connects to R3, R2 tells R3 about
R1's rooms that R2 learned about earlier.

Previously, only local rooms were announced on link setup. If R1
had a client but R2 had no clients, R2 wouldn't tell R3 about R1.

Also added diagnostic logging for room announcements on link setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:43:15 +04:00
Siavash Sameni
3d76acf528 fix: multi-hop federation — hub relay forwards without local participants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m18s
Three fixes for 3-relay chain (R1→R2→R3):

1. Room lookup in handle_datagram: hub relay (R2) has no local
   participants, so active_rooms() was empty and datagrams were
   silently dropped. Now also checks global_rooms config directly,
   allowing hub relays to forward without local clients.

2. Multi-hop forwarding: removed active_rooms filter — forward to
   ALL connected peers except source. The receiving peer decides
   whether to deliver or forward further.

3. Android relay_label: native RoomMember now includes relay_label
   from RoomUpdate signal. Kotlin UI reads it for relay grouping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:33:44 +04:00
Siavash Sameni
f4b5996bdf feat: Android relay-grouped participant list matching desktop
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 2m4s
Participants now grouped by relay on Android:
- Green dot + "THIS RELAY" for local participants
- Blue dot + relay label for federated participants

Added relayLabel to RoomMember data class, parsed from
relay_label JSON field. UI groups and renders with headers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:15:12 +04:00
Siavash Sameni
fc721c4217 fix: clear stale federated presence on GlobalRoomInactive
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 7m37s
When a remote relay's room goes inactive (all participants left),
the receiving relay now:
1. Clears remote_participants for that peer+room
2. Broadcasts updated RoomUpdate to local clients with the remote
   participant removed
3. Updates federation_active_rooms metric

Previously, remote participants lingered in the participant list
after disconnect, causing ghost entries and stale media forwarding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:06:48 +04:00
Siavash Sameni
5c24adf1c1 feat: remote version query — wzp-client --version-check <relay>
Some checks failed
Mirror to GitHub / mirror (push) Failing after 1m32s
Build Release Binaries / build-amd64 (push) Failing after 2m16s
Connects to a relay over QUIC with SNI "version", reads build hash
from a unidirectional stream, prints "<relay> <git-hash>" and exits.

Usage: wzp-client --version-check 172.16.81.175:4434
Output: 172.16.81.175:4434 8dbda3e

Relay side: detects "version" SNI, opens uni stream, writes
BUILD_GIT_HASH, waits 100ms for client to read, closes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:47:37 +04:00
Siavash Sameni
8dbda3e052 feat: --version flag with git hash + test script kill fix
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m9s
Mirror to GitHub / mirror (push) Failing after 32s
wzp-relay --version prints "wzp-relay <short-git-hash>".
Build hash also logged on startup: version=abc1234.
Enables verifying deployed relay matches expected build.

Also fixed federation-test.sh: use kill -INT (not SIGTERM) so
clients save recordings before exit. Added save delay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:36:33 +04:00
Siavash Sameni
c8a3aaacb6 feat: comprehensive federation test harness
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
7 test scenarios across 3 relays:
1. Basic 2-relay audio (A→B)
2. Reverse direction (B→A)
3. 3-relay chain (A→B→C)
4. File playback (60s test audio)
5. Reconnection (join/leave/rejoin)
6. Multi-participant (3 users on 3 relays)
7. Simultaneous senders (2 senders, 1 recorder)

Usage: ./scripts/federation-test.sh <relay1> <relay2> <relay3>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:19:15 +04:00
Siavash Sameni
395a0c557e feat: TX/RX codec badges on desktop call screen
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
Desktop now shows codec badges like Android:
- Green TX badge: e.g. "Opus64k"
- Blue RX badge: e.g. "Opus24k"
Displayed in the stats line below the call controls.

Engine tracks tx_codec (set on encoder init) and rx_codec (updated
from incoming packet headers). Passed through EngineStatus → CallStatus
→ frontend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:03:20 +04:00
Siavash Sameni
54cb6c3b71 feat: relay_label in RoomParticipant + tagged remote participants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 44s
Build Release Binaries / build-amd64 (push) Failing after 2m26s
RoomParticipant.relay_label identifies which relay a participant is
connected to. Local participants have None, federated participants
get tagged with the peer relay's label when storing remote_participants.

This enables clients to group participants by relay in the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:22:53 +04:00
Siavash Sameni
da593f9510 feat: relay-grouped participant rendering + relay_label in protocol
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m47s
RoomParticipant now has optional relay_label field. Desktop client
groups participants by relay: "This Relay" (green dot) for local,
peer label (blue dot) for federated. Shows all relays in the chain
including intermediate ones.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:22:05 +04:00
Siavash Sameni
a3ebf5616f fix: unified raw room names + merged presence on join
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
1. CLI client now sends raw room names (no hash), matching Android
   JNI and Desktop Tauri. All three clients are now consistent.

2. When a client joins a global room, the relay merges federated
   remote participants into the initial RoomUpdate. Previously,
   clients that joined after the GlobalRoomActive signal only saw
   local participants. Now they see everyone immediately.

3. Added get_remote_participants() to FederationManager for querying
   cached remote participants from all peer links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:09:15 +04:00
Siavash Sameni
ff6d0444c0 feat: federation Prometheus metrics — peer status, packets, active rooms
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 2m8s
Wires up the existing RelayMetrics federation fields:
- wzp_federation_peer_status{peer} — 1=connected, 0=disconnected
- wzp_federation_packets_forwarded_total{peer,direction} — in/out counts
- wzp_federation_active_rooms — number of active federated rooms

These are critical for monitoring federation health and will feed into
the adaptive codec selection system (PRD-coordinated-codec.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:00:13 +04:00
Siavash Sameni
8080713098 feat: federated presence — RoomUpdate includes remote participants
Some checks failed
Mirror to GitHub / mirror (push) Failing after 42s
Build Release Binaries / build-amd64 (push) Failing after 2m29s
GlobalRoomActive signal now carries participant list from the
announcing relay. When received, the relay:
1. Stores remote participants per peer link
2. Broadcasts merged RoomUpdate to local clients (local + all remote)

This means clients on different relays can now SEE each other in the
participant list. Also fixes build: removed non-existent metric field
references that were added by linter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:52:27 +04:00
Siavash Sameni
e813362395 feat: federation metrics + dedup + rate limiting
Some checks failed
Mirror to GitHub / mirror (push) Failing after 33s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
Add Prometheus metrics for federation links (per-peer RTT, packet
counters, active rooms gauge, dedup/rate-limit drop counters).

Add dedup filter (4096-entry ring buffer) to drop duplicate packets
arriving via multiple federation paths. Add per-room token bucket
rate limiter (500 pps) to prevent amplification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:36:26 +04:00
Siavash Sameni
d52b8befd6 fix: canonical room hash for federation — handles hashed vs raw room names
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m13s
Different clients send different room names:
- Android: raw "general" as SNI
- Desktop: hash_room_name("general") = "f09ae11d..." as SNI

Federation datagrams are tagged with an 8-byte room hash. Previously,
each relay computed the hash from the client-provided room name,
causing mismatches between relays with different client types.

Fix: resolve_global_room() maps any room name (raw or hashed) to the
canonical [[global_rooms]] name. global_room_hash() always uses the
canonical name for federation hashing. handle_datagram uses both raw
and canonical hash matching to find the local room.

Also: run_participant now receives the pre-computed federation_room_hash
so the egress uses the canonical hash, not the client-specific name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:31:26 +04:00
Siavash Sameni
0abecf7fd8 feat: adaptive quality engine + codec indicator UI
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 2m17s
Wire AdaptiveQualityController into Android engine for auto codec
switching based on network quality reports. Add color-coded TX/RX
codec badges to the in-call screen showing active codecs and Auto mode.

- Recv task: ingest QualityReports, feed to controller, signal profile
  changes via AtomicU8 to send task
- Send task: check for pending profile switch at frame boundaries,
  update encoder/FEC/frame size
- Track peer codec from incoming packet headers
- Kotlin UI: codec badges (blue=studio, green=good, amber=degraded,
  red=catastrophic) with Auto tag
- Add .taskmaster to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:19:11 +04:00
Siavash Sameni
f4cc3b1a6b fix: forward media to ALL connected peers, not just those with room active
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 2m14s
The bug: when a local client joins a global room and sends media, the
egress task checked peer_links.active_rooms to decide where to forward.
But active_rooms tracks what PEERS announced (their rooms), not what
WE announced. So our own GlobalRoomActive signal went out but our
peer_links had empty active_rooms — media was dropped.

Fix: for locally-originated media, send to ALL connected federation
peers unconditionally. The receiving relay decides whether to deliver
to local participants (if it has the room) or forward further. This
is correct because federation peers are explicitly configured — if
they're connected, they should receive global room media.

Multi-hop forwarding (handle_datagram) still filters by active_rooms
to prevent loops — only forwards to peers that announced the room.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:09:50 +04:00
Siavash Sameni
af4c89f5f0 docs: PRD for delegated trust in relay federation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m51s
Addresses the trust gap where a hub relay can forward media from
unknown relays without the receiving relay's consent. Introduces
delegate=true flag on [[trusted]] entries: when set, the relay
accepts media forwarded through the trusted peer from relays it
vouches for. Without delegate, only direct media is accepted.

Covers: FederationTrustChain signal, origin authorization checks,
TTL for chain depth limiting, anti-spam properties. 5 phases, ~3 days.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:00:21 +04:00
Siavash Sameni
406461d460 feat: personalized config generation with --listen addr + own fingerprint
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m16s
When --config points to a non-existent file, the relay now generates
a personalized example config that includes:
- listen_addr matching the --listen flag (not hardcoded 0.0.0.0:4433)
- Pre-filled [[peers]] section with this relay's detected IP, port,
  and TLS fingerprint — ready to copy/paste into other relay configs

This makes setting up federation much easier: start each relay, it
generates its config with its own peering info commented out, you
just uncomment and copy between configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:38:28 +04:00
Siavash Sameni
7064f484af feat: -c/--config and -i/--identity flags for multi-instance relay
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m17s
Enables running multiple relays on the same machine:
  wzp-relay -c ~/.wzp1/config.toml -i ~/.wzp1/relay-identity --listen :4433
  wzp-relay -c ~/.wzp2/config.toml -i ~/.wzp2/relay-identity --listen :4434
  wzp-relay -c ~/.wzp3/config.toml -i ~/.wzp3/relay-identity --listen :4435

Config auto-creation: if the config file doesn't exist, writes an
example config with all fields documented and commented. The relay
starts with defaults but the file is ready to edit.

Identity auto-generation: if the identity file doesn't exist, generates
a new random seed (OsRng via wzp_crypto::Seed::generate) and saves it.
Subsequent starts load the same identity.

Short flags: -c for --config, -i for --identity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:18:48 +04:00
Siavash Sameni
1d2222a25a debug: add datagram receive + multi-hop forward error logging
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 2m28s
Added logging to trace federation media flow:
- media_task logs first + every 250th received datagram (count, len)
- handle_datagram multi-hop forward logs errors (was silently dropped)
- forward_to_peers logs when no peer matches

2-relay (A→B): WORKING — full audio received, 300 packets forwarded
3-relay (A→B→C): B receives datagrams from A but only 1 arrives —
  remaining packets not received, likely a QUIC read_datagram issue
  when handle_datagram holds locks during processing. Needs further
  investigation into async lock contention or datagram buffering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:45:54 +04:00
Siavash Sameni
270e139f20 feat: federation media forwarding WORKING — global rooms router model complete
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
2-relay test: 5.0s audio, RMS 4748, PASS. Full pipeline verified:
- Room correctly identified as global (hash matching works)
- Federation egress channel created and connected
- GlobalRoomActive signals exchanged between peers
- 300 packets (250 source + 50 FEC) forwarded via tagged datagrams
- Client B on relay B received full 5-second tone from client A on relay A

Added debug logging: is_global check, egress channel creation, per-peer
forwarding with active_rooms diagnostic when no match found. Also logs
egress packet count (first + every 250th).

Multi-hop propagation: GlobalRoomActive signals forwarded to other peers
so A→B→C chain knows about rooms across the full mesh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:31:37 +04:00
Siavash Sameni
d9b2e0fd53 docs: comprehensive documentation — design, architecture, admin, user guide
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
4 files, 2,511 lines covering the entire WarzonePhone project:

DESIGN.md (591 lines): system overview, codec system (9 variants),
FEC (RaptorQ), transport (QUIC/quinn), security (Ed25519/X25519/
ChaCha20/HKDF/BIP39/TOFU), federation (global rooms), jitter buffer.
Mermaid diagrams for audio pipelines and crate dependencies.

ARCHITECTURE.md (874 lines): 15 mermaid diagrams — system overview,
encode/decode pipelines, relay SFU, federation topology/protocol,
signal handshake, client architectures (desktop/android/CLI), wire
format tables (MediaHeader/MiniHeader/QualityReport), project tree.

ADMINISTRATION.md (587 lines): relay deployment (binary/Docker/systemd),
complete TOML config reference, CLI flags table, federation setup
(peers/trusted/global_rooms), 3 example configs, Prometheus metrics,
auth, identity persistence, 12-item troubleshooting guide.

USER_GUIDE.md (459 lines): all clients — desktop (settings, quality
slider, key warning, shortcuts), Android (8-level quality slider,
server management, identity backup), CLI (flags table, 8 usage
patterns). Identity system, quality profiles when-to-use guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:21:13 +04:00
Siavash Sameni
898c1ea32b docs: PRDs for P2P direct calls and coordinated codec switching
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
PRD-p2p-direct.md: STUN-based NAT traversal for direct QUIC
connections between clients. True E2E with mutual TLS cert pinning
via identity fingerprints. Hybrid mode: try P2P, fall back to relay.
4 phases: STUN discovery, hole punching, P2P adaptive quality,
seamless relay-to-P2P migration.

PRD-coordinated-codec.md: Relay acts as quality judge — monitors
per-participant loss/RTT/jitter, sends quality directives. Downgrade
is immediate (match weakest link), upgrade is consensual (all
participants must agree, synchronized switch at agreed timestamp).
Covers asymmetric encoding in SFU and P2P→relay backporting strategy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:12:12 +04:00
Siavash Sameni
b00db5dfdc feat: federation rewrite — global rooms router model
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
Major rewrite of relay federation replacing virtual participants with
a clean router model:

1. Global rooms: [[global_rooms]] in TOML config declares rooms that
   are bridged across federation. Each relay is a router + local SFU.

2. Room events: RoomManager emits LocalJoin/LocalLeave via broadcast
   channel when rooms transition between empty and non-empty.

3. GlobalRoomActive/Inactive signals: relays announce when they have
   local participants in global rooms. Peers track active state and
   forward media accordingly. Announcements propagate for multi-hop.

4. Media forwarding: separated from SFU loop. Local participant sends
   via mpsc channel → egress task → forward_to_peers() → room-hash
   tagged datagrams to active peer links. Inbound datagrams delivered
   to local participants + forwarded to other active peers (multi-hop).

5. Loop prevention: don't forward back to source relay.

6. Room name hashing: is_global_room() checks both plain name and
   hash (clients hash room names for SNI privacy).

Removed: ParticipantSender::Federation, federated_participants, virtual
participant join/leave, periodic room polling. Rooms now only contain
local participants.

Signaling tested: 3-relay chain (A→B←C) correctly propagates
GlobalRoomActive through B to both A and C. Media forwarding plumbing
in place but needs final debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 07:54:38 +04:00
Siavash Sameni
bc8bb3d790 feat: [[trusted]] config + FederationHello for one-sided federation
Some checks failed
Mirror to GitHub / mirror (push) Failing after 34s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
- Added [[trusted]] config: relay B can accept inbound federation
  from relay A by fingerprint alone, without knowing A's address.
  A connects to B with [[peers]], B trusts A with [[trusted]].

- FederationHello signal: outbound connections send their TLS
  fingerprint as first signal. The accepting relay verifies it
  against [[peers]] (by IP) or [[trusted]] (by fingerprint).

- Tested 3-relay chain: A→B←C. Both A and C connect to B, B trusts
  both. B correctly accepts both inbound connections. Room
  announcements flow A→B and C→B.

- Remaining: B needs to announce rooms back to A and C on the same
  connection so media can flow A→B→C. Currently A has no virtual
  participant for B, so media doesn't reach B's SFU for forwarding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:49:20 +04:00
Siavash Sameni
ea51d068e6 feat: --debug-tap for relay packet header logging
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
Adds --debug-tap <room> flag (or debug_tap in TOML config) that logs
every media packet's header metadata passing through a room. Use '*'
for all rooms.

Output (via tracing target "debug_tap"):
  TAP room=... dir=in addr=... seq=31 codec=Opus24k ts=520
      fec_block=5 fec_sym=1 repair=false len=65 fan_out=1

Shows: direction, source address, sequence number, codec ID, timestamp,
FEC block/symbol, repair flag, payload size, and fan-out count.
No decryption needed — headers are not encrypted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:34:22 +04:00
Siavash Sameni
7271942c6a feat: federation media forwarding working — audio crosses between relays
Some checks failed
Mirror to GitHub / mirror (push) Failing after 33s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
Added debug logging to federation signal path. Fixed the announce/recv
flow: outbound link's announce_task sends FederationRoomJoin, peer's
inbound signal_task receives it and creates virtual participant.

Tested: two relays on localhost with mutual TOML config, client A
sends tone via relay A, client B records via relay B — audio received
through federation (0.1s/RMS 7291/PASS).

Room announcement delay is ~1s (poll interval). The full pipeline:
client join → room created → announce_task detects → sends signal →
peer receives → creates virtual participant → SFU loop forwards
media via room-hash-tagged datagrams → peer demuxes → local delivery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 06:26:49 +04:00
Siavash Sameni
da84ed332c docs: PRD for protocol analyzer — relay debug tap + full analyzer tool
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m47s
Two tools:
1. --debug-tap on relay: logs packet header metadata (seq, codec, ts,
   FEC, repair, size) per room without decryption. 0.5 day effort.
2. wzp-analyzer standalone: joins room as observer, decodes audio,
   shows TUI with per-participant waveforms + quality stats + FEC
   recovery rates. Capture/replay and HTML reports. 5-8 days total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 05:57:27 +04:00
Siavash Sameni
e50925e05a fix: IP-based peer matching for inbound federation + room announcements
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
- Inbound federation connections now matched by source IP against
  configured peer URLs (QUIC clients don't present TLS certs, so
  fingerprint matching fails for inbound direction).
- Added periodic room announcement task (1s poll) that sends
  FederationRoomJoin to peers when new rooms appear with local
  participants. Handles rooms created after federation link is up.
- Added find_peer_by_addr() to FederationManager.

Federation link topology: each relay pair has 2 connections (outbound
from each side). Outbound sends signals, peer's inbound receives them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 05:49:37 +04:00
Siavash Sameni
6be36e43c2 feat: relay federation infrastructure — room bridging, loop prevention, peer connections
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 2m1s
Phase 1 of relay federation:

1. Signal messages: FederationRoomJoin/Leave/ParticipantUpdate added
   to SignalMessage enum for relay-to-relay room coordination.

2. Room changes: ParticipantOrigin (Local/Federated) tracking, loop
   prevention (federated media only forwards to local participants),
   ParticipantSender::Federation with 8-byte room-hash prefixed
   datagrams, merged participant lists (local + remote), new methods:
   join_federated(), update_federated_participants(), local_senders(),
   active_rooms(), local_participants().

3. FederationManager: connects to configured peers via QUIC with SNI
   "_federation", reconnects with exponential backoff (5s-300s),
   exchanges FederationRoomJoin signals, runs recv loops for both
   signals and media datagrams, creates virtual participants in rooms.

4. Accept-side: _federation SNI handling in main.rs, unknown peer
   gets helpful "add to relay.toml" log message, recognized peers
   handed off to FederationManager.

TODO: TLS fingerprint verification — currently outbound connections
use client_config() which doesn't present a cert, so inbound
verification fails. Need mutual TLS or URL-based peer matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:30:18 +04:00
Siavash Sameni
2f2720802d feat: TOML config file with federation peers + --config flag
The relay now supports loading configuration from a TOML file via
--config <path>. CLI flags override TOML values. All fields have
serde defaults so a minimal config only needs what you want to change.

Example relay.toml:
  listen_addr = "0.0.0.0:4433"
  [[peers]]
  url = "193.180.213.68:4433"
  fingerprint = "1a:39:38:..."
  label = "Pangolin EU"

Federation hint on startup now shows TOML format with TLS fingerprint
(not Ed25519 identity fingerprint), since TLS fingerprint is what
peers actually verify. Configured peers are logged on startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:13:56 +04:00
Siavash Sameni
087bfd2335 feat: deterministic TLS certificate from relay identity seed
The relay's TLS certificate is now derived from the persisted
Ed25519 seed via HKDF, so the same seed produces the same cert
and the same TLS fingerprint across restarts. This fixes the
"Server Key Changed" warnings on every relay restart.

Implementation: HKDF-SHA256(seed, "wzp-tls-ed25519") → Ed25519
signing key → PKCS8 DER → rcgen KeyPair → self-signed cert.

Also adds tls_fingerprint() helper (SHA-256 of DER cert, hex with
colons) and prints it on startup. This is the prerequisite for
relay federation (peers verify each other by TLS fingerprint).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:10:08 +04:00
Siavash Sameni
0a05e62c7f feat: relay prints federation peering config on startup
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m50s
On startup, the relay detects its outbound IP (via UDP socket trick)
and prints a ready-to-copy YAML snippet for other relays to federate:

  federation: to peer with this relay, add to peers config:
    - url: "193.180.213.68:4433"
      fingerprint: "a5d6:e3c6:..."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:37:10 +04:00
Siavash Sameni
b97f32ce46 docs: PRD for relay federation (multi-relay mesh) + identity fix
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
Documents the relay TLS identity bug (cert regenerates on restart
because server_config() creates a new keypair every time, ignoring
the persisted Ed25519 seed) and the full federation design:

- YAML config with mutual peer trust (url + fingerprint)
- QUIC connections between peers, fingerprint verification
- Room bridging: media forwarding for shared room names
- Merged participant presence across relays
- Helpful log message for unconfigured peer connection attempts
- No transcoding, no re-encryption, no central coordinator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:33:05 +04:00
Siavash Sameni
d66d583583 docs: PRD for adaptive quality control (auto codec)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 1m55s
Covers the full design for runtime codec switching based on network
conditions: 3-tier basic (GOOD/DEGRADED/CATASTROPHIC), extended
5-tier with studio levels, and bandwidth probing. Details the
existing QualityAdapter infrastructure, what's missing (report
ingestion, profile switch loop, cross-task signaling via AtomicU8),
and implementation plan for both Android and desktop engines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:25:33 +04:00
Siavash Sameni
d06cf66538 fix: auto codec, force-ping button, relay delete button
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
1. Auto codec: new "Auto" position on quality slider (JNI index 7).
   When selected, the engine uses the relay's chosen_profile from
   CallAnswer instead of the local preference. Slider now has 8
   positions: Studio 64k → Auto → Codec2 1.2k.

2. Force ping: added refresh button (↻) in Manage Relays dialog
   header. Calls pingAllServers() to re-check all relays on demand.

3. Delete relay fix: the X button was inside a Surface(onClick=...)
   which swallowed the touch event. Replaced with a separate Surface
   that properly intercepts the click.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:22:24 +04:00
Siavash Sameni
7bddc6b5a6 fix: advertise studio profiles in desktop handshake supported_profiles
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m55s
Same fix as Android — the CallOffer now includes STUDIO_64K/48K/32K
so the relay can negotiate studio quality levels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:06:48 +04:00
Siavash Sameni
c8bcc5c974 fix: advertise studio profiles in handshake supported_profiles
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 2m7s
Mirror to GitHub / mirror (push) Failing after 35s
The CallOffer only advertised GOOD/DEGRADED/CATASTROPHIC. When a
client uses a studio profile, the relay's choose_profile couldn't
pick it. Now advertises all 6 profiles (studio 64k/48k/32k + good +
degraded + catastrophic) in both Android engine and shared handshake.

Also: the relay MUST be rebuilt with the new CodecId variants,
otherwise it will fail to deserialize CallOffer messages containing
studio QualityProfiles in supported_profiles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:39:31 +04:00
Siavash Sameni
760126b6ab fix: remove duplicate Kotlin imports causing build failure
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 2m5s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:17:33 +04:00
Siavash Sameni
53f8bf8fff feat: full quality tiers + slider UI + key-change warning on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m52s
1. Wire protocol: add Opus 32k/48k/64k (CodecId 6/7/8) + STUDIO
   profiles with is_opus() helper. Opus enc/dec accept all Opus variants.

2. JNI bridge: expand profile_from_int to 7 levels (0-6) mapping to
   GOOD, DEGRADED, CATASTROPHIC, Codec2_3200, STUDIO_32K/48K/64K.

3. Settings UI: replace radio buttons with Material3 Slider — 7 stops
   from Studio 64k (green) to Codec2 1.2k (dark red), matching desktop.

4. Key-change warning: AlertDialog on connect when server fingerprint
   has changed. Shows old vs new fingerprint, Accept New Key or Cancel.
   Accepting saves the new fingerprint and proceeds with the call.

5. Engine recv: handle studio codec IDs in auto-switch path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:11:29 +04:00
Siavash Sameni
3b85604b41 docs: PRDs for local recording + mixer and studio quality tiers
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m56s
PRD-local-recording.md: Dual-path architecture for podcast-quality
interviews — local lossless WAV recording alongside live call, with
sync markers for post-session alignment, resumable upload to a
self-hosted mixer service that produces normalized multi-track output.

PRD-studio-quality.md: Documents the Opus 32k/48k/64k studio tiers,
when to use them, cross-codec interop, and backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:32:24 +04:00
Siavash Sameni
a8c2011445 feat: add Opus 32k/48k/64k studio quality tiers
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Has been cancelled
Adds three new codec IDs (Opus32k=6, Opus48k=7, Opus64k=8) and
corresponding STUDIO_32K, STUDIO_48K, STUDIO_64K quality profiles.
All use 20ms frames with minimal FEC (10%) for maximum quality on
good networks.

Updated across: wire protocol (codec_id.rs), encoder/decoder
(opus_enc/dec.rs), adaptive codec switch (call.rs), CLI
(--profile studio-64k), desktop engine + UI slider (8 quality
levels from Studio 64k green to Codec2 1.2k red).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:31:05 +04:00
Siavash Sameni
ded49bdb7b feat: replace browser confirm with proper key-change warning dialog
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m57s
When the relay's server key changes (e.g. after restart), show a
styled in-app warning dialog instead of the ugly browser confirm().
The dialog shows old vs new fingerprints and lets the user accept
the new key or cancel. Accepting updates the saved fingerprint and
refreshes the relay button state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:19:53 +04:00
Siavash Sameni
b3cdad0c75 fix: copy libc++_shared.so from NDK when cargo-ndk skips it
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
cargo-ndk doesn't always copy libc++_shared.so into jniLibs. The
build script now finds it in the NDK and copies it manually if
missing, preventing the build check from failing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:06:28 +04:00
Siavash Sameni
fa3c7f1cef fix: dynamic frame sizing for non-default quality profiles on Android
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
The send loop was hardcoded to 960 samples (20ms/Opus24k), causing
DEGRADED (Opus 6k, 40ms) and CATASTROPHIC (Codec2 1200, 40ms) to
fail — the encoder needed 1920 samples but only got 960.

Changes:
- capture_buf, ring read threshold, and timestamp increment are now
  computed from profile.frame_duration_ms (960 for 20ms, 1920 for 40ms)
- decode_buf sized to MAX_FRAME_SAMPLES (1920) to handle any incoming codec
- recv codec switch now uses correct QualityProfile per codec (was
  inheriting original profile's frame_duration_ms, breaking cross-codec)
- added ComfortNoise guard on recv path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:00:27 +04:00
Siavash Sameni
369347ce54 fix: remove unused FRAME_SAMPLES_20MS constant in desktop engine
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m53s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:54:13 +04:00
Siavash Sameni
44f04b55e8 feat: quality slider in settings with color gradient
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 1m56s
Replace the quality dropdown with a range slider in the settings
panel. The slider goes from Auto (green) through Opus 24k, Opus 6k
(yellow), Codec2 3.2k (orange) to Codec2 1.2k (dark red). The
track uses a green-to-red gradient and the label color updates
to match the selected level. Removed the quality dropdown from
the connect screen — quality is now settings-only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:50:46 +04:00
Siavash Sameni
85c2146760 feat: quality profile selection in desktop settings
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 1m58s
Adds a Quality dropdown (Auto / Opus 24k / Opus 6k / Codec2 3.2k /
Codec2 1.2k) to both the connect screen and settings panel. The
selected profile is passed through to the engine which configures
the encoder and decoder accordingly.

The desktop engine recv path now auto-switches the decoder codec
when incoming packets use a different codec than expected, enabling
cross-codec interop between clients on different quality settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:44:17 +04:00
Siavash Sameni
96ccb4f333 fix: auto-switch decoder codec to match incoming packets
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
The CallDecoder now inspects each incoming packet's codec_id and
automatically switches the audio decoder if it differs from the
current profile. This enables cross-codec interop where one client
sends Opus and the other sends Codec2 — previously the receiver
would try to decode with the wrong codec, producing garbled audio.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:35:31 +04:00
Siavash Sameni
95a905e1b5 feat: add --profile/--codec flag to CLI for forcing codec selection
Enables debugging Codec2 by allowing forced codec selection from CLI.
Supports: good, degraded, catastrophic, codec2-3200, codec2-1200.
Frame size, timing, and jitter buffer are all adjusted dynamically
based on the selected profile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:35:31 +04:00
Siavash Sameni
68b56d9172 fix: ping every 5min (was 5s), clean endpoint on failure, never block connect
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 3m45s
- Ping interval: 5 minutes (was 5 seconds — too aggressive)
- Rust ping_relay: explicitly close endpoint + shutdown runtime on failure
- Connect button works regardless of ping status (never blocked)
- Ping failure doesn't corrupt engine state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:40:14 +04:00
Siavash Sameni
7973c8c6a3 fix: ntfy failure notification on build error (trap ERR)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
Both Android and Linux build scripts now send ntfy notification
when build fails, not just on success.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:23:32 +04:00
Siavash Sameni
3e9539e5da fix: add libasound2-dev to Docker image for Linux audio builds
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 4m16s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:16:39 +04:00
Siavash Sameni
a1ccb3f390 feat: Linux x86_64 fire-and-forget Docker build on SepehrHomeserverdk
Some checks failed
Mirror to GitHub / mirror (push) Failing after 41s
Build Release Binaries / build-amd64 (push) Failing after 4m2s
Same Docker image as Android build. Separate cache dirs (cache-linux/)
to avoid conflicts when running both builds simultaneously.

Builds: wzp-relay, wzp-client, wzp-client-audio, wzp-web, wzp-bench
Uploads tar.gz to rustypaste, notifies ntfy.sh/wzp.

Usage:
  ./scripts/build-linux-docker.sh --pull         # fire and forget
  ./scripts/build-linux-docker.sh --pull --install # wait + download

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:09:01 +04:00
Siavash Sameni
7751439e2b feat: relay identity persistence + Linux build script
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Has been cancelled
Relay identity:
- Stored in ~/.wzp/relay-identity (hex-encoded 32-byte seed)
- Generated on first run, reused on restart
- Fingerprint stays consistent across relay restarts

Linux build script (scripts/build-linux-notify.sh):
- Fire and forget: Hetzner VM → build all binaries → upload to rustypaste → ntfy notify → destroy VM
- Builds: wzp-relay, wzp-client, wzp-client-audio, wzp-web, wzp-bench
- Packages as tar.gz, uploads to rustypaste
- --keep flag to preserve VM

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:05:49 +04:00
Siavash Sameni
20bc290c18 fix: relay handles ping connections gracefully (no timeout errors)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 4m0s
Relay recognizes SNI "ping" and returns immediately — no handshake,
no stream accept, no timeout error logs. Client closes after QUIC
connect for RTT measurement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:01:03 +04:00
Siavash Sameni
a8dc350a65 feat: codec selection in settings (Opus / Opus Low / Codec2)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 41s
Build Release Binaries / build-amd64 (push) Failing after 3m41s
- Settings UI: radio buttons for encode codec selection
- Persisted via SettingsRepository
- Passed through WzpEngine.startCall(profile=) → JNI → Rust CallStartConfig
- Decode always accepts all codecs (per-packet codec_id switch)
- 0 = Opus 24k (GOOD), 1 = Opus 6k (DEGRADED), 2 = Codec2 1.2k (CATASTROPHIC)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:50:01 +04:00
Siavash Sameni
00fa109f07 feat: codec2 support — adaptive encoder/decoder, per-packet codec switch
Some checks failed
Mirror to GitHub / mirror (push) Failing after 33s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
Android engine:
- Use wzp_codec::create_encoder/create_decoder (factory) instead of
  hardcoded OpusEncoder/OpusDecoder
- Recv path: auto-switch decoder based on incoming packet's codec_id
- Supports mixed-codec rooms (one client Opus, another Codec2)

Desktop client already uses factory functions — no changes needed.

Codec selection via QualityProfile:
- GOOD: Opus 24kbps
- DEGRADED: Opus 6kbps
- CATASTROPHIC: Codec2 1200bps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:34:14 +04:00
Siavash Sameni
1e40dec468 feat: periodic server ping every 5s while app is open
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m53s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:13:51 +04:00
Siavash Sameni
aecef0905d feat: fire-and-forget build script with ntfy + rustypaste
Some checks failed
Mirror to GitHub / mirror (push) Failing after 37s
Build Release Binaries / build-amd64 (push) Failing after 3m59s
- Uploads build script to remote, runs in tmux (survives SSH drop)
- Builds Rust + APK in Docker
- Validates both .so files present before APK build
- Uploads APK to rustypaste
- Sends ntfy.sh/wzp notification with download URL
- --install flag: waits + downloads + adb installs locally
- --rust flag: force clean Rust rebuild
- --pull flag: git pull before building

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:00:49 +04:00
Siavash Sameni
18f7faa279 fix: ping as engine instance method — same lifecycle as call
Some checks failed
Mirror to GitHub / mirror (push) Failing after 7s
Build Release Binaries / build-amd64 (push) Failing after 19s
Ping was a static JNI method that loaded the .so before nativeInit,
crashing jemalloc. Now ping is an instance method on WzpEngine:

- Engine is created once (nativeInit), reused for both ping and call
- pingRelay() uses same tokio runtime pattern as startCall()
- Auto-pings all servers on app launch (after engine init)
- No process restart needed
- TOFU fingerprints saved on first successful ping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:49:33 +04:00
Siavash Sameni
eeb85aeac2 feat: ping-and-exit for server RTT, remove broken UDP ping
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m38s
- Ping button: pings all servers via native QUIC, saves RTT + fingerprint
  to SharedPreferences, then exits process (System.exit)
- On restart: loads saved ping results (no native .so loading needed)
- Avoids jemalloc crash: native lib only loaded once per process lifetime
- Removed broken UDP probe (QUIC servers don't respond to it)
- SettingsRepository: savePingRtt/loadPingRtt for cached results
- PingResult: added reachable field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:31:02 +04:00
Siavash Sameni
00b405aa87 feat: debug recording off by default, toggle in settings
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
- AudioPipeline.debugRecording defaults to false (was true)
- SettingsRepository: persist debug_recording preference
- CallViewModel: debugRecording StateFlow + setter, wired to AudioPipeline
- Only records PCM + RMS when explicitly enabled in settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:01:43 +04:00
Siavash Sameni
d09e21965e feat: pure Kotlin UDP ping — periodic every 5s, no JNI crash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 3m35s
Replace WzpEngine.pingRelay() (JNI, loads native .so, crashes jemalloc
on Android 16 MTE) with pure Kotlin DatagramSocket UDP probe.

- RelayPinger: sends QUIC Version Negotiation trigger packet, measures
  RTT from response. No native lib, no JNI, zero crash risk.
- Periodic: pings all servers every 5 seconds via coroutine
- Server fingerprint: filled lazily on first real QUIC connection
  (TOFU still works, just delayed)
- Lock status: OFFLINE when ping fails, NEW until first connection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 08:57:27 +04:00
Siavash Sameni
97bcc79f9b feat: desktop-style UI + docker build scripts, fix ping crash
Some checks failed
Mirror to GitHub / mirror (push) Failing after 39s
Build Release Binaries / build-amd64 (push) Failing after 4m3s
- InCallScreen rewrite matching desktop dark theme layout
- Removed auto-ping LaunchedEffect (loading native .so early via
  pingRelay crashes jemalloc on Android 16 MTE)
- Added Docker build scripts (Dockerfile.android-builder + build-android-docker.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 08:19:45 +04:00
Siavash Sameni
264ef9c4d4 feat: relay ping with RTT, server TOFU, lock icons (Phase 2 backport)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 40s
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Rust JNI:
- nativePingRelay: QUIC connect with 3s timeout, returns RTT + server
  certificate fingerprint as JSON. Static method, no engine needed.

Kotlin:
- WzpEngine.pingRelay() static wrapper
- SettingsRepository: TOFU fingerprint persistence (tofu_{address} keys)
- CallViewModel: pingAllServers() coroutine, lockStatus() helper,
  PingResult/LockStatus data types
- InCallScreen: server chips show lock icon + RTT color (green/yellow),
  "Ping All" button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:43:53 +04:00
Siavash Sameni
a9adb5cfd7 feat: identicons, tap-to-copy fingerprint, recent rooms (Phase 1 backport)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m47s
Backport from desktop client to Android:

Identicons:
- New Identicon.kt composable: deterministic 5x5 symmetric Canvas pattern
  from fingerprint hash (same algorithm as desktop identicon.ts)
- Participant list shows identicon + name + tappable fingerprint
- Settings page shows identicon next to fingerprint

CopyableFingerprint:
- Tap any fingerprint text to copy to clipboard with Toast feedback
- Used in participant list and settings page

Recent rooms:
- SettingsRepository: persists last 5 (relay, room) pairs
- CallViewModel: saves on startCall, exposes as StateFlow
- InCallScreen: clickable chips that fill room + select matching server

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:37:46 +04:00
Siavash Sameni
a39b074d6e fix: DirectByteBuffer as class field — survives ART JIT OSR
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
Previous attempt allocated DirectByteBuffer as local variables inside
runCapture/runPlayout. ART's JIT On-Stack Replacement nulled them
when recompiling the hot loop mid-execution.

Fix: allocate as class fields on AudioPipeline (captureDirectBuf,
playoutDirectBuf). Object fields live on the heap, immune to OSR
stack frame replacement.

Eliminates JNI array copies (GetShortArrayRegion/SetShortArrayRegion)
from the audio hot path, preventing ART GC SIGBUS crashes on
Android 16 with concurrent mark-compact GC.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:22:54 +04:00
Siavash Sameni
9cab6e2347 ci: skip build on CI-only file changes
Add paths-ignore for .gitea/** so build.yml doesn't waste runner time
when only workflow files are modified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:13:29 +04:00
Siavash Sameni
5e93cb74f2 fix: filter tracing to INFO for wzp crates, WARN for jni crate
Some checks failed
Mirror to GitHub / mirror (push) Failing after 38s
Build Release Binaries / build-amd64 (push) Failing after 4m7s
The jni crate emits VERBOSE logs for every JNI method lookup (~10 lines
per call, 100+ calls/sec on audio threads). This floods logcat, consumes
CPU, and triggers system kills. Filter to only show INFO+ for our crates
and WARN+ for everything else.

Also fix build script: clean full Rust target to ensure libc++_shared.so
is always copied by cargo-ndk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:37:29 +04:00
Siavash Sameni
b56b4a759c revert: use ShortArray audio path (DirectByteBuffer causes null ptr crash)
Some checks failed
Mirror to GitHub / mirror (push) Failing after 35s
Build Release Binaries / build-amd64 (push) Failing after 3m58s
DirectByteBuffer.clear() crashes with null pointer in ART's JIT OSR
compiled code on Android 16. Revert AudioPipeline to use the original
ShortArray writeAudio/readAudio path.

The DirectByteBuffer JNI functions remain in WzpEngine.kt and
jni_bridge.rs for future use once the OSR issue is resolved.

The original SIGBUS from ART GC is rare (~1 crash per 8 min call)
and doesn't warrant the DirectByteBuffer approach until we can
allocate the buffer as a class field outside the hot loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:17:15 +04:00
Siavash Sameni
6f99841cc7 fix: cloud build script — filter by server name, rsync upload, cx33
Some checks failed
Mirror to GitHub / mirror (push) Failing after 36s
Build Release Binaries / build-amd64 (push) Failing after 3m57s
- Filter hcloud by SERVER_NAME to avoid touching other servers
- Use rsync instead of tar (handles submodules, no macOS xattr spam)
- Default server type cx33
- Release APK failure is non-fatal (debug APK still produced)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 20:00:10 +04:00
Siavash Sameni
3b0811ce2e ci: add GitHub mirror workflow
Automatically pushes branches and tags to github.com:manawenuz/wzp.git
on every push to Forgejo. Uses GH_SSH_KEY secret for authentication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:49:59 +04:00
Siavash Sameni
9eed94850d fix: DirectByteBuffer audio path — eliminate JNI array copies
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m43s
Adds nativeWriteAudioDirect / nativeReadAudioDirect JNI functions
that accept a DirectByteBuffer instead of ShortArray. The buffer's
native memory is accessed directly by Rust via pointer — no
GetShortArrayRegion / SetShortArrayRegion, no GC-managed array
copies on the audio hot path.

This fixes SIGBUS crashes on Android 16 where ART's concurrent
mark-compact GC crashes when flipping thread roots during JNI
array operations on MAX_PRIORITY audio threads.

Old ShortArray methods kept for backward compatibility.
AudioPipeline switched to use Direct variants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:29:08 +04:00
Siavash Sameni
5e9718aeb2 docs: incident report — SIGBUS in ART GC during audio JNI calls
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m37s
Android 16's concurrent mark-compact GC crashes when flipping
thread roots on our MAX_PRIORITY audio threads during JNI calls
(AudioRecord.read / AudioTrack.write). Not our code — all crash
frames are in libart.so.

Proposed fixes:
- Short term: DirectByteBuffer to reduce JNI transitions
- Long term: Oboe native audio from Rust (no JNI, no GC)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:21:32 +04:00
Siavash Sameni
3093933602 fix: build script works on Ubuntu 24.04 (cmake 3.28) too
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m48s
cmake 3.28 works when ANDROID_NDK is set (not just ANDROID_NDK_HOME).
Relaxed version check from <=3.26 to <=3.30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:00:06 +04:00
Siavash Sameni
4c6c909732 feat: comprehensive Android build script for Debian 12
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m56s
Documents WHY each version is pinned:
- cmake 3.25: 3.27+ rewrote Android-Determine.cmake with bugs
- NDK 26.1: NDK 27 scudo crashes on MTE devices (Nothing A059)
- JDK 17: Gradle 8.5 + AGP 8.2.0 official support
- ANDROID_NDK: cmake checks this, not ANDROID_NDK_HOME

Idempotent, works from clone or existing tree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:37:12 +04:00
Siavash Sameni
33fab9a049 fix: vec allocation for AudioRing, catch_unwind on tracing init, profiling
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m49s
- AudioRing: use vec![].into_boxed_slice() instead of Box::new([]) to
  avoid 32KB stack allocation that crashes scudo on Android
- JNI bridge: wrap tracing_subscriber init in catch_unwind to survive
  sharded_slab allocation failures on some devices
- Engine: per-step encode profiling (avg_agc_us, avg_opus_us, avg_fec_us,
  avg_send_us) logged every 5 seconds in send stats

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 15:41:46 +04:00
Siavash Sameni
31d2306915 feat: per-step encode profiling in send task stats
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m48s
Adds average microsecond timings for each encode step:
- avg_agc_us: AGC processing
- avg_opus_us: Opus encoding
- avg_fec_us: FEC encode + repair generation
- avg_send_us: QUIC send_media
- avg_total_us: sum of above

Logged every 5 seconds in send stats. Resets each interval.
Use to identify which step is bottlenecking the encode loop
on devices where fps drops below 50.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 14:18:33 +04:00
Siavash Sameni
4af7c5f94c fix: AudioRing cursor desync + capture thread use-after-free
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m56s
AudioRing (reader-detects-lap architecture):
- Writer NEVER touches read_pos — fixes SPSC invariant violation
- Reader self-corrects when lapped (snaps read_pos forward)
- Power-of-2 capacity (16384 = 341ms) with bitmask indexing
- Added overflow_count and underrun_count diagnostics
- Wired ring health into engine stats and periodic logging

Capture thread use-after-free (drain latch):
- Added CountDownLatch(2) to AudioPipeline
- Audio threads count down after exiting their loops
- teardown() awaits latch (200ms timeout) before destroy()
- Guarantees no in-flight JNI calls when native handle is freed
- stopAudio() no longer nulls pipeline (teardown handles it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:28:34 +04:00
Claude
6597b5bd86 docs: incident report + fix spec for capture thread use-after-free crash
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3s
SIGSEGV on hangup: capture thread calls writeAudio() via JNI after
teardown() has freed the native engine handle. TOCTOU race between
the nativeHandle==0L check and destroy() on the ViewModel thread.

Fix: CountDownLatch(2) — audio threads count down after exiting loops,
teardown() awaits before destroy(). 2 Kotlin files, no Rust changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:21:35 +00:00
Claude
ae9d8526dd docs: implementation spec for AudioRing SPSC desync fix
Some checks failed
Build Release Binaries / build-amd64 (push) Failing after 3m51s
Complete spec for fixing the playout ring buffer cursor race that
causes 12-16s bidirectional silence mid-call. Includes exact code,
memory ordering rationale, unit tests, and verification steps.

Any agent can implement from this document alone.

See also: debug/INCIDENT-2026-04-06-playout-ring-desync.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:16:47 +00:00
140 changed files with 31711 additions and 2554 deletions

View File

@@ -0,0 +1,72 @@
---
name: caveman
description: >
Ultra-compressed communication mode. Slash token usage ~75% by speaking like caveman
while keeping full technical accuracy. Use when user says "caveman mode", "talk like caveman",
"use caveman", "less tokens", "be brief", or invokes /caveman. Also auto-triggers
when token efficiency is requested.
---
# Caveman Mode
## Core Rule
Respond like smart caveman. Cut articles, filler, pleasantries. Keep all technical substance.
## Grammar
- Drop articles (a, an, the)
- Drop filler (just, really, basically, actually, simply)
- Drop pleasantries (sure, certainly, of course, happy to)
- Short synonyms (big not extensive, fix not "implement a solution for")
- No hedging (skip "it might be worth considering")
- Fragments fine. No need full sentence
- Technical terms stay exact. "Polymorphism" stays "polymorphism"
- Code blocks unchanged. Caveman speak around code, not in code
- Error messages quoted exact. Caveman only for explanation
## Pattern
```
[thing] [action] [reason]. [next step].
```
Not:
> Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by...
Yes:
> Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:
## Examples
**User:** Why is my React component re-rendering?
**Normal (69 tokens):** "The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I'd recommend using useMemo to memoize the object."
**Caveman (19 tokens):** "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
---
**User:** How do I set up a PostgreSQL connection pool?
**Caveman:**
```
Use `pg` pool:
```
```js
const pool = new Pool({
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
})
```
```
max = concurrent connections. Keep under DB limit. idleTimeout kill stale conn.
```
## Boundaries
- Code: write normal. Caveman English only
- Git commits: normal
- PR descriptions: normal
- User say "stop caveman" or "normal mode": revert immediately

25
.gitignore vendored
View File

@@ -4,3 +4,28 @@
*.swp
*.swo
*~
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
dev-debug.log
# Dependency directories
node_modules/
# Environment variables
.env
# Editor directories and files
.idea
.vscode
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
# OS specific
# Taskmaster (local workflow tool)
.taskmaster/
.env.example

975
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -10,6 +10,7 @@ members = [
"crates/wzp-client",
"crates/wzp-web",
"crates/wzp-android",
"crates/wzp-native",
"desktop/src-tauri",
]
@@ -31,17 +32,25 @@ serde = { version = "1", features = ["derive"] }
# Transport
quinn = "0.11"
socket2 = "0.5"
# FEC
raptorq = "2"
# Codec
audiopus = "0.3.0-rc.0"
# opusic-c: high-level safe bindings over libopus 1.5.2 (encoder side).
# opusic-sys: raw FFI for the decoder side — we build our own DecoderHandle
# because opusic-c::Decoder.inner is pub(crate) and cannot be reached for the
# Phase 3 DRED reconstruction path. See docs/PRD-dred-integration.md.
# Pinned exactly (no caret) for reproducible libopus 1.5.2 across the fleet.
opusic-c = { version = "=1.5.5", default-features = false, features = ["bundled", "dred"] }
opusic-sys = { version = "=0.6.0", default-features = false, features = ["bundled"] }
bytemuck = "1"
codec2 = "0.3"
# Crypto
x25519-dalek = { version = "2", features = ["static_secrets"] }
ed25519-dalek = { version = "2", features = ["rand_core"] }
ed25519-dalek = { version = "2", features = ["rand_core", "pkcs8"] }
chacha20poly1305 = "0.10"
hkdf = "0.12"
sha2 = "0.10"
@@ -65,9 +74,7 @@ opt-level = 2
# real-time audio needs < 20ms per frame, impossible unoptimized.
[profile.dev.package.nnnoiseless]
opt-level = 3
[profile.dev.package.audiopus_sys]
opt-level = 3
[profile.dev.package.audiopus]
[profile.dev.package.opusic-sys]
opt-level = 3
[profile.dev.package.raptorq]
opt-level = 3
@@ -75,3 +82,10 @@ opt-level = 3
opt-level = 3
[profile.dev.package.wzp-fec]
opt-level = 3
# Phase 0 (opus-DRED): removed the [patch.crates-io] audiopus_sys = { path =
# "vendor/audiopus_sys" } block. That patch existed to fix a Windows clang-cl
# SIMD compile bug in libopus 1.3.1. With the swap to opusic-sys (libopus
# 1.5.2), the upstream SIMD gating was fixed and the vendor patch is
# obsolete. The vendor/audiopus_sys directory itself should be deleted as
# part of the same cleanup — see the commit that follows this Phase 0.

View File

@@ -19,6 +19,8 @@ import java.io.FileOutputStream
import java.io.OutputStreamWriter
import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.util.concurrent.CountDownLatch
import java.util.concurrent.TimeUnit
import kotlin.math.pow
import kotlin.math.sqrt
@@ -55,10 +57,23 @@ class AudioPipeline(private val context: Context) {
/** Whether to attach hardware AEC. Must be set before start(). */
var aecEnabled: Boolean = true
/** Enable debug recording of PCM + RMS histogram to cache dir. */
var debugRecording: Boolean = true
var debugRecording: Boolean = false
private var captureThread: Thread? = null
private var playoutThread: Thread? = null
// DirectByteBuffers for zero-copy JNI audio transfer.
// Allocated as class fields (NOT locals) because ART's JIT OSR
// can null local variables when it replaces the stack frame mid-loop.
// These survive OSR because they're on the heap.
private val captureDirectBuf: ByteBuffer =
ByteBuffer.allocateDirect(FRAME_SAMPLES * 2).order(ByteOrder.LITTLE_ENDIAN)
private val playoutDirectBuf: ByteBuffer =
ByteBuffer.allocateDirect(FRAME_SAMPLES * 2).order(ByteOrder.LITTLE_ENDIAN)
/** Latch counted down by each audio thread after exiting its loop.
* stop() does NOT wait on this — teardown waits via awaitDrain(). */
private var drainLatch: CountDownLatch? = null
private val debugDir: File by lazy {
File(context.cacheDir, "wzp_debug").also { it.mkdirs() }
}
@@ -66,9 +81,11 @@ class AudioPipeline(private val context: Context) {
fun start(engine: WzpEngine) {
if (running) return
running = true
drainLatch = CountDownLatch(2) // one for capture, one for playout
captureThread = Thread({
runCapture(engine)
drainLatch?.countDown() // signal: capture loop exited, no more JNI calls
// Park thread forever — exiting triggers a libcrypto TLS destructor
// crash (SIGSEGV in OPENSSL_free) on Android when a JNI-calling thread exits.
parkThread()
@@ -80,6 +97,7 @@ class AudioPipeline(private val context: Context) {
playoutThread = Thread({
runPlayout(engine)
drainLatch?.countDown() // signal: playout loop exited
parkThread()
}, "wzp-playout").apply {
isDaemon = true
@@ -92,10 +110,20 @@ class AudioPipeline(private val context: Context) {
fun stop() {
running = false
// Don't join threads are parked as daemons to avoid native TLS crash
// Don't join threads — they are parked as daemons to avoid native TLS crash.
// Don't null thread refs or drainLatch — teardown() needs awaitDrain().
Log.i(TAG, "audio pipeline stopped (running=false)")
}
/** Block until both audio threads have exited their loops (max 200ms).
* After this returns, no more JNI calls to the engine will be made. */
fun awaitDrain(): Boolean {
val ok = drainLatch?.await(200, TimeUnit.MILLISECONDS) ?: true
if (!ok) Log.w(TAG, "awaitDrain: audio threads did not drain in 200ms")
captureThread = null
playoutThread = null
Log.i(TAG, "audio pipeline stopped")
drainLatch = null
return ok
}
private fun applyGain(pcm: ShortArray, count: Int, db: Float) {
@@ -206,7 +234,10 @@ class AudioPipeline(private val context: Context) {
val read = recorder.read(pcm, 0, FRAME_SAMPLES)
if (read > 0) {
applyGain(pcm, read, captureGainDb)
engine.writeAudio(pcm)
// Zero-copy write via DirectByteBuffer (class field, survives JIT OSR)
captureDirectBuf.clear()
captureDirectBuf.asShortBuffer().put(pcm, 0, read)
engine.writeAudioDirect(captureDirectBuf, read)
// Debug: write raw PCM + RMS
if (pcmOut != null) {
@@ -285,8 +316,12 @@ class AudioPipeline(private val context: Context) {
}
try {
while (running) {
val read = engine.readAudio(pcm)
// Zero-copy read via DirectByteBuffer (class field, survives JIT OSR)
playoutDirectBuf.clear()
val read = engine.readAudioDirect(playoutDirectBuf, FRAME_SAMPLES)
if (read >= FRAME_SAMPLES) {
playoutDirectBuf.rewind()
playoutDirectBuf.asShortBuffer().get(pcm, 0, read)
applyGain(pcm, read, playoutGainDb)
track.write(pcm, 0, read)

View File

@@ -28,6 +28,9 @@ class SettingsRepository(context: Context) {
private const val KEY_PREFER_IPV6 = "prefer_ipv6"
private const val KEY_IDENTITY_SEED = "identity_seed_hex"
private const val KEY_AEC_ENABLED = "aec_enabled"
private const val KEY_DEBUG_RECORDING = "debug_recording"
private const val KEY_RECENT_ROOMS = "recent_rooms"
private const val TOFU_PREFIX = "tofu_"
}
// --- Servers ---
@@ -118,6 +121,16 @@ class SettingsRepository(context: Context) {
fun saveAecEnabled(enabled: Boolean) { prefs.edit().putBoolean(KEY_AEC_ENABLED, enabled).apply() }
fun loadAecEnabled(): Boolean = prefs.getBoolean(KEY_AEC_ENABLED, true)
// --- Debug recording ---
fun saveDebugRecording(enabled: Boolean) { prefs.edit().putBoolean(KEY_DEBUG_RECORDING, enabled).apply() }
fun loadDebugRecording(): Boolean = prefs.getBoolean(KEY_DEBUG_RECORDING, false)
// --- Codec choice ---
// 0 = Opus (GOOD), 1 = Opus Low (DEGRADED), 2 = Codec2 (CATASTROPHIC)
fun saveCodecChoice(choice: Int) { prefs.edit().putInt("codec_choice", choice).apply() }
fun loadCodecChoice(): Int = prefs.getInt("codec_choice", 0)
// --- Identity seed ---
/**
@@ -138,4 +151,53 @@ class SettingsRepository(context: Context) {
fun saveSeedHex(hex: String) {
prefs.edit().putString(KEY_IDENTITY_SEED, hex).apply()
}
// --- Recent rooms ---
data class RecentRoom(val relay: String, val room: String)
fun addRecentRoom(relay: String, room: String) {
val rooms = loadRecentRooms().toMutableList()
rooms.removeAll { it.relay == relay && it.room == room }
rooms.add(0, RecentRoom(relay, room))
if (rooms.size > 5) rooms.subList(5, rooms.size).clear()
val arr = JSONArray()
rooms.forEach { arr.put(JSONObject().apply { put("relay", it.relay); put("room", it.room) }) }
prefs.edit().putString(KEY_RECENT_ROOMS, arr.toString()).apply()
}
fun loadRecentRooms(): List<RecentRoom> {
val json = prefs.getString(KEY_RECENT_ROOMS, null) ?: return emptyList()
return try {
val arr = JSONArray(json)
(0 until arr.length()).map { i ->
val o = arr.getJSONObject(i)
RecentRoom(o.getString("relay"), o.getString("room"))
}
} catch (_: Exception) { emptyList() }
}
fun clearRecentRooms() {
prefs.edit().remove(KEY_RECENT_ROOMS).apply()
}
// --- Server fingerprint TOFU ---
fun saveServerFingerprint(address: String, fingerprint: String) {
prefs.edit().putString("$TOFU_PREFIX$address", fingerprint).apply()
}
fun loadServerFingerprint(address: String): String? {
return prefs.getString("$TOFU_PREFIX$address", null)
}
// --- Ping RTT cache ---
fun savePingRtt(address: String, rttMs: Int) {
prefs.edit().putInt("ping_rtt_$address", rttMs).apply()
}
fun loadPingRtt(address: String): Int {
return prefs.getInt("ping_rtt_$address", -1)
}
}

View File

@@ -46,6 +46,14 @@ class DebugReporter(private val context: Context) {
val zipFile = File(context.cacheDir, "wzp_debug_${timestamp}.zip")
ZipOutputStream(BufferedOutputStream(FileOutputStream(zipFile))).use { zos ->
// Phase 4: extract DRED / classical PLC counters from the
// stats JSON so they're visible in the meta preamble at a
// glance, not buried in the trailing JSON dump.
val dredReconstructions = extractLongField(finalStatsJson, "dred_reconstructions")
val classicalPlc = extractLongField(finalStatsJson, "classical_plc_invocations")
val framesDecoded = extractLongField(finalStatsJson, "frames_decoded")
val fecRecovered = extractLongField(finalStatsJson, "fec_recovered")
// 1. Call metadata
val meta = buildString {
appendLine("=== WZ Phone Debug Report ===")
@@ -58,6 +66,18 @@ class DebugReporter(private val context: Context) {
appendLine("Device: ${android.os.Build.MANUFACTURER} ${android.os.Build.MODEL}")
appendLine("Android: ${android.os.Build.VERSION.RELEASE} (API ${android.os.Build.VERSION.SDK_INT})")
appendLine()
appendLine("=== Loss Recovery ===")
appendLine("Frames decoded: $framesDecoded")
appendLine("DRED reconstructions: $dredReconstructions (Opus neural recovery)")
appendLine("Classical PLC: $classicalPlc (fallback)")
appendLine("RaptorQ FEC recovered: $fecRecovered (Codec2 only)")
if (framesDecoded > 0) {
val dredPct = 100.0 * dredReconstructions / framesDecoded
val plcPct = 100.0 * classicalPlc / framesDecoded
appendLine("DRED rate: ${"%.2f".format(dredPct)}%")
appendLine("Classical PLC rate: ${"%.2f".format(plcPct)}%")
}
appendLine()
appendLine("=== Final Stats ===")
appendLine(finalStatsJson)
}
@@ -195,4 +215,28 @@ class DebugReporter(private val context: Context) {
FileInputStream(file).use { it.copyTo(zos) }
zos.closeEntry()
}
/**
* Tiny JSON field extractor — pulls an integer value for a top-level
* field like `"dred_reconstructions":42`. We don't want to pull in a
* full JSON parser just for the debug preamble, and the CallStats
* output is a flat record with well-known field names.
*
* Returns 0 if the field is missing or unparseable.
*/
private fun extractLongField(json: String, field: String): Long {
val key = "\"$field\":"
val idx = json.indexOf(key)
if (idx < 0) return 0
var i = idx + key.length
// Skip whitespace
while (i < json.length && json[i].isWhitespace()) i++
val start = i
while (i < json.length && (json[i].isDigit() || json[i] == '-')) i++
return try {
json.substring(start, i).toLong()
} catch (_: NumberFormatException) {
0
}
}
}

View File

@@ -33,10 +33,24 @@ data class CallStats(
val fecRecovered: Long = 0,
/** Current mic audio level (RMS, 0-32767). */
val audioLevel: Int = 0,
/** Our current outgoing codec (e.g. "Opus24k"). */
val currentCodec: String = "",
/** Last seen incoming codec from peers. */
val peerCodec: String = "",
/** Whether auto quality mode is active. */
val autoMode: Boolean = false,
/** Number of participants in the room. */
val roomParticipantCount: Int = 0,
/** Participants in the room (fingerprint + optional alias). */
val roomParticipants: List<RoomMember> = emptyList(),
/** SAS verification code (4-digit, null if not in a call). */
val sasCode: Int? = null,
/** Incoming call ID (or "relay|room" for CallSetup). */
val incomingCallId: String? = null,
/** Incoming caller's fingerprint. */
val incomingCallerFp: String? = null,
/** Incoming caller's alias. */
val incomingCallerAlias: String? = null,
) {
/** Human-readable quality label. */
val qualityLabel: String
@@ -54,7 +68,8 @@ data class CallStats(
val o = arr.getJSONObject(i)
RoomMember(
fingerprint = o.optString("fingerprint", ""),
alias = if (o.isNull("alias")) null else o.optString("alias", null)
alias = if (o.isNull("alias")) null else o.optString("alias", null),
relayLabel = if (o.isNull("relay_label")) null else o.optString("relay_label", null)
)
}
}
@@ -76,8 +91,15 @@ data class CallStats(
underruns = obj.optLong("underruns", 0),
fecRecovered = obj.optLong("fec_recovered", 0),
audioLevel = obj.optInt("audio_level", 0),
currentCodec = obj.optString("current_codec", ""),
peerCodec = obj.optString("peer_codec", ""),
autoMode = obj.optBoolean("auto_mode", false),
roomParticipantCount = obj.optInt("room_participant_count", 0),
roomParticipants = parseParticipants(obj.optJSONArray("room_participants"))
roomParticipants = parseParticipants(obj.optJSONArray("room_participants")),
sasCode = if (obj.has("sas_code")) obj.optInt("sas_code") else null,
incomingCallId = if (obj.isNull("incoming_call_id")) null else obj.optString("incoming_call_id", null),
incomingCallerFp = if (obj.isNull("incoming_caller_fp")) null else obj.optString("incoming_caller_fp", null),
incomingCallerAlias = if (obj.isNull("incoming_caller_alias")) null else obj.optString("incoming_caller_alias", null),
)
} catch (e: Exception) {
CallStats()
@@ -88,7 +110,8 @@ data class CallStats(
data class RoomMember(
val fingerprint: String,
val alias: String? = null
val alias: String? = null,
val relayLabel: String? = null
) {
/** Short display name: alias if set, otherwise first 8 chars of fingerprint. */
val displayName: String

View File

@@ -38,9 +38,12 @@ class WzpEngine(private val callback: WzpCallback) {
* @param alias display name sent to relay for room participant list
* @return 0 on success, negative error code on failure
*/
fun startCall(relayAddr: String, room: String, seedHex: String = "", token: String = "", alias: String = ""): Int {
/**
* @param profile 0 = Opus GOOD, 1 = Opus DEGRADED, 2 = Codec2 CATASTROPHIC
*/
fun startCall(relayAddr: String, room: String, seedHex: String = "", token: String = "", alias: String = "", profile: Int = 0): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
val result = nativeStartCall(nativeHandle, relayAddr, room, seedHex, token, alias)
val result = nativeStartCall(nativeHandle, relayAddr, room, seedHex, token, alias, profile)
if (result == 0) {
callback.onCallStateChanged(CallStateConstants.CONNECTING)
} else {
@@ -50,6 +53,7 @@ class WzpEngine(private val callback: WzpCallback) {
}
/** Stop the active call. Safe to call when no call is active. */
@Synchronized
fun stopCall() {
if (nativeHandle != 0L) {
nativeStopCall(nativeHandle)
@@ -73,6 +77,7 @@ class WzpEngine(private val callback: WzpCallback) {
*
* @return JSON-serialised [CallStats], or `"{}"` if the engine is not initialised.
*/
@Synchronized
fun getStats(): String {
if (nativeHandle == 0L) return "{}"
return try {
@@ -91,7 +96,19 @@ class WzpEngine(private val callback: WzpCallback) {
if (nativeHandle != 0L) nativeForceProfile(nativeHandle, profile)
}
/**
* Signal a network transport change (e.g. WiFi → LTE handoff).
*
* @param networkType matches Rust `NetworkContext` ordinals:
* 0=WiFi, 1=LTE, 2=5G, 3=3G, 4=Unknown, 5=None
* @param bandwidthKbps reported downstream bandwidth in kbps
*/
fun onNetworkChanged(networkType: Int, bandwidthKbps: Int) {
if (nativeHandle != 0L) nativeOnNetworkChanged(nativeHandle, networkType, bandwidthKbps)
}
/** Destroy the native engine and free all resources. The instance must not be reused. */
@Synchronized
fun destroy() {
if (nativeHandle != 0L) {
nativeDestroy(nativeHandle)
@@ -117,11 +134,31 @@ class WzpEngine(private val callback: WzpCallback) {
return nativeReadAudio(nativeHandle, pcm)
}
/**
* Write captured PCM from a DirectByteBuffer — zero JNI array copy.
* The buffer must be a direct ByteBuffer with native byte order containing i16 samples.
* Called from the AudioRecord capture thread.
*/
fun writeAudioDirect(buffer: java.nio.ByteBuffer, sampleCount: Int): Int {
if (nativeHandle == 0L) return 0
return nativeWriteAudioDirect(nativeHandle, buffer, sampleCount)
}
/**
* Read decoded PCM into a DirectByteBuffer — zero JNI array copy.
* The buffer must be a direct ByteBuffer with native byte order.
* Called from the AudioTrack playout thread.
*/
fun readAudioDirect(buffer: java.nio.ByteBuffer, maxSamples: Int): Int {
if (nativeHandle == 0L) return 0
return nativeReadAudioDirect(nativeHandle, buffer, maxSamples)
}
// -- JNI native methods --------------------------------------------------
private external fun nativeInit(): Long
private external fun nativeStartCall(
handle: Long, relay: String, room: String, seed: String, token: String, alias: String
handle: Long, relay: String, room: String, seed: String, token: String, alias: String, profile: Int
): Int
private external fun nativeStopCall(handle: Long)
private external fun nativeSetMute(handle: Long, muted: Boolean)
@@ -130,7 +167,58 @@ class WzpEngine(private val callback: WzpCallback) {
private external fun nativeForceProfile(handle: Long, profile: Int)
private external fun nativeWriteAudio(handle: Long, pcm: ShortArray): Int
private external fun nativeReadAudio(handle: Long, pcm: ShortArray): Int
private external fun nativeWriteAudioDirect(handle: Long, buffer: java.nio.ByteBuffer, sampleCount: Int): Int
private external fun nativeReadAudioDirect(handle: Long, buffer: java.nio.ByteBuffer, maxSamples: Int): Int
private external fun nativeDestroy(handle: Long)
private external fun nativePingRelay(handle: Long, relay: String): String?
private external fun nativeStartSignaling(handle: Long, relay: String, seed: String, token: String, alias: String): Int
private external fun nativePlaceCall(handle: Long, targetFp: String): Int
private external fun nativeAnswerCall(handle: Long, callId: String, mode: Int): Int
private external fun nativeOnNetworkChanged(handle: Long, networkType: Int, bandwidthKbps: Int)
/**
* Ping a relay server. Requires engine to be initialized.
* Returns JSON `{"rtt_ms":N,"server_fingerprint":"hex"}` or null.
*/
fun pingRelay(address: String): String? {
if (nativeHandle == 0L) return null
return nativePingRelay(nativeHandle, address)
}
/**
* Start persistent signaling connection for direct 1:1 calls.
* The engine registers on the relay and listens for incoming calls.
* Call state updates are available via [getStats].
*
* @return 0 on success, -1 on error
*/
fun startSignaling(relay: String, seed: String = "", token: String = "", alias: String = ""): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
return nativeStartSignaling(nativeHandle, relay, seed, token, alias)
}
/**
* Place a direct call to a peer by fingerprint.
* Requires [startSignaling] to have been called first.
*
* @return 0 on success, -1 on error
*/
fun placeCall(targetFingerprint: String): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
return nativePlaceCall(nativeHandle, targetFingerprint)
}
/**
* Answer an incoming direct call.
*
* @param callId The call ID from the incoming call (available in stats.incoming_call_id)
* @param mode 0=Reject, 1=AcceptTrusted (P2P in Phase 2), 2=AcceptGeneric (relay-mediated)
* @return 0 on success, -1 on error
*/
fun answerCall(callId: String, mode: Int = 2): Int {
check(nativeHandle != 0L) { "Engine not initialized" }
return nativeAnswerCall(nativeHandle, callId, mode)
}
companion object {
init {

View File

@@ -0,0 +1,141 @@
package com.wzp.net
import android.content.Context
import android.net.ConnectivityManager
import android.net.Network
import android.net.NetworkCapabilities
import android.net.NetworkRequest
import android.os.Handler
import android.os.Looper
/**
* Monitors network connectivity changes via [ConnectivityManager.NetworkCallback]
* and classifies the active transport (WiFi, LTE, 5G, 3G).
*
* Callbacks fire on the main looper so callers can safely update UI state or
* dispatch to a native engine from any callback.
*
* Usage:
* 1. Set [onNetworkChanged] to receive `(type: Int, downlinkKbps: Int)` events
* 2. Optionally set [onIpChanged] for IP address change events (mid-call ICE refresh)
* 3. Call [register] when the call starts
* 4. Call [unregister] when the call ends
*/
class NetworkMonitor(context: Context) {
private val cm = context.getSystemService(Context.CONNECTIVITY_SERVICE) as ConnectivityManager
private val mainHandler = Handler(Looper.getMainLooper())
/**
* Called when the network transport type or bandwidth changes.
* `type` constants match the Rust `NetworkContext` enum ordinals.
*/
var onNetworkChanged: ((type: Int, downlinkKbps: Int) -> Unit)? = null
/**
* Called when the device's IP address changes (link properties changed).
* Useful for triggering mid-call ICE candidate re-gathering.
*/
var onIpChanged: (() -> Unit)? = null
// Track the last emitted type to avoid redundant callbacks
@Volatile
private var lastEmittedType: Int = TYPE_UNKNOWN
private val callback = object : ConnectivityManager.NetworkCallback() {
override fun onAvailable(network: Network) {
classifyAndEmit(network)
}
override fun onCapabilitiesChanged(network: Network, caps: NetworkCapabilities) {
classifyFromCaps(caps)
}
override fun onLinkPropertiesChanged(
network: Network,
linkProperties: android.net.LinkProperties
) {
// IP address may have changed — notify for ICE refresh
onIpChanged?.invoke()
// Also re-classify in case the transport changed simultaneously
classifyAndEmit(network)
}
override fun onLost(network: Network) {
lastEmittedType = TYPE_NONE
onNetworkChanged?.invoke(TYPE_NONE, 0)
}
}
// -- Public API -----------------------------------------------------------
/** Register the network callback. Call when a call starts. */
fun register() {
val request = NetworkRequest.Builder()
.addCapability(NetworkCapabilities.NET_CAPABILITY_INTERNET)
.build()
cm.registerNetworkCallback(request, callback, mainHandler)
}
/** Unregister the network callback. Call when the call ends. */
fun unregister() {
try {
cm.unregisterNetworkCallback(callback)
} catch (_: IllegalArgumentException) {
// Already unregistered — safe to ignore
}
}
// -- Classification -------------------------------------------------------
private fun classifyAndEmit(network: Network) {
val caps = cm.getNetworkCapabilities(network) ?: return
classifyFromCaps(caps)
}
private fun classifyFromCaps(caps: NetworkCapabilities) {
val type = when {
caps.hasTransport(NetworkCapabilities.TRANSPORT_WIFI) -> TYPE_WIFI
caps.hasTransport(NetworkCapabilities.TRANSPORT_ETHERNET) -> TYPE_WIFI // treat as WiFi
caps.hasTransport(NetworkCapabilities.TRANSPORT_CELLULAR) -> classifyCellular(caps)
else -> TYPE_UNKNOWN
}
val bw = caps.getLinkDownstreamBandwidthKbps()
// Deduplicate: only emit when the transport type actually changes
if (type != lastEmittedType) {
lastEmittedType = type
onNetworkChanged?.invoke(type, bw)
}
}
/**
* Approximate cellular generation from reported downstream bandwidth.
* This avoids requiring READ_PHONE_STATE permission (needed for
* TelephonyManager.getNetworkType on API 30+).
*
* Thresholds are conservative — carriers over-report bandwidth, so we
* classify based on what's actually usable for VoIP:
* - >= 100 Mbps → 5G NR
* - >= 10 Mbps → LTE
* - < 10 Mbps → 3G or worse
*/
private fun classifyCellular(caps: NetworkCapabilities): Int {
val bw = caps.getLinkDownstreamBandwidthKbps()
return when {
bw >= 100_000 -> TYPE_CELLULAR_5G
bw >= 10_000 -> TYPE_CELLULAR_LTE
else -> TYPE_CELLULAR_3G
}
}
companion object {
/** Constants matching Rust `NetworkContext` enum ordinals. */
const val TYPE_WIFI = 0
const val TYPE_CELLULAR_LTE = 1
const val TYPE_CELLULAR_5G = 2
const val TYPE_CELLULAR_3G = 3
const val TYPE_UNKNOWN = 4
const val TYPE_NONE = 5
}
}

View File

@@ -0,0 +1,12 @@
package com.wzp.net
// Relay pinging is now done via WzpEngine.pingRelay() (instance method).
// This file kept for the data class only.
object RelayPinger {
data class PingResult(
val rttMs: Int,
val reachable: Boolean,
val serverFingerprint: String = "",
)
}

View File

@@ -5,6 +5,7 @@ import android.util.Log
import androidx.lifecycle.ViewModel
import androidx.lifecycle.viewModelScope
import com.wzp.audio.AudioPipeline
import com.wzp.audio.AudioRoute
import com.wzp.audio.AudioRouteManager
import com.wzp.data.SettingsRepository
import com.wzp.debug.DebugReporter
@@ -12,6 +13,8 @@ import com.wzp.engine.CallStats
import com.wzp.service.CallService
import com.wzp.engine.WzpCallback
import com.wzp.engine.WzpEngine
import com.wzp.net.NetworkMonitor
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.Job
import kotlinx.coroutines.delay
import kotlinx.coroutines.flow.MutableStateFlow
@@ -19,6 +22,8 @@ import kotlinx.coroutines.flow.StateFlow
import kotlinx.coroutines.flow.asStateFlow
import kotlinx.coroutines.isActive
import kotlinx.coroutines.launch
import kotlinx.coroutines.withContext
import org.json.JSONObject
import java.io.File
import java.net.Inet4Address
import java.net.Inet6Address
@@ -26,12 +31,21 @@ import java.net.InetAddress
data class ServerEntry(val address: String, val label: String)
data class PingResult(
val rttMs: Int,
val serverFingerprint: String = "",
val reachable: Boolean = rttMs > 0,
)
enum class LockStatus { UNKNOWN, OFFLINE, NEW, VERIFIED, CHANGED }
class CallViewModel : ViewModel(), WzpCallback {
private var engine: WzpEngine? = null
private var engineInitialized = false
private var audioPipeline: AudioPipeline? = null
private var audioRouteManager: AudioRouteManager? = null
private var networkMonitor: NetworkMonitor? = null
private var audioStarted = false
private var appContext: Context? = null
private var settings: SettingsRepository? = null
@@ -49,6 +63,9 @@ class CallViewModel : ViewModel(), WzpCallback {
private val _isSpeaker = MutableStateFlow(false)
val isSpeaker: StateFlow<Boolean> = _isSpeaker.asStateFlow()
private val _audioRoute = MutableStateFlow(AudioRoute.EARPIECE)
val audioRoute: StateFlow<AudioRoute> = _audioRoute.asStateFlow()
private val _stats = MutableStateFlow(CallStats())
val stats: StateFlow<CallStats> = _stats.asStateFlow()
@@ -70,6 +87,16 @@ class CallViewModel : ViewModel(), WzpCallback {
private val _preferIPv6 = MutableStateFlow(false)
val preferIPv6: StateFlow<Boolean> = _preferIPv6.asStateFlow()
private val _recentRooms = MutableStateFlow<List<com.wzp.data.SettingsRepository.RecentRoom>>(emptyList())
val recentRooms: StateFlow<List<com.wzp.data.SettingsRepository.RecentRoom>> = _recentRooms.asStateFlow()
/** Ping results keyed by server address. */
private val _pingResults = MutableStateFlow<Map<String, PingResult>>(emptyMap())
val pingResults: StateFlow<Map<String, PingResult>> = _pingResults.asStateFlow()
/** Known server fingerprints (TOFU). */
private val _knownFingerprints = MutableStateFlow<Map<String, String>>(emptyMap())
private val _playoutGainDb = MutableStateFlow(0f)
val playoutGainDb: StateFlow<Float> = _playoutGainDb.asStateFlow()
@@ -85,6 +112,18 @@ class CallViewModel : ViewModel(), WzpCallback {
private val _aecEnabled = MutableStateFlow(true)
val aecEnabled: StateFlow<Boolean> = _aecEnabled.asStateFlow()
private val _debugRecording = MutableStateFlow(false)
val debugRecording: StateFlow<Boolean> = _debugRecording.asStateFlow()
// Quality profile index (matches JNI bridge profile_from_int)
private val _codecChoice = MutableStateFlow(0)
val codecChoice: StateFlow<Int> = _codecChoice.asStateFlow()
/** Key-change warning dialog state. */
data class KeyWarningInfo(val address: String, val oldFp: String, val newFp: String)
private val _keyWarning = MutableStateFlow<KeyWarningInfo?>(null)
val keyWarning: StateFlow<KeyWarningInfo?> = _keyWarning.asStateFlow()
/** True when a call just ended and debug report can be sent. */
private val _debugReportAvailable = MutableStateFlow(false)
val debugReportAvailable: StateFlow<Boolean> = _debugReportAvailable.asStateFlow()
@@ -99,13 +138,91 @@ class CallViewModel : ViewModel(), WzpCallback {
private var statsJob: Job? = null
// ── Direct calling state ──
/** 0=room mode, 1=direct call mode */
private val _callMode = MutableStateFlow(0)
val callMode: StateFlow<Int> = _callMode.asStateFlow()
/** Target fingerprint for direct call */
private val _targetFingerprint = MutableStateFlow("")
val targetFingerprint: StateFlow<String> = _targetFingerprint.asStateFlow()
/** Signal connection state: 0=idle, 5=registered, 6=ringing, 7=incoming */
private val _signalState = MutableStateFlow(0)
val signalState: StateFlow<Int> = _signalState.asStateFlow()
/** Incoming call info */
private val _incomingCallId = MutableStateFlow<String?>(null)
val incomingCallId: StateFlow<String?> = _incomingCallId.asStateFlow()
private val _incomingCallerFp = MutableStateFlow<String?>(null)
val incomingCallerFp: StateFlow<String?> = _incomingCallerFp.asStateFlow()
private val _incomingCallerAlias = MutableStateFlow<String?>(null)
val incomingCallerAlias: StateFlow<String?> = _incomingCallerAlias.asStateFlow()
fun setCallMode(mode: Int) { _callMode.value = mode }
fun setTargetFingerprint(fp: String) { _targetFingerprint.value = fp }
/** Register on relay for direct calls */
fun registerForCalls() {
if (engine == null) {
engine = WzpEngine(this).also { it.init() }
}
val serverIdx = _selectedServer.value
val serverList = _servers.value
if (serverIdx >= serverList.size) return
val relay = serverList[serverIdx].address
val seed = _seedHex.value
val alias = _alias.value
viewModelScope.launch(Dispatchers.IO) {
val resolvedRelay = resolveToIp(relay) ?: relay
val result = engine?.startSignaling(resolvedRelay, seed, "", alias)
if (result == 0) {
_signalState.value = 5 // Registered
startStatsPolling()
} else {
_errorMessage.value = "Failed to register on relay"
}
}
}
/** Place a direct call to the target fingerprint */
fun placeDirectCall() {
val target = _targetFingerprint.value.trim()
if (target.isEmpty()) {
_errorMessage.value = "Enter a fingerprint to call"
return
}
engine?.placeCall(target)
_signalState.value = 6 // Ringing
}
/** Answer an incoming direct call */
fun answerIncomingCall(mode: Int = 2) {
val callId = _incomingCallId.value ?: return
engine?.answerCall(callId, mode)
}
/** Reject an incoming direct call */
fun rejectIncomingCall() {
val callId = _incomingCallId.value ?: return
engine?.answerCall(callId, 0) // 0 = Reject
_signalState.value = 5 // Back to registered
_incomingCallId.value = null
_incomingCallerFp.value = null
_incomingCallerAlias.value = null
}
companion object {
private const val TAG = "WzpCall"
val DEFAULT_SERVERS = listOf(
ServerEntry("172.16.81.175:4433", "LAN (172.16.81.175)"),
ServerEntry("193.180.213.68:4433", "Pangolin (IP)"),
)
const val DEFAULT_ROOM = "android"
const val DEFAULT_ROOM = "general"
}
fun setContext(context: Context) {
@@ -115,7 +232,19 @@ class CallViewModel : ViewModel(), WzpCallback {
audioPipeline = AudioPipeline(appCtx)
}
if (audioRouteManager == null) {
audioRouteManager = AudioRouteManager(appCtx)
audioRouteManager = AudioRouteManager(appCtx).also { arm ->
arm.onRouteChanged = { route ->
_audioRoute.value = route
_isSpeaker.value = (route == AudioRoute.SPEAKER)
}
}
}
if (networkMonitor == null) {
networkMonitor = NetworkMonitor(appCtx).also { nm ->
nm.onNetworkChanged = { type, bw ->
engine?.onNetworkChanged(type, bw)
}
}
}
if (debugReporter == null) {
debugReporter = DebugReporter(appCtx)
@@ -139,6 +268,9 @@ class CallViewModel : ViewModel(), WzpCallback {
_captureGainDb.value = s.loadCaptureGain()
_seedHex.value = s.getOrCreateSeedHex()
_aecEnabled.value = s.loadAecEnabled()
_debugRecording.value = s.loadDebugRecording()
_codecChoice.value = s.loadCodecChoice()
_recentRooms.value = s.loadRecentRooms()
}
fun selectServer(index: Int) {
@@ -182,6 +314,70 @@ class CallViewModel : ViewModel(), WzpCallback {
settings?.saveSelectedServer(_selectedServer.value)
}
/**
* Ping all servers via native QUIC. Requires engine to be initialized.
* Creates engine if needed, pings, keeps engine alive for subsequent Connect.
*/
fun pingAllServers() {
viewModelScope.launch {
// Ensure engine exists
if (engine == null || engine?.isInitialized != true) {
try {
engine = WzpEngine(this@CallViewModel).also { it.init() }
engineInitialized = true
} catch (e: Exception) {
Log.w(TAG, "engine init for ping failed: $e")
return@launch
}
}
val eng = engine ?: return@launch
val results = mutableMapOf<String, PingResult>()
val known = mutableMapOf<String, String>()
_servers.value.forEach { server ->
val json = withContext(Dispatchers.IO) {
eng.pingRelay(server.address)
}
if (json != null) {
try {
val obj = JSONObject(json)
val rtt = obj.getInt("rtt_ms")
val fp = obj.optString("server_fingerprint", "")
results[server.address] = PingResult(rttMs = rtt, serverFingerprint = fp)
// TOFU
if (fp.isNotEmpty()) {
val saved = settings?.loadServerFingerprint(server.address)
if (saved == null) settings?.saveServerFingerprint(server.address, fp)
known[server.address] = saved ?: fp
}
} catch (_: Exception) {}
}
}
_pingResults.value = results
_knownFingerprints.value = known
}
}
/** Load saved TOFU fingerprints. */
fun loadSavedFingerprints() {
val known = mutableMapOf<String, String>()
_servers.value.forEach { server ->
settings?.loadServerFingerprint(server.address)?.let {
known[server.address] = it
}
}
_knownFingerprints.value = known
}
/** Get lock status for a server. */
fun lockStatus(address: String): LockStatus {
val pr = _pingResults.value[address] ?: return LockStatus.UNKNOWN
if (!pr.reachable) return LockStatus.OFFLINE
val known = _knownFingerprints.value[address] ?: return LockStatus.NEW
if (pr.serverFingerprint.isEmpty()) return LockStatus.NEW
return if (pr.serverFingerprint == known) LockStatus.VERIFIED else LockStatus.CHANGED
}
fun setRoomName(name: String) {
_roomName.value = name
settings?.saveRoom(name)
@@ -214,6 +410,16 @@ class CallViewModel : ViewModel(), WzpCallback {
settings?.saveAecEnabled(enabled)
}
fun setDebugRecording(enabled: Boolean) {
_debugRecording.value = enabled
settings?.saveDebugRecording(enabled)
}
fun setCodecChoice(choice: Int) {
_codecChoice.value = choice
settings?.saveCodecChoice(choice)
}
/**
* Resolve DNS hostname to IP address on the Kotlin/Android side,
* since Rust's DNS resolution may not work on Android.
@@ -254,8 +460,17 @@ class CallViewModel : ViewModel(), WzpCallback {
Log.i(TAG, "teardown: stopping audio, stopService=$stopService")
val hadCall = audioStarted
CallService.onStopFromNotification = null
stopAudio()
stopAudio() // sets running=false (non-blocking)
stopStatsPolling()
// Wait for audio threads to exit their loops before destroying the engine.
// This guarantees no in-flight JNI calls to writeAudio/readAudio.
val drained = audioPipeline?.awaitDrain() ?: true
if (!drained) {
Log.w(TAG, "teardown: audio threads did not drain in time")
}
audioPipeline = null
Log.i(TAG, "teardown: stopping engine")
try { engine?.stopCall() } catch (e: Exception) { Log.w(TAG, "stopCall err: $e") }
try { engine?.destroy() } catch (e: Exception) { Log.w(TAG, "destroy err: $e") }
@@ -271,13 +486,82 @@ class CallViewModel : ViewModel(), WzpCallback {
Log.i(TAG, "teardown: done")
}
/** Accept the new server key and proceed with the call. */
fun acceptNewFingerprint() {
val info = _keyWarning.value ?: return
_knownFingerprints.value = _knownFingerprints.value.toMutableMap().also {
it[info.address] = info.newFp
}
settings?.saveServerFingerprint(info.address, info.newFp)
_keyWarning.value = null
startCallInternal()
}
fun dismissKeyWarning() {
_keyWarning.value = null
}
fun startCall() {
val serverEntry = _servers.value[_selectedServer.value]
// Check for key change before connecting
val ls = lockStatus(serverEntry.address)
if (ls == LockStatus.CHANGED) {
val known = _knownFingerprints.value[serverEntry.address] ?: ""
val current = _pingResults.value[serverEntry.address]?.serverFingerprint ?: ""
_keyWarning.value = KeyWarningInfo(serverEntry.address, known, current)
return
}
startCallInternal()
}
/** Start a call to a specific relay + room (used by direct call setup). */
private fun startCallInternal(relay: String, room: String) {
Log.i(TAG, "startCallDirect: relay=$relay room=$room")
try {
// Don't teardown — keep the signal connection alive
engine = WzpEngine(this)
engine!!.init()
engineInitialized = true
_callState.value = 1
_errorMessage.value = null
try { appContext?.let { CallService.start(it) } } catch (e: Exception) {
Log.w(TAG, "service start err: $e")
}
startStatsPolling()
viewModelScope.launch(kotlinx.coroutines.Dispatchers.IO) {
try {
val seed = _seedHex.value
val name = _alias.value
val result = engine?.startCall(relay, room, seedHex = seed, alias = name, profile = _codecChoice.value) ?: -1
CallService.onStopFromNotification = { stopCall() }
if (result != 0) {
_callState.value = 0
_errorMessage.value = "Failed to connect to call room (code $result)"
appContext?.let { CallService.stop(it) }
}
} catch (e: Exception) {
Log.e(TAG, "startCallDirect error", e)
_callState.value = 0
_errorMessage.value = "Engine error: ${e.message}"
appContext?.let { CallService.stop(it) }
}
}
} catch (e: Exception) {
Log.e(TAG, "startCallDirect error", e)
_callState.value = 0
_errorMessage.value = "Engine error: ${e.message}"
}
}
private fun startCallInternal() {
val serverEntry = _servers.value[_selectedServer.value]
val room = _roomName.value
Log.i(TAG, "startCall: server=${serverEntry.address} room=$room")
_debugReportAvailable.value = false
_debugReportStatus.value = null
lastCallServer = serverEntry.address
settings?.addRecentRoom(serverEntry.address, room)
_recentRooms.value = settings?.loadRecentRooms() ?: emptyList()
debugReporter?.prepareForCall()
try {
// Teardown previous call but don't stop the service (we're about to restart it)
@@ -300,7 +584,7 @@ class CallViewModel : ViewModel(), WzpCallback {
val seed = _seedHex.value
val name = _alias.value
Log.i(TAG, "startCall: resolved=$relay, alias=$name, calling engine.startCall")
val result = engine?.startCall(relay, room, seedHex = seed, alias = name) ?: -1
val result = engine?.startCall(relay, room, seedHex = seed, alias = name, profile = _codecChoice.value) ?: -1
Log.i(TAG, "startCall: engine returned $result")
// Only wire up notification callback after engine is running
CallService.onStopFromNotification = { stopCall() }
@@ -341,6 +625,27 @@ class CallViewModel : ViewModel(), WzpCallback {
audioRouteManager?.setSpeaker(newSpeaker)
}
/** Cycle audio output: Earpiece → Speaker → Bluetooth (if available) → Earpiece. */
fun cycleAudioRoute() {
val routes = audioRouteManager?.availableRoutes() ?: return
val currentIdx = routes.indexOf(_audioRoute.value)
val next = routes[(currentIdx + 1) % routes.size]
when (next) {
AudioRoute.EARPIECE -> {
audioRouteManager?.setBluetoothSco(false)
audioRouteManager?.setSpeaker(false)
}
AudioRoute.SPEAKER -> {
audioRouteManager?.setSpeaker(true)
}
AudioRoute.BLUETOOTH -> {
audioRouteManager?.setBluetoothSco(true)
}
}
_audioRoute.value = next
_isSpeaker.value = (next == AudioRoute.SPEAKER)
}
fun clearError() { _errorMessage.value = null }
fun sendDebugReport() {
@@ -391,19 +696,22 @@ class CallViewModel : ViewModel(), WzpCallback {
it.playoutGainDb = _playoutGainDb.value
it.captureGainDb = _captureGainDb.value
it.aecEnabled = _aecEnabled.value
it.debugRecording = _debugRecording.value
it.start(e)
}
audioRouteManager?.register()
networkMonitor?.register()
audioStarted = true
}
private fun stopAudio() {
if (!audioStarted) return
audioPipeline?.stop()
audioPipeline = null
audioPipeline?.stop() // sets running=false; DON'T null — teardown needs awaitDrain()
audioRouteManager?.unregister()
networkMonitor?.unregister()
audioRouteManager?.setSpeaker(false)
_isSpeaker.value = false
_audioRoute.value = AudioRoute.EARPIECE
audioStarted = false
}
@@ -422,6 +730,27 @@ class CallViewModel : ViewModel(), WzpCallback {
if (s.state != 0) {
_callState.value = s.state
}
// Track signal state changes for direct calling
if (s.state in 5..7) {
_signalState.value = s.state
}
// Incoming call detection
if (s.state == 7) { // IncomingCall
_incomingCallId.value = s.incomingCallId
_incomingCallerFp.value = s.incomingCallerFp
_incomingCallerAlias.value = s.incomingCallerAlias
}
// CallSetup: auto-connect to media room
if (s.state == 1 && s.incomingCallId != null && s.incomingCallId.contains("|")) {
// Format: "relay_addr|room_name"
val parts = s.incomingCallId.split("|", limit = 2)
if (parts.size == 2) {
val mediaRelay = parts[0]
val mediaRoom = parts[1]
Log.i(TAG, "CallSetup: connecting to $mediaRelay room $mediaRoom")
startCallInternal(mediaRelay, mediaRoom)
}
}
if (s.state == 2 && !audioStarted) {
startAudio()
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,141 @@
package com.wzp.ui.components
import android.widget.Toast
import androidx.compose.foundation.Canvas
import androidx.compose.foundation.clickable
import androidx.compose.foundation.layout.size
import androidx.compose.foundation.shape.RoundedCornerShape
import androidx.compose.runtime.Composable
import androidx.compose.ui.Modifier
import androidx.compose.ui.draw.clip
import androidx.compose.ui.geometry.Offset
import androidx.compose.ui.geometry.Size
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.platform.LocalClipboardManager
import androidx.compose.ui.platform.LocalContext
import androidx.compose.ui.text.AnnotatedString
import androidx.compose.ui.unit.Dp
import androidx.compose.ui.unit.dp
import kotlin.math.min
/**
* Deterministic identicon — generates a unique 5x5 symmetric pattern
* from a hex fingerprint string. Identical algorithm to the desktop
* TypeScript implementation in identicon.ts.
*/
@Composable
fun Identicon(
fingerprint: String,
size: Dp = 36.dp,
clickToCopy: Boolean = true,
modifier: Modifier = Modifier,
) {
val clipboard = LocalClipboardManager.current
val context = LocalContext.current
val bytes = hashBytes(fingerprint)
val (bg, fg) = deriveColors(bytes)
val grid = buildGrid(bytes)
Canvas(
modifier = modifier
.size(size)
.clip(RoundedCornerShape(size * 0.12f))
.then(
if (clickToCopy && fingerprint.isNotEmpty()) {
Modifier.clickable {
clipboard.setText(AnnotatedString(fingerprint))
Toast.makeText(context, "Copied", Toast.LENGTH_SHORT).show()
}
} else Modifier
)
) {
val cellW = this.size.width / 5f
val cellH = this.size.height / 5f
// Background
drawRect(color = bg, size = this.size)
// Foreground cells
for (y in 0 until 5) {
for (x in 0 until 5) {
if (grid[y][x]) {
drawRect(
color = fg,
topLeft = Offset(x * cellW, y * cellH),
size = Size(cellW, cellH),
)
}
}
}
}
}
/**
* Fingerprint text that copies to clipboard on tap.
*/
@Composable
fun CopyableFingerprint(
fingerprint: String,
modifier: Modifier = Modifier,
style: androidx.compose.ui.text.TextStyle = androidx.compose.material3.MaterialTheme.typography.bodySmall,
color: Color = Color.Unspecified,
) {
val clipboard = LocalClipboardManager.current
val context = LocalContext.current
androidx.compose.material3.Text(
text = fingerprint,
style = style,
color = color,
modifier = modifier.clickable {
if (fingerprint.isNotEmpty()) {
clipboard.setText(AnnotatedString(fingerprint))
Toast.makeText(context, "Fingerprint copied", Toast.LENGTH_SHORT).show()
}
}
)
}
// --- Internal helpers (matching desktop identicon.ts) ---
private fun hashBytes(hex: String): List<Int> {
val clean = hex.filter { it.isLetterOrDigit() }
val bytes = mutableListOf<Int>()
var i = 0
while (i + 1 < clean.length) {
val b = clean.substring(i, i + 2).toIntOrNull(16) ?: 0
bytes.add(b)
i += 2
}
// Pad to at least 16 bytes
while (bytes.size < 16) bytes.add(0)
return bytes
}
private fun deriveColors(bytes: List<Int>): Pair<Color, Color> {
val hue1 = bytes[0] * 360f / 256f
val hue2 = (bytes[1] * 360f / 256f + 120f) % 360f
val bg = hslToColor(hue1, 0.65f, 0.35f)
val fg = hslToColor(hue2, 0.70f, 0.55f)
return bg to fg
}
private fun buildGrid(bytes: List<Int>): List<List<Boolean>> {
return (0 until 5).map { y ->
val left = (0 until 3).map { x ->
val idx = 2 + y * 3 + x
bytes[idx % bytes.size] > 128
}
// Mirror: col3 = col1, col4 = col0
listOf(left[0], left[1], left[2], left[1], left[0])
}
}
private fun hslToColor(h: Float, s: Float, l: Float): Color {
val k = { n: Float -> (n + h / 30f) % 12f }
val a = s * min(l, 1f - l)
val f = { n: Float ->
l - a * maxOf(-1f, minOf(k(n) - 3f, minOf(9f - k(n), 1f)))
}
return Color(f(0f), f(8f), f(4f))
}

View File

@@ -1,5 +1,6 @@
package com.wzp.ui.settings
import androidx.compose.foundation.clickable
import android.content.ClipData
import android.content.ClipboardManager
import android.content.Context
@@ -22,6 +23,7 @@ import androidx.compose.material3.AlertDialog
import androidx.compose.material3.Button
import androidx.compose.material3.ButtonDefaults
import androidx.compose.material3.Divider
import androidx.compose.material3.RadioButton
import androidx.compose.material3.FilledTonalButton
import androidx.compose.material3.FilledTonalIconButton
import androidx.compose.material3.IconButtonDefaults
@@ -158,20 +160,30 @@ fun SettingsScreen(
Spacer(modifier = Modifier.height(16.dp))
// Fingerprint display
// Fingerprint display with identicon
val fingerprint = if (draftSeedHex.length >= 16) draftSeedHex.take(16).uppercase() else "Not generated"
Text(
text = "Fingerprint",
style = MaterialTheme.typography.labelSmall,
color = MaterialTheme.colorScheme.onSurfaceVariant
)
Text(
text = fingerprint.chunked(4).joinToString(" "),
style = MaterialTheme.typography.bodyMedium.copy(
fontFamily = FontFamily.Monospace
),
color = MaterialTheme.colorScheme.onSurface
)
Row(
verticalAlignment = Alignment.CenterVertically,
modifier = Modifier.padding(vertical = 4.dp)
) {
com.wzp.ui.components.Identicon(
fingerprint = draftSeedHex,
size = 40.dp,
)
Spacer(modifier = Modifier.width(12.dp))
com.wzp.ui.components.CopyableFingerprint(
fingerprint = fingerprint.chunked(4).joinToString(" "),
style = MaterialTheme.typography.bodyMedium.copy(
fontFamily = FontFamily.Monospace
),
color = MaterialTheme.colorScheme.onSurface,
)
}
Spacer(modifier = Modifier.height(12.dp))
@@ -231,6 +243,51 @@ fun SettingsScreen(
)
}
Spacer(modifier = Modifier.height(12.dp))
// Quality selection — slider from best (studio 64k) to worst (codec2 1.2k) + auto
val qualityLabels = listOf(
"Studio 64k", "Studio 48k", "Studio 32k", "Auto",
"Opus 24k", "Opus 6k", "Codec2 3.2k", "Codec2 1.2k"
)
// Map slider position to JNI profile int:
// 0=Studio64k(6), 1=Studio48k(5), 2=Studio32k(4), 3=Auto(7),
// 4=Opus24k(0), 5=Opus6k(1), 6=Codec2_3.2k(3), 7=Codec2_1.2k(2)
val sliderToProfile = intArrayOf(6, 5, 4, 7, 0, 1, 3, 2)
val profileToSlider = mapOf(6 to 0, 5 to 1, 4 to 2, 7 to 3, 0 to 4, 1 to 5, 3 to 6, 2 to 7)
val qualityColors = listOf(
Color(0xFF22C55E), Color(0xFF4ADE80), Color(0xFF86EFAC), Color(0xFFA3E635),
Color(0xFFA3E635), Color(0xFFFACC15), Color(0xFFE97320), Color(0xFF991B1B)
)
val currentCodec by viewModel.codecChoice.collectAsState()
val sliderPos = profileToSlider[currentCodec] ?: 3
Text("Quality", style = MaterialTheme.typography.bodyMedium)
Text(
text = "Decode always accepts all codecs",
style = MaterialTheme.typography.bodySmall,
color = MaterialTheme.colorScheme.onSurfaceVariant
)
Spacer(modifier = Modifier.height(4.dp))
Text(
text = qualityLabels[sliderPos],
style = MaterialTheme.typography.titleMedium.copy(fontWeight = FontWeight.Bold),
color = qualityColors[sliderPos]
)
Slider(
value = sliderPos.toFloat(),
onValueChange = { viewModel.setCodecChoice(sliderToProfile[it.toInt()]) },
valueRange = 0f..7f,
steps = 6,
modifier = Modifier.fillMaxWidth()
)
Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.SpaceBetween
) {
Text("Best", style = MaterialTheme.typography.labelSmall, color = Color(0xFF22C55E))
Text("Lowest", style = MaterialTheme.typography.labelSmall, color = Color(0xFF991B1B))
}
Spacer(modifier = Modifier.height(24.dp))
Divider()
Spacer(modifier = Modifier.height(16.dp))

View File

@@ -17,7 +17,7 @@ wzp-crypto = { workspace = true }
wzp-transport = { workspace = true }
tokio = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true }
tracing-subscriber = { workspace = true, features = ["env-filter"] }
bytes = { workspace = true }
serde = { workspace = true }
serde_json = "1"

View File

@@ -1,91 +1,128 @@
//! Lock-free SPSC ring buffers for audio PCM transfer between
//! Kotlin AudioRecord/AudioTrack threads and the Rust engine.
//! Lock-free SPSC ring buffer — "Reader-Detects-Lap" architecture.
//!
//! These use a simple spin-free design: the producer writes and advances
//! a write cursor, the consumer reads and advances a read cursor.
//! Both cursors are atomic so no mutex is needed.
//! SPSC invariant: the producer ONLY writes `write_pos`, the consumer
//! ONLY writes `read_pos`. Neither thread touches the other's cursor.
//!
//! On overflow (writer laps the reader), the writer simply overwrites
//! old buffer data. The reader detects the lap via `available() >
//! RING_CAPACITY` and snaps its own `read_pos` forward.
//!
//! Capacity is a power of 2 for bitmask indexing (no modulo).
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
/// Ring buffer capacity in i16 samples.
/// 960 samples * 10 frames = ~200ms of audio at 48kHz mono.
const RING_CAPACITY: usize = 960 * 10;
/// Ring buffer capacity — power of 2 for bitmask indexing.
/// 16384 samples = 341.3ms at 48kHz mono. 70% more headroom
/// than the previous 9600 (200ms) for surviving Android GC pauses.
const RING_CAPACITY: usize = 16384; // 2^14
const RING_MASK: usize = RING_CAPACITY - 1;
/// Lock-free single-producer single-consumer ring buffer for i16 PCM samples.
pub struct AudioRing {
buf: Box<[i16; RING_CAPACITY]>,
buf: Box<[i16]>,
/// Monotonically increasing write cursor. ONLY written by producer.
write_pos: AtomicUsize,
/// Monotonically increasing read cursor. ONLY written by consumer.
read_pos: AtomicUsize,
/// Incremented by reader when it detects it was lapped (overflow).
overflow_count: AtomicU64,
/// Incremented by reader when ring is empty (underrun).
underrun_count: AtomicU64,
}
// SAFETY: AudioRing is designed for SPSC — one thread writes, one reads.
// The atomics ensure visibility. The buffer itself is never accessed
// from the same index by both threads simultaneously because the
// producer only writes to positions between write_pos and read_pos,
// and the consumer only reads from positions between read_pos and write_pos.
// SAFETY: AudioRing is SPSC — one thread writes (producer), one reads (consumer).
// The producer only writes write_pos. The consumer only writes read_pos.
// Neither thread writes the other's cursor. Buffer indices are derived from
// the owning thread's cursor, ensuring no concurrent access to the same index.
unsafe impl Send for AudioRing {}
unsafe impl Sync for AudioRing {}
impl AudioRing {
pub fn new() -> Self {
debug_assert!(RING_CAPACITY.is_power_of_two());
Self {
buf: Box::new([0i16; RING_CAPACITY]),
buf: vec![0i16; RING_CAPACITY].into_boxed_slice(),
write_pos: AtomicUsize::new(0),
read_pos: AtomicUsize::new(0),
overflow_count: AtomicU64::new(0),
underrun_count: AtomicU64::new(0),
}
}
/// Number of samples available to read.
/// Number of samples available to read (clamped to capacity).
pub fn available(&self) -> usize {
let w = self.write_pos.load(Ordering::Acquire);
let r = self.read_pos.load(Ordering::Acquire);
w.wrapping_sub(r)
let r = self.read_pos.load(Ordering::Relaxed);
w.wrapping_sub(r).min(RING_CAPACITY)
}
/// Number of samples that can be written without overwriting.
/// Number of samples that can be written without overwriting unread data.
pub fn free_space(&self) -> usize {
RING_CAPACITY - self.available()
RING_CAPACITY.saturating_sub(self.available())
}
/// Write samples into the ring. Returns number of samples written.
/// Drops oldest samples if the ring is full.
///
/// If the ring is full, old data is silently overwritten. The reader
/// will detect the lap and self-correct. The writer NEVER touches
/// `read_pos` — this is the key invariant that prevents cursor desync.
pub fn write(&self, samples: &[i16]) -> usize {
let w = self.write_pos.load(Ordering::Relaxed);
let count = samples.len().min(RING_CAPACITY);
let w = self.write_pos.load(Ordering::Relaxed);
for i in 0..count {
let idx = (w + i) % RING_CAPACITY;
// SAFETY: We're the only writer, and the reader won't read
// past read_pos which we haven't advanced past yet.
unsafe {
let ptr = self.buf.as_ptr() as *mut i16;
*ptr.add(idx) = samples[i];
*ptr.add((w + i) & RING_MASK) = samples[i];
}
}
self.write_pos.store(w.wrapping_add(count), Ordering::Release);
// If we overwrote unread data, advance read_pos
if self.available() > RING_CAPACITY {
let new_read = self.write_pos.load(Ordering::Relaxed).wrapping_sub(RING_CAPACITY);
self.read_pos.store(new_read, Ordering::Release);
}
count
}
/// Read samples from the ring into `out`. Returns number of samples read.
///
/// If the writer has lapped the reader (overflow), `read_pos` is snapped
/// forward to the oldest valid data. This is safe because only the
/// reader thread writes `read_pos`.
pub fn read(&self, out: &mut [i16]) -> usize {
let avail = self.available();
let count = out.len().min(avail);
let w = self.write_pos.load(Ordering::Acquire);
let mut r = self.read_pos.load(Ordering::Relaxed);
let mut avail = w.wrapping_sub(r);
// Lap detection: writer has overwritten our unread data.
// Snap read_pos forward to oldest valid data in the buffer.
if avail > RING_CAPACITY {
r = w.wrapping_sub(RING_CAPACITY);
avail = RING_CAPACITY;
self.overflow_count.fetch_add(1, Ordering::Relaxed);
}
let count = out.len().min(avail);
if count == 0 {
if w == r {
self.underrun_count.fetch_add(1, Ordering::Relaxed);
}
return 0;
}
let r = self.read_pos.load(Ordering::Relaxed);
for i in 0..count {
let idx = (r + i) % RING_CAPACITY;
out[i] = unsafe { *self.buf.as_ptr().add(idx) };
out[i] = unsafe { *self.buf.as_ptr().add((r + i) & RING_MASK) };
}
self.read_pos.store(r.wrapping_add(count), Ordering::Release);
count
}
/// Number of overflow events (reader was lapped by writer).
pub fn overflow_count(&self) -> u64 {
self.overflow_count.load(Ordering::Relaxed)
}
/// Number of underrun events (reader found empty buffer).
pub fn underrun_count(&self) -> u64 {
self.underrun_count.load(Ordering::Relaxed)
}
}

View File

@@ -12,4 +12,13 @@ pub enum EngineCommand {
ForceProfile(QualityProfile),
/// Stop the call and shut down the engine.
Stop,
/// Place a direct call to a fingerprint (requires signal connection).
PlaceCall { target_fingerprint: String },
/// Answer an incoming direct call.
AnswerCall {
call_id: String,
accept_mode: wzp_proto::CallAcceptMode,
},
/// Reject an incoming direct call.
RejectCall { call_id: String },
}

File diff suppressed because it is too large Load Diff

View File

@@ -21,11 +21,24 @@ unsafe fn handle_ref(handle: jlong) -> &'static mut EngineHandle {
unsafe { &mut *(handle as *mut EngineHandle) }
}
/// 7 = auto (use relay's chosen profile)
const PROFILE_AUTO: jint = 7;
fn profile_from_int(value: jint) -> QualityProfile {
match value {
1 => QualityProfile::DEGRADED,
2 => QualityProfile::CATASTROPHIC,
_ => QualityProfile::GOOD,
0 => QualityProfile::GOOD, // Opus 24k
1 => QualityProfile::DEGRADED, // Opus 6k
2 => QualityProfile::CATASTROPHIC, // Codec2 1.2k
3 => QualityProfile { // Codec2 3.2k
codec: wzp_proto::CodecId::Codec2_3200,
fec_ratio: 0.5,
frame_duration_ms: 20,
frames_per_block: 5,
},
4 => QualityProfile::STUDIO_32K, // Opus 32k
5 => QualityProfile::STUDIO_48K, // Opus 48k
6 => QualityProfile::STUDIO_64K, // Opus 64k
_ => QualityProfile::GOOD, // auto falls back to GOOD
}
}
@@ -35,11 +48,25 @@ static INIT_LOGGING: Once = Once::new();
/// Safe to call multiple times — only the first call takes effect.
fn init_logging() {
INIT_LOGGING.call_once(|| {
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::util::SubscriberInitExt;
if let Ok(layer) = tracing_android::layer("wzp_android") {
let _ = tracing_subscriber::registry().with(layer).try_init();
}
// Wrap in catch_unwind — sharded_slab allocation inside
// tracing_subscriber::registry() can crash on some Android
// devices if scudo malloc fails during early initialization.
let _ = std::panic::catch_unwind(|| {
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::util::SubscriberInitExt;
use tracing_subscriber::EnvFilter;
if let Ok(layer) = tracing_android::layer("wzp_android") {
// Filter: INFO for our crates, WARN for everything else.
// The jni crate emits VERBOSE logs for every method lookup
// (~10 lines per JNI call, 100+ calls/sec) which floods logcat
// and causes the system to kill the app.
let filter = EnvFilter::new("warn,wzp_android=info,wzp_proto=info,wzp_transport=info,wzp_codec=info,wzp_fec=info,wzp_crypto=info");
let _ = tracing_subscriber::registry()
.with(layer)
.with(filter)
.try_init();
}
});
});
}
@@ -71,6 +98,7 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeStartCall(
seed_hex_j: JString,
token_j: JString,
alias_j: JString,
profile_j: jint,
) -> jint {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let relay_addr: String = env.get_string(&relay_addr_j).map(|s| s.into()).unwrap_or_default();
@@ -96,7 +124,8 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeStartCall(
}
let config = CallStartConfig {
profile: QualityProfile::GOOD,
profile: profile_from_int(profile_j),
auto_profile: profile_j == PROFILE_AUTO,
relay_addr,
room,
auth_token: if token.is_empty() { Vec::new() } else { token.into_bytes() },
@@ -193,6 +222,29 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeForceProfile(
}));
}
/// Signal a network transport change from the Android ConnectivityManager.
///
/// `network_type` matches the Rust `NetworkContext` enum:
/// 0=WiFi, 1=CellularLte, 2=Cellular5g, 3=Cellular3g, 4=Unknown, 5=None
///
/// The engine forwards this to the `AdaptiveQualityController` which:
/// - Preemptively downgrades one tier on WiFi→cellular
/// - Activates a 10-second FEC boost
/// - Uses faster downgrade thresholds on cellular
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeOnNetworkChanged(
_env: JNIEnv,
_class: JClass,
handle: jlong,
network_type: jint,
bandwidth_kbps: jint,
) {
let _ = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
h.engine.on_network_changed(network_type as u8, bandwidth_kbps as u32);
}));
}
/// Write captured PCM samples from Kotlin AudioRecord into the engine's capture ring.
/// pcm is a Java short[] array.
#[unsafe(no_mangle)]
@@ -209,7 +261,6 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeWriteAudio(
return 0;
}
let mut buf = vec![0i16; len];
// GetShortArrayRegion copies Java array into our buffer
if env.get_short_array_region(&pcm, 0, &mut buf).is_err() {
return 0;
}
@@ -243,6 +294,56 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeReadAudio(
result.unwrap_or(0)
}
/// Write captured PCM from a DirectByteBuffer — zero JNI array copies.
/// The ByteBuffer must contain little-endian i16 samples.
/// Called from the AudioRecord capture thread.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeWriteAudioDirect(
env: JNIEnv,
_class: JClass,
handle: jlong,
buffer: jni::objects::JByteBuffer,
sample_count: jint,
) -> jint {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
let ptr = env.get_direct_buffer_address(&buffer).unwrap_or(std::ptr::null_mut());
if ptr.is_null() || sample_count <= 0 {
return 0;
}
let samples = unsafe {
std::slice::from_raw_parts(ptr as *const i16, sample_count as usize)
};
h.engine.write_audio(samples) as jint
}));
result.unwrap_or(0)
}
/// Read decoded PCM into a DirectByteBuffer — zero JNI array copies.
/// The ByteBuffer will be filled with little-endian i16 samples.
/// Called from the AudioTrack playout thread.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeReadAudioDirect(
env: JNIEnv,
_class: JClass,
handle: jlong,
buffer: jni::objects::JByteBuffer,
max_samples: jint,
) -> jint {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
let ptr = env.get_direct_buffer_address(&buffer).unwrap_or(std::ptr::null_mut());
if ptr.is_null() || max_samples <= 0 {
return 0;
}
let samples = unsafe {
std::slice::from_raw_parts_mut(ptr as *mut i16, max_samples as usize)
};
h.engine.read_audio(samples) as jint
}));
result.unwrap_or(0)
}
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeDestroy(
_env: JNIEnv,
@@ -254,3 +355,116 @@ pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeDestroy(
drop(h);
}));
}
/// Ping a relay server — instance method, requires engine handle.
/// Returns JSON `{"rtt_ms":N,"server_fingerprint":"hex"}` or null on failure.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativePingRelay<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
relay_j: JString,
) -> jstring {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
let relay: String = env.get_string(&relay_j).map(|s| s.into()).unwrap_or_default();
match h.engine.ping_relay(&relay) {
Ok(json) => Some(json),
Err(_) => None,
}
}));
let json = match result {
Ok(Some(s)) => s,
_ => return JObject::null().into_raw(),
};
env.new_string(&json)
.map(|s| s.into_raw())
.unwrap_or(JObject::null().into_raw())
}
// ── Direct calling JNI functions ──
/// Start persistent signaling connection to relay for direct calls.
/// Returns 0 on success, -1 on error.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeStartSignaling<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
relay_addr_j: JString,
seed_hex_j: JString,
token_j: JString,
alias_j: JString,
) -> jint {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
let relay_addr: String = env.get_string(&relay_addr_j).map(|s| s.into()).unwrap_or_default();
let seed_hex: String = env.get_string(&seed_hex_j).map(|s| s.into()).unwrap_or_default();
let token: String = env.get_string(&token_j).map(|s| s.into()).unwrap_or_default();
let alias: String = env.get_string(&alias_j).map(|s| s.into()).unwrap_or_default();
h.engine.start_signaling(
&relay_addr,
&seed_hex,
if token.is_empty() { None } else { Some(&token) },
if alias.is_empty() { None } else { Some(&alias) },
)
}));
match result {
Ok(Ok(())) => 0,
Ok(Err(e)) => { error!("start_signaling failed: {e}"); -1 }
Err(_) => { error!("start_signaling panicked"); -1 }
}
}
/// Place a direct call to a target fingerprint.
/// Returns 0 on success, -1 on error.
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativePlaceCall<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
target_fp_j: JString,
) -> jint {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
let target: String = env.get_string(&target_fp_j).map(|s| s.into()).unwrap_or_default();
h.engine.place_call(&target)
}));
match result {
Ok(Ok(())) => 0,
Ok(Err(e)) => { error!("place_call failed: {e}"); -1 }
Err(_) => { error!("place_call panicked"); -1 }
}
}
/// Answer an incoming direct call.
/// mode: 0=Reject, 1=AcceptTrusted, 2=AcceptGeneric
#[unsafe(no_mangle)]
pub unsafe extern "system" fn Java_com_wzp_engine_WzpEngine_nativeAnswerCall<'a>(
mut env: JNIEnv<'a>,
_class: JClass,
handle: jlong,
call_id_j: JString,
mode: jint,
) -> jint {
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
let h = unsafe { handle_ref(handle) };
let call_id: String = env.get_string(&call_id_j).map(|s| s.into()).unwrap_or_default();
let accept_mode = match mode {
0 => wzp_proto::CallAcceptMode::Reject,
1 => wzp_proto::CallAcceptMode::AcceptTrusted,
_ => wzp_proto::CallAcceptMode::AcceptGeneric,
};
h.engine.answer_call(&call_id, accept_mode)
}));
match result {
Ok(Ok(())) => 0,
Ok(Err(e)) => { error!("answer_call failed: {e}"); -1 }
Err(_) => { error!("answer_call panicked"); -1 }
}
}

View File

@@ -8,6 +8,19 @@
//!
//! On non-Android targets, the Oboe C++ layer compiles as a stub,
//! allowing `cargo check` and unit tests on the host.
//!
//! ## Status
//!
//! **Dead code as of the Tauri mobile rewrite.** The legacy Kotlin+JNI
//! Android app that consumed this crate was replaced by a Tauri 2.x
//! Mobile app (see `desktop/src-tauri/src/engine.rs` for the live
//! Android audio recv path and `crates/wzp-native/` for the Oboe
//! bridge). We keep this crate in the workspace for reference and to
//! preserve the commit history, but it is not built by any shipping
//! target. Allow the accumulated leftover warnings so CI/workspace
//! checks stay clean — any real cleanup should happen as part of
//! removing the crate entirely, not piecemeal.
#![allow(dead_code, unused_imports, unused_variables, unused_mut)]
pub mod audio_android;
pub mod audio_ring;

View File

@@ -11,6 +11,12 @@ pub enum CallState {
Active,
Reconnecting,
Closed,
/// Connected to relay signal channel, registered for direct calls.
Registered,
/// Outgoing call ringing on callee's side.
Ringing,
/// Incoming call received, waiting for user to accept/reject.
IncomingCall,
}
impl serde::Serialize for CallState {
@@ -21,6 +27,9 @@ impl serde::Serialize for CallState {
CallState::Active => 2,
CallState::Reconnecting => 3,
CallState::Closed => 4,
CallState::Registered => 5,
CallState::Ringing => 6,
CallState::IncomingCall => 7,
};
serializer.serialize_u8(n)
}
@@ -49,14 +58,46 @@ pub struct CallStats {
pub frames_decoded: u64,
/// Number of playout underruns (buffer empty when audio needed).
pub underruns: u64,
/// Frames recovered by FEC.
/// Frames recovered by RaptorQ FEC (Codec2 tiers only; Opus bypasses
/// RaptorQ per Phase 2).
pub fec_recovered: u64,
/// Phase 3c: Opus frames reconstructed via DRED side-channel data.
/// Only increments on the Opus tiers; always zero for Codec2.
pub dred_reconstructions: u64,
/// Phase 3c: Opus frames filled via classical Opus PLC because no DRED
/// state covered the gap, plus any decode-error fallbacks. Codec2 loss
/// also increments this counter via the Codec2 PLC path.
pub classical_plc_invocations: u64,
/// Playout ring overflow count (reader was lapped by writer).
pub playout_overflows: u64,
/// Playout ring underrun count (reader found empty buffer).
pub playout_underruns: u64,
/// Capture ring overflow count.
pub capture_overflows: u64,
/// Current mic audio level (RMS of i16 samples, 0-32767).
pub audio_level: u32,
/// Our current outgoing codec name (e.g. "Opus24k", "Codec2_1200").
pub current_codec: String,
/// Last seen incoming codec from other participants.
pub peer_codec: String,
/// Whether auto quality mode is active.
pub auto_mode: bool,
/// Number of participants in the room (from last RoomUpdate).
pub room_participant_count: u32,
/// Participant list (fingerprint + optional alias) serialized as JSON array.
pub room_participants: Vec<RoomMember>,
/// SAS code for verbal verification (None if not in a call).
#[serde(skip_serializing_if = "Option::is_none")]
pub sas_code: Option<u32>,
/// Incoming call info (present when state == IncomingCall).
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_call_id: Option<String>,
/// Fingerprint of the caller (present when state == IncomingCall).
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_caller_fp: Option<String>,
/// Alias of the caller (present when state == IncomingCall).
#[serde(skip_serializing_if = "Option::is_none")]
pub incoming_caller_alias: Option<String>,
}
/// A room member entry, serialized into the stats JSON.
@@ -64,4 +105,5 @@ pub struct CallStats {
pub struct RoomMember {
pub fingerprint: String,
pub alias: Option<String>,
pub relay_label: Option<String>,
}

View File

@@ -23,13 +23,77 @@ serde_json = "1"
chrono = "0.4"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
cpal = { version = "0.15", optional = true }
coreaudio-rs = { version = "0.11", optional = true }
libc = "0.2"
# Phase 5.5 — LAN host-candidate ICE: enumerate local network
# interface addresses for inclusion in DirectCallOffer/Answer so
# peers on the same LAN can direct-connect without NAT hairpinning
# through the WAN reflex addr (which many consumer NATs, including
# MikroTik's default masquerade, don't support).
if-addrs = "0.13"
# coreaudio-rs is Apple-framework-only; gate it to macOS so enabling
# the `vpio` feature from a non-macOS target builds cleanly instead of
# pulling in a crate that can only link against Apple frameworks.
[target.'cfg(target_os = "macos")'.dependencies]
coreaudio-rs = { version = "0.11", optional = true }
# Windows-only: direct WASAPI bindings for the `windows-aec` feature.
# `windows` is Microsoft's official Rust COM bindings crate. We pull in
# only the audio + COM subfeatures we need — the crate is organized as
# a massive optional-feature tree, so enabling just these keeps compile
# times reasonable (~5s for these features vs ~60s for the full crate).
[target.'cfg(target_os = "windows")'.dependencies]
windows = { version = "0.58", optional = true, features = [
"Win32_Foundation",
"Win32_Media_Audio",
"Win32_Security",
"Win32_System_Com",
"Win32_System_Com_StructuredStorage",
"Win32_System_Threading",
"Win32_System_Variant",
] }
# Linux-only: WebRTC AEC (Audio Processing Module) bindings for the
# `linux-aec` feature. This is the 0.3.x line of the `tonarino/
# webrtc-audio-processing` crate, which links against Debian's
# `libwebrtc-audio-processing-dev` apt package (0.3-1+b1 on Bookworm).
#
# Note: we attempted the 2.x line with its `bundled` sub-feature first
# (which would give us AEC3 instead of AEC2), but both the crates.io
# tarball AND the upstream git `main` branch of webrtc-audio-processing-sys
# 2.0.3 hit a `meson setup --reconfigure` bug where the build.rs passes
# --reconfigure unconditionally even on first-run empty build dirs,
# causing the bundled build to fail with "Directory does not contain a
# valid build tree". The 0.x line doesn't use bundled mode and sidesteps
# this entirely by linking the apt-provided library. AEC2 is older than
# AEC3 but still the same algorithm family — this is what PulseAudio's
# module-echo-cancel and PipeWire's filter-chain use by default on
# current Debian-family distros.
[target.'cfg(target_os = "linux")'.dependencies]
webrtc-audio-processing = { version = "0.3", optional = true }
[features]
default = []
audio = ["cpal"]
vpio = ["coreaudio-rs"]
# vpio enables coreaudio-rs but that dep is itself gated to macOS above,
# so enabling this feature on Windows/Linux is a no-op (the audio_vpio
# module is also #[cfg(target_os = "macos")] in lib.rs).
vpio = ["dep:coreaudio-rs"]
# windows-aec enables a direct WASAPI capture backend that opens the
# microphone under AudioCategory_Communications, turning on Windows's
# OS-level communications audio processing (AEC + noise suppression +
# AGC). The `windows` dep is itself target-gated to Windows above, so
# enabling this feature on non-Windows targets is a no-op (the
# audio_wasapi module is also #[cfg(target_os = "windows")] in lib.rs).
windows-aec = ["dep:windows"]
# linux-aec enables a CPAL + WebRTC AEC3 capture/playback backend that
# runs the WebRTC Audio Processing Module (same algo as Chrome / Zoom /
# Teams) in-process, using the playback PCM as the reference signal for
# echo cancellation. The webrtc-audio-processing dep is target-gated to
# Linux above, so enabling this feature on non-Linux targets is a no-op
# (the audio_linux_aec module is also #[cfg(target_os = "linux")] in
# lib.rs).
linux-aec = ["dep:webrtc-audio-processing"]
[[bin]]
name = "wzp-client"

View File

@@ -0,0 +1,537 @@
//! Linux AEC backend: CPAL capture + playback wired through the WebRTC Audio
//! Processing Module (AEC3 + noise suppression + high-pass filter).
//!
//! This is the same algorithm used by Chrome WebRTC, Zoom, Teams, Jitsi, and
//! any other "serious" Linux VoIP app. It runs in-process — no dependency on
//! PulseAudio's module-echo-cancel or PipeWire's filter-chain, so it works
//! identically on ALSA / PulseAudio / PipeWire systems.
//!
//! ## Architecture
//!
//! A single module-level `Arc<Mutex<Processor>>` is shared between the
//! capture and playback paths. On each 20 ms frame (960 samples @ 48 kHz
//! mono):
//!
//! - **Playback path**: `LinuxAecPlayback::start` spawns the usual CPAL
//! output thread, but wraps each chunk in a call to
//! `Processor::process_render_frame` **before** handing it to CPAL. That
//! gives APM an authoritative reference of exactly what's going out to
//! the speakers (same approach Zoom/Teams/Jitsi use). The AEC then knows
//! what to cancel when it sees echo in the capture stream.
//!
//! - **Capture path**: `LinuxAecCapture::start` spawns the usual CPAL
//! input thread, and runs `Processor::process_capture_frame` on each
//! incoming mic chunk **in place** before pushing it into the ring
//! buffer. The AEC subtracts the echo using the render reference it
//! saw on the playback side.
//!
//! APM is strict about frame size: it requires exactly 10 ms = 480 samples
//! per call at 48 kHz. Our pipeline uses 20 ms = 960 samples, so each 20 ms
//! frame is split into two 480-sample halves, APM is called twice, and the
//! halves are stitched back together.
//!
//! APM only accepts f32 samples in `[-1.0, 1.0]`, so we convert i16 → f32
//! before the call and f32 → i16 after (with clamping on the return path).
//!
//! ## Stream delay
//!
//! AEC needs to know roughly how long it takes between a sample being passed
//! to `process_render_frame` and its echo showing up at `process_capture_frame`
//! — i.e. the round trip through CPAL playback → speaker → air → microphone
//! → CPAL capture. AEC3's internal estimator tracks this within a window
//! around whatever hint we give it. We hardcode 60 ms as a reasonable
//! starting point for typical Linux audio stacks; the delay estimator does
//! the fine-tuning automatically.
//!
//! ## Thread safety
//!
//! The 0.3.x line of `webrtc-audio-processing` takes `&mut self` on both
//! `process_capture_frame` and `process_render_frame`, so the `Processor`
//! needs a `Mutex` around it for cross-thread sharing. The capture and
//! playback threads each acquire the lock briefly (sub-millisecond per
//! 10 ms frame) so contention is minimal at our frame rates.
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{Arc, Mutex, OnceLock};
use anyhow::{anyhow, Context};
use cpal::traits::{DeviceTrait, HostTrait, StreamTrait};
use cpal::{SampleFormat, SampleRate, StreamConfig};
use tracing::{info, warn};
use webrtc_audio_processing::{
Config, EchoCancellation, EchoCancellationSuppressionLevel, InitializationConfig,
NoiseSuppression, NoiseSuppressionLevel, Processor, NUM_SAMPLES_PER_FRAME,
};
use crate::audio_ring::AudioRing;
/// 20 ms at 48 kHz, mono — matches the rest of the pipeline and the codec.
pub const FRAME_SAMPLES: usize = 960;
/// APM requires strict 10 ms frames at 48 kHz = 480 samples per call.
/// Imported from the webrtc-audio-processing crate so we can't drift out
/// of sync with whatever sample rate / frame length the C++ lib is using.
const APM_FRAME_SAMPLES: usize = NUM_SAMPLES_PER_FRAME as usize;
const APM_NUM_CHANNELS: usize = 1;
/// Round-trip delay hint passed to APM; the estimator refines from here.
/// 60 ms is a reasonable default for CPAL on ALSA / PulseAudio / PipeWire.
#[allow(dead_code)]
const STREAM_DELAY_MS: i32 = 60;
// ---------------------------------------------------------------------------
// Shared APM instance
// ---------------------------------------------------------------------------
/// Module-level lazily-initialized APM. Shared between capture and playback
/// so they operate on the same echo-cancellation state — the render frames
/// pushed by playback are what the capture path subtracts from the mic input.
/// Wrapped in a Mutex because the 0.3.x Processor takes `&mut self` on both
/// process_capture_frame and process_render_frame.
static PROCESSOR: OnceLock<Arc<Mutex<Processor>>> = OnceLock::new();
fn get_or_init_processor() -> anyhow::Result<Arc<Mutex<Processor>>> {
if let Some(p) = PROCESSOR.get() {
return Ok(p.clone());
}
let init_config = InitializationConfig {
num_capture_channels: APM_NUM_CHANNELS as i32,
num_render_channels: APM_NUM_CHANNELS as i32,
..Default::default()
};
let mut processor = Processor::new(&init_config)
.map_err(|e| anyhow!("webrtc APM init failed: {e:?}"))?;
let config = Config {
echo_cancellation: Some(EchoCancellation {
suppression_level: EchoCancellationSuppressionLevel::High,
stream_delay_ms: Some(STREAM_DELAY_MS),
enable_delay_agnostic: true,
enable_extended_filter: true,
}),
noise_suppression: Some(NoiseSuppression {
suppression_level: NoiseSuppressionLevel::High,
}),
enable_high_pass_filter: true,
// AGC left off for now — it can fight the Opus encoder's own gain
// staging and the adaptive-quality controller. Add later if users
// report low mic levels.
..Default::default()
};
processor.set_config(config);
let arc = Arc::new(Mutex::new(processor));
let _ = PROCESSOR.set(arc.clone());
info!(
stream_delay_ms = STREAM_DELAY_MS,
"webrtc APM initialized (AEC High + NS High + HPF, AGC off)"
);
Ok(arc)
}
// ---------------------------------------------------------------------------
// Helpers: i16 ↔ f32 and APM frame processing
// ---------------------------------------------------------------------------
#[inline]
fn i16_to_f32(s: i16) -> f32 {
s as f32 / 32768.0
}
#[inline]
fn f32_to_i16(s: f32) -> i16 {
(s.clamp(-1.0, 1.0) * 32767.0) as i16
}
/// Feed a 20 ms (960-sample) playback frame to APM as the render reference.
/// Splits into two 10 ms halves because APM is strict about frame size.
/// Takes the Mutex-wrapped Processor and locks briefly around each call.
fn push_render_frame_20ms(apm: &Mutex<Processor>, pcm: &[i16]) {
debug_assert_eq!(pcm.len(), FRAME_SAMPLES);
let mut buf = [0f32; APM_FRAME_SAMPLES];
for half in pcm.chunks_exact(APM_FRAME_SAMPLES) {
for (i, &s) in half.iter().enumerate() {
buf[i] = i16_to_f32(s);
}
match apm.lock() {
Ok(mut p) => {
if let Err(e) = p.process_render_frame(&mut buf) {
warn!("webrtc APM process_render_frame failed: {e:?}");
}
}
Err(_) => {
warn!("webrtc APM mutex poisoned in render path");
return;
}
}
}
}
/// Run a 20 ms (960-sample) capture frame through APM's echo cancellation
/// in place. Splits into two 10 ms halves, runs APM on each, stitches
/// results back into the caller's buffer. Briefly holds the Mutex once
/// per 10 ms half.
fn process_capture_frame_20ms(apm: &Mutex<Processor>, pcm: &mut [i16]) {
debug_assert_eq!(pcm.len(), FRAME_SAMPLES);
let mut buf = [0f32; APM_FRAME_SAMPLES];
for half in pcm.chunks_exact_mut(APM_FRAME_SAMPLES) {
for (i, &s) in half.iter().enumerate() {
buf[i] = i16_to_f32(s);
}
match apm.lock() {
Ok(mut p) => {
if let Err(e) = p.process_capture_frame(&mut buf) {
warn!("webrtc APM process_capture_frame failed: {e:?}");
}
}
Err(_) => {
warn!("webrtc APM mutex poisoned in capture path");
return;
}
}
for (i, d) in half.iter_mut().enumerate() {
*d = f32_to_i16(buf[i]);
}
}
}
// ---------------------------------------------------------------------------
// LinuxAecCapture — CPAL mic + WebRTC AEC capture-side processing
// ---------------------------------------------------------------------------
/// Microphone capture with WebRTC AEC3 applied in place before the codec
/// sees the samples. Mirrors the public API of `audio_io::AudioCapture` so
/// downstream code doesn't change.
pub struct LinuxAecCapture {
ring: Arc<AudioRing>,
running: Arc<AtomicBool>,
}
impl LinuxAecCapture {
pub fn start() -> Result<Self, anyhow::Error> {
// Eagerly init the APM so the playback side can find it already
// configured, and so init errors surface on the caller thread
// instead of silently failing inside the capture thread.
let apm = get_or_init_processor()?;
let ring = Arc::new(AudioRing::new());
let running = Arc::new(AtomicBool::new(true));
let (init_tx, init_rx) = std::sync::mpsc::sync_channel::<Result<(), String>>(1);
let ring_cb = ring.clone();
let running_clone = running.clone();
let apm_capture = apm.clone();
std::thread::Builder::new()
.name("wzp-audio-capture-linuxaec".into())
.spawn(move || {
let result = (|| -> Result<(), anyhow::Error> {
let host = cpal::default_host();
let device = host
.default_input_device()
.ok_or_else(|| anyhow!("no default input audio device found"))?;
info!(device = %device.name().unwrap_or_default(), "LinuxAEC: using input device");
let config = StreamConfig {
channels: 1,
sample_rate: SampleRate(48_000),
buffer_size: cpal::BufferSize::Default,
};
let use_f32 = !supports_i16_input(&device)?;
let err_cb = |e: cpal::StreamError| {
warn!("LinuxAEC input stream error: {e}");
};
// Leftover buffer for when CPAL gives us partial frames.
// We need exactly 960-sample chunks to feed APM.
let leftover = std::sync::Mutex::new(Vec::<i16>::with_capacity(FRAME_SAMPLES * 4));
let stream = if use_f32 {
let ring = ring_cb.clone();
let running = running_clone.clone();
let apm = apm_capture.clone();
device.build_input_stream(
&config,
move |data: &[f32], _: &cpal::InputCallbackInfo| {
if !running.load(Ordering::Relaxed) {
return;
}
let mut lv = leftover.lock().unwrap();
lv.reserve(data.len());
for &s in data {
lv.push(f32_to_i16(s));
}
drain_frames_through_apm(&mut lv, &apm, &ring);
},
err_cb,
None,
)?
} else {
let ring = ring_cb.clone();
let running = running_clone.clone();
let apm = apm_capture.clone();
device.build_input_stream(
&config,
move |data: &[i16], _: &cpal::InputCallbackInfo| {
if !running.load(Ordering::Relaxed) {
return;
}
let mut lv = leftover.lock().unwrap();
lv.extend_from_slice(data);
drain_frames_through_apm(&mut lv, &apm, &ring);
},
err_cb,
None,
)?
};
stream.play().context("failed to start LinuxAEC input stream")?;
let _ = init_tx.send(Ok(()));
info!("LinuxAEC capture started (AEC3 active)");
while running_clone.load(Ordering::Relaxed) {
std::thread::park_timeout(std::time::Duration::from_millis(200));
}
drop(stream);
Ok(())
})();
if let Err(e) = result {
let _ = init_tx.send(Err(e.to_string()));
}
})?;
init_rx
.recv()
.map_err(|_| anyhow!("LinuxAEC capture thread exited before signaling"))?
.map_err(|e| anyhow!("{e}"))?;
Ok(Self { ring, running })
}
pub fn ring(&self) -> &Arc<AudioRing> {
&self.ring
}
pub fn stop(&self) {
self.running.store(false, Ordering::Relaxed);
}
}
impl Drop for LinuxAecCapture {
fn drop(&mut self) {
self.stop();
}
}
/// Pull whole 960-sample frames out of the leftover buffer, run them through
/// APM's capture-side processing, and push to the ring. Leaves any partial
/// sub-960 remainder in `leftover` for the next callback.
fn drain_frames_through_apm(leftover: &mut Vec<i16>, apm: &Mutex<Processor>, ring: &AudioRing) {
let mut frame = [0i16; FRAME_SAMPLES];
while leftover.len() >= FRAME_SAMPLES {
frame.copy_from_slice(&leftover[..FRAME_SAMPLES]);
process_capture_frame_20ms(apm, &mut frame);
ring.write(&frame);
leftover.drain(..FRAME_SAMPLES);
}
}
// ---------------------------------------------------------------------------
// LinuxAecPlayback — CPAL speaker output + WebRTC AEC render-side tee
// ---------------------------------------------------------------------------
/// Speaker playback with a render-side tee: each frame written to CPAL is
/// ALSO fed to APM via `process_render_frame` as the echo-cancellation
/// reference signal. This is the "tee the playback ring" approach (Zoom,
/// Teams, Jitsi) — deterministic, does not depend on PulseAudio loopback or
/// PipeWire monitor sources.
pub struct LinuxAecPlayback {
ring: Arc<AudioRing>,
running: Arc<AtomicBool>,
}
impl LinuxAecPlayback {
pub fn start() -> Result<Self, anyhow::Error> {
let apm = get_or_init_processor()?;
let ring = Arc::new(AudioRing::new());
let running = Arc::new(AtomicBool::new(true));
let (init_tx, init_rx) = std::sync::mpsc::sync_channel::<Result<(), String>>(1);
let ring_cb = ring.clone();
let running_clone = running.clone();
let apm_render = apm.clone();
std::thread::Builder::new()
.name("wzp-audio-playback-linuxaec".into())
.spawn(move || {
let result = (|| -> Result<(), anyhow::Error> {
let host = cpal::default_host();
let device = host
.default_output_device()
.ok_or_else(|| anyhow!("no default output audio device found"))?;
info!(device = %device.name().unwrap_or_default(), "LinuxAEC: using output device");
let config = StreamConfig {
channels: 1,
sample_rate: SampleRate(48_000),
buffer_size: cpal::BufferSize::Default,
};
let use_f32 = !supports_i16_output(&device)?;
let err_cb = |e: cpal::StreamError| {
warn!("LinuxAEC output stream error: {e}");
};
// Same 960-sample batching approach as the capture side:
// CPAL may ask for N samples in a callback where N doesn't
// divide 960. We accumulate partial frames in a Vec and
// feed APM as soon as we have a whole 20 ms frame.
let carry = std::sync::Mutex::new(Vec::<i16>::with_capacity(FRAME_SAMPLES * 4));
let stream = if use_f32 {
let ring = ring_cb.clone();
let apm = apm_render.clone();
device.build_output_stream(
&config,
move |data: &mut [f32], _: &cpal::OutputCallbackInfo| {
fill_output_and_tee_f32(data, &ring, &apm, &carry);
},
err_cb,
None,
)?
} else {
let ring = ring_cb.clone();
let apm = apm_render.clone();
device.build_output_stream(
&config,
move |data: &mut [i16], _: &cpal::OutputCallbackInfo| {
fill_output_and_tee_i16(data, &ring, &apm, &carry);
},
err_cb,
None,
)?
};
stream.play().context("failed to start LinuxAEC output stream")?;
let _ = init_tx.send(Ok(()));
info!("LinuxAEC playback started (render tee active)");
while running_clone.load(Ordering::Relaxed) {
std::thread::park_timeout(std::time::Duration::from_millis(200));
}
drop(stream);
Ok(())
})();
if let Err(e) = result {
let _ = init_tx.send(Err(e.to_string()));
}
})?;
init_rx
.recv()
.map_err(|_| anyhow!("LinuxAEC playback thread exited before signaling"))?
.map_err(|e| anyhow!("{e}"))?;
Ok(Self { ring, running })
}
pub fn ring(&self) -> &Arc<AudioRing> {
&self.ring
}
pub fn stop(&self) {
self.running.store(false, Ordering::Relaxed);
}
}
impl Drop for LinuxAecPlayback {
fn drop(&mut self) {
self.stop();
}
}
fn fill_output_and_tee_i16(
data: &mut [i16],
ring: &AudioRing,
apm: &Mutex<Processor>,
carry: &std::sync::Mutex<Vec<i16>>,
) {
let read = ring.read(data);
for s in &mut data[read..] {
*s = 0;
}
tee_render_samples(data, apm, carry);
}
fn fill_output_and_tee_f32(
data: &mut [f32],
ring: &AudioRing,
apm: &Mutex<Processor>,
carry: &std::sync::Mutex<Vec<i16>>,
) {
let mut tmp = vec![0i16; data.len()];
let read = ring.read(&mut tmp);
for s in &mut tmp[read..] {
*s = 0;
}
for (d, &s) in data.iter_mut().zip(tmp.iter()) {
*d = i16_to_f32(s);
}
tee_render_samples(&tmp, apm, carry);
}
/// Push CPAL-bound samples into APM's render-side input for echo cancellation.
/// Uses a carry buffer to batch into exact 960-sample (20 ms) frames.
fn tee_render_samples(samples: &[i16], apm: &Mutex<Processor>, carry: &std::sync::Mutex<Vec<i16>>) {
let mut lv = carry.lock().unwrap();
lv.extend_from_slice(samples);
while lv.len() >= FRAME_SAMPLES {
let mut frame = [0i16; FRAME_SAMPLES];
frame.copy_from_slice(&lv[..FRAME_SAMPLES]);
push_render_frame_20ms(apm, &frame);
lv.drain(..FRAME_SAMPLES);
}
}
// ---------------------------------------------------------------------------
// CPAL format helpers (duplicated from audio_io.rs to keep the modules
// independent — each backend file is a self-contained unit)
// ---------------------------------------------------------------------------
fn supports_i16_input(device: &cpal::Device) -> Result<bool, anyhow::Error> {
let supported = device
.supported_input_configs()
.context("failed to query input configs")?;
for cfg in supported {
if cfg.sample_format() == SampleFormat::I16
&& cfg.min_sample_rate() <= SampleRate(48_000)
&& cfg.max_sample_rate() >= SampleRate(48_000)
&& cfg.channels() >= 1
{
return Ok(true);
}
}
Ok(false)
}
fn supports_i16_output(device: &cpal::Device) -> Result<bool, anyhow::Error> {
let supported = device
.supported_output_configs()
.context("failed to query output configs")?;
for cfg in supported {
if cfg.sample_format() == SampleFormat::I16
&& cfg.min_sample_rate() <= SampleRate(48_000)
&& cfg.max_sample_rate() >= SampleRate(48_000)
&& cfg.channels() >= 1
{
return Ok(true);
}
}
Ok(false)
}

View File

@@ -0,0 +1,332 @@
//! Direct WASAPI microphone capture with Windows's OS-level AEC enabled.
//!
//! Bypasses CPAL and opens the default capture endpoint directly via
//! `IMMDeviceEnumerator` + `IAudioClient2::SetClientProperties`, setting
//! `AudioClientProperties.eCategory = AudioCategory_Communications`. That's
//! the switch that tells Windows "this is a VoIP call" — the OS then
//! enables its communications audio processing chain (AEC, noise
//! suppression, automatic gain control) for the stream. AEC operates at
//! the OS level using the currently-playing audio as the reference
//! signal, so it cancels echo from our CPAL playback (and any other app's
//! audio) without us having to plumb a reference signal ourselves.
//!
//! Platform: Windows only, compiled only when the `windows-aec` feature
//! is enabled. Mirrors the public API of `audio_io::AudioCapture` so
//! `wzp-client`'s lib.rs can transparently re-export either one as
//! `AudioCapture`.
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use anyhow::{anyhow, Context};
use tracing::{info, warn};
use windows::core::{Interface, GUID};
use windows::Win32::Foundation::{CloseHandle, BOOL, WAIT_OBJECT_0};
use windows::Win32::Media::Audio::{
eCapture, eCommunications, AudioCategory_Communications, AudioClientProperties,
IAudioCaptureClient, IAudioClient, IAudioClient2, IMMDeviceEnumerator, MMDeviceEnumerator,
AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM,
AUDCLNT_STREAMFLAGS_EVENTCALLBACK, AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY, WAVEFORMATEX,
WAVE_FORMAT_PCM,
};
use windows::Win32::System::Com::{
CoCreateInstance, CoInitializeEx, CoUninitialize, CLSCTX_ALL, COINIT_MULTITHREADED,
};
use windows::Win32::System::Threading::{CreateEventW, WaitForSingleObject, INFINITE};
use crate::audio_ring::AudioRing;
/// 20 ms at 48 kHz, mono. Matches the rest of the audio pipeline.
pub const FRAME_SAMPLES: usize = 960;
/// Microphone capture via WASAPI with Windows's communications AEC enabled.
///
/// The WASAPI capture stream runs on a dedicated OS thread. This handle is
/// `Send + Sync`. Dropping it stops the stream and joins the thread.
pub struct WasapiAudioCapture {
ring: Arc<AudioRing>,
running: Arc<AtomicBool>,
thread: Option<std::thread::JoinHandle<()>>,
}
impl WasapiAudioCapture {
/// Open the default communications microphone, enable OS AEC, and start
/// streaming PCM into a lock-free ring buffer.
///
/// Returns only after the capture thread has successfully initialized
/// the stream, or propagates the error back to the caller.
pub fn start() -> Result<Self, anyhow::Error> {
let ring = Arc::new(AudioRing::new());
let running = Arc::new(AtomicBool::new(true));
let (init_tx, init_rx) = std::sync::mpsc::sync_channel::<Result<(), String>>(1);
let ring_cb = ring.clone();
let running_cb = running.clone();
let thread = std::thread::Builder::new()
.name("wzp-audio-capture-wasapi".into())
.spawn(move || {
let result = unsafe { capture_thread_main(ring_cb, running_cb.clone(), &init_tx) };
if let Err(e) = result {
warn!("wasapi capture thread exited with error: {e}");
// If we failed before signaling init, signal now so the
// caller unblocks. Double-send is harmless (channel is
// bounded to 1 and we only hit the second send path on
// late errors).
let _ = init_tx.send(Err(e.to_string()));
}
})
.context("failed to spawn WASAPI capture thread")?;
init_rx
.recv()
.map_err(|_| anyhow!("WASAPI capture thread exited before signaling init"))?
.map_err(|e| anyhow!("{e}"))?;
Ok(Self {
ring,
running,
thread: Some(thread),
})
}
/// Get a reference to the capture ring buffer for direct polling.
pub fn ring(&self) -> &Arc<AudioRing> {
&self.ring
}
/// Stop capturing.
pub fn stop(&self) {
self.running.store(false, Ordering::Relaxed);
}
}
impl Drop for WasapiAudioCapture {
fn drop(&mut self) {
self.stop();
if let Some(handle) = self.thread.take() {
// Join best-effort. The thread loop polls `running` every 200ms
// via a short WaitForSingleObject timeout, so it should exit
// within ~200ms of `stop()`.
let _ = handle.join();
}
}
}
// ---------------------------------------------------------------------------
// WASAPI thread entry point — everything below this line runs on the
// dedicated wzp-audio-capture-wasapi thread.
// ---------------------------------------------------------------------------
unsafe fn capture_thread_main(
ring: Arc<AudioRing>,
running: Arc<AtomicBool>,
init_tx: &std::sync::mpsc::SyncSender<Result<(), String>>,
) -> Result<(), anyhow::Error> {
// COM init for the capture thread. MULTITHREADED because we're not
// running a message pump. Must be balanced by CoUninitialize on exit.
CoInitializeEx(None, COINIT_MULTITHREADED)
.ok()
.context("CoInitializeEx failed")?;
// Use a guard struct so CoUninitialize runs even on early returns.
struct ComGuard;
impl Drop for ComGuard {
fn drop(&mut self) {
unsafe { CoUninitialize() };
}
}
let _com_guard = ComGuard;
let enumerator: IMMDeviceEnumerator =
CoCreateInstance(&MMDeviceEnumerator, None, CLSCTX_ALL)
.context("CoCreateInstance(MMDeviceEnumerator) failed")?;
// eCommunications role (not eConsole) — this picks the device the user
// has designated for communications in Sound Settings. It's the one
// Windows's AEC is actually tuned for and the one Teams/Zoom use.
let device = enumerator
.GetDefaultAudioEndpoint(eCapture, eCommunications)
.context("GetDefaultAudioEndpoint(eCapture, eCommunications) failed")?;
if let Ok(name) = device_name(&device) {
info!(device = %name, "opening WASAPI communications capture endpoint");
}
let audio_client: IAudioClient = device
.Activate(CLSCTX_ALL, None)
.context("IMMDevice::Activate(IAudioClient) failed")?;
// IAudioClient2 exposes SetClientProperties, which is the ONLY way to
// set AudioCategory_Communications pre-Initialize. Calling it on the
// base IAudioClient would not compile, and setting it after Initialize
// is a no-op.
let audio_client2: IAudioClient2 = audio_client
.cast()
.context("QueryInterface IAudioClient2 failed")?;
let mut props = AudioClientProperties {
cbSize: std::mem::size_of::<AudioClientProperties>() as u32,
bIsOffload: BOOL(0),
eCategory: AudioCategory_Communications,
// 0 = AUDCLNT_STREAMOPTIONS_NONE. The `windows` crate doesn't
// export the enum constant in all versions, so use 0 directly.
Options: Default::default(),
};
audio_client2
.SetClientProperties(&mut props as *mut _)
.context("SetClientProperties(AudioCategory_Communications) failed")?;
// Request 48 kHz mono i16 directly. AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM
// tells Windows to do any needed format conversion inside the audio
// engine rather than rejecting our format. SRC_DEFAULT_QUALITY picks
// the standard Windows resampler quality (fine for voice).
let wave_format = WAVEFORMATEX {
wFormatTag: WAVE_FORMAT_PCM as u16,
nChannels: 1,
nSamplesPerSec: 48_000,
nAvgBytesPerSec: 48_000 * 2, // 1 ch * 2 bytes/sample * 48000 Hz
nBlockAlign: 2, // 1 ch * 2 bytes/sample
wBitsPerSample: 16,
cbSize: 0,
};
// 1,000,000 hns = 100 ms buffer (hns = 100-nanosecond units). Windows
// treats this as the minimum; the engine may give us a larger one.
const BUFFER_DURATION_HNS: i64 = 1_000_000;
audio_client
.Initialize(
AUDCLNT_SHAREMODE_SHARED,
AUDCLNT_STREAMFLAGS_EVENTCALLBACK
| AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM
| AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY,
BUFFER_DURATION_HNS,
0,
&wave_format,
Some(&GUID::zeroed()),
)
.context("IAudioClient::Initialize failed — Windows rejected communications-mode 48k mono i16")?;
// Event-driven capture: Windows signals this handle each time a new
// audio packet is available. We wait on it from the loop below.
let event = CreateEventW(None, false, false, None)
.context("CreateEventW failed")?;
audio_client
.SetEventHandle(event)
.context("SetEventHandle failed")?;
let capture_client: IAudioCaptureClient = audio_client
.GetService()
.context("IAudioClient::GetService(IAudioCaptureClient) failed")?;
audio_client.Start().context("IAudioClient::Start failed")?;
// Signal to the parent thread that init succeeded before entering the
// hot loop. From this point on, errors get logged but don't propagate
// back to the caller (they'd just cause the ring buffer to stop
// filling, which the main thread detects as underruns).
let _ = init_tx.send(Ok(()));
info!("WASAPI communications-mode capture started with OS AEC enabled");
let mut logged_first_packet = false;
// Main capture loop. Exit when `running` goes false (from Drop or an
// explicit stop() call).
while running.load(Ordering::Relaxed) {
// 200 ms timeout so we check `running` regularly even if the audio
// engine stops delivering packets (e.g. device unplugged).
let wait = WaitForSingleObject(event, 200);
if wait.0 != WAIT_OBJECT_0.0 {
// Timeout or failure — just loop and re-check running.
continue;
}
// Drain all available packets. Windows may have queued more than
// one since we were last scheduled.
loop {
let packet_length = match capture_client.GetNextPacketSize() {
Ok(n) => n,
Err(e) => {
warn!("GetNextPacketSize failed: {e}");
break;
}
};
if packet_length == 0 {
break;
}
let mut buffer_ptr: *mut u8 = std::ptr::null_mut();
let mut num_frames: u32 = 0;
let mut flags: u32 = 0;
let mut device_position: u64 = 0;
let mut qpc_position: u64 = 0;
if let Err(e) = capture_client.GetBuffer(
&mut buffer_ptr,
&mut num_frames,
&mut flags,
Some(&mut device_position),
Some(&mut qpc_position),
) {
warn!("GetBuffer failed: {e}");
break;
}
if num_frames > 0 && !buffer_ptr.is_null() {
if !logged_first_packet {
info!(
frames = num_frames,
flags, "WASAPI capture: first packet received"
);
logged_first_packet = true;
}
// Because we asked for 48 kHz mono i16, each frame is
// exactly one i16. Windows's AUTOCONVERTPCM handles the
// conversion from whatever the engine mix format is.
let samples = std::slice::from_raw_parts(
buffer_ptr as *const i16,
num_frames as usize,
);
ring.write(samples);
}
if let Err(e) = capture_client.ReleaseBuffer(num_frames) {
warn!("ReleaseBuffer failed: {e}");
break;
}
}
}
info!("WASAPI capture thread stopping");
let _ = audio_client.Stop();
let _ = CloseHandle(event);
// _com_guard drops here, calling CoUninitialize.
// Silence INFINITE unused-import warning — it's referenced by the
// `windows` crate's WaitForSingleObject alternative but we use the
// 200 ms timeout variant instead. Explicit suppression for clarity.
let _ = INFINITE;
Ok(())
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
/// Best-effort device ID string for logging. Grabbing the friendly name via
/// PKEY_Device_FriendlyName requires IPropertyStore + PROPVARIANT plumbing
/// that's far more ceremony than a log line justifies; the ID is already
/// sufficient to confirm we opened the right endpoint.
///
/// Rust 2024 edition's `unsafe_op_in_unsafe_fn` lint requires explicit
/// `unsafe { ... }` blocks inside `unsafe fn` bodies for each unsafe call,
/// even though the whole function is already marked unsafe.
unsafe fn device_name(
device: &windows::Win32::Media::Audio::IMMDevice,
) -> Result<String, anyhow::Error> {
let id = unsafe { device.GetId() }.context("IMMDevice::GetId failed")?;
Ok(unsafe { id.to_string() }.unwrap_or_else(|_| "<non-utf16>".to_string()))
}

View File

@@ -7,14 +7,15 @@ use std::time::{Duration, Instant};
use bytes::Bytes;
use tracing::{debug, info, warn};
use wzp_codec::{AutoGainControl, ComfortNoise, EchoCanceller, NoiseSupressor, SilenceDetector};
use wzp_codec::dred_ffi::{DredDecoderHandle, DredState};
use wzp_codec::{
AdaptiveDecoder, AutoGainControl, ComfortNoise, EchoCanceller, NoiseSupressor, SilenceDetector,
};
use wzp_fec::{RaptorQFecDecoder, RaptorQFecEncoder};
use wzp_proto::jitter::{JitterBuffer, PlayoutResult};
use wzp_proto::packet::{MediaHeader, MediaPacket, MiniFrameContext};
use wzp_proto::quality::AdaptiveQualityController;
use wzp_proto::traits::{
AudioDecoder, AudioEncoder, FecDecoder, FecEncoder,
};
use wzp_proto::traits::{AudioDecoder, AudioEncoder, FecDecoder, FecEncoder};
use wzp_proto::packet::QualityReport;
use wzp_proto::{CodecId, QualityProfile};
@@ -344,6 +345,22 @@ impl CallEncoder {
let enc_len = self.audio_enc.encode(pcm, &mut encoded)?;
encoded.truncate(enc_len);
// Phase 2: Opus tiers bypass RaptorQ entirely (DRED handles loss
// recovery at the codec layer). Codec2 tiers keep RaptorQ unchanged.
// On Opus packets, zero the FEC header fields so old receivers
// can cleanly identify "no RaptorQ block to assemble" and new
// receivers can short-circuit their FEC ingest path.
let is_opus = self.profile.codec.is_opus();
let (fec_block, fec_symbol, fec_ratio_encoded) = if is_opus {
(0u8, 0u8, 0u8)
} else {
(
self.block_id,
self.frame_in_block,
MediaHeader::encode_fec_ratio(self.profile.fec_ratio),
)
};
// Build source media packet
let source_pkt = MediaPacket {
header: MediaHeader {
@@ -351,11 +368,11 @@ impl CallEncoder {
is_repair: false,
codec_id: self.profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(self.profile.fec_ratio),
fec_ratio_encoded,
seq: self.seq,
timestamp: self.timestamp_ms,
fec_block: self.block_id,
fec_symbol: self.frame_in_block,
fec_block,
fec_symbol,
reserved: 0,
csrc_count: 0,
},
@@ -370,39 +387,42 @@ impl CallEncoder {
let mut output = vec![source_pkt];
// Add to FEC encoder
self.fec_enc.add_source_symbol(&encoded)?;
self.frame_in_block += 1;
// Codec2-only: feed RaptorQ and generate repair packets when the
// block is full. Opus tiers skip this entire block — DRED (active
// in Phase 1) provides codec-layer loss recovery.
if !is_opus {
self.fec_enc.add_source_symbol(&encoded)?;
self.frame_in_block += 1;
// If block is full, generate repair and finalize
if self.frame_in_block >= self.profile.frames_per_block {
if let Ok(repairs) = self.fec_enc.generate_repair(self.profile.fec_ratio) {
for (sym_idx, repair_data) in repairs {
output.push(MediaPacket {
header: MediaHeader {
version: 0,
is_repair: true,
codec_id: self.profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(
self.profile.fec_ratio,
),
seq: self.seq,
timestamp: self.timestamp_ms,
fec_block: self.block_id,
fec_symbol: sym_idx,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(repair_data),
quality_report: None,
});
self.seq = self.seq.wrapping_add(1);
if self.frame_in_block >= self.profile.frames_per_block {
if let Ok(repairs) = self.fec_enc.generate_repair(self.profile.fec_ratio) {
for (sym_idx, repair_data) in repairs {
output.push(MediaPacket {
header: MediaHeader {
version: 0,
is_repair: true,
codec_id: self.profile.codec,
has_quality_report: false,
fec_ratio_encoded: MediaHeader::encode_fec_ratio(
self.profile.fec_ratio,
),
seq: self.seq,
timestamp: self.timestamp_ms,
fec_block: self.block_id,
fec_symbol: sym_idx,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(repair_data),
quality_report: None,
});
self.seq = self.seq.wrapping_add(1);
}
}
let _ = self.fec_enc.finalize_block();
self.block_id = self.block_id.wrapping_add(1);
self.frame_in_block = 0;
}
let _ = self.fec_enc.finalize_block();
self.block_id = self.block_id.wrapping_add(1);
self.frame_in_block = 0;
}
Ok(output)
@@ -425,6 +445,15 @@ impl CallEncoder {
self.aec.feed_farend(farend);
}
/// Apply DRED tuning output to the encoder.
///
/// Called by the send loop after `DredTuner::update()` returns `Some`.
/// No-op when the active codec is Codec2 (DRED is Opus-only).
pub fn apply_dred_tuning(&mut self, tuning: wzp_proto::DredTuning) {
self.audio_enc.set_dred_duration(tuning.dred_frames);
self.audio_enc.set_expected_loss(tuning.expected_loss_pct);
}
/// Enable or disable acoustic echo cancellation.
pub fn set_aec_enabled(&mut self, enabled: bool) {
self.aec.set_enabled(enabled);
@@ -438,9 +467,12 @@ impl CallEncoder {
/// Manages the recv/decode side of a call.
pub struct CallDecoder {
/// Audio decoder.
audio_dec: Box<dyn AudioDecoder>,
/// FEC decoder.
/// Audio decoder. Concrete `AdaptiveDecoder` (not `Box<dyn AudioDecoder>`)
/// because Phase 3b calls the inherent `reconstruct_from_dred` method,
/// which cannot live on the `AudioDecoder` trait without dragging libopus
/// types into `wzp-proto`.
audio_dec: AdaptiveDecoder,
/// FEC decoder (Codec2 tiers only; Opus bypasses RaptorQ per Phase 2).
fec_dec: RaptorQFecDecoder,
/// Jitter buffer.
jitter: JitterBuffer,
@@ -454,6 +486,24 @@ pub struct CallDecoder {
last_was_cn: bool,
/// Mini-frame decompression context (tracks last full header baseline).
mini_context: MiniFrameContext,
// ─── Phase 3b: DRED reconstruction state ──────────────────────────────
/// DRED side-channel parser (a separate libopus object from the decoder).
dred_decoder: DredDecoderHandle,
/// Scratch buffer used by `dred_decoder.parse_into` on every arriving
/// Opus packet. Reused across calls to avoid 10 KB alloc churn per packet.
dred_parse_scratch: DredState,
/// Cached "most recently parsed valid" DRED state, swapped with
/// `dred_parse_scratch` on successful parse. Used by `decode_next` when
/// the jitter buffer reports a gap.
last_good_dred: DredState,
/// Sequence number of the packet that produced `last_good_dred`. `None`
/// if no packet has yielded DRED state yet (cold start or legacy sender).
last_good_dred_seq: Option<u16>,
/// Phase 4 telemetry counter: gaps recovered via DRED reconstruction.
pub dred_reconstructions: u64,
/// Phase 4 telemetry counter: gaps filled via classical Opus PLC
/// (because no DRED state covered the gap, or the active codec is Codec2).
pub classical_plc_invocations: u64,
}
impl CallDecoder {
@@ -463,8 +513,19 @@ impl CallDecoder {
} else {
JitterBuffer::new(config.jitter_target, config.jitter_max, config.jitter_min)
};
// Phase 3b: build the DRED parser + state buffers. These allocate
// libopus state (~10 KB each) once per call, not per packet — the
// scratch and last-good buffers are reused via std::mem::swap on
// every successful parse.
let dred_decoder =
DredDecoderHandle::new().expect("opus_dred_decoder_create failed at call setup");
let dred_parse_scratch =
DredState::new().expect("opus_dred_alloc failed at call setup (scratch)");
let last_good_dred =
DredState::new().expect("opus_dred_alloc failed at call setup (good state)");
Self {
audio_dec: wzp_codec::create_decoder(config.profile),
audio_dec: AdaptiveDecoder::new(config.profile)
.expect("failed to create adaptive decoder"),
fec_dec: wzp_fec::create_decoder(&config.profile),
jitter,
quality: AdaptiveQualityController::new(),
@@ -472,6 +533,12 @@ impl CallDecoder {
comfort_noise: ComfortNoise::new(50),
last_was_cn: false,
mini_context: MiniFrameContext::default(),
dred_decoder,
dred_parse_scratch,
last_good_dred,
last_good_dred_seq: None,
dred_reconstructions: 0,
classical_plc_invocations: 0,
}
}
@@ -486,20 +553,105 @@ impl CallDecoder {
/// Feed a received media packet into the decode pipeline.
pub fn ingest(&mut self, packet: MediaPacket) {
// Feed to FEC decoder
let _ = self.fec_dec.add_symbol(
packet.header.fec_block,
packet.header.fec_symbol,
packet.header.is_repair,
&packet.payload,
);
// Phase 2: Opus packets bypass RaptorQ. Codec2 packets still feed
// the FEC decoder for recovery. This also cleanly drops any stray
// Opus repair packets from an old sender (we don't push repair
// packets to the jitter buffer either, so they're effectively
// ignored — a graceful mixed-version degradation).
if !packet.header.codec_id.is_opus() {
let _ = self.fec_dec.add_symbol(
packet.header.fec_block,
packet.header.fec_symbol,
packet.header.is_repair,
&packet.payload,
);
}
// If not a repair packet, also feed directly to jitter buffer
// Phase 3b: Opus source packets carry DRED side-channel data in
// libopus 1.5. Parse it into the scratch state and, on success,
// swap with the cached `last_good_dred` so later gap reconstruction
// has fresh neural redundancy to draw from. Parsing happens before
// the jitter push because the jitter buffer consumes the packet.
if packet.header.codec_id.is_opus() && !packet.header.is_repair {
match self
.dred_decoder
.parse_into(&mut self.dred_parse_scratch, &packet.payload)
{
Ok(available) if available > 0 => {
// Swap the freshly parsed state into `last_good_dred`.
// The old good state (now in scratch) is about to be
// overwritten on the next parse — its contents are
// not needed after this swap.
std::mem::swap(&mut self.dred_parse_scratch, &mut self.last_good_dred);
self.last_good_dred_seq = Some(packet.header.seq);
}
Ok(_) => {
// Packet had no DRED data (return 0). Leave the cached
// state untouched — it may still cover upcoming gaps
// from a warm-up period where the encoder was producing
// DRED bytes. The scratch buffer was potentially written
// but its `samples_available` is 0 so it's harmless.
}
Err(e) => {
debug!("DRED parse error (ignored): {e}");
}
}
}
// Source packets (Opus or Codec2) go to the jitter buffer for decode.
// Repair packets never reach the jitter buffer; for Codec2 they're
// used by the FEC decoder above, for Opus they're dropped here.
if !packet.header.is_repair {
self.jitter.push(packet);
}
}
/// Switch the decoder to match an incoming packet's codec if it differs
/// from the current profile. This enables cross-codec interop (e.g. one
/// client sends Opus, the other sends Codec2).
fn switch_decoder_if_needed(&mut self, incoming_codec: CodecId) {
if incoming_codec == self.profile.codec || incoming_codec == CodecId::ComfortNoise {
return;
}
let new_profile = Self::profile_for_codec(incoming_codec);
info!(
from = ?self.profile.codec,
to = ?incoming_codec,
"decoder switching codec to match incoming packet"
);
if let Err(e) = self.audio_dec.set_profile(new_profile) {
warn!("failed to switch decoder profile: {e}");
return;
}
self.fec_dec = wzp_fec::create_decoder(&new_profile);
self.profile = new_profile;
}
/// Map a `CodecId` to a reasonable `QualityProfile` for decoding.
fn profile_for_codec(codec: CodecId) -> QualityProfile {
match codec {
CodecId::Opus24k => QualityProfile::GOOD,
CodecId::Opus16k => QualityProfile {
codec: CodecId::Opus16k,
fec_ratio: 0.3,
frame_duration_ms: 20,
frames_per_block: 5,
},
CodecId::Opus6k => QualityProfile::DEGRADED,
CodecId::Opus32k => QualityProfile::STUDIO_32K,
CodecId::Opus48k => QualityProfile::STUDIO_48K,
CodecId::Opus64k => QualityProfile::STUDIO_64K,
CodecId::Codec2_3200 => QualityProfile {
codec: CodecId::Codec2_3200,
fec_ratio: 0.5,
frame_duration_ms: 20,
frames_per_block: 5,
},
CodecId::Codec2_1200 => QualityProfile::CATASTROPHIC,
CodecId::ComfortNoise => QualityProfile::GOOD,
}
}
/// Decode the next audio frame from the jitter buffer.
///
/// Returns PCM samples (48kHz mono) or None if not ready.
@@ -514,6 +666,9 @@ impl CallDecoder {
return Some(pcm.len());
}
// Auto-switch decoder if incoming codec differs from current.
self.switch_decoder_if_needed(pkt.header.codec_id);
self.last_was_cn = false;
let result = match self.audio_dec.decode(&pkt.payload, pcm) {
Ok(n) => Some(n),
@@ -528,19 +683,72 @@ impl CallDecoder {
result
}
PlayoutResult::Missing { seq } => {
// Only generate PLC if there are still packets buffered ahead.
// Only attempt recovery if there are still packets buffered ahead.
// Otherwise we've drained everything — return None to stop.
if self.jitter.depth() > 0 {
debug!(seq, "packet loss, generating PLC");
let result = self.audio_dec.decode_lost(pcm).ok();
if result.is_some() {
self.jitter.record_decode();
}
result
} else {
if self.jitter.depth() == 0 {
self.jitter.record_underrun();
None
return None;
}
// Phase 3b: try DRED reconstruction first. If we have a
// recent DRED state from a packet whose seq > missing seq,
// and the seq delta (in samples) fits within the state's
// available window, libopus can synthesize a plausible
// replacement for the lost frame. Fall back to classical
// PLC when no state covers the gap, when the active codec
// is Codec2, or when the reconstruction itself errors.
if self.profile.codec.is_opus() {
if let Some(last_seq) = self.last_good_dred_seq {
// How many frames ahead of the missing seq is the
// last-good packet? Use wrapping arithmetic for the
// u16 seq space.
let seq_delta = last_seq.wrapping_sub(seq);
// Reject stale or backward state. u16 wraparound
// would make a "seq went backward" delta very large;
// cap at a sane forward-looking window.
const MAX_SEQ_DELTA: u16 = 128;
if seq_delta > 0 && seq_delta <= MAX_SEQ_DELTA {
let frame_samples =
(48_000 * self.profile.frame_duration_ms as i32) / 1000;
let offset_samples = seq_delta as i32 * frame_samples;
let available = self.last_good_dred.samples_available();
if offset_samples > 0 && offset_samples <= available {
match self.audio_dec.reconstruct_from_dred(
&self.last_good_dred,
offset_samples,
pcm,
) {
Ok(n) => {
self.dred_reconstructions += 1;
self.jitter.record_decode();
debug!(
seq,
last_seq,
offset_samples,
available,
"DRED reconstruction for gap"
);
return Some(n);
}
Err(e) => {
// Reconstruction failed — fall
// through to classical PLC below.
debug!(seq, "DRED reconstruct error: {e}");
}
}
}
}
}
}
// Classical PLC fallback (also the Codec2 path).
debug!(seq, "packet loss, generating classical PLC");
self.classical_plc_invocations += 1;
let result = self.audio_dec.decode_lost(pcm).ok();
if result.is_some() {
self.jitter.record_decode();
}
result
}
PlayoutResult::NotReady => {
self.jitter.record_underrun();
@@ -563,6 +771,19 @@ impl CallDecoder {
pub fn reset_stats(&mut self) {
self.jitter.reset_stats();
}
/// Phase 3b introspection: sequence number of the most recently parsed
/// valid DRED state, or `None` if no Opus packet has yielded DRED data
/// yet. Used by tests to debug reconstruction eligibility.
pub fn last_good_dred_seq(&self) -> Option<u16> {
self.last_good_dred_seq
}
/// Phase 3b introspection: samples of audio history currently available
/// in the cached DRED state.
pub fn last_good_dred_samples_available(&self) -> i32 {
self.last_good_dred.samples_available()
}
}
/// Periodic telemetry logger for jitter buffer statistics.
@@ -624,18 +845,83 @@ mod tests {
assert!(!packets[0].header.is_repair);
}
/// Phase 2: Opus packets have zero FEC header fields — no block, no
/// symbol index, no repair ratio. The RaptorQ layer is bypassed
/// entirely on the Opus tiers.
#[test]
fn encoder_generates_repair_on_full_block() {
fn opus_source_packets_have_zero_fec_header_fields() {
let config = CallConfig {
profile: QualityProfile::GOOD, // 5 frames/block
profile: QualityProfile::GOOD, // Opus 24k
suppression_enabled: false, // skip silence gate for this test
..Default::default()
};
let mut enc = CallEncoder::new(&config);
let pcm = vec![0i16; 960];
// Non-silent sine wave so silence detection doesn't suppress us
// even with suppression_enabled=false (belt and braces).
let pcm: Vec<i16> = (0..960)
.map(|i| ((i as f32 * 0.1).sin() * 10_000.0) as i16)
.collect();
let packets = enc.encode_frame(&pcm).unwrap();
assert_eq!(packets.len(), 1, "Opus must emit exactly 1 source packet");
let hdr = &packets[0].header;
assert!(hdr.codec_id.is_opus());
assert!(!hdr.is_repair);
assert_eq!(hdr.fec_block, 0, "Opus fec_block must be 0");
assert_eq!(hdr.fec_symbol, 0, "Opus fec_symbol must be 0");
assert_eq!(hdr.fec_ratio_encoded, 0, "Opus fec_ratio_encoded must be 0");
}
let mut total_packets = 0;
let mut repair_count = 0;
for _ in 0..5 {
/// Phase 2: Opus never emits repair packets, regardless of how many
/// source frames are fed in. DRED (Phase 1) provides loss recovery at
/// the codec layer; RaptorQ is disabled on Opus tiers.
#[test]
fn opus_encoder_never_emits_repair_packets() {
let config = CallConfig {
profile: QualityProfile::GOOD, // 5 frames/block in the Codec2 sense
suppression_enabled: false,
..Default::default()
};
let mut enc = CallEncoder::new(&config);
let pcm: Vec<i16> = (0..960)
.map(|i| ((i as f32 * 0.1).sin() * 10_000.0) as i16)
.collect();
// Encode well beyond a block boundary to prove no repair ever comes out.
let mut total_packets = 0usize;
let mut repair_count = 0usize;
for _ in 0..20 {
let packets = enc.encode_frame(&pcm).unwrap();
total_packets += packets.len();
repair_count += packets.iter().filter(|p| p.header.is_repair).count();
}
assert_eq!(repair_count, 0, "Opus must emit zero repair packets");
assert_eq!(
total_packets, 20,
"20 source frames → 20 source packets (1:1, no RaptorQ expansion)"
);
}
/// Phase 2: Codec2 still emits repair packets with RaptorQ ratio unchanged.
/// DRED is libopus-only and does not apply here, so RaptorQ is still the
/// primary loss-recovery mechanism on Codec2 tiers.
#[test]
fn codec2_encoder_generates_repair_on_full_block() {
let config = CallConfig {
profile: QualityProfile::CATASTROPHIC, // Codec2 1200, 8 frames/block, ratio 1.0
suppression_enabled: false,
..Default::default()
};
let mut enc = CallEncoder::new(&config);
// Codec2 takes 48 kHz samples and downsamples internally.
// CATASTROPHIC uses 40 ms frames → 1920 samples.
let pcm: Vec<i16> = (0..1920)
.map(|i| ((i as f32 * 0.1).sin() * 10_000.0) as i16)
.collect();
let mut total_packets = 0usize;
let mut repair_count = 0usize;
// Run long enough to cross the 8-frame block boundary and see repairs.
for _ in 0..16 {
let packets = enc.encode_frame(&pcm).unwrap();
for p in &packets {
if p.header.is_repair {
@@ -644,8 +930,10 @@ mod tests {
}
total_packets += packets.len();
}
assert!(repair_count > 0, "should have repair packets after full block");
assert!(total_packets > 5, "total {total_packets} should exceed 5 source");
assert!(
repair_count > 0,
"Codec2 must still emit repair packets (got {repair_count} repairs, {total_packets} total)"
);
}
#[test]
@@ -676,6 +964,219 @@ mod tests {
assert!(dec.decode_next(&mut pcm).is_none());
}
// ─── Phase 3b — DRED reconstruction on packet loss ────────────────────
/// Helper: create a CallEncoder/CallDecoder pair with the given profile
/// and silence suppression disabled so silence-detection doesn't drop
/// our synthetic test frames.
fn encoder_decoder_pair(profile: QualityProfile) -> (CallEncoder, CallDecoder) {
let config = CallConfig {
profile,
suppression_enabled: false,
// Small jitter buffer so decode_next drains quickly in tests.
jitter_min: 2,
jitter_target: 3,
jitter_max: 20,
adaptive_jitter: false,
..Default::default()
};
(CallEncoder::new(&config), CallDecoder::new(&config))
}
/// Helper: generate a non-silent 20 ms frame of 300 Hz sine at the
/// given sample offset so consecutive frames form a continuous tone.
fn voice_frame_20ms(sample_offset: usize) -> Vec<i16> {
(0..960)
.map(|i| {
let t = (sample_offset + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect()
}
/// Phase 3b probe: sweep packet_loss_perc values to find the minimum
/// that produces a samples_available ≥ 960 (enough to reconstruct a
/// single 20 ms Opus frame). This guides the production loss floor.
#[test]
#[ignore] // diagnostic only — run with `cargo test ... -- --ignored --nocapture`
fn probe_dred_samples_available_by_loss_floor() {
use wzp_codec::opus_enc::OpusEncoder;
use wzp_proto::traits::AudioEncoder;
for loss_pct in [5u8, 10, 15, 20, 25, 40, 60, 80].iter().copied() {
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
enc.set_expected_loss(loss_pct);
let (_drop_enc, mut dec) = encoder_decoder_pair(QualityProfile::GOOD);
for i in 0..60u16 {
let pcm = voice_frame_20ms(i as usize * 960);
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm, &mut encoded).unwrap();
encoded.truncate(n);
let pkt = MediaPacket {
header: MediaHeader {
version: 0,
is_repair: false,
codec_id: CodecId::Opus24k,
has_quality_report: false,
fec_ratio_encoded: 0,
seq: i,
timestamp: (i as u32) * 20,
fec_block: 0,
fec_symbol: 0,
reserved: 0,
csrc_count: 0,
},
payload: Bytes::from(encoded),
quality_report: None,
};
dec.ingest(pkt);
}
eprintln!(
"[phase3b probe] loss_pct={loss_pct} samples_available={}",
dec.last_good_dred_samples_available()
);
}
}
/// Phase 3b: simulated single-packet loss on an Opus call triggers a
/// DRED reconstruction rather than a classical PLC fill. Runs the full
/// encode → ingest → decode_next pipeline.
#[test]
fn opus_single_packet_loss_is_recovered_via_dred() {
let (mut enc, mut dec) = encoder_decoder_pair(QualityProfile::GOOD);
// Warm-up: encode and ingest 60 frames (1.2 s) so the DRED emitter
// has had time to fill its 200 ms window and at least one
// successful DRED parse has happened on the decoder side.
let warmup_frames = 60;
for i in 0..warmup_frames {
let pcm = voice_frame_20ms(i * 960);
let packets = enc.encode_frame(&pcm).unwrap();
for pkt in packets {
dec.ingest(pkt);
}
}
// Drain the warm-up frames through the decoder to advance the
// jitter buffer cursor past them.
let mut out = vec![0i16; 960];
while dec.decode_next(&mut out).is_some() {}
// Encode the next three frames but skip ingesting the middle one.
let base_offset = warmup_frames * 960;
let pcm_a = voice_frame_20ms(base_offset);
let pcm_b = voice_frame_20ms(base_offset + 960);
let pcm_c = voice_frame_20ms(base_offset + 1920);
let pkts_a = enc.encode_frame(&pcm_a).unwrap();
let pkts_b = enc.encode_frame(&pcm_b).unwrap(); // DROP THIS ONE
let pkts_c = enc.encode_frame(&pcm_c).unwrap();
for pkt in pkts_a {
dec.ingest(pkt);
}
// Skip pkts_b entirely — this is the "packet loss".
drop(pkts_b);
for pkt in pkts_c {
dec.ingest(pkt);
}
// Drain again. Somewhere in here decode_next will hit Missing()
// for the dropped packet and attempt DRED reconstruction.
let baseline_dred = dec.dred_reconstructions;
let baseline_plc = dec.classical_plc_invocations;
eprintln!(
"[phase3b probe] pre-drain: last_good_seq={:?} samples_available={}",
dec.last_good_dred_seq(),
dec.last_good_dred_samples_available()
);
while dec.decode_next(&mut out).is_some() {}
let dred_delta = dec.dred_reconstructions - baseline_dred;
let plc_delta = dec.classical_plc_invocations - baseline_plc;
eprintln!(
"[phase3b probe] post-drain: dred_delta={dred_delta} plc_delta={plc_delta}"
);
assert!(
dred_delta >= 1,
"expected ≥1 DRED reconstruction on single-packet loss, \
got dred_delta={dred_delta} plc_delta={plc_delta}"
);
}
/// Phase 3b: lossless stream never triggers DRED reconstruction or PLC.
/// Baseline behavior — verifies the Missing() branch is not spuriously taken.
#[test]
fn opus_lossless_ingest_never_triggers_dred_or_plc() {
let (mut enc, mut dec) = encoder_decoder_pair(QualityProfile::GOOD);
// Encode + ingest 40 frames with no drops.
for i in 0..40 {
let pcm = voice_frame_20ms(i * 960);
let packets = enc.encode_frame(&pcm).unwrap();
for pkt in packets {
dec.ingest(pkt);
}
}
let mut out = vec![0i16; 960];
while dec.decode_next(&mut out).is_some() {}
assert_eq!(
dec.dred_reconstructions, 0,
"lossless stream should not reconstruct"
);
assert_eq!(
dec.classical_plc_invocations, 0,
"lossless stream should not PLC"
);
}
/// Phase 3b: Codec2 calls fall through to classical PLC on loss.
/// DRED is libopus-only, so even if the decoder's DRED state were
/// populated (it won't be — Codec2 packets don't carry DRED bytes),
/// `reconstruct_from_dred` rejects Codec2 at the AdaptiveDecoder
/// level. This test guards the Codec2 side of the protection split.
#[test]
fn codec2_loss_falls_through_to_classical_plc() {
let (mut enc, mut dec) = encoder_decoder_pair(QualityProfile::CATASTROPHIC);
// Codec2 1200 uses 40 ms frames → 1920 samples at 48 kHz (before
// the downsample inside the codec). Encode 20 frames (~0.8 s).
let make_frame = |offset: usize| -> Vec<i16> {
(0..1920)
.map(|i| {
let t = (offset + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect()
};
for i in 0..20 {
let pcm = make_frame(i * 1920);
let packets = enc.encode_frame(&pcm).unwrap();
for pkt in packets {
// Drop every 5th source packet to simulate loss.
if !pkt.header.is_repair && i % 5 == 3 {
continue;
}
dec.ingest(pkt);
}
}
let mut out = vec![0i16; 1920];
while dec.decode_next(&mut out).is_some() {}
assert_eq!(
dec.dred_reconstructions, 0,
"Codec2 must never reconstruct via DRED"
);
// classical_plc_invocations may or may not trigger depending on
// whether the jitter buffer sees Missing before draining — the key
// assertion is that DRED is not used. PLC count is advisory.
}
// ---- QualityAdapter tests ----
/// Helper: build a QualityReport from human-readable loss% and RTT ms.
@@ -950,4 +1451,131 @@ mod tests {
"frames_suppressed should be > 0"
);
}
// ---- DredTuner integration tests ----
/// End-to-end test: DredTuner reacts to simulated network degradation
/// and adjusts the encoder's DRED parameters via `apply_dred_tuning`.
#[test]
fn dred_tuner_adjusts_encoder_on_loss() {
use wzp_proto::DredTuner;
let mut enc = CallEncoder::new(&CallConfig {
profile: QualityProfile::GOOD,
suppression_enabled: false,
..Default::default()
});
let mut tuner = DredTuner::new(QualityProfile::GOOD.codec);
// Baseline: good network → baseline DRED (20 frames = 200 ms).
let baseline = tuner.current();
assert_eq!(baseline.dred_frames, 20);
// Warm up the tuner — first few updates may return Some as the
// EWMA initializes and expected_loss settles from the initial 15%.
for _ in 0..10 {
tuner.update(0.0, 50, 5);
}
// After settling, the tuning should be at baseline.
assert_eq!(tuner.current().dred_frames, 20);
// Simulate network degradation: 30% loss, 300ms RTT.
// The tuner should increase DRED frames above baseline.
let tuning = tuner.update(30.0, 300, 15);
assert!(tuning.is_some(), "loss spike should trigger tuning change");
let t = tuning.unwrap();
assert!(
t.dred_frames > 20,
"30% loss should increase DRED above baseline 20, got {}",
t.dred_frames
);
// Apply to encoder — should not panic.
enc.apply_dred_tuning(t);
// Verify the encoder still works after tuning.
let pcm = voice_frame_20ms(0);
let packets = enc.encode_frame(&pcm).unwrap();
assert!(!packets.is_empty(), "encoder must still produce packets after DRED tuning");
}
/// DredTuner jitter spike triggers pre-emptive DRED boost to ceiling.
#[test]
fn dred_tuner_spike_boosts_to_ceiling() {
use wzp_proto::DredTuner;
let mut tuner = DredTuner::new(CodecId::Opus24k);
// Establish low-jitter baseline.
for _ in 0..20 {
tuner.update(0.0, 50, 5);
}
assert!(!tuner.spike_boost_active());
// Jitter spikes to 40ms (8x baseline of ~5ms).
let tuning = tuner.update(0.0, 50, 40);
assert!(tuner.spike_boost_active(), "jitter spike should activate boost");
assert!(tuning.is_some());
// Ceiling for Opus24k is 50 frames = 500 ms.
assert_eq!(
tuning.unwrap().dred_frames, 50,
"spike should push to ceiling"
);
}
/// DredTuner is a no-op for Codec2 profiles.
#[test]
fn dred_tuner_noop_for_codec2() {
use wzp_proto::DredTuner;
let mut tuner = DredTuner::new(CodecId::Codec2_1200);
// Even extreme conditions produce no tuning output.
assert!(tuner.update(50.0, 800, 100).is_none());
assert_eq!(tuner.current().dred_frames, 0);
}
/// DredTuner + CallEncoder: full cycle through profile switch.
#[test]
fn dred_tuner_handles_profile_switch() {
use wzp_proto::DredTuner;
let mut enc = CallEncoder::new(&CallConfig {
profile: QualityProfile::GOOD,
suppression_enabled: false,
..Default::default()
});
let mut tuner = DredTuner::new(QualityProfile::GOOD.codec);
// Apply initial tuning on good network.
if let Some(t) = tuner.update(0.0, 50, 5) {
enc.apply_dred_tuning(t);
}
// Switch to degraded profile.
enc.set_profile(QualityProfile::DEGRADED).unwrap();
tuner.set_codec(QualityProfile::DEGRADED.codec);
// Opus6k baseline is 50 frames (500 ms), ceiling is 104 (1040 ms).
let baseline = tuner.current();
// After set_codec, the cached tuning should reflect old state;
// a fresh update gives the new codec's mapping.
let tuning = tuner.update(20.0, 200, 10);
assert!(tuning.is_some());
let t = tuning.unwrap();
assert!(
t.dred_frames >= 50,
"Opus6k with 20% loss should be at least baseline 50, got {}",
t.dred_frames
);
enc.apply_dred_tuning(t);
// Encode a 40ms frame (Opus6k uses 40ms frames = 1920 samples).
let pcm: Vec<i16> = (0..1920)
.map(|i| ((i as f32 * 0.1).sin() * 10_000.0) as i16)
.collect();
let packets = enc.encode_frame(&pcm).unwrap();
assert!(!packets.is_empty());
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,546 @@
//! Phase 3.5 — dual-path QUIC connect race for P2P hole-punching.
//!
//! When both peers advertised reflex addrs in the
//! DirectCallOffer/Answer flow, the relay cross-wires them into
//! `CallSetup.peer_direct_addr`. This module races a direct QUIC
//! handshake against the existing relay dial and returns whichever
//! completes first — with automatic drop of the loser via
//! `tokio::select!`.
//!
//! Role determination is deterministic and symmetric
//! (`wzp_client::reflect::determine_role`): whichever peer has the
//! lexicographically smaller reflex addr becomes the **Acceptor**
//! (listens on a server-capable endpoint), the other becomes the
//! **Dialer** (dials the peer's addr). Because the rule is
//! identical on both sides, the Acceptor's inbound QUIC session
//! and the Dialer's outbound are the SAME connection — no
//! negotiation needed, no two-conns-per-call confusion.
//!
//! Timeout policy:
//! - Direct path: 2s from the start of `race`. Cone-NAT hole-punch
//! typically completes in < 500ms on a LAN; 2s gives us tolerance
//! for a single QUIC Initial retry on unreliable networks.
//! - Relay path: 10s (existing behavior elsewhere in the codebase).
//! - Overall: `tokio::select!` returns as soon as either succeeds.
use std::net::SocketAddr;
use std::sync::Arc;
use std::time::Duration;
use crate::reflect::Role;
use wzp_transport::QuinnTransport;
/// Which path won the race. Used by the `connect` command for
/// logging + (in the future) metrics.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum WinningPath {
Direct,
Relay,
}
/// Phase 6: the race now returns BOTH transports (when available)
/// so the connect command can negotiate with the peer before
/// committing. The negotiation decides which transport to use
/// based on whether BOTH sides report `direct_ok = true`.
pub struct RaceResult {
/// The direct P2P transport, if the direct path completed.
/// `None` if the direct dial/accept failed or timed out.
pub direct_transport: Option<Arc<QuinnTransport>>,
/// The relay transport, if the relay dial completed.
/// `None` if the relay dial failed (shouldn't happen in
/// practice since relay is always reachable).
pub relay_transport: Option<Arc<QuinnTransport>>,
/// Which future completed first in the local race.
/// Informational — the actual path used is decided by the
/// Phase 6 negotiation after both sides exchange reports.
pub local_winner: WinningPath,
}
/// Attempt a direct QUIC connection to the peer in parallel with
/// the relay dial and return the winning `QuinnTransport`.
///
/// `role` selects the direction of the direct attempt:
/// - `Role::Acceptor` creates a server-capable endpoint and waits
/// for the peer to dial in.
/// - `Role::Dialer` creates a client-only endpoint and dials
/// `peer_direct_addr`.
///
/// The relay path is always attempted in parallel as a fallback so
/// the race ALWAYS produces a working transport unless both paths
/// genuinely fail (network partition). Returns
/// `Err(anyhow::anyhow!(...))` if both paths fail within the
/// timeout.
/// Phase 5.5 candidate bundle — full ICE-ish candidate list for
/// the peer. The race tries them all in parallel alongside the
/// relay path. At minimum this should contain the peer's
/// server-reflexive address; `local_addrs` carries LAN host
/// candidates gathered from their physical interfaces.
///
/// Empty is valid: the D-role has nothing to dial and the race
/// reduces to "relay only" + (if A-role) accepting on the
/// shared endpoint.
#[derive(Debug, Clone, Default)]
pub struct PeerCandidates {
/// Peer's server-reflexive address (Phase 3). `None` if the
/// peer didn't advertise one.
pub reflexive: Option<SocketAddr>,
/// Peer's LAN host addresses (Phase 5.5). Tried first on
/// same-LAN pairs — direct dials to these bypass the NAT
/// entirely.
pub local: Vec<SocketAddr>,
}
impl PeerCandidates {
/// Flatten into the list of addrs the D-role should dial.
/// Order: LAN host candidates first (fastest when they
/// work), then reflexive (covers the non-LAN case).
pub fn dial_order(&self) -> Vec<SocketAddr> {
let mut out = Vec::with_capacity(self.local.len() + 1);
out.extend(self.local.iter().copied());
if let Some(a) = self.reflexive {
// Only add if it's not already in the list (some
// edge cases on same-LAN could have the same addr
// in both).
if !out.contains(&a) {
out.push(a);
}
}
out
}
/// Is there anything for the D-role to dial? If not, the
/// race reduces to relay-only.
pub fn is_empty(&self) -> bool {
self.reflexive.is_none() && self.local.is_empty()
}
}
#[allow(clippy::too_many_arguments)]
pub async fn race(
role: Role,
peer_candidates: PeerCandidates,
relay_addr: SocketAddr,
room_sni: String,
call_sni: String,
// Phase 5: when `Some`, reuse this endpoint for BOTH the
// direct-path branch AND the relay dial. Pass the signal
// endpoint. The endpoint MUST be server-capable (created
// with a server config) for the A-role accept branch to
// work.
//
// When `None`, falls back to fresh endpoints per role.
// Used by tests.
shared_endpoint: Option<wzp_transport::Endpoint>,
// Phase 7: dedicated IPv6 endpoint with IPV6_V6ONLY=1.
// When `Some`, A-role accepts on both v4+v6, D-role routes
// each candidate to its matching-AF endpoint. When `None`,
// IPv6 candidates are skipped (IPv4-only, pre-Phase-7).
ipv6_endpoint: Option<wzp_transport::Endpoint>,
) -> anyhow::Result<RaceResult> {
// Rustls provider must be installed before any quinn endpoint
// is created. Install attempt is idempotent.
let _ = rustls::crypto::ring::default_provider().install_default();
// Build the direct-path endpoint + future based on role.
//
// A-role: one accept future on the shared endpoint. The
// first incoming QUIC connection wins — we don't care
// which peer candidate the dialer used to reach us.
//
// D-role: N parallel dial futures, one per peer candidate
// (all LAN host addrs + the reflex addr), consolidated
// into a single direct_fut via FuturesUnordered-style
// "first OK wins" semantics. The first successful dial
// becomes the direct path; the losers are dropped (quinn
// will abort the in-flight handshakes via the dropped
// Connecting futures).
//
// Either way, direct_fut resolves to a single QuinnTransport
// (or an error) and is raced against the relay_fut by the
// outer tokio::select!.
let direct_ep: wzp_transport::Endpoint;
let direct_fut: std::pin::Pin<
Box<dyn std::future::Future<Output = anyhow::Result<QuinnTransport>> + Send>,
>;
match role {
Role::Acceptor => {
let ep = match shared_endpoint.clone() {
Some(ep) => {
tracing::info!(
local_addr = ?ep.local_addr().ok(),
"dual_path: A-role reusing shared endpoint for accept"
);
ep
}
None => {
let (sc, _cert_der) = wzp_transport::server_config();
// 0.0.0.0:0 = IPv4 socket. [::]:0 dual-stack was
// tried but breaks on Android devices where
// IPV6_V6ONLY=1 (default on some kernels) —
// IPv4 candidates silently fail. IPv6 host
// candidates are skipped for now; they need a
// dedicated IPv6 socket alongside the v4 one
// (like WebRTC's dual-socket approach).
let bind: SocketAddr = "0.0.0.0:0".parse().unwrap();
let fresh = wzp_transport::create_endpoint(bind, Some(sc))?;
tracing::info!(
local_addr = ?fresh.local_addr().ok(),
"dual_path: A-role fresh endpoint up, awaiting peer dial"
);
fresh
}
};
let ep_for_fut = ep.clone();
// Phase 7: IPv6 accept temporarily disabled (same reason
// as dial — IPv6 connections die on datagram send).
// Accept on IPv4 shared endpoint only.
let _v6_ep_unused = ipv6_endpoint.clone();
direct_fut = Box::pin(async move {
// Accept loop: retry if we get a stale/closed
// connection from a previous call. Max 3 retries
// to avoid spinning until the race timeout.
const MAX_STALE: usize = 3;
let mut stale_count: usize = 0;
loop {
let conn = wzp_transport::accept(&ep_for_fut)
.await
.map_err(|e| anyhow::anyhow!("direct accept: {e}"))?;
if let Some(reason) = conn.close_reason() {
// Explicitly close so the peer gets a
// close frame instead of idle timeout.
conn.close(0u32.into(), b"stale");
stale_count += 1;
tracing::warn!(
remote = %conn.remote_address(),
stable_id = conn.stable_id(),
stale_count,
?reason,
"dual_path: A-role skipping stale connection"
);
if stale_count >= MAX_STALE {
return Err(anyhow::anyhow!(
"A-role: {stale_count} stale connections, aborting"
));
}
continue;
}
let has_dgram = conn.max_datagram_size().is_some();
tracing::info!(
remote = %conn.remote_address(),
stable_id = conn.stable_id(),
has_dgram,
"dual_path: A-role accepted direct connection"
);
break Ok(QuinnTransport::new(conn));
}
});
direct_ep = ep;
}
Role::Dialer => {
let ep = match shared_endpoint.clone() {
Some(ep) => {
tracing::info!(
local_addr = ?ep.local_addr().ok(),
candidates = ?peer_candidates.dial_order(),
"dual_path: D-role reusing shared endpoint to dial peer candidates"
);
ep
}
None => {
// 0.0.0.0:0 = IPv4 socket. [::]:0 dual-stack was
// tried but breaks on Android devices where
// IPV6_V6ONLY=1 (default on some kernels) —
// IPv4 candidates silently fail. IPv6 host
// candidates are skipped for now; they need a
// dedicated IPv6 socket alongside the v4 one
// (like WebRTC's dual-socket approach).
let bind: SocketAddr = "0.0.0.0:0".parse().unwrap();
let fresh = wzp_transport::create_endpoint(bind, None)?;
tracing::info!(
local_addr = ?fresh.local_addr().ok(),
candidates = ?peer_candidates.dial_order(),
"dual_path: D-role fresh endpoint up, dialing peer candidates"
);
fresh
}
};
let ep_for_fut = ep.clone();
let _v6_ep_for_dial = ipv6_endpoint.clone();
let dial_order = peer_candidates.dial_order();
let sni = call_sni.clone();
direct_fut = Box::pin(async move {
if dial_order.is_empty() {
// No candidates — the race reduces to
// relay-only. Surface a stable error so the
// outer select falls through to relay_fut
// without a spurious "direct failed" warning.
// Use a pending future that never resolves so
// the select's "other side wins" branch is
// the natural outcome.
std::future::pending::<anyhow::Result<QuinnTransport>>().await
} else {
// Fan out N parallel dials via JoinSet. First
// `Ok` wins; `Err` from a single candidate is
// not fatal — we wait for the others. Only
// when ALL have failed do we return Err.
let mut set = tokio::task::JoinSet::new();
for (idx, candidate) in dial_order.iter().enumerate() {
// Phase 7: route each candidate to the
// endpoint matching its address family.
let candidate = *candidate;
// Phase 7: IPv6 dials temporarily disabled.
// IPv6 QUIC handshakes succeed but the
// connection dies immediately on datagram
// send ("connection lost"). Root cause is
// likely router-level IPv6 UDP filtering.
// Re-enable once IPv6 datagram delivery is
// verified on target networks.
if candidate.is_ipv6() {
tracing::debug!(
%candidate,
candidate_idx = idx,
"dual_path: skipping IPv6 candidate (disabled)"
);
continue;
}
let ep = ep_for_fut.clone();
let client_cfg = wzp_transport::client_config();
let sni = sni.clone();
set.spawn(async move {
let result = wzp_transport::connect(
&ep,
candidate,
&sni,
client_cfg,
)
.await;
(idx, candidate, result)
});
}
let mut last_err: Option<String> = None;
while let Some(join_res) = set.join_next().await {
let (idx, candidate, dial_res) = match join_res {
Ok(t) => t,
Err(e) => {
last_err = Some(format!("join {e}"));
continue;
}
};
match dial_res {
Ok(conn) => {
tracing::info!(
%candidate,
candidate_idx = idx,
remote = %conn.remote_address(),
stable_id = conn.stable_id(),
"dual_path: direct dial succeeded on candidate"
);
// Abort the remaining in-flight
// dials so they don't complete
// and leak QUIC sessions.
set.abort_all();
return Ok(QuinnTransport::new(conn));
}
Err(e) => {
tracing::debug!(
%candidate,
candidate_idx = idx,
error = %e,
"dual_path: direct dial failed, trying others"
);
last_err = Some(format!("candidate {candidate}: {e}"));
}
}
}
Err(anyhow::anyhow!(
"all {} direct candidates failed; last: {}",
dial_order.len(),
last_err.unwrap_or_else(|| "n/a".into())
))
}
});
direct_ep = ep;
}
}
// Relay path: classic dial to the relay's media room. Phase 5:
// reuse the shared endpoint here too so MikroTik-style NATs
// keep a stable external port across all flows from this
// client. Falls back to a fresh endpoint when not shared.
let relay_ep = match shared_endpoint.clone() {
Some(ep) => ep,
None => {
let relay_bind: SocketAddr = "[::]:0".parse().unwrap();
wzp_transport::create_endpoint(relay_bind, None)?
}
};
let relay_ep_for_fut = relay_ep.clone();
let relay_client_cfg = wzp_transport::client_config();
let relay_sni = room_sni.clone();
// Phase 5.5 direct-path head-start: hold the relay dial for
// 500ms before attempting it. On same-LAN cone-NAT pairs the
// direct dial finishes in ~30-100ms, so giving direct a 500ms
// head start means direct reliably wins when it's going to
// work at all. The worst case adds 500ms to the fall-back-
// to-relay scenario, which is imperceptible for users on
// setups where direct isn't available anyway.
//
// Prior behavior (immediate race) caused the relay to win
// ~105ms races on a MikroTik LAN because:
// - Acceptor role's direct_fut = accept() can only fire
// when the peer has completed its outbound LAN dial
// - Dialer role's parallel LAN dials need the peer's
// CallSetup processed + the race started on the other
// side before they can reach us
// - Meanwhile relay_fut is a plain dial that completes in
// whatever the client→relay RTT is (often <100ms)
//
// The 500ms head start is the minimum that empirically makes
// same-LAN direct reliably beat relay, without penalizing
// users who genuinely need the relay path.
const DIRECT_HEAD_START: Duration = Duration::from_millis(500);
let relay_fut = async move {
tokio::time::sleep(DIRECT_HEAD_START).await;
let conn =
wzp_transport::connect(&relay_ep_for_fut, relay_addr, &relay_sni, relay_client_cfg)
.await
.map_err(|e| anyhow::anyhow!("relay dial: {e}"))?;
Ok::<_, anyhow::Error>(QuinnTransport::new(conn))
};
// Phase 6: run both paths concurrently via tokio::spawn and
// collect BOTH results. The old tokio::select! approach dropped
// the loser, which meant the connect command couldn't negotiate
// with the peer — it had to commit to whichever path won locally.
//
// Now we spawn both as tasks, wait for the first to complete
// (that determines `local_winner`), then give the loser a short
// grace period to also complete. The connect command gets a
// RaceResult with both transports (when available) and uses the
// Phase 6 MediaPathReport exchange to decide which one to
// actually use for media.
tracing::info!(
?role,
candidates = ?peer_candidates.dial_order(),
%relay_addr,
"dual_path: racing direct vs relay"
);
let mut direct_task = tokio::spawn(
tokio::time::timeout(Duration::from_secs(2), direct_fut),
);
let mut relay_task = tokio::spawn(async move {
// Keep the 500ms head start so direct has a chance
tokio::time::sleep(Duration::from_millis(500)).await;
tokio::time::timeout(Duration::from_secs(5), relay_fut).await
});
// Wait for the first one to complete. This tells us the
// local_winner — but we DON'T commit to it yet. Phase 6
// negotiation decides the actual path.
let (mut direct_result, mut relay_result): (
Option<anyhow::Result<QuinnTransport>>,
Option<anyhow::Result<QuinnTransport>>,
) = (None, None);
let local_winner;
tokio::select! {
biased;
d = &mut direct_task => {
match d {
Ok(Ok(Ok(t))) => {
tracing::info!("dual_path: direct completed first");
direct_result = Some(Ok(t));
local_winner = WinningPath::Direct;
}
Ok(Ok(Err(e))) => {
tracing::warn!(error = %e, "dual_path: direct failed");
direct_result = Some(Err(anyhow::anyhow!("{e}")));
local_winner = WinningPath::Relay; // direct failed → relay is our only hope
}
Ok(Err(_)) => {
tracing::warn!("dual_path: direct timed out (2s)");
direct_result = Some(Err(anyhow::anyhow!("direct timeout")));
local_winner = WinningPath::Relay;
}
Err(e) => {
tracing::warn!(error = %e, "dual_path: direct task panicked");
direct_result = Some(Err(anyhow::anyhow!("direct task panic")));
local_winner = WinningPath::Relay;
}
}
}
r = &mut relay_task => {
match r {
Ok(Ok(Ok(t))) => {
tracing::info!("dual_path: relay completed first");
relay_result = Some(Ok(t));
local_winner = WinningPath::Relay;
}
Ok(Ok(Err(e))) => {
tracing::warn!(error = %e, "dual_path: relay failed");
relay_result = Some(Err(anyhow::anyhow!("{e}")));
local_winner = WinningPath::Direct;
}
Ok(Err(_)) => {
tracing::warn!("dual_path: relay timed out");
relay_result = Some(Err(anyhow::anyhow!("relay timeout")));
local_winner = WinningPath::Direct;
}
Err(e) => {
relay_result = Some(Err(anyhow::anyhow!("relay task panic: {e}")));
local_winner = WinningPath::Direct;
}
}
}
}
// Give the loser a short grace period (1s) to also complete.
// If it does, we have both transports for Phase 6 negotiation.
// If it doesn't, we still proceed with just the winner.
if direct_result.is_none() {
match tokio::time::timeout(Duration::from_secs(1), direct_task).await {
Ok(Ok(Ok(Ok(t)))) => { direct_result = Some(Ok(t)); }
Ok(Ok(Ok(Err(e)))) => { direct_result = Some(Err(anyhow::anyhow!("{e}"))); }
_ => { direct_result = Some(Err(anyhow::anyhow!("direct: no result in grace period"))); }
}
}
if relay_result.is_none() {
match tokio::time::timeout(Duration::from_secs(1), relay_task).await {
Ok(Ok(Ok(Ok(t)))) => { relay_result = Some(Ok(t)); }
Ok(Ok(Ok(Err(e)))) => { relay_result = Some(Err(anyhow::anyhow!("{e}"))); }
_ => { relay_result = Some(Err(anyhow::anyhow!("relay: no result in grace period"))); }
}
}
let direct_ok = direct_result.as_ref().map(|r| r.is_ok()).unwrap_or(false);
let relay_ok = relay_result.as_ref().map(|r| r.is_ok()).unwrap_or(false);
tracing::info!(
?local_winner,
direct_ok,
relay_ok,
"dual_path: race finished, both results collected for Phase 6 negotiation"
);
if !direct_ok && !relay_ok {
return Err(anyhow::anyhow!("both paths failed: no media transport available"));
}
let _ = (direct_ep, relay_ep, ipv6_endpoint);
Ok(RaceResult {
direct_transport: direct_result
.and_then(|r| r.ok())
.map(|t| Arc::new(t)),
relay_transport: relay_result
.and_then(|r| r.ok())
.map(|t| Arc::new(t)),
local_winner,
})
}

View File

@@ -96,6 +96,7 @@ pub fn signal_to_call_type(signal: &SignalMessage) -> CallSignalType {
SignalMessage::Hangup { .. } => CallSignalType::Hangup,
SignalMessage::Rekey { .. } => CallSignalType::Offer, // reuse
SignalMessage::QualityUpdate { .. } => CallSignalType::Offer, // reuse
SignalMessage::LossRecoveryUpdate { .. } => CallSignalType::Offer, // reuse (telemetry)
SignalMessage::Ping { .. } | SignalMessage::Pong { .. } => CallSignalType::Offer,
SignalMessage::AuthToken { .. } => CallSignalType::Offer,
SignalMessage::Hold => CallSignalType::Hold,
@@ -110,7 +111,27 @@ pub fn signal_to_call_type(signal: &SignalMessage) -> CallSignalType {
SignalMessage::SessionForward { .. } => CallSignalType::Offer, // reuse
SignalMessage::SessionForwardAck { .. } => CallSignalType::Offer, // reuse
SignalMessage::RoomUpdate { .. } => CallSignalType::Offer, // reuse
SignalMessage::SetAlias { .. } => CallSignalType::Offer, // reuse
SignalMessage::FederationHello { .. }
| SignalMessage::GlobalRoomActive { .. }
| SignalMessage::GlobalRoomInactive { .. } => CallSignalType::Offer, // relay-only
SignalMessage::DirectCallOffer { .. } => CallSignalType::Offer,
SignalMessage::DirectCallAnswer { .. } => CallSignalType::Answer,
SignalMessage::CallSetup { .. } => CallSignalType::Offer, // relay-only
SignalMessage::CallRinging { .. } => CallSignalType::Ringing,
SignalMessage::RegisterPresence { .. }
| SignalMessage::RegisterPresenceAck { .. } => CallSignalType::Offer, // relay-only
// NAT reflection is a client↔relay control exchange that
// never crosses the featherChat bridge — if it ever reaches
// this mapper something is wrong, but we still have to give
// an answer. "Offer" is the generic catch-all.
SignalMessage::Reflect
| SignalMessage::ReflectResponse { .. } => CallSignalType::Offer, // control-plane
// Phase 4 cross-relay forwarding envelope — strictly a
// relay-to-relay message, never rides the featherChat
// bridge. Catch-all mapping for completeness.
SignalMessage::FederatedSignalForward { .. } => CallSignalType::Offer,
SignalMessage::MediaPathReport { .. } => CallSignalType::Offer, // control-plane
SignalMessage::QualityDirective { .. } => CallSignalType::Offer, // relay-initiated
}
}
@@ -150,6 +171,7 @@ mod tests {
let hangup = SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
call_id: None,
};
assert!(matches!(signal_to_call_type(&hangup), CallSignalType::Hangup));

View File

@@ -38,6 +38,9 @@ pub async fn perform_handshake(
ephemeral_pub,
signature,
supported_profiles: vec![
QualityProfile::STUDIO_64K,
QualityProfile::STUDIO_48K,
QualityProfile::STUDIO_32K,
QualityProfile::GOOD,
QualityProfile::DEGRADED,
QualityProfile::CATASTROPHIC,

View File

@@ -10,18 +10,75 @@
pub mod audio_io;
#[cfg(feature = "audio")]
pub mod audio_ring;
#[cfg(feature = "vpio")]
// VoiceProcessingIO is an Apple Core Audio API — only compile the module
// when the `vpio` feature is on AND we're targeting macOS. Enabling the
// feature on Windows/Linux was previously silently broken.
#[cfg(all(feature = "vpio", target_os = "macos"))]
pub mod audio_vpio;
// WASAPI-direct capture with Windows's OS-level AEC (AudioCategory_Communications).
// Only compiled when `windows-aec` feature is on AND target is Windows. The
// `windows` dependency is itself gated to Windows in Cargo.toml, so enabling
// this feature on non-Windows targets is a no-op.
#[cfg(all(feature = "windows-aec", target_os = "windows"))]
pub mod audio_wasapi;
// WebRTC AEC3 (Audio Processing Module) wrapper around CPAL capture + playback
// on Linux. Only compiled when `linux-aec` feature is on AND target is Linux.
// The webrtc-audio-processing dep is itself gated to Linux in Cargo.toml.
#[cfg(all(feature = "linux-aec", target_os = "linux"))]
pub mod audio_linux_aec;
pub mod bench;
pub mod call;
pub mod drift_test;
pub mod echo_test;
pub mod featherchat;
pub mod handshake;
pub mod dual_path;
pub mod metrics;
pub mod reflect;
pub mod sweep;
#[cfg(feature = "audio")]
pub use audio_io::{AudioCapture, AudioPlayback};
// AudioPlayback: three possible backends depending on feature flags.
// 1. Default CPAL (`audio_io::AudioPlayback`) — baseline on every platform.
// 2. Linux AEC (`audio_linux_aec::LinuxAecPlayback`) — CPAL + WebRTC APM
// render-side tee, so echo from speakers gets cancelled from the mic.
//
// On macOS and Windows we always use the default CPAL playback because:
// - macOS: VoiceProcessingIO handles AEC at the capture side (Apple's
// native hardware AEC uses its own reference signal handling).
// - Windows: WASAPI AudioCategory_Communications AEC uses the system
// render mix as reference — no per-process plumbing needed.
//
// Linux is the only platform where the in-app approach is necessary, so
// the AEC playback path is gated to target_os = "linux".
#[cfg(all(
feature = "audio",
any(not(feature = "linux-aec"), not(target_os = "linux"))
))]
pub use audio_io::AudioPlayback;
#[cfg(all(feature = "linux-aec", target_os = "linux"))]
pub use audio_linux_aec::LinuxAecPlayback as AudioPlayback;
// AudioCapture: three possible backends depending on feature flags.
// 1. Default CPAL (`audio_io::AudioCapture`) — baseline on every platform.
// 2. Windows AEC (`audio_wasapi::WasapiAudioCapture`) — direct WASAPI
// with AudioCategory_Communications, OS APO chain does AEC.
// 3. Linux AEC (`audio_linux_aec::LinuxAecCapture`) — CPAL + WebRTC APM
// capture-side echo cancellation using the playback tee as reference.
// All three expose the same public API (`start`, `ring`, `stop`, `Drop`).
#[cfg(all(
feature = "audio",
any(not(feature = "windows-aec"), not(target_os = "windows")),
any(not(feature = "linux-aec"), not(target_os = "linux"))
))]
pub use audio_io::AudioCapture;
#[cfg(all(feature = "windows-aec", target_os = "windows"))]
pub use audio_wasapi::WasapiAudioCapture as AudioCapture;
#[cfg(all(feature = "linux-aec", target_os = "linux"))]
pub use audio_linux_aec::LinuxAecCapture as AudioCapture;
pub use call::{CallConfig, CallDecoder, CallEncoder};
pub use handshake::perform_handshake;

View File

@@ -0,0 +1,679 @@
//! Multi-relay NAT reflection ("STUN for QUIC" — Phase 2).
//!
//! Phase 1 (`SignalMessage::Reflect` / `ReflectResponse`) lets a
//! client ask a single relay "what source address do you see for
//! me?". Phase 2 queries N relays in parallel and classifies the
//! results into a NAT type so the future P2P hole-punching path
//! can decide whether a direct QUIC handshake is viable:
//!
//! - All relays return the same `(ip, port)` → **Cone NAT**.
//! Endpoint-independent mapping, P2P hole-punching viable,
//! `consensus_addr` is the one address to advertise.
//! - Same ip, different ports → **Symmetric port-dependent NAT**.
//! The mapping changes per destination, so the advertised addr
//! wouldn't match what a peer actually sees; fall back to
//! relay-mediated path.
//! - Different ips → multi-homed / anycast / broken DNS, treat as
//! `Multiple` and do not attempt P2P.
//! - 0 or 1 successful probes → `Unknown`, not enough data.
//!
//! A probe is a throwaway QUIC signal connection: open endpoint,
//! connect, RegisterPresence (with a zero identity — the relay
//! accepts this exactly like the main signaling path does), send
//! Reflect, read ReflectResponse, close. Each probe gets its own
//! ephemeral quinn::Endpoint so the OS assigns a fresh source port
//! per relay — if we shared one endpoint across probes, a
//! symmetric NAT in front of the client would map every probe to
//! the same port and we couldn't detect it.
use std::net::SocketAddr;
use std::time::{Duration, Instant};
use serde::Serialize;
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_transport::{client_config, create_endpoint, QuinnTransport};
/// Result of one probe against one relay. Always returned so the
/// UI can render per-relay status even when some fail.
#[derive(Debug, Clone, Serialize)]
pub struct NatProbeResult {
pub relay_name: String,
pub relay_addr: String,
/// `Some` on successful probe, `None` on failure.
pub observed_addr: Option<String>,
/// End-to-end wall-clock from connect start to ReflectResponse
/// received, in milliseconds. `Some` only on success.
pub latency_ms: Option<u32>,
/// Human-readable error on failure.
pub error: Option<String>,
}
/// Aggregated classification over N `NatProbeResult`s.
#[derive(Debug, Clone, Serialize)]
pub struct NatDetection {
pub probes: Vec<NatProbeResult>,
pub nat_type: NatType,
/// When `nat_type == Cone`, the one address all probes agreed
/// on. `None` for every other case.
pub consensus_addr: Option<String>,
}
/// NAT classification. See module doc for semantics.
#[derive(Debug, Clone, Copy, Serialize, PartialEq, Eq)]
pub enum NatType {
Cone,
SymmetricPort,
Multiple,
Unknown,
}
/// Probe a single relay with a QUIC connection.
///
/// # Endpoint reuse (Phase 5 — Nebula-style architecture)
///
/// If `existing_endpoint` is `Some`, the probe uses that socket
/// instead of creating a fresh one. This is the desired mode in
/// production: a port-preserving NAT (MikroTik masquerade, most
/// consumer routers) gives a **stable** external port for the
/// one socket, so the reflex addr observed by ANY relay is the
/// SAME addr and matches what a peer would see on a direct dial.
/// Pass the signal endpoint here.
///
/// If `None`, creates a fresh one-shot endpoint. Kept for:
/// - tests that spin up isolated probes
/// - the "I'm not registered yet" case where there's no signal
/// endpoint to reuse
///
/// NOTE on NAT-type detection: the pre-Phase-5 behavior of
/// forcing a fresh endpoint per probe was wrong — it made every
/// port-preserving NAT look symmetric because the classifier saw
/// a different external port for each fresh source port. With
/// one shared socket, the classifier reflects the REAL NAT
/// behavior.
pub async fn probe_reflect_addr(
relay: SocketAddr,
timeout_ms: u64,
existing_endpoint: Option<wzp_transport::Endpoint>,
) -> Result<(SocketAddr, u32), String> {
// Install rustls provider idempotently — a second install on the
// same thread is a no-op.
let _ = rustls::crypto::ring::default_provider().install_default();
let endpoint = match existing_endpoint {
Some(ep) => ep,
None => {
let bind: SocketAddr = "0.0.0.0:0".parse().unwrap();
create_endpoint(bind, None).map_err(|e| format!("endpoint: {e}"))?
}
};
let start = Instant::now();
let probe = async {
// Open the signal connection.
let conn =
wzp_transport::connect(&endpoint, relay, "_signal", client_config())
.await
.map_err(|e| format!("connect: {e}"))?;
let transport = QuinnTransport::new(conn);
// The relay signal handler waits for a RegisterPresence
// before entering its main dispatch loop (see
// wzp-relay/src/main.rs). So a transient probe has to
// register with a zero identity first — the relay accepts
// the empty-signature form exactly as the main signaling
// path does in desktop/src-tauri/src/lib.rs register_signal.
transport
.send_signal(&SignalMessage::RegisterPresence {
identity_pub: [0u8; 32],
signature: vec![],
alias: None,
})
.await
.map_err(|e| format!("send RegisterPresence: {e}"))?;
// Drain the RegisterPresenceAck so the response to our
// Reflect doesn't land on an unexpected stream order.
match transport.recv_signal().await {
Ok(Some(SignalMessage::RegisterPresenceAck { success: true, .. })) => {}
Ok(Some(other)) => {
return Err(format!(
"unexpected pre-reflect signal: {:?}",
std::mem::discriminant(&other)
));
}
Ok(None) => return Err("connection closed before RegisterPresenceAck".into()),
Err(e) => return Err(format!("recv RegisterPresenceAck: {e}")),
}
// Send Reflect and await response.
transport
.send_signal(&SignalMessage::Reflect)
.await
.map_err(|e| format!("send Reflect: {e}"))?;
match transport.recv_signal().await {
Ok(Some(SignalMessage::ReflectResponse { observed_addr })) => {
let parsed: SocketAddr = observed_addr
.parse()
.map_err(|e| format!("parse observed_addr {observed_addr:?}: {e}"))?;
let latency_ms = start.elapsed().as_millis() as u32;
// Clean close so the relay's per-connection cleanup
// runs promptly and we don't leak file descriptors.
let _ = transport.close().await;
Ok((parsed, latency_ms))
}
Ok(Some(other)) => Err(format!(
"expected ReflectResponse, got {:?}",
std::mem::discriminant(&other)
)),
Ok(None) => Err("connection closed before ReflectResponse".into()),
Err(e) => Err(format!("recv ReflectResponse: {e}")),
}
};
let out = tokio::time::timeout(Duration::from_millis(timeout_ms), probe)
.await
.map_err(|_| format!("probe timeout ({timeout_ms}ms)"))??;
// `endpoint` is a quinn::Endpoint clone — an Arc under the
// hood. Letting it drop at end-of-scope is correct whether it
// was fresh (last ref → socket closes) or shared (ref count
// decrements, socket stays alive for the signal loop).
Ok(out)
}
/// Detect the client's NAT type by probing N relays in parallel and
/// classifying the returned addresses. Never errors — failing
/// probes surface via `NatProbeResult.error`; aggregate is always
/// returned.
///
/// # Endpoint reuse (Phase 5)
///
/// If `shared_endpoint` is `Some`, every probe reuses it. This is
/// the PRODUCTION behavior: all probes source from the same UDP
/// port, so port-preserving NATs map them to the same external
/// port, and the classifier reflects the real NAT type. Pass the
/// signal endpoint.
///
/// If `None`, each probe creates its own fresh endpoint — useful
/// in tests that don't have a signal endpoint, but produces
/// spurious `SymmetricPort` classifications against NATs that
/// would otherwise look cone-like.
pub async fn detect_nat_type(
relays: Vec<(String, SocketAddr)>,
timeout_ms: u64,
shared_endpoint: Option<wzp_transport::Endpoint>,
) -> NatDetection {
// Parallel probes via tokio::task::JoinSet so the wall-clock is
// bounded by the slowest probe, not the sum. JoinSet keeps the
// dep surface at just tokio — we already depend on it.
let mut set = tokio::task::JoinSet::new();
for (name, addr) in relays {
let ep = shared_endpoint.clone();
set.spawn(async move {
let result = probe_reflect_addr(addr, timeout_ms, ep).await;
(name, addr, result)
});
}
let mut probes = Vec::new();
while let Some(join_result) = set.join_next().await {
let (name, addr, result) = match join_result {
Ok(tuple) => tuple,
// Task panicked — surface as a synthetic failed probe so
// the aggregate still returns a reasonable shape. This
// shouldn't happen but we don't want one bad probe to
// poison the whole detection.
Err(join_err) => {
probes.push(NatProbeResult {
relay_name: "<panicked>".into(),
relay_addr: "unknown".into(),
observed_addr: None,
latency_ms: None,
error: Some(format!("probe task panicked: {join_err}")),
});
continue;
}
};
probes.push(match result {
Ok((observed, latency_ms)) => NatProbeResult {
relay_name: name,
relay_addr: addr.to_string(),
observed_addr: Some(observed.to_string()),
latency_ms: Some(latency_ms),
error: None,
},
Err(e) => NatProbeResult {
relay_name: name,
relay_addr: addr.to_string(),
observed_addr: None,
latency_ms: None,
error: Some(e),
},
});
}
let (nat_type, consensus_addr) = classify_nat(&probes);
NatDetection {
probes,
nat_type,
consensus_addr,
}
}
/// Enumerate LAN-local host candidates this client is reachable
/// on, paired with the given port (typically the signal
/// endpoint's bound port so that incoming dials land on the same
/// socket the advertised reflex addr points to).
///
/// Gathers BOTH IPv4 and IPv6 candidates:
///
/// - **IPv4**: RFC1918 private ranges (10/8, 172.16/12, 192.168/16)
/// and CGNAT shared-transition (100.64/10). Public IPv4 is
/// skipped because the reflex-addr path already covers it.
/// Loopback and link-local (169.254/16) are skipped.
///
/// - **IPv6**: ALL global-unicast addresses (2000::/3 — the real
/// routable IPv6 space) AND unique-local (fc00::/7). These
/// are directly dialable from a peer on the same LAN, and on
/// true dual-stack LANs (which most consumer ISPs now provide,
/// including Starlink) IPv6 often gives a direct path even
/// when IPv4 can't hairpin. Loopback (::1), unspecified (::),
/// and link-local (fe80::/10) are skipped — link-local would
/// require a scope ID to be useful and is basically never
/// reachable across interface boundaries.
///
/// The port must come from the caller — typically
/// `signal_endpoint.local_addr()?.port()`, so that the peer's
/// dials to these addresses land on the same socket that's
/// already listening (Phase 5 shared-endpoint architecture).
///
/// Safe to call from any thread; no I/O, no async. The `if-addrs`
/// crate reads the kernel's interface table via a single
/// getifaddrs(3) syscall.
pub fn local_host_candidates(v4_port: u16, v6_port: Option<u16>) -> Vec<SocketAddr> {
let Ok(ifaces) = if_addrs::get_if_addrs() else {
return Vec::new();
};
let mut out = Vec::new();
for iface in ifaces {
if iface.is_loopback() {
continue;
}
match iface.ip() {
std::net::IpAddr::V4(v4) => {
if v4.is_link_local() {
continue;
}
// Keep RFC1918 private ranges and CGNAT — those
// are the LAN-dialable addrs we actually want.
// Skip public v4 because the reflex addr already
// covers that path.
if v4.is_private() {
out.push(SocketAddr::new(std::net::IpAddr::V4(v4), v4_port));
} else if v4.octets()[0] == 100 && (v4.octets()[1] & 0xc0) == 0x40 {
// 100.64/10 CGNAT — rare but valid if two
// phones are on the same CGNAT-hairpinned
// carrier LAN (some hotspot setups).
out.push(SocketAddr::new(std::net::IpAddr::V4(v4), v4_port));
}
}
std::net::IpAddr::V6(v6) => {
// Phase 7: IPv6 host candidates via dedicated
// IPv6 socket. When v6_port is None, no IPv6
// endpoint exists — skip silently.
let Some(port) = v6_port else { continue };
if v6.is_loopback() || v6.is_unspecified() {
continue;
}
// fe80::/10 link-local — needs scope ID, not
// routable across interfaces.
if (v6.segments()[0] & 0xffc0) == 0xfe80 {
continue;
}
// Accept global unicast (2000::/3) and
// unique-local (fc00::/7).
let first_seg = v6.segments()[0];
let is_global = (first_seg & 0xe000) == 0x2000;
let is_ula = (first_seg & 0xfe00) == 0xfc00;
if is_global || is_ula {
out.push(SocketAddr::new(std::net::IpAddr::V6(v6), port));
}
}
}
}
out
}
/// Role assignment for the Phase 3.5 dual-path QUIC race.
///
/// Both peers already know two strings at CallSetup time: their
/// own server-reflexive address (queried via Phase 1 Reflect) and
/// the peer's (carried in `CallSetup.peer_direct_addr`). To avoid
/// a negotiation round-trip, both sides compare the two strings
/// lexicographically and agree on a deterministic role:
///
/// - **Acceptor** — lexicographically smaller addr. Listens for
/// an incoming direct connection from the peer. Does NOT dial.
/// - **Dialer** — lexicographically larger addr. Dials the
/// peer's direct addr. Does NOT listen.
///
/// Both roles ALSO dial the relay in parallel as a fallback.
/// Whichever future (direct or relay) completes first is used as
/// the media transport. Because the role is deterministic and
/// symmetric, both peers end up holding the same underlying QUIC
/// session on the direct path — A's accepted conn and D's dialed
/// conn are literally the same connection.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Role {
/// This peer listens for the direct incoming connection.
Acceptor,
/// This peer dials the peer's direct address.
Dialer,
}
/// Compute the deterministic role for this peer in the dual-path
/// race. Returns `None` when no direct attempt is possible —
/// either peer didn't advertise a reflex addr, or the two addrs
/// are identical (same host on loopback / mis-advertised).
///
/// The caller should treat `None` as "skip direct, relay-only".
pub fn determine_role(
own_reflex_addr: Option<&str>,
peer_reflex_addr: Option<&str>,
) -> Option<Role> {
let (own, peer) = match (own_reflex_addr, peer_reflex_addr) {
(Some(o), Some(p)) => (o, p),
_ => return None,
};
match own.cmp(peer) {
std::cmp::Ordering::Less => Some(Role::Acceptor),
std::cmp::Ordering::Greater => Some(Role::Dialer),
// Equal addrs should never happen in production (both
// peers behind the same NAT mapping + same port would be
// a degenerate case). Guard against it so we don't infinite-
// loop waiting for a connection to ourselves.
std::cmp::Ordering::Equal => None,
}
}
/// Returns `true` if the address is in an RFC1918 / link-local /
/// loopback range and therefore cannot possibly be a post-NAT
/// reflex address from the public internet's point of view.
///
/// A probe against a relay ON THE SAME LAN as the client will
/// naturally report the client's LAN IP back (because there's no
/// NAT between them) — that observation is real but says nothing
/// about the client's public-internet-facing NAT state. Mixing
/// LAN reflex addrs with public-internet reflex addrs in
/// `classify_nat` would always report `Multiple` (different IPs)
/// and falsely warn about symmetric NAT. Filter them out before
/// classifying.
fn is_private_or_loopback(addr: &SocketAddr) -> bool {
match addr.ip() {
std::net::IpAddr::V4(v4) => {
let o = v4.octets();
v4.is_loopback()
|| v4.is_private() // 10/8, 172.16/12, 192.168/16
|| v4.is_link_local() // 169.254/16
|| (o[0] == 100 && (o[1] & 0xc0) == 0x40) // 100.64/10 CGNAT shared
}
std::net::IpAddr::V6(v6) => {
v6.is_loopback() || v6.is_unspecified() || (v6.segments()[0] & 0xffc0) == 0xfe80 // fe80::/10 link-local
}
}
}
/// Pure-function NAT classifier — split out for unit testing
/// without touching the network.
///
/// Only considers probes whose reflex addr is a **public-internet**
/// address. LAN / private / loopback reflex addrs are dropped
/// because they reflect the same-network path rather than the
/// real NAT state. CGNAT (100.64/10) is also treated as private
/// because the post-CGNAT address would be what we actually want
/// to classify on — but CGNAT is unreachable from outside the
/// carrier, so a relay seeing the CGNAT addr is on the same
/// carrier network and again not useful for classification.
pub fn classify_nat(probes: &[NatProbeResult]) -> (NatType, Option<String>) {
// First: parse every successful probe's observed addr.
let parsed: Vec<SocketAddr> = probes
.iter()
.filter_map(|p| p.observed_addr.as_deref().and_then(|s| s.parse().ok()))
.collect();
// Then: drop LAN / private / loopback reflex addrs. Those are
// legitimate observations by same-network relays, but they
// don't contribute to NAT-type classification because the
// client's real public-facing NAT mapping is not involved on
// that path. A relay on the same LAN always sees the client's
// LAN IP, regardless of whether the NAT beyond it is cone or
// symmetric.
let successes: Vec<SocketAddr> = parsed
.into_iter()
.filter(|a| !is_private_or_loopback(a))
.collect();
if successes.len() < 2 {
return (NatType::Unknown, None);
}
let first = successes[0];
let same_ip = successes.iter().all(|a| a.ip() == first.ip());
if !same_ip {
return (NatType::Multiple, None);
}
let same_port = successes.iter().all(|a| a.port() == first.port());
if same_port {
(NatType::Cone, Some(first.to_string()))
} else {
(NatType::SymmetricPort, None)
}
}
// ── Unit tests for the pure classifier ───────────────────────────
#[cfg(test)]
mod tests {
use super::*;
fn mk(addr: Option<&str>) -> NatProbeResult {
NatProbeResult {
relay_name: "test".into(),
relay_addr: "0.0.0.0:0".into(),
observed_addr: addr.map(|s| s.to_string()),
latency_ms: addr.map(|_| 10),
error: None,
}
}
#[test]
fn classify_empty_is_unknown() {
let (nt, addr) = classify_nat(&[]);
assert_eq!(nt, NatType::Unknown);
assert!(addr.is_none());
}
#[test]
fn classify_single_success_is_unknown() {
let probes = vec![mk(Some("192.0.2.1:4433"))];
let (nt, addr) = classify_nat(&probes);
assert_eq!(nt, NatType::Unknown);
assert!(addr.is_none());
}
#[test]
fn classify_two_identical_is_cone() {
let probes = vec![
mk(Some("192.0.2.1:4433")),
mk(Some("192.0.2.1:4433")),
];
let (nt, addr) = classify_nat(&probes);
assert_eq!(nt, NatType::Cone);
assert_eq!(addr.as_deref(), Some("192.0.2.1:4433"));
}
#[test]
fn classify_same_ip_different_ports_is_symmetric() {
let probes = vec![
mk(Some("192.0.2.1:4433")),
mk(Some("192.0.2.1:51234")),
];
let (nt, addr) = classify_nat(&probes);
assert_eq!(nt, NatType::SymmetricPort);
assert!(addr.is_none());
}
#[test]
fn classify_different_ips_is_multiple() {
let probes = vec![
mk(Some("192.0.2.1:4433")),
mk(Some("198.51.100.9:4433")),
];
let (nt, addr) = classify_nat(&probes);
assert_eq!(nt, NatType::Multiple);
assert!(addr.is_none());
}
#[test]
fn classify_drops_private_ip_probes() {
// One LAN probe + one public probe should behave like a
// single public probe — i.e. Unknown (not enough data to
// classify). This is the common real-world case: the user
// has a LAN relay + an internet relay configured, the LAN
// relay sees the LAN IP, the internet relay sees the WAN
// IP, and the old classifier would flag "Multiple" and
// falsely warn about symmetric NAT.
let probes = vec![
mk(Some("192.168.1.100:4433")), // LAN — must be dropped
mk(Some("203.0.113.5:4433")), // public (TEST-NET-3)
];
let (nt, _) = classify_nat(&probes);
assert_eq!(nt, NatType::Unknown);
}
#[test]
fn classify_drops_loopback_probes() {
let probes = vec![
mk(Some("127.0.0.1:4433")), // loopback — must be dropped
mk(Some("203.0.113.5:4433")), // public
mk(Some("203.0.113.5:4433")), // public, same addr
];
let (nt, addr) = classify_nat(&probes);
// Two public probes with identical addrs → Cone.
assert_eq!(nt, NatType::Cone);
assert_eq!(addr.as_deref(), Some("203.0.113.5:4433"));
}
#[test]
fn classify_drops_cgnat_probes() {
// 100.64.0.0/10 is the CGNAT shared-transition range.
// Filter treats it like RFC1918 — a relay that sees the
// client with a 100.64/10 addr is on the same CGNAT
// network and can't contribute to public NAT classification.
let probes = vec![
mk(Some("100.64.0.42:4433")), // CGNAT — dropped
mk(Some("203.0.113.5:4433")), // public
mk(Some("203.0.113.5:12345")), // public, different port
];
let (nt, _) = classify_nat(&probes);
// Two public probes same IP different port → SymmetricPort.
assert_eq!(nt, NatType::SymmetricPort);
}
#[test]
fn classify_two_lan_probes_is_unknown_not_cone() {
// Even if both probes come back from LAN relays, we can't
// say anything useful about the public NAT state. Unknown,
// not Cone.
let probes = vec![
mk(Some("192.168.1.100:4433")),
mk(Some("192.168.1.100:4433")),
];
let (nt, addr) = classify_nat(&probes);
assert_eq!(nt, NatType::Unknown);
assert!(addr.is_none());
}
#[test]
fn classify_mix_of_success_and_failure() {
let probes = vec![
mk(Some("192.0.2.1:4433")),
mk(None), // failed probe
mk(Some("192.0.2.1:4433")),
];
let (nt, addr) = classify_nat(&probes);
// Two successes both agree → Cone, ignore the failure row.
assert_eq!(nt, NatType::Cone);
assert_eq!(addr.as_deref(), Some("192.0.2.1:4433"));
}
#[test]
fn determine_role_smaller_is_acceptor() {
// Lexicographic: "192.0.2.1:4433" < "198.51.100.9:4433"
assert_eq!(
determine_role(Some("192.0.2.1:4433"), Some("198.51.100.9:4433")),
Some(Role::Acceptor)
);
}
#[test]
fn determine_role_larger_is_dialer() {
assert_eq!(
determine_role(Some("198.51.100.9:4433"), Some("192.0.2.1:4433")),
Some(Role::Dialer)
);
}
#[test]
fn determine_role_port_difference_matters() {
// Same ip, different ports — string compare still works
// because "4433" < "54321".
assert_eq!(
determine_role(Some("127.0.0.1:4433"), Some("127.0.0.1:54321")),
Some(Role::Acceptor)
);
assert_eq!(
determine_role(Some("127.0.0.1:54321"), Some("127.0.0.1:4433")),
Some(Role::Dialer)
);
}
#[test]
fn determine_role_equal_addrs_is_none() {
assert_eq!(
determine_role(Some("192.0.2.1:4433"), Some("192.0.2.1:4433")),
None
);
}
#[test]
fn determine_role_missing_side_is_none() {
assert_eq!(determine_role(None, Some("192.0.2.1:4433")), None);
assert_eq!(determine_role(Some("192.0.2.1:4433"), None), None);
assert_eq!(determine_role(None, None), None);
}
#[test]
fn determine_role_is_symmetric_across_peers() {
// Both peers compute roles independently; they must end
// up with opposite assignments (one Acceptor, one Dialer)
// so that each side ends up talking to the other.
let a = "192.0.2.1:4433";
let b = "198.51.100.9:4433";
let alice_role = determine_role(Some(a), Some(b));
let bob_role = determine_role(Some(b), Some(a));
assert_eq!(alice_role, Some(Role::Acceptor));
assert_eq!(bob_role, Some(Role::Dialer));
}
#[test]
fn classify_one_success_one_failure_is_unknown() {
let probes = vec![mk(Some("192.0.2.1:4433")), mk(None)];
let (nt, addr) = classify_nat(&probes);
assert_eq!(nt, NatType::Unknown);
assert!(addr.is_none());
}
}

View File

@@ -0,0 +1,213 @@
//! Phase 3.5 integration tests for the dual-path QUIC race.
//!
//! The race takes a role (Acceptor or Dialer), a peer_direct_addr,
//! a relay_addr, and two SNI strings, then returns whichever QUIC
//! handshake completes first wrapped in a `QuinnTransport`. These
//! tests validate that:
//!
//! 1. On loopback with two real clients playing A + D roles, the
//! direct path wins (fewer hops than relay).
//! 2. When the direct peer is dead (nothing listening) but the
//! relay is up, the relay wins within the fallback window.
//! 3. When both paths are dead, the race errors cleanly rather
//! than hanging forever.
//!
//! The "relay" in these tests is a minimal mock that just accepts
//! an incoming QUIC connection and drops it — we don't need any
//! protocol handling, just a TCP-ish listen-and-accept.
use std::net::{Ipv4Addr, SocketAddr};
use std::time::Duration;
use wzp_client::dual_path::{race, PeerCandidates, WinningPath};
use wzp_client::reflect::Role;
use wzp_transport::{create_endpoint, server_config};
/// Spin up a "relay-ish" mock server on loopback that accepts
/// incoming QUIC connections and does nothing with them. Used to
/// give the relay branch of the race a real target to dial.
/// Returns the bound address + a join handle (kept alive to keep
/// the endpoint up).
async fn spawn_mock_relay() -> (SocketAddr, tokio::task::JoinHandle<()>) {
let _ = rustls::crypto::ring::default_provider().install_default();
let (sc, _cert_der) = server_config();
let bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let ep = create_endpoint(bind, Some(sc)).expect("relay endpoint");
let addr = ep.local_addr().expect("local_addr");
let handle = tokio::spawn(async move {
// Accept loop — hold the connection alive for a short
// while so the race result isn't killed by the peer
// closing before the winning transport is returned.
while let Some(incoming) = ep.accept().await {
if let Ok(_conn) = incoming.await {
tokio::time::sleep(Duration::from_secs(5)).await;
}
}
});
(addr, handle)
}
// -----------------------------------------------------------------------
// Test 1: direct path wins when both sides are up
// -----------------------------------------------------------------------
//
// Spawn a mock relay, then set up a two-client test where one
// client plays the Acceptor role and the other plays the Dialer
// role. The Dialer's `peer_direct_addr` is the Acceptor's listen
// address. Because the direct path is a single loopback hop and
// the relay dial also terminates on loopback, both complete
// essentially instantly — the `biased` tokio::select in race()
// should pick direct.
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn dual_path_direct_wins_on_loopback() {
let _ = rustls::crypto::ring::default_provider().install_default();
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
// Acceptor task: run race(Role::Acceptor, peer_addr_placeholder, ...).
// Since the acceptor doesn't dial, the peer_direct_addr arg is
// unused on the direct branch but we still pass a placeholder
// because the API takes one. Use a stub addr that would error
// if it were ever dialed — proving the Acceptor really doesn't
// reach it.
let unused_addr: SocketAddr = "127.0.0.1:2".parse().unwrap();
// We can't race both sides in the same task because each race
// call has its own direct endpoint that needs to talk to the
// OTHER side's endpoint. So spawn the Acceptor in a task and
// let it expose its listen addr via a oneshot back to the test,
// then run the Dialer in the test's main task.
//
// There's a chicken-and-egg issue: the Acceptor's listen addr
// is only known after race() creates its endpoint. To avoid
// reaching into race()'s internals, we instead play a slight
// trick: create the Acceptor's endpoint ourselves (outside
// race()) to learn its addr, spin up an accept loop on it
// ourselves, and pass THAT addr as the Dialer's peer addr.
// This tests the Dialer->Acceptor handshake end-to-end without
// running the full race() on both sides.
let (sc, _cert_der) = server_config();
let acceptor_bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let acceptor_ep = create_endpoint(acceptor_bind, Some(sc)).expect("acceptor ep");
let acceptor_listen_addr = acceptor_ep.local_addr().expect("acceptor addr");
// Drop the external acceptor after the test finishes, not
// before — spawn a dedicated accept task.
let acceptor_accept_task = tokio::spawn(async move {
// Accept one connection and hold it for a while so the
// Dialer side can complete its QUIC handshake.
if let Some(incoming) = acceptor_ep.accept().await {
if let Ok(_conn) = incoming.await {
tokio::time::sleep(Duration::from_secs(5)).await;
}
}
});
// Now run the Dialer in the race — peer_direct_addr = acceptor's
// listen addr. The relay is the mock from above. Direct path
// should win.
let result = race(
Role::Dialer,
PeerCandidates {
reflexive: Some(acceptor_listen_addr),
local: Vec::new(),
},
relay_addr,
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
)
.await
.expect("race must succeed");
assert!(result.direct_transport.is_some(), "direct transport should be available");
assert_eq!(result.local_winner, WinningPath::Direct, "direct should win on loopback");
// Cancel the acceptor accept task so the test finishes.
acceptor_accept_task.abort();
// Suppress unused-var warning for the placeholder.
let _ = unused_addr;
}
// -----------------------------------------------------------------------
// Test 2: relay wins when the direct peer is dead
// -----------------------------------------------------------------------
//
// Dialer role, peer_direct_addr = a port nothing is listening on,
// relay is the working mock. Direct dial will sit waiting for a
// QUIC handshake that never comes; the 2s direct timeout kicks in
// and the relay path wins the fallback.
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn dual_path_relay_wins_when_direct_is_dead() {
let _ = rustls::crypto::ring::default_provider().install_default();
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
// A port that nothing is listening on — dead direct target.
// Port 1 on loopback is almost never bound and UDP packets to
// it will be dropped silently, so the QUIC handshake times out.
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
let result = race(
Role::Dialer,
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
},
relay_addr,
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
)
.await
.expect("race must succeed via relay fallback");
assert!(result.relay_transport.is_some(), "relay transport should be available");
assert_eq!(
result.local_winner,
WinningPath::Relay,
"relay should win when direct dial has nowhere to land"
);
}
// -----------------------------------------------------------------------
// Test 3: race errors cleanly when both paths are dead
// -----------------------------------------------------------------------
//
// Dialer role, peer_direct_addr = dead, relay_addr = dead.
// Expected: race returns an Err within ~7s (2s direct timeout +
// 5s relay timeout fallback).
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn dual_path_errors_cleanly_when_both_paths_dead() {
let _ = rustls::crypto::ring::default_provider().install_default();
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
let dead_relay: SocketAddr = "127.0.0.1:2".parse().unwrap();
let start = std::time::Instant::now();
let result = race(
Role::Dialer,
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
},
dead_relay,
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
)
.await;
let elapsed = start.elapsed();
assert!(result.is_err(), "both-dead must return Err");
// Upper bound: direct 2s timeout + relay 5s fallback + small
// slack for scheduling. If this blows, something is looping.
assert!(
elapsed < Duration::from_secs(10),
"race took too long to give up: {:?}",
elapsed
);
}

View File

@@ -83,12 +83,12 @@ async fn full_handshake_both_sides_derive_same_session() {
// Run client and relay handshakes concurrently.
let (client_result, relay_result) = tokio::join!(
wzp_client::handshake::perform_handshake(client_transport_clone.as_ref(), &client_seed),
wzp_client::handshake::perform_handshake(client_transport_clone.as_ref(), &client_seed, None),
wzp_relay::handshake::accept_handshake(relay_transport_clone.as_ref(), &relay_seed),
);
let mut client_session = client_result.expect("client handshake should succeed");
let (mut relay_session, chosen_profile) =
let (mut relay_session, chosen_profile, _caller_fp, _caller_alias) =
relay_result.expect("relay handshake should succeed");
// Verify a profile was chosen.
@@ -151,6 +151,7 @@ async fn handshake_rejects_tampered_signature() {
ephemeral_pub,
signature: bad_signature,
supported_profiles: vec![wzp_proto::QualityProfile::GOOD],
alias: None,
};
client_transport_clone
.send_signal(&offer)

View File

@@ -10,8 +10,17 @@ description = "WarzonePhone audio codec layer — Opus + Codec2 encoding/decodin
wzp-proto = { workspace = true }
tracing = { workspace = true }
# Opus bindings
audiopus = { workspace = true }
# Opus bindings — libopus 1.5.2.
# opusic-c for the encoder (set_dred_duration lives here in Phase 1).
# opusic-sys for the decoder — we wrap the raw *mut OpusDecoder ourselves
# because opusic-c::Decoder.inner is pub(crate), blocking the unified
# decoder + DRED path we need in Phase 3.
opusic-c = { workspace = true }
opusic-sys = { workspace = true }
# Zero-cost slice reinterpretation for the i16 ↔ u16 boundary between
# our PCM buffers and opusic-c's encode API.
bytemuck = { workspace = true }
# Pure-Rust Codec2 implementation
codec2 = { workspace = true }

View File

@@ -116,6 +116,14 @@ impl AudioEncoder for AdaptiveEncoder {
fn set_dtx(&mut self, enabled: bool) {
self.opus.set_dtx(enabled);
}
fn set_expected_loss(&mut self, loss_pct: u8) {
self.opus.set_expected_loss(loss_pct);
}
fn set_dred_duration(&mut self, frames: u8) {
self.opus.set_dred_duration(frames);
}
}
// ─── AdaptiveDecoder ─────────────────────────────────────────────────────────
@@ -199,6 +207,27 @@ impl AdaptiveDecoder {
fn codec2_frame_samples(&self) -> usize {
self.codec2.frame_samples()
}
/// Reconstruct a lost frame from a previously parsed DRED state.
///
/// Phase 3b entry point for gap reconstruction. Dispatches to the
/// inner Opus decoder when active. Returns an error if the active
/// codec is Codec2 — DRED is libopus-only and has no Codec2 equivalent,
/// so callers must fall back to classical PLC on Codec2 tiers.
pub fn reconstruct_from_dred(
&mut self,
state: &crate::dred_ffi::DredState,
offset_samples: i32,
output: &mut [i16],
) -> Result<usize, CodecError> {
if is_codec2(self.active) {
return Err(CodecError::DecodeFailed(
"DRED reconstruction is Opus-only; Codec2 must use classical PLC".into(),
));
}
self.opus
.reconstruct_from_dred(state, offset_samples, output)
}
}
// ─── Tests ───────────────────────────────────────────────────────────────────

View File

@@ -0,0 +1,585 @@
//! Raw opusic-sys FFI wrappers for libopus 1.5.2 decoder + DRED reconstruction.
//!
//! # Why this module exists
//!
//! We cannot use `opusic_c::Decoder` because its inner `*mut OpusDecoder`
//! pointer is `pub(crate)` — not reachable from outside the opusic-c crate.
//! Phase 3 of the DRED integration needs to hand that same pointer to
//! `opus_decoder_dred_decode`, and running two parallel decoders (one from
//! opusic-c for normal audio, another from opusic-sys for DRED) would cause
//! the DRED-only decoder's internal state to drift out of sync with the
//! audio stream because it would not see normal decode calls.
//!
//! The fix is to own the raw decoder ourselves and use the same handle for
//! both normal decode AND DRED reconstruction. This module is the single
//! owner of `*mut OpusDecoder`, `*mut OpusDREDDecoder`, and `*mut OpusDRED`
//! in the WZP workspace.
//!
//! # Phase 3a scope
//!
//! Phase 0 added `DecoderHandle` (normal decode). Phase 3a adds:
//! - [`DredDecoderHandle`] — wraps `*mut OpusDREDDecoder` for parsing DRED
//! side-channel data out of arriving Opus packets.
//! - [`DredState`] — wraps `*mut OpusDRED` (a fixed 10,592-byte buffer
//! allocated by libopus) that holds parsed DRED state between the parse
//! and reconstruct steps.
//! - [`DredDecoderHandle::parse_into`] — wraps `opus_dred_parse`.
//! - [`DecoderHandle::reconstruct_from_dred`] — wraps `opus_decoder_dred_decode`.
//!
//! The pattern is: on every arriving Opus packet, the receiver calls
//! `parse_into` with a reusable `DredState`, then stores (seq, state_clone)
//! in a ring. On detected loss, the receiver computes the offset from the
//! freshest reachable DRED state and calls `reconstruct_from_dred` to
//! synthesize the missing audio.
use std::ptr::NonNull;
use opusic_sys::{
OPUS_OK, OpusDRED, OpusDREDDecoder, OpusDecoder as RawOpusDecoder, opus_decode,
opus_decoder_create, opus_decoder_destroy, opus_decoder_dred_decode, opus_dred_alloc,
opus_dred_decoder_create, opus_dred_decoder_destroy, opus_dred_free, opus_dred_parse,
};
use wzp_proto::CodecError;
/// libopus operates at 48 kHz for all Opus variants we use.
const SAMPLE_RATE_HZ: i32 = 48_000;
/// Mono.
const CHANNELS: i32 = 1;
/// Safe owner of a `*mut OpusDecoder` allocated via `opus_decoder_create`.
///
/// Releases the decoder in `Drop`. All FFI access goes through `&mut self`
/// methods, so there is no aliasing or race. The raw pointer is exposed via
/// [`Self::as_raw_ptr`] at a crate-internal visibility for the future Phase 3
/// DRED reconstruction path — external crates cannot reach it.
pub struct DecoderHandle {
inner: NonNull<RawOpusDecoder>,
}
impl DecoderHandle {
/// Allocate a new Opus decoder at 48 kHz mono.
pub fn new() -> Result<Self, CodecError> {
let mut error: i32 = OPUS_OK;
// SAFETY: opus_decoder_create writes to `error` and returns either a
// valid heap pointer or null. We check both before constructing the
// NonNull wrapper.
let ptr = unsafe { opus_decoder_create(SAMPLE_RATE_HZ, CHANNELS, &mut error) };
if error != OPUS_OK {
// Even if ptr is non-null on error, libopus contracts guarantee
// it is unusable — do not attempt to free it.
return Err(CodecError::DecodeFailed(format!(
"opus_decoder_create failed: err={error}"
)));
}
let inner = NonNull::new(ptr).ok_or_else(|| {
CodecError::DecodeFailed("opus_decoder_create returned null".into())
})?;
Ok(Self { inner })
}
/// Decode an Opus packet into PCM samples.
///
/// `pcm` must have enough capacity for the frame (960 for 20 ms, 1920
/// for 40 ms at 48 kHz mono). Returns the number of decoded samples
/// per channel — for mono streams this equals the total sample count.
pub fn decode(&mut self, packet: &[u8], pcm: &mut [i16]) -> Result<usize, CodecError> {
if packet.is_empty() {
return Err(CodecError::DecodeFailed("empty packet".into()));
}
if pcm.is_empty() {
return Err(CodecError::DecodeFailed("empty output buffer".into()));
}
// SAFETY: self.inner is a valid *mut OpusDecoder owned by this struct.
// `data` / `pcm` are live Rust slices, so their pointers and lengths
// are valid for the duration of the call. libopus reads len bytes
// from data and writes up to frame_size samples (per channel) to pcm.
let n = unsafe {
opus_decode(
self.inner.as_ptr(),
packet.as_ptr(),
packet.len() as i32,
pcm.as_mut_ptr(),
pcm.len() as i32,
/* decode_fec = */ 0,
)
};
if n < 0 {
return Err(CodecError::DecodeFailed(format!(
"opus_decode failed: err={n}"
)));
}
Ok(n as usize)
}
/// Generate packet-loss concealment audio for a missing frame.
///
/// Implemented via `opus_decode` with a null data pointer, per the
/// libopus API contract. `pcm` should be sized for the expected frame.
pub fn decode_lost(&mut self, pcm: &mut [i16]) -> Result<usize, CodecError> {
if pcm.is_empty() {
return Err(CodecError::DecodeFailed("empty output buffer".into()));
}
// SAFETY: same invariants as decode(). libopus documents that passing
// a null data pointer with len=0 triggers PLC synthesis into pcm.
let n = unsafe {
opus_decode(
self.inner.as_ptr(),
std::ptr::null(),
0,
pcm.as_mut_ptr(),
pcm.len() as i32,
/* decode_fec = */ 0,
)
};
if n < 0 {
return Err(CodecError::DecodeFailed(format!(
"opus_decode PLC failed: err={n}"
)));
}
Ok(n as usize)
}
/// Reconstruct audio from a `DredState` into the `output` buffer.
///
/// `offset_samples` is the sample position (positive, measured backward
/// from the packet anchor that produced `state`) where reconstruction
/// begins. `output.len()` must match the number of samples to synthesize.
///
/// The libopus API: `opus_decoder_dred_decode(st, dred, dred_offset, pcm,
/// frame_size)` where `dred_offset` is "position of the redundancy to
/// decode, in samples before the beginning of the real audio data in the
/// packet." Valid values: `0 < offset_samples < state.samples_available()`.
///
/// Returns the number of samples actually written (should equal
/// `output.len()` on success).
pub fn reconstruct_from_dred(
&mut self,
state: &DredState,
offset_samples: i32,
output: &mut [i16],
) -> Result<usize, CodecError> {
if output.is_empty() {
return Err(CodecError::DecodeFailed(
"empty reconstruction output buffer".into(),
));
}
if offset_samples <= 0 {
return Err(CodecError::DecodeFailed(format!(
"DRED offset must be positive (got {offset_samples})"
)));
}
if offset_samples > state.samples_available() {
return Err(CodecError::DecodeFailed(format!(
"DRED offset {offset_samples} exceeds available samples {}",
state.samples_available()
)));
}
// SAFETY: self.inner is a valid *mut OpusDecoder, state.inner is a
// valid *const OpusDRED populated by a prior parse_into call, and
// output is a live mutable slice. libopus reads from dred and writes
// exactly frame_size samples (the output.len()) to pcm.
let n = unsafe {
opus_decoder_dred_decode(
self.inner.as_ptr(),
state.inner.as_ptr(),
offset_samples,
output.as_mut_ptr(),
output.len() as i32,
)
};
if n < 0 {
return Err(CodecError::DecodeFailed(format!(
"opus_decoder_dred_decode failed: err={n}"
)));
}
Ok(n as usize)
}
}
impl Drop for DecoderHandle {
fn drop(&mut self) {
// SAFETY: we own the pointer and no further access happens after
// this call because Drop consumes self.
unsafe { opus_decoder_destroy(self.inner.as_ptr()) };
}
}
// SAFETY: The underlying OpusDecoder is a plain heap allocation with no
// thread-local or lock-free state. It is safe to move between threads
// (Send), and all method access is gated by &mut self so Rust's borrow
// checker prevents simultaneous access from multiple threads (Sync).
unsafe impl Send for DecoderHandle {}
unsafe impl Sync for DecoderHandle {}
// ─── DRED decoder (parser) ──────────────────────────────────────────────────
/// Safe owner of a `*mut OpusDREDDecoder` allocated via
/// `opus_dred_decoder_create`.
///
/// The DRED decoder is a **separate** libopus object from the regular
/// `OpusDecoder`. It's used exclusively for parsing DRED side-channel data
/// out of arriving Opus packets via [`Self::parse_into`]. Actual audio
/// reconstruction from the parsed state uses the regular `DecoderHandle`
/// via [`DecoderHandle::reconstruct_from_dred`].
pub struct DredDecoderHandle {
inner: NonNull<OpusDREDDecoder>,
}
impl DredDecoderHandle {
/// Allocate a new DRED decoder.
pub fn new() -> Result<Self, CodecError> {
let mut error: i32 = OPUS_OK;
// SAFETY: opus_dred_decoder_create writes to `error` and returns
// either a valid heap pointer or null. Both are checked.
let ptr = unsafe { opus_dred_decoder_create(&mut error) };
if error != OPUS_OK {
return Err(CodecError::DecodeFailed(format!(
"opus_dred_decoder_create failed: err={error}"
)));
}
let inner = NonNull::new(ptr).ok_or_else(|| {
CodecError::DecodeFailed("opus_dred_decoder_create returned null".into())
})?;
Ok(Self { inner })
}
/// Parse DRED side-channel data from an Opus packet into `state`.
///
/// Returns the number of samples of audio history available for
/// reconstruction, or 0 if the packet carries no DRED data. Subsequent
/// `DecoderHandle::reconstruct_from_dred` calls using this `state` can
/// reconstruct any sample position in `(0, samples_available]`.
///
/// libopus API: `opus_dred_parse(dred_dec, dred, data, len,
/// max_dred_samples, sampling_rate, dred_end, defer_processing)`. We
/// pass `max_dred_samples = 48000` (1 s at 48 kHz, the DRED maximum),
/// `sampling_rate = 48000`, `defer_processing = 0` (process immediately).
/// The `dred_end` output is the silence gap at the tail of the DRED
/// window; we subtract it from the total offset to give callers the
/// truly usable sample count.
pub fn parse_into(
&mut self,
state: &mut DredState,
packet: &[u8],
) -> Result<i32, CodecError> {
if packet.is_empty() {
state.samples_available = 0;
return Ok(0);
}
let mut dred_end: i32 = 0;
// SAFETY: self.inner is a valid *mut OpusDREDDecoder; state.inner is
// a valid *mut OpusDRED allocated via opus_dred_alloc; packet is a
// live slice; dred_end is a stack int. libopus reads packet bytes
// and writes parsed DRED state into *state.inner.
let ret = unsafe {
opus_dred_parse(
self.inner.as_ptr(),
state.inner.as_ptr(),
packet.as_ptr(),
packet.len() as i32,
/* max_dred_samples = */ 48_000, // 1s max per libopus 1.5
/* sampling_rate = */ 48_000,
&mut dred_end,
/* defer_processing = */ 0,
)
};
if ret < 0 {
state.samples_available = 0;
return Err(CodecError::DecodeFailed(format!(
"opus_dred_parse failed: err={ret}"
)));
}
// ret is the positive offset of the first decodable DRED sample,
// or 0 if no DRED is present. dred_end is the silence gap at the
// tail. The usable sample range is (dred_end, ret], so the count
// of usable samples is ret - dred_end. We store `ret` as the max
// usable offset — callers should pass dred_offset values in the
// range (dred_end, ret] to reconstruct_from_dred. For simplicity
// we expose just samples_available = ret and let callers treat
// the full window as valid (the silence gap is small and libopus
// handles minor boundary cases gracefully).
state.samples_available = ret;
Ok(ret)
}
}
impl Drop for DredDecoderHandle {
fn drop(&mut self) {
// SAFETY: we own the pointer and no further access happens after
// this call because Drop consumes self.
unsafe { opus_dred_decoder_destroy(self.inner.as_ptr()) };
}
}
// SAFETY: same reasoning as DecoderHandle — heap allocation with no
// thread-local state, &mut self access discipline prevents races.
unsafe impl Send for DredDecoderHandle {}
unsafe impl Sync for DredDecoderHandle {}
// ─── DRED state buffer ──────────────────────────────────────────────────────
/// Safe owner of a `*mut OpusDRED` allocated via `opus_dred_alloc`.
///
/// Holds a fixed-size (10,592-byte per libopus 1.5) buffer that
/// `DredDecoderHandle::parse_into` populates from an Opus packet. The state
/// is reusable — the caller can call `parse_into` again on the same
/// `DredState` to overwrite it with a fresh packet's data.
///
/// `samples_available` tracks the last-parsed result so reconstruction
/// callers don't need to thread the return value separately. A fresh
/// state (before any `parse_into`) has `samples_available == 0`.
pub struct DredState {
inner: NonNull<OpusDRED>,
samples_available: i32,
}
impl DredState {
/// Allocate a new DRED state buffer.
pub fn new() -> Result<Self, CodecError> {
let mut error: i32 = OPUS_OK;
// SAFETY: opus_dred_alloc writes to `error` and returns either a
// valid heap pointer or null.
let ptr = unsafe { opus_dred_alloc(&mut error) };
if error != OPUS_OK {
return Err(CodecError::DecodeFailed(format!(
"opus_dred_alloc failed: err={error}"
)));
}
let inner = NonNull::new(ptr)
.ok_or_else(|| CodecError::DecodeFailed("opus_dred_alloc returned null".into()))?;
Ok(Self {
inner,
samples_available: 0,
})
}
/// How many samples of audio history this state currently covers.
///
/// Returns 0 if the state is fresh or the last parse found no DRED
/// data. Otherwise returns the positive offset set by the most recent
/// `DredDecoderHandle::parse_into` call — the maximum valid
/// `offset_samples` value for `DecoderHandle::reconstruct_from_dred`.
pub fn samples_available(&self) -> i32 {
self.samples_available
}
/// Reset the state to "fresh" without freeing the underlying buffer.
/// The next `parse_into` will overwrite the contents.
pub fn reset(&mut self) {
self.samples_available = 0;
}
}
impl Drop for DredState {
fn drop(&mut self) {
// SAFETY: we own the pointer and no further access happens after
// this call because Drop consumes self.
unsafe { opus_dred_free(self.inner.as_ptr()) };
}
}
// SAFETY: same reasoning as DecoderHandle.
unsafe impl Send for DredState {}
unsafe impl Sync for DredState {}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn decoder_handle_creates_and_drops() {
let handle = DecoderHandle::new().expect("decoder create");
// Dropping the handle must not panic or leak — validated by miri
// and the absence of sanitizer complaints in CI.
drop(handle);
}
#[test]
fn decode_lost_produces_full_frame_of_silence_on_cold_start() {
let mut handle = DecoderHandle::new().unwrap();
// 20 ms @ 48 kHz mono.
let mut pcm = vec![0i16; 960];
let n = handle.decode_lost(&mut pcm).unwrap();
assert_eq!(n, 960);
// On a fresh decoder, PLC output is silence (no past audio to extend).
assert!(pcm.iter().all(|&s| s == 0));
}
#[test]
fn decode_empty_packet_errors() {
let mut handle = DecoderHandle::new().unwrap();
let mut pcm = vec![0i16; 960];
let err = handle.decode(&[], &mut pcm);
assert!(err.is_err());
}
// ─── Phase 3a — DRED decoder + state ────────────────────────────────────
#[test]
fn dred_decoder_handle_creates_and_drops() {
let h = DredDecoderHandle::new().expect("dred decoder create");
drop(h);
}
#[test]
fn dred_state_creates_and_drops() {
let s = DredState::new().expect("dred state alloc");
assert_eq!(s.samples_available(), 0);
drop(s);
}
#[test]
fn dred_state_reset_zeroes_counter() {
let mut s = DredState::new().unwrap();
s.samples_available = 480; // pretend a parse populated it
assert_eq!(s.samples_available(), 480);
s.reset();
assert_eq!(s.samples_available(), 0);
}
/// Phase 3a end-to-end: encode a DRED-enabled stream, parse state out
/// of packets, and reconstruct audio at a past offset. Validates the
/// full parse → reconstruct pipeline against a real libopus 1.5.2
/// encoder so we catch FFI-layer bugs early.
#[test]
fn dred_parse_and_reconstruct_roundtrip() {
use crate::opus_enc::OpusEncoder;
use wzp_proto::{AudioEncoder, QualityProfile};
// Encoder with DRED at Opus 24k / 200 ms duration (Phase 1 default
// for GOOD profile). The loss floor is 5% per Phase 1.
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
// Decode-side handles.
let mut dec = DecoderHandle::new().unwrap();
let mut dred_dec = DredDecoderHandle::new().unwrap();
let mut state = DredState::new().unwrap();
// Generate 60 frames (1.2 s) of a voice-like 300 Hz sine wave so
// the encoder's DRED emitter has real content to encode rather
// than compressing silence.
let frame_len = 960usize; // 20 ms @ 48 kHz
let make_frame = |offset: usize| -> Vec<i16> {
(0..frame_len)
.map(|i| {
let t = (offset + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect()
};
// Track the freshest packet that carried non-zero DRED state.
let mut best_samples_available = 0;
let mut best_packet: Option<Vec<u8>> = None;
for frame_idx in 0..60 {
let pcm = make_frame(frame_idx * frame_len);
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm, &mut encoded).unwrap();
encoded.truncate(n);
// Run the packet through the normal decode path so dec's
// internal state mirrors the full stream — this is necessary
// for DRED reconstruction to produce meaningful output.
let mut decoded = vec![0i16; frame_len];
dec.decode(&encoded, &mut decoded).unwrap();
// Parse DRED state out of the same packet. Early packets may
// have samples_available == 0 while the DRED encoder warms up;
// later packets should carry the full window.
match dred_dec.parse_into(&mut state, &encoded) {
Ok(available) => {
if available > best_samples_available {
best_samples_available = available;
best_packet = Some(encoded.clone());
}
}
Err(e) => panic!("parse_into errored unexpectedly: {e:?}"),
}
}
// By the time we're 60 frames in, DRED should have emitted data.
assert!(
best_samples_available > 0,
"DRED emitted zero samples across 60 frames — the encoder isn't \
producing DRED bytes (check set_dred_duration and packet_loss floor)"
);
// Parse the best packet into a fresh state and reconstruct some
// audio from somewhere inside its DRED window. We use frame_len/2
// as the offset to pick a point squarely inside the reconstructable
// range rather than at an edge.
let packet = best_packet.expect("at least one packet had DRED state");
let mut fresh_state = DredState::new().unwrap();
let available = dred_dec.parse_into(&mut fresh_state, &packet).unwrap();
assert!(available > 0, "re-parse of known-good packet returned 0");
// Need a decoder that's in the right state to reconstruct — rewind
// by creating a fresh one and feeding it the same stream up to the
// point of the best packet. Simpler: just use a fresh decoder and
// accept that the reconstructed samples may not be phase-matched.
// The test here only asserts *non-silent energy*, not signal fidelity.
let mut recon_dec = DecoderHandle::new().unwrap();
// Warm up the decoder with one frame so its internal state is valid.
let warmup_pcm = vec![0i16; frame_len];
let warmup_encoded = {
let mut warmup_enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
let mut buf = vec![0u8; 512];
let n = warmup_enc.encode(&warmup_pcm, &mut buf).unwrap();
buf.truncate(n);
buf
};
let mut throwaway = vec![0i16; frame_len];
let _ = recon_dec.decode(&warmup_encoded, &mut throwaway);
// Reconstruct 20 ms from some position inside the DRED window.
let offset = (available / 2).max(480).min(available);
let mut recon_pcm = vec![0i16; frame_len];
let n = recon_dec
.reconstruct_from_dred(&fresh_state, offset, &mut recon_pcm)
.expect("reconstruct_from_dred failed");
assert_eq!(n, frame_len);
// Energy check: reconstructed audio should not be all zeros. A
// loose threshold — the DRED reconstruction won't be phase-matched
// to our sine wave because we fed a cold decoder only one warmup
// frame, but it should still produce non-silent speech-like output
// since the DRED state was parsed from real speech content.
let energy: u64 = recon_pcm.iter().map(|&s| (s as i32).unsigned_abs() as u64).sum();
assert!(
energy > 0,
"reconstructed audio has zero total energy — DRED reconstruction produced silence"
);
}
/// A second roundtrip variant: offset too large errors cleanly rather
/// than crashing the FFI.
#[test]
fn reconstruct_with_out_of_range_offset_errors() {
let mut dec = DecoderHandle::new().unwrap();
let state = DredState::new().unwrap();
// state has samples_available == 0 (fresh), so any positive offset
// should be out of range.
let mut out = vec![0i16; 960];
let err = dec.reconstruct_from_dred(&state, 480, &mut out);
assert!(err.is_err());
}
#[test]
fn reconstruct_with_zero_offset_errors() {
let mut dec = DecoderHandle::new().unwrap();
let state = DredState::new().unwrap();
let mut out = vec![0i16; 960];
let err = dec.reconstruct_from_dred(&state, 0, &mut out);
assert!(err.is_err());
}
#[test]
fn dred_parse_empty_packet_returns_zero() {
let mut dred_dec = DredDecoderHandle::new().unwrap();
let mut state = DredState::new().unwrap();
let result = dred_dec.parse_into(&mut state, &[]).unwrap();
assert_eq!(result, 0);
assert_eq!(state.samples_available(), 0);
}
}

View File

@@ -15,6 +15,7 @@ pub mod agc;
pub mod codec2_dec;
pub mod codec2_enc;
pub mod denoise;
pub mod dred_ffi;
pub mod opus_dec;
pub mod opus_enc;
pub mod resample;
@@ -27,6 +28,26 @@ pub use denoise::NoiseSupressor;
pub use silence::{ComfortNoise, SilenceDetector};
pub use wzp_proto::{AudioDecoder, AudioEncoder, CodecId, QualityProfile};
use std::sync::atomic::{AtomicBool, Ordering};
/// Global verbose-logging flag for DRED. Off by default — when enabled
/// (via the GUI debug toggle wired through Tauri), the encoder logs its
/// DRED config + libopus version, and the recv path logs every DRED
/// reconstruction, classical PLC fill, and parse heartbeat. Off in
/// "normal" mode keeps logcat clean.
static DRED_VERBOSE_LOGS: AtomicBool = AtomicBool::new(false);
/// Returns whether DRED verbose logging is currently enabled.
#[inline]
pub fn dred_verbose_logs() -> bool {
DRED_VERBOSE_LOGS.load(Ordering::Relaxed)
}
/// Enable/disable DRED verbose logging at runtime.
pub fn set_dred_verbose_logs(enabled: bool) {
DRED_VERBOSE_LOGS.store(enabled, Ordering::Relaxed);
}
/// Create an adaptive encoder starting at the given quality profile.
///
/// The returned encoder accepts 48 kHz mono PCM regardless of the active

View File

@@ -1,30 +1,32 @@
//! Opus decoder wrapping the `audiopus` crate.
//! Opus decoder built on top of the raw opusic-sys `DecoderHandle`.
//!
//! Phase 0 of the DRED integration: we went straight to a custom
//! `DecoderHandle` instead of `opusic_c::Decoder` because the latter's
//! inner pointer is `pub(crate)` and we need to reach it in Phase 3 for
//! `opus_decoder_dred_decode`. See `dred_ffi.rs` for the rationale and
//! `docs/PRD-dred-integration.md` for the full plan.
use audiopus::coder::Decoder;
use audiopus::{Channels, MutSignals, SampleRate};
use audiopus::packet::Packet;
use crate::dred_ffi::{DecoderHandle, DredState};
use wzp_proto::{AudioDecoder, CodecError, CodecId, QualityProfile};
/// Opus decoder implementing `AudioDecoder`.
/// Opus decoder implementing [`AudioDecoder`].
///
/// Operates at 48 kHz mono output.
/// Operates at 48 kHz mono output. 20 ms and 40 ms frames supported via
/// the active `QualityProfile`. Behavior is intentionally identical to
/// the pre-swap audiopus-based decoder at this phase — DRED reconstruction
/// lands in Phase 3.
pub struct OpusDecoder {
inner: Decoder,
inner: DecoderHandle,
codec_id: CodecId,
frame_duration_ms: u8,
}
// SAFETY: Same reasoning as OpusEncoder — exclusive access via &mut self.
unsafe impl Sync for OpusDecoder {}
impl OpusDecoder {
/// Create a new Opus decoder for the given quality profile.
pub fn new(profile: QualityProfile) -> Result<Self, CodecError> {
let decoder = Decoder::new(SampleRate::Hz48000, Channels::Mono)
.map_err(|e| CodecError::DecodeFailed(format!("opus decoder init: {e}")))?;
let inner = DecoderHandle::new()?;
Ok(Self {
inner: decoder,
inner,
codec_id: profile.codec,
frame_duration_ms: profile.frame_duration_ms,
})
@@ -34,6 +36,24 @@ impl OpusDecoder {
pub fn frame_samples(&self) -> usize {
(48_000 * self.frame_duration_ms as usize) / 1000
}
/// Reconstruct a lost frame from a previously parsed `DredState`.
///
/// Phase 3b entry point: callers (CallDecoder / engine.rs) use this to
/// synthesize audio for gaps detected by the jitter buffer when DRED
/// side-channel state from a later-arriving packet covers the gap's
/// sample offset. `offset_samples` is measured backward from the anchor
/// packet that produced `state`. See `DecoderHandle::reconstruct_from_dred`
/// for the full semantics.
pub fn reconstruct_from_dred(
&mut self,
state: &DredState,
offset_samples: i32,
output: &mut [i16],
) -> Result<usize, CodecError> {
self.inner
.reconstruct_from_dred(state, offset_samples, output)
}
}
impl AudioDecoder for OpusDecoder {
@@ -45,15 +65,7 @@ impl AudioDecoder for OpusDecoder {
pcm.len()
)));
}
let packet = Packet::try_from(encoded)
.map_err(|e| CodecError::DecodeFailed(format!("invalid packet: {e}")))?;
let signals = MutSignals::try_from(pcm)
.map_err(|e| CodecError::DecodeFailed(format!("output signals: {e}")))?;
let n = self
.inner
.decode(Some(packet), signals, false)
.map_err(|e| CodecError::DecodeFailed(format!("opus decode: {e}")))?;
Ok(n)
self.inner.decode(encoded, pcm)
}
fn decode_lost(&mut self, pcm: &mut [i16]) -> Result<usize, CodecError> {
@@ -64,13 +76,7 @@ impl AudioDecoder for OpusDecoder {
pcm.len()
)));
}
let signals = MutSignals::try_from(pcm)
.map_err(|e| CodecError::DecodeFailed(format!("output signals: {e}")))?;
let n = self
.inner
.decode(None, signals, false)
.map_err(|e| CodecError::DecodeFailed(format!("opus PLC: {e}")))?;
Ok(n)
self.inner.decode_lost(pcm)
}
fn codec_id(&self) -> CodecId {
@@ -79,7 +85,7 @@ impl AudioDecoder for OpusDecoder {
fn set_profile(&mut self, profile: QualityProfile) -> Result<(), CodecError> {
match profile.codec {
CodecId::Opus24k | CodecId::Opus16k | CodecId::Opus6k => {
c if c.is_opus() => {
self.codec_id = profile.codec;
self.frame_duration_ms = profile.frame_duration_ms;
Ok(())

View File

@@ -1,58 +1,225 @@
//! Opus encoder wrapping the `audiopus` crate.
//! Opus encoder wrapping the `opusic-c` crate (libopus 1.5.2).
//!
//! Phase 1 of the DRED integration: encoder-side DRED is enabled on every
//! Opus profile with a tiered duration (studio 100 ms / normal 200 ms /
//! degraded 500 ms), and Opus inband FEC (LBRR) is disabled because DRED
//! is the stronger mechanism for the same failure mode. The legacy behavior
//! is preserved behind the `AUDIO_USE_LEGACY_FEC` environment variable as a
//! runtime escape hatch for rollout. See `docs/PRD-dred-integration.md`.
//!
//! # DRED duration policy
//!
//! Rationale from the PRD:
//! - Studio tiers (Opus 32k/48k/64k): 100 ms — loss is rare on high-quality
//! networks; short window keeps decoder CPU modest.
//! - Normal tiers (Opus 16k/24k): 200 ms — balanced baseline covering common
//! VoIP loss patterns (20150 ms bursts from wifi roam, transient congestion).
//! - Degraded tier (Opus 6k): 1040 ms — users on 6k are by definition on a
//! bad link; the maximum libopus DRED window buys the best burst resilience
//! where it matters. The RDO-VAE naturally degrades quality at longer offsets.
//!
//! # Why the 15% packet loss floor
//!
//! libopus 1.5's DRED emitter is gated on `OPUS_SET_PACKET_LOSS_PERC` and
//! scales the emitted window proportionally to the assumed loss:
//!
//! ```text
//! loss_pct samples_available effective_ms
//! 5% 720 15
//! 10% 2640 55
//! 15% 4560 95
//! 20% 6480 135
//! 25%+ 8400 (capped) 175 (≈ 87% of the 200ms configured max)
//! ```
//!
//! Measured empirically against libopus 1.5.2 on Opus 24k / 200 ms DRED
//! duration during Phase 3b. At 5% loss the window is only 15 ms — too
//! small to even reconstruct a single 20 ms Opus frame. 15% gives 95 ms
//! (enough for single-frame recovery plus modest burst margin) while
//! keeping the bitrate overhead modest compared to 25%. Real measurements
//! from the quality adapter override upward when loss exceeds the floor.
use audiopus::coder::Encoder;
use audiopus::{Application, Bitrate, Channels, SampleRate, Signal};
use tracing::debug;
use std::sync::OnceLock;
use opusic_c::{Application, Bitrate, Channels, Encoder, InbandFec, SampleRate, Signal};
use tracing::{debug, info, warn};
use wzp_proto::{AudioEncoder, CodecError, CodecId, QualityProfile};
/// Logged exactly once per process the first time an OpusEncoder is built.
/// Confirms that libopus 1.5.2 (the version with DRED) is actually linked
/// at runtime — invaluable when chasing "is the new codec loaded?"
/// regressions on Android, where the only debug surface is logcat.
static LIBOPUS_VERSION_LOGGED: OnceLock<()> = OnceLock::new();
/// Minimum `OPUS_SET_PACKET_LOSS_PERC` value used in DRED mode. libopus
/// scales the DRED emission window with the assumed loss percentage:
/// empirically, 5% gives a 15 ms window (useless), 10% gives 55 ms, 15%
/// gives 95 ms, and 25%+ saturates the configured max (~175 ms at 200 ms
/// duration). 15% is the minimum value that produces a DRED window larger
/// than a single 20 ms frame, making it the minimum floor that actually
/// gives DRED something useful to reconstruct. Real loss measurements from
/// the quality adapter override this upward.
const DRED_LOSS_FLOOR_PCT: u8 = 15;
/// Environment variable that reverts Phase 1 behavior to Phase 0 (inband FEC
/// on, DRED off, no loss floor). Read once per encoder construction.
const LEGACY_FEC_ENV: &str = "AUDIO_USE_LEGACY_FEC";
/// Returns the DRED duration in 10 ms frame units for a given Opus codec.
///
/// Unit: each frame is 10 ms, so the max value of 104 corresponds to 1040 ms
/// of reconstructable history. Returns 0 for non-Opus codecs (DRED is not
/// emitted by the libopus encoder in that case anyway, but we avoid a
/// pointless FFI call).
///
/// See the DRED duration policy in the module docs for per-tier rationale.
pub fn dred_duration_for(codec: CodecId) -> u8 {
match codec {
// Studio tiers — loss is rare, short window.
CodecId::Opus32k | CodecId::Opus48k | CodecId::Opus64k => 10,
// Normal tiers — balanced baseline.
CodecId::Opus16k | CodecId::Opus24k => 20,
// Degraded tier — maximum burst resilience. 104 × 10 ms = 1040 ms,
// the highest value libopus 1.5 supports. Users on 6k are on a bad
// link by definition; the RDO-VAE naturally degrades quality at longer
// offsets, so the extra window costs only ~1-2 kbps additional overhead
// while buying substantially better burst resilience (up from 500 ms).
CodecId::Opus6k => 104,
// Non-Opus (Codec2 / CN): DRED is N/A.
CodecId::Codec2_1200 | CodecId::Codec2_3200 | CodecId::ComfortNoise => 0,
}
}
/// Returns whether the legacy-FEC escape hatch is active.
///
/// Read from `AUDIO_USE_LEGACY_FEC`. Any non-empty value activates legacy
/// mode; unset or empty leaves DRED enabled.
fn read_legacy_fec_env() -> bool {
match std::env::var(LEGACY_FEC_ENV) {
Ok(v) => !v.is_empty() && v != "0" && v.to_ascii_lowercase() != "false",
Err(_) => false,
}
}
/// Opus encoder implementing `AudioEncoder`.
///
/// Operates at 48 kHz mono. Supports frame sizes of 20 ms (960 samples)
/// and 40 ms (1920 samples).
/// Operates at 48 kHz mono. Supports 20 ms and 40 ms frames via the active
/// `QualityProfile`.
pub struct OpusEncoder {
inner: Encoder,
codec_id: CodecId,
frame_duration_ms: u8,
/// When `true`, revert to the Phase 0 behavior: inband FEC Mode1, DRED
/// disabled, no loss floor. Captured at construction time and not
/// re-read mid-call.
legacy_fec_mode: bool,
}
// SAFETY: OpusEncoder is only used via `&mut self` methods. The inner
// audiopus Encoder contains a raw pointer that is !Sync, but we never
// share it across threads without exclusive access.
// opusic-c Encoder wraps a non-null pointer that is !Sync by default,
// but we never share it across threads without exclusive access.
unsafe impl Sync for OpusEncoder {}
impl OpusEncoder {
/// Create a new Opus encoder for the given quality profile.
pub fn new(profile: QualityProfile) -> Result<Self, CodecError> {
let encoder = Encoder::new(SampleRate::Hz48000, Channels::Mono, Application::Voip)
.map_err(|e| CodecError::EncodeFailed(format!("opus encoder init: {e}")))?;
// opusic-c argument order: (Channels, SampleRate, Application)
// — different from audiopus's (SampleRate, Channels, Application).
let encoder = Encoder::new(Channels::Mono, SampleRate::Hz48000, Application::Voip)
.map_err(|e| CodecError::EncodeFailed(format!("opus encoder init: {e:?}")))?;
let legacy_fec_mode = read_legacy_fec_env();
if legacy_fec_mode {
warn!(
"AUDIO_USE_LEGACY_FEC active — reverting Opus encoder to Phase 0 \
behavior (inband FEC Mode1, no DRED)"
);
}
let mut enc = Self {
inner: encoder,
codec_id: profile.codec,
frame_duration_ms: profile.frame_duration_ms,
legacy_fec_mode,
};
enc.apply_bitrate(profile.codec)?;
enc.set_inband_fec(true);
enc.set_dtx(true);
// Voice signal type hint for better compression
// Common setup — bitrate, DTX, signal hint, complexity. These are
// identical regardless of the protection mode below.
enc.apply_bitrate(profile.codec)?;
enc.set_dtx(true);
enc.inner
.set_signal(Signal::Voice)
.map_err(|e| CodecError::EncodeFailed(format!("set signal: {e}")))?;
// Default complexity 7 — good quality/CPU trade-off for VoIP
.map_err(|e| CodecError::EncodeFailed(format!("set signal: {e:?}")))?;
enc.inner
.set_complexity(7)
.map_err(|e| CodecError::EncodeFailed(format!("set complexity: {e}")))?;
.map_err(|e| CodecError::EncodeFailed(format!("set complexity: {e:?}")))?;
// Protection mode: DRED (Phase 1 default) or legacy inband FEC.
enc.apply_protection_mode(profile.codec)?;
Ok(enc)
}
fn apply_bitrate(&mut self, codec: CodecId) -> Result<(), CodecError> {
let bps = codec.bitrate_bps() as i32;
/// Configure the protection mode for the active codec.
///
/// In DRED mode (default): disable inband FEC, set DRED duration for the
/// codec tier, clamp packet_loss to the 5% floor so DRED stays active.
///
/// In legacy mode: enable inband FEC Mode1 (Phase 0 behavior), leave
/// DRED and packet_loss at libopus defaults.
fn apply_protection_mode(&mut self, codec: CodecId) -> Result<(), CodecError> {
if self.legacy_fec_mode {
self.inner
.set_inband_fec(InbandFec::Mode1)
.map_err(|e| CodecError::EncodeFailed(format!("set inband FEC: {e:?}")))?;
// Leave DRED at 0 and packet_loss at default — matches Phase 0.
return Ok(());
}
// DRED path: disable the overlapping inband FEC, enable DRED with
// per-profile duration, floor packet_loss so DRED emits.
self.inner
.set_bitrate(Bitrate::BitsPerSecond(bps))
.map_err(|e| CodecError::EncodeFailed(format!("set bitrate: {e}")))?;
.set_inband_fec(InbandFec::Off)
.map_err(|e| CodecError::EncodeFailed(format!("set inband FEC off: {e:?}")))?;
let dred_frames = dred_duration_for(codec);
self.inner
.set_dred_duration(dred_frames)
.map_err(|e| CodecError::EncodeFailed(format!("set DRED duration: {e:?}")))?;
self.inner
.set_packet_loss(DRED_LOSS_FLOOR_PCT)
.map_err(|e| CodecError::EncodeFailed(format!("set packet loss floor: {e:?}")))?;
// Both of these are gated behind the GUI debug toggle so logcat
// stays clean in normal mode. Flip "DRED verbose logs" in the
// settings panel to see the per-encoder config + libopus version.
if crate::dred_verbose_logs() {
info!(
codec = ?codec,
dred_frames,
dred_ms = dred_frames as u32 * 10,
loss_floor_pct = DRED_LOSS_FLOOR_PCT,
"opus encoder: DRED enabled"
);
// One-shot logging of the linked libopus version so we can
// confirm at a glance that opusic-c (libopus 1.5.2) is loaded.
// Pre-Phase-0 audiopus shipped libopus 1.3 which has no DRED;
// if this log says "libopus 1.3" something is very wrong.
LIBOPUS_VERSION_LOGGED.get_or_init(|| {
info!(libopus_version = %opusic_c::version(), "linked libopus version");
});
}
Ok(())
}
fn apply_bitrate(&mut self, codec: CodecId) -> Result<(), CodecError> {
let bps = codec.bitrate_bps();
self.inner
.set_bitrate(Bitrate::Value(bps))
.map_err(|e| CodecError::EncodeFailed(format!("set bitrate: {e:?}")))?;
debug!(bitrate_bps = bps, "opus encoder bitrate set");
Ok(())
}
@@ -71,10 +238,36 @@ impl OpusEncoder {
/// Hint the encoder about expected packet loss percentage (0-100).
///
/// Higher values cause the encoder to use more redundancy to survive
/// packet loss, at the expense of slightly higher bitrate.
/// In DRED mode, the value is floored at `DRED_LOSS_FLOOR_PCT` so the
/// encoder never drops DRED emission even on a perfect network. Real
/// loss measurements from the quality adapter override upward.
///
/// In legacy mode, the value is passed through unchanged (min 0, max 100).
pub fn set_expected_loss(&mut self, loss_pct: u8) {
let _ = self.inner.set_packet_loss_perc(loss_pct.min(100));
let clamped = if self.legacy_fec_mode {
loss_pct.min(100)
} else {
loss_pct.max(DRED_LOSS_FLOOR_PCT).min(100)
};
let _ = self.inner.set_packet_loss(clamped);
}
/// Set the DRED duration in 10 ms frame units (0 disables, max 104).
///
/// No-op in legacy mode. Normally driven automatically by the active
/// quality profile via `apply_protection_mode`; this setter exists for
/// tests and for the rare case where a caller needs to override the
/// per-profile default.
pub fn set_dred_duration(&mut self, frames: u8) {
if self.legacy_fec_mode {
return;
}
let _ = self.inner.set_dred_duration(frames.min(104));
}
/// Test/introspection accessor: whether legacy FEC mode is active.
pub fn is_legacy_fec_mode(&self) -> bool {
self.legacy_fec_mode
}
}
@@ -87,10 +280,14 @@ impl AudioEncoder for OpusEncoder {
pcm.len()
)));
}
// opusic-c takes &[u16] for the sample input. Bit pattern is
// identical to i16 — the cast is zero-cost and the encoder
// interprets the bytes the same way as libopus internally.
let pcm_u16: &[u16] = bytemuck::cast_slice(pcm);
let n = self
.inner
.encode(pcm, out)
.map_err(|e| CodecError::EncodeFailed(format!("opus encode: {e}")))?;
.encode_to_slice(pcm_u16, out)
.map_err(|e| CodecError::EncodeFailed(format!("opus encode: {e:?}")))?;
Ok(n)
}
@@ -100,10 +297,13 @@ impl AudioEncoder for OpusEncoder {
fn set_profile(&mut self, profile: QualityProfile) -> Result<(), CodecError> {
match profile.codec {
CodecId::Opus24k | CodecId::Opus16k | CodecId::Opus6k => {
c if c.is_opus() => {
self.codec_id = profile.codec;
self.frame_duration_ms = profile.frame_duration_ms;
self.apply_bitrate(profile.codec)?;
// Refresh DRED duration for the new tier. apply_protection_mode
// is idempotent and handles the legacy-vs-DRED branch correctly.
self.apply_protection_mode(profile.codec)?;
Ok(())
}
other => Err(CodecError::UnsupportedTransition {
@@ -120,10 +320,198 @@ impl AudioEncoder for OpusEncoder {
}
fn set_inband_fec(&mut self, enabled: bool) {
let _ = self.inner.set_inband_fec(enabled);
// In DRED mode, ignore external requests to re-enable inband FEC —
// running both mechanisms wastes bitrate on overlapping protection
// and opusic-c's own docs recommend disabling inband FEC when DRED
// is on. Trait callers that genuinely want classical FEC should set
// `AUDIO_USE_LEGACY_FEC=1` and re-create the encoder.
if !self.legacy_fec_mode {
debug!(
enabled,
"set_inband_fec ignored: DRED mode is active (set AUDIO_USE_LEGACY_FEC to revert)"
);
return;
}
let mode = if enabled { InbandFec::Mode1 } else { InbandFec::Off };
let _ = self.inner.set_inband_fec(mode);
}
fn set_dtx(&mut self, enabled: bool) {
let _ = self.inner.set_dtx(enabled);
}
fn set_expected_loss(&mut self, loss_pct: u8) {
OpusEncoder::set_expected_loss(self, loss_pct);
}
fn set_dred_duration(&mut self, frames: u8) {
OpusEncoder::set_dred_duration(self, frames);
}
}
#[cfg(test)]
mod tests {
use super::*;
use wzp_proto::AudioDecoder;
/// Phase 0 acceptance gate: fail loudly if the linked libopus is not 1.5.x.
/// DRED (Phase 1+) only exists in libopus ≥ 1.5, so running against an
/// older version would silently regress the entire DRED integration.
#[test]
fn linked_libopus_is_1_5() {
let version = opusic_c::version();
assert!(
version.contains("1.5"),
"expected libopus 1.5.x, got: {version}"
);
}
#[test]
fn encoder_creates_at_good_profile() {
let enc = OpusEncoder::new(QualityProfile::GOOD).expect("opus encoder init");
assert_eq!(enc.codec_id, CodecId::Opus24k);
assert_eq!(enc.frame_samples(), 960); // 20 ms @ 48 kHz
}
#[test]
fn encoder_roundtrip_silence() {
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
let mut dec = crate::opus_dec::OpusDecoder::new(QualityProfile::GOOD).unwrap();
let pcm_in = vec![0i16; 960]; // 20 ms silence
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm_in, &mut encoded).unwrap();
assert!(n > 0);
let mut pcm_out = vec![0i16; 960];
let samples = dec.decode(&encoded[..n], &mut pcm_out).unwrap();
assert_eq!(samples, 960);
}
// ─── Phase 1 — DRED duration policy ─────────────────────────────────────
#[test]
fn dred_duration_for_studio_tiers_is_100ms() {
assert_eq!(dred_duration_for(CodecId::Opus32k), 10);
assert_eq!(dred_duration_for(CodecId::Opus48k), 10);
assert_eq!(dred_duration_for(CodecId::Opus64k), 10);
}
#[test]
fn dred_duration_for_normal_tiers_is_200ms() {
assert_eq!(dred_duration_for(CodecId::Opus16k), 20);
assert_eq!(dred_duration_for(CodecId::Opus24k), 20);
}
#[test]
fn dred_duration_for_degraded_tier_is_1040ms() {
assert_eq!(dred_duration_for(CodecId::Opus6k), 104);
}
#[test]
fn dred_duration_for_codec2_is_zero() {
assert_eq!(dred_duration_for(CodecId::Codec2_3200), 0);
assert_eq!(dred_duration_for(CodecId::Codec2_1200), 0);
assert_eq!(dred_duration_for(CodecId::ComfortNoise), 0);
}
// ─── Phase 1 — Legacy escape hatch ──────────────────────────────────────
/// By default (env var unset), legacy mode is off.
///
/// This test does NOT manipulate the environment to avoid flakiness
/// when the full suite runs in parallel. It only asserts on a freshly
/// created encoder in the ambient environment.
#[test]
fn default_mode_is_dred_not_legacy() {
// SAFETY: only run if the ambient env hasn't set the var externally.
if std::env::var(LEGACY_FEC_ENV).is_ok() {
return; // don't assert — someone set the env for a reason.
}
let enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
assert!(!enc.is_legacy_fec_mode());
}
// ─── Phase 1 — Behavioral regression: roundtrip still works ─────────────
#[test]
fn dred_mode_roundtrip_voice_pattern() {
// Use a realistic voice-like input (sine wave at speech frequencies)
// so the encoder emits meaningful DRED data rather than trivially
// compressible silence.
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
let mut dec = crate::opus_dec::OpusDecoder::new(QualityProfile::GOOD).unwrap();
let mut total_encoded_bytes = 0usize;
// Run 50 frames (1 second) so DRED fills up and starts emitting.
for frame_idx in 0..50 {
let pcm_in: Vec<i16> = (0..960)
.map(|i| {
let t = (frame_idx * 960 + i) as f64 / 48_000.0;
(8000.0 * (2.0 * std::f64::consts::PI * 300.0 * t).sin()) as i16
})
.collect();
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm_in, &mut encoded).unwrap();
assert!(n > 0);
total_encoded_bytes += n;
let mut pcm_out = vec![0i16; 960];
let samples = dec.decode(&encoded[..n], &mut pcm_out).unwrap();
assert_eq!(samples, 960);
}
// Effective bitrate after 1 second of encoding.
// Opus 24k base + ~1 kbps DRED ≈ 25 kbps ≈ 3125 bytes/sec.
// Allow generous headroom (2000 lower bound, 8000 upper bound) —
// this is a behavioral regression check, not a tight bitrate assertion.
// The exact value is printed with --nocapture for diagnostic use.
eprintln!(
"[phase1 bitrate probe] legacy_fec_mode={} total_encoded={} bytes/sec",
enc.is_legacy_fec_mode(),
total_encoded_bytes
);
assert!(
total_encoded_bytes > 2000,
"encoder output too small: {total_encoded_bytes} bytes/sec (DRED likely not emitting)"
);
assert!(
total_encoded_bytes < 8000,
"encoder output too large: {total_encoded_bytes} bytes/sec"
);
}
// ─── Phase 1 — set_profile updates DRED duration on tier switch ─────────
#[test]
fn profile_switch_refreshes_dred_duration() {
// Start on GOOD (Opus 24k, DRED 20 frames), switch to DEGRADED
// (Opus 6k, DRED 50 frames). The encoder should accept both profile
// changes without error. We can't directly observe the DRED duration
// inside libopus, but apply_protection_mode returns Ok for both.
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
assert_eq!(enc.codec_id, CodecId::Opus24k);
enc.set_profile(QualityProfile::DEGRADED).unwrap();
assert_eq!(enc.codec_id, CodecId::Opus6k);
enc.set_profile(QualityProfile::STUDIO_64K).unwrap();
assert_eq!(enc.codec_id, CodecId::Opus64k);
}
// ─── Phase 1 — Trait set_inband_fec is a no-op in DRED mode ─────────────
#[test]
fn set_inband_fec_noop_in_dred_mode() {
if std::env::var(LEGACY_FEC_ENV).is_ok() {
return;
}
let mut enc = OpusEncoder::new(QualityProfile::GOOD).unwrap();
// Should not error, should not re-enable inband FEC internally.
enc.set_inband_fec(true);
// We can't directly query libopus's inband FEC state through opusic-c,
// but the call must not panic and the encoder must still work.
let pcm_in = vec![0i16; 960];
let mut encoded = vec![0u8; 512];
let n = enc.encode(&pcm_in, &mut encoded).unwrap();
assert!(n > 0);
}
}

View File

@@ -110,7 +110,18 @@ impl KeyExchange for WarzoneKeyExchange {
hk.expand(b"warzone-session-key", &mut session_key)
.expect("HKDF expand for session key should not fail");
Ok(Box::new(ChaChaSession::new(session_key)))
// Derive SAS (Short Authentication String) from shared secret only.
// The shared secret is identical on both sides (X25519 DH property).
// A MITM would produce a different shared secret → different SAS.
// We use a dedicated HKDF label so SAS is independent of the session key.
let mut sas_key = [0u8; 4];
hk.expand(b"warzone-sas-code", &mut sas_key)
.expect("HKDF expand for SAS should not fail");
let sas_code = u32::from_be_bytes(sas_key) % 10000;
let mut session = ChaChaSession::new(session_key);
session.set_sas(sas_code);
Ok(Box::new(session))
}
}
@@ -211,4 +222,47 @@ mod tests {
assert_eq!(&decrypted, plaintext);
}
#[test]
fn sas_codes_match_between_peers() {
let mut alice = WarzoneKeyExchange::from_identity_seed(&[0xAA; 32]);
let mut bob = WarzoneKeyExchange::from_identity_seed(&[0xBB; 32]);
let alice_eph_pub = alice.generate_ephemeral();
let bob_eph_pub = bob.generate_ephemeral();
let alice_session = alice.derive_session(&bob_eph_pub).unwrap();
let bob_session = bob.derive_session(&alice_eph_pub).unwrap();
let alice_sas = alice_session.sas_code();
let bob_sas = bob_session.sas_code();
assert!(alice_sas.is_some(), "Alice should have SAS");
assert!(bob_sas.is_some(), "Bob should have SAS");
assert_eq!(alice_sas, bob_sas, "SAS codes must match between peers");
assert!(alice_sas.unwrap() < 10000, "SAS should be 4 digits");
}
#[test]
fn sas_differs_for_different_peers() {
let mut alice = WarzoneKeyExchange::from_identity_seed(&[0xAA; 32]);
let mut bob = WarzoneKeyExchange::from_identity_seed(&[0xBB; 32]);
let mut eve = WarzoneKeyExchange::from_identity_seed(&[0xEE; 32]);
let alice_eph = alice.generate_ephemeral();
let bob_eph = bob.generate_ephemeral();
let eve_eph = eve.generate_ephemeral();
let alice_bob_session = alice.derive_session(&bob_eph).unwrap();
// Eve does separate handshake with Bob (MITM scenario)
let eve_bob_session = eve.derive_session(&bob_eph).unwrap();
// SAS codes should differ — Eve's session has different shared secret
assert_ne!(
alice_bob_session.sas_code(),
eve_bob_session.sas_code(),
"MITM session should produce different SAS"
);
}
}

View File

@@ -26,6 +26,8 @@ pub struct ChaChaSession {
rekey_mgr: RekeyManager,
/// Pending ephemeral secret for rekey (stored until peer responds).
pending_rekey_secret: Option<StaticSecret>,
/// Short Authentication String (4-digit code for verbal verification).
sas_code: Option<u32>,
}
impl ChaChaSession {
@@ -46,9 +48,15 @@ impl ChaChaSession {
recv_seq: 0,
rekey_mgr: RekeyManager::new(shared_secret),
pending_rekey_secret: None,
sas_code: None,
}
}
/// Set the SAS code (called by key exchange after derivation).
pub fn set_sas(&mut self, code: u32) {
self.sas_code = Some(code);
}
/// Install a new key (after rekeying).
fn install_key(&mut self, new_key: [u8; 32]) {
use sha2::Digest;
@@ -136,6 +144,10 @@ impl CryptoSession for ChaChaSession {
Ok(())
}
fn sas_code(&self) -> Option<u32> {
self.sas_code
}
}
#[cfg(test)]

View File

@@ -115,6 +115,7 @@ fn wzp_signal_serializes_into_fc_callsignal_payload() {
ephemeral_pub: [2u8; 32],
signature: vec![3u8; 64],
supported_profiles: vec![wzp_proto::QualityProfile::GOOD],
alias: None,
};
// Encode as featherChat CallSignal payload
@@ -198,6 +199,7 @@ fn wzp_answer_round_trips_through_fc_callsignal() {
fn wzp_hangup_round_trips_through_fc_callsignal() {
let hangup = wzp_proto::SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
call_id: None,
};
let payload = wzp_client::featherchat::encode_call_payload(&hangup, None, None);
@@ -273,13 +275,14 @@ fn auth_invalid_response_matches() {
#[test]
fn all_signal_types_map_correctly() {
use wzp_client::featherchat::{signal_to_call_type, CallSignalType};
use wzp_client::featherchat::signal_to_call_type;
let cases: Vec<(wzp_proto::SignalMessage, &str)> = vec![
(
wzp_proto::SignalMessage::CallOffer {
identity_pub: [0; 32], ephemeral_pub: [0; 32],
signature: vec![], supported_profiles: vec![],
alias: None,
},
"Offer",
),
@@ -300,6 +303,7 @@ fn all_signal_types_map_correctly() {
(
wzp_proto::SignalMessage::Hangup {
reason: wzp_proto::HangupReason::Normal,
call_id: None,
},
"Hangup",
),

View File

@@ -1,6 +1,7 @@
//! RaptorQ FEC decoder — reassembles source blocks from received source and repair symbols.
use std::collections::HashMap;
use std::time::Instant;
use raptorq::{EncodingPacket, ObjectTransmissionInformation, PayloadId, SourceBlockDecoder};
use wzp_proto::error::FecError;
@@ -9,6 +10,9 @@ use wzp_proto::FecDecoder;
/// Length prefix size (u16 little-endian), must match encoder.
const LEN_PREFIX: usize = 2;
/// Decoded blocks older than this are eligible for reuse by a new sender.
const BLOCK_STALE_SECS: u64 = 2;
/// State for one in-flight block being decoded.
struct BlockState {
/// Number of source symbols expected.
@@ -21,6 +25,8 @@ struct BlockState {
decoded: bool,
/// Cached decoded result.
result: Option<Vec<Vec<u8>>>,
/// When this block was last decoded (for staleness check).
decoded_at: Option<Instant>,
}
/// RaptorQ-based FEC decoder that handles multiple concurrent blocks.
@@ -58,6 +64,7 @@ impl RaptorQFecDecoder {
symbol_size: self.symbol_size,
decoded: false,
result: None,
decoded_at: None,
})
}
}
@@ -74,8 +81,20 @@ impl FecDecoder for RaptorQFecDecoder {
let block = self.get_or_create_block(block_id);
if block.decoded {
// Already decoded, ignore additional symbols.
return Ok(());
// If the block was decoded recently, skip (normal duplicate).
// If it's stale (>2s), a new sender is reusing this block_id — reset it.
if let Some(at) = block.decoded_at {
if at.elapsed().as_secs() >= BLOCK_STALE_SECS {
block.decoded = false;
block.result = None;
block.decoded_at = None;
block.packets.clear();
} else {
return Ok(());
}
} else {
return Ok(());
}
}
// Data should already be at symbol_size (length-prefixed and padded by the encoder).
@@ -132,6 +151,7 @@ impl FecDecoder for RaptorQFecDecoder {
let block = self.blocks.get_mut(&block_id).unwrap();
block.decoded = true;
block.decoded_at = Some(Instant::now());
block.result = Some(frames.clone());
Ok(Some(frames))
}

View File

@@ -0,0 +1,29 @@
[package]
name = "wzp-native"
version = "0.1.0"
edition = "2024"
description = "WarzonePhone native audio library — standalone Android cdylib that eventually owns all C++ (Oboe bridge) and exposes a pure-C FFI. Built with cargo-ndk, loaded at runtime by the Tauri desktop cdylib via libloading."
# Crate-type is DELIBERATELY only cdylib (no rlib, no staticlib). This crate
# is built with `cargo ndk -t arm64-v8a build --release -p wzp-native` as a
# standalone .so, which is the same path the legacy wzp-android crate uses
# successfully on the same phone / same NDK. Keeping the crate-type single
# avoids the rust-lang/rust#104707 symbol leak that bit us when Tauri's
# desktop crate had ["staticlib", "cdylib", "rlib"] and any C++ static
# archive pulled bionic's internal pthread_create into the final .so.
[lib]
name = "wzp_native"
crate-type = ["cdylib"]
[build-dependencies]
# cc is SAFE to use here because this crate is a single-cdylib: no
# staticlib in crate-type → no rust-lang/rust#104707 symbol leak. The
# legacy wzp-android crate uses the same setup and works.
cc = "1"
[dependencies]
# Phase 2: Oboe C++ audio bridge. Still no Rust deps — we do the whole
# audio pipeline via extern "C" into the bundled C++ and expose our own
# narrow extern "C" API for wzp-desktop to dlopen via libloading.
# Phase 3 can add wzp-proto/wzp-codec if we want to share codec logic
# instead of calling back into wzp-desktop via callbacks.

119
crates/wzp-native/build.rs Normal file
View File

@@ -0,0 +1,119 @@
//! wzp-native build.rs — Oboe C++ bridge compile on Android.
//!
//! Near-verbatim copy of crates/wzp-android/build.rs (which is known to
//! work). The crucial distinction: this crate is a single-cdylib (no
//! staticlib, no rlib in crate-type) so rust-lang/rust#104707 doesn't
//! apply — bionic's internal pthread_create / __init_tcb symbols stay
//! UND and resolve against libc.so at runtime, as they should.
//!
//! On non-Android hosts we compile `cpp/oboe_stub.cpp` (empty stubs) so
//! `cargo check --target <host>` still works for IDEs and CI.
use std::path::PathBuf;
fn main() {
let target = std::env::var("TARGET").unwrap_or_default();
if target.contains("android") {
// getauxval_fix: override compiler-rt's broken static getauxval
// stub that SIGSEGVs in shared libraries.
cc::Build::new()
.file("cpp/getauxval_fix.c")
.compile("wzp_native_getauxval_fix");
let oboe_dir = fetch_oboe();
match oboe_dir {
Some(oboe_path) => {
println!("cargo:warning=wzp-native: building with Oboe from {:?}", oboe_path);
let mut build = cc::Build::new();
build
.cpp(true)
.std("c++17")
// Shared libc++ — matches legacy wzp-android setup.
.cpp_link_stdlib(Some("c++_shared"))
.include("cpp")
.include(oboe_path.join("include"))
.include(oboe_path.join("src"))
.define("WZP_HAS_OBOE", None)
.file("cpp/oboe_bridge.cpp");
add_cpp_files_recursive(&mut build, &oboe_path.join("src"));
build.compile("wzp_native_oboe_bridge");
}
None => {
println!("cargo:warning=wzp-native: Oboe not found, building stub");
cc::Build::new()
.cpp(true)
.std("c++17")
.cpp_link_stdlib(Some("c++_shared"))
.file("cpp/oboe_stub.cpp")
.include("cpp")
.compile("wzp_native_oboe_bridge");
}
}
// Oboe needs log + OpenSLES backends at runtime.
println!("cargo:rustc-link-lib=log");
println!("cargo:rustc-link-lib=OpenSLES");
// Re-run if any cpp file changes
println!("cargo:rerun-if-changed=cpp/oboe_bridge.cpp");
println!("cargo:rerun-if-changed=cpp/oboe_bridge.h");
println!("cargo:rerun-if-changed=cpp/oboe_stub.cpp");
println!("cargo:rerun-if-changed=cpp/getauxval_fix.c");
} else {
// Non-Android hosts: compile the empty stub so lib.rs's extern
// declarations resolve when someone runs `cargo check` on macOS
// or Linux without an NDK.
cc::Build::new()
.cpp(true)
.std("c++17")
.file("cpp/oboe_stub.cpp")
.include("cpp")
.compile("wzp_native_oboe_bridge");
println!("cargo:rerun-if-changed=cpp/oboe_stub.cpp");
}
}
/// Recursively add all `.cpp` files from a directory to a cc::Build.
fn add_cpp_files_recursive(build: &mut cc::Build, dir: &std::path::Path) {
if !dir.is_dir() {
return;
}
for entry in std::fs::read_dir(dir).unwrap() {
let entry = entry.unwrap();
let path = entry.path();
if path.is_dir() {
add_cpp_files_recursive(build, &path);
} else if path.extension().map_or(false, |e| e == "cpp") {
build.file(&path);
}
}
}
/// Fetch or find Oboe headers + sources (v1.8.1). Same logic as the
/// legacy wzp-android crate's build.rs.
fn fetch_oboe() -> Option<PathBuf> {
let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap());
let oboe_dir = out_dir.join("oboe");
if oboe_dir.join("include").join("oboe").join("Oboe.h").exists() {
return Some(oboe_dir);
}
let status = std::process::Command::new("git")
.args([
"clone",
"--depth=1",
"--branch=1.8.1",
"https://github.com/google/oboe.git",
oboe_dir.to_str().unwrap(),
])
.status();
match status {
Ok(s) if s.success() && oboe_dir.join("include").join("oboe").join("Oboe.h").exists() => {
Some(oboe_dir)
}
_ => None,
}
}

View File

@@ -0,0 +1,21 @@
// Override the broken static getauxval from compiler-rt/CRT.
// The static version reads from __libc_auxv which is NULL in shared libs
// loaded via dlopen, causing SIGSEGV in init_have_lse_atomics at load time.
// This version calls the real bionic getauxval via dlsym.
#ifdef __ANDROID__
#include <dlfcn.h>
#include <stdint.h>
typedef unsigned long (*getauxval_fn)(unsigned long);
unsigned long getauxval(unsigned long type) {
static getauxval_fn real_getauxval = (getauxval_fn)0;
if (!real_getauxval) {
real_getauxval = (getauxval_fn)dlsym((void*)-1L /* RTLD_DEFAULT */, "getauxval");
if (!real_getauxval) {
return 0;
}
}
return real_getauxval(type);
}
#endif

View File

@@ -0,0 +1,477 @@
// Full Oboe implementation for Android
// This file is compiled only when targeting Android
#include "oboe_bridge.h"
#ifdef __ANDROID__
#include <oboe/Oboe.h>
#include <android/log.h>
#include <cstring>
#include <atomic>
#include <chrono>
#include <thread>
#define LOG_TAG "wzp-oboe"
#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__)
#define LOGW(...) __android_log_print(ANDROID_LOG_WARN, LOG_TAG, __VA_ARGS__)
#define LOGE(...) __android_log_print(ANDROID_LOG_ERROR, LOG_TAG, __VA_ARGS__)
// ---------------------------------------------------------------------------
// Ring buffer helpers (SPSC, lock-free)
// ---------------------------------------------------------------------------
static inline int32_t ring_available_read(const wzp_atomic_int* write_idx,
const wzp_atomic_int* read_idx,
int32_t capacity) {
int32_t w = std::atomic_load_explicit(write_idx, std::memory_order_acquire);
int32_t r = std::atomic_load_explicit(read_idx, std::memory_order_relaxed);
int32_t avail = w - r;
if (avail < 0) avail += capacity;
return avail;
}
static inline int32_t ring_available_write(const wzp_atomic_int* write_idx,
const wzp_atomic_int* read_idx,
int32_t capacity) {
return capacity - 1 - ring_available_read(write_idx, read_idx, capacity);
}
static inline void ring_write(int16_t* buf, int32_t capacity,
wzp_atomic_int* write_idx, const wzp_atomic_int* read_idx,
const int16_t* src, int32_t count) {
int32_t w = std::atomic_load_explicit(write_idx, std::memory_order_relaxed);
for (int32_t i = 0; i < count; i++) {
buf[w] = src[i];
w++;
if (w >= capacity) w = 0;
}
std::atomic_store_explicit(write_idx, w, std::memory_order_release);
}
static inline void ring_read(int16_t* buf, int32_t capacity,
const wzp_atomic_int* write_idx, wzp_atomic_int* read_idx,
int16_t* dst, int32_t count) {
int32_t r = std::atomic_load_explicit(read_idx, std::memory_order_relaxed);
for (int32_t i = 0; i < count; i++) {
dst[i] = buf[r];
r++;
if (r >= capacity) r = 0;
}
std::atomic_store_explicit(read_idx, r, std::memory_order_release);
}
// ---------------------------------------------------------------------------
// Global state
// ---------------------------------------------------------------------------
static std::shared_ptr<oboe::AudioStream> g_capture_stream;
static std::shared_ptr<oboe::AudioStream> g_playout_stream;
// Value copy — the WzpOboeRings the Rust side passes us lives on the caller's
// stack frame and goes away as soon as wzp_oboe_start returns. The raw
// int16/atomic pointers INSIDE the struct point into the Rust-owned, leaked-
// for-the-lifetime-of-the-process AudioBackend singleton, so copying the
// struct by value is safe and keeps the inner pointers valid indefinitely.
// g_rings_valid guards the audio-callback-side read; clearing it in stop()
// signals "no backend" to the callbacks which then return silence + Stop.
static WzpOboeRings g_rings{};
static std::atomic<bool> g_rings_valid{false};
static std::atomic<bool> g_running{false};
static std::atomic<float> g_capture_latency_ms{0.0f};
static std::atomic<float> g_playout_latency_ms{0.0f};
// ---------------------------------------------------------------------------
// Capture callback
// ---------------------------------------------------------------------------
class CaptureCallback : public oboe::AudioStreamDataCallback {
public:
uint64_t calls = 0;
uint64_t total_frames = 0;
uint64_t total_written = 0;
uint64_t ring_full_drops = 0;
oboe::DataCallbackResult onAudioReady(
oboe::AudioStream* stream,
void* audioData,
int32_t numFrames) override {
if (!g_running.load(std::memory_order_relaxed) ||
!g_rings_valid.load(std::memory_order_acquire)) {
return oboe::DataCallbackResult::Stop;
}
const int16_t* src = static_cast<const int16_t*>(audioData);
int32_t avail = ring_available_write(g_rings.capture_write_idx,
g_rings.capture_read_idx,
g_rings.capture_capacity);
int32_t to_write = (numFrames < avail) ? numFrames : avail;
if (to_write > 0) {
ring_write(g_rings.capture_buf, g_rings.capture_capacity,
g_rings.capture_write_idx, g_rings.capture_read_idx,
src, to_write);
}
total_frames += numFrames;
total_written += to_write;
if (to_write < numFrames) {
ring_full_drops += (numFrames - to_write);
}
// Sample-range probe on the FIRST callback to prove we get real audio
if (calls == 0 && numFrames > 0) {
int16_t lo = src[0], hi = src[0];
int32_t sumsq = 0;
for (int32_t i = 0; i < numFrames; i++) {
if (src[i] < lo) lo = src[i];
if (src[i] > hi) hi = src[i];
sumsq += (int32_t)src[i] * (int32_t)src[i];
}
int32_t rms = (int32_t) (numFrames > 0 ? (int32_t)__builtin_sqrt((double)sumsq / (double)numFrames) : 0);
LOGI("capture cb#0: numFrames=%d sample_range=[%d..%d] rms=%d to_write=%d",
numFrames, lo, hi, rms, to_write);
}
// Heartbeat every 50 callbacks (~1s at 20ms/burst)
calls++;
if ((calls % 50) == 0) {
LOGI("capture heartbeat: calls=%llu numFrames=%d ring_avail_write=%d to_write=%d full_drops=%llu total_written=%llu",
(unsigned long long)calls, numFrames, avail, to_write,
(unsigned long long)ring_full_drops, (unsigned long long)total_written);
}
// Update latency estimate
auto result = stream->calculateLatencyMillis();
if (result) {
g_capture_latency_ms.store(static_cast<float>(result.value()),
std::memory_order_relaxed);
}
return oboe::DataCallbackResult::Continue;
}
};
// ---------------------------------------------------------------------------
// Playout callback
// ---------------------------------------------------------------------------
class PlayoutCallback : public oboe::AudioStreamDataCallback {
public:
uint64_t calls = 0;
uint64_t total_frames = 0;
uint64_t total_played_real = 0;
uint64_t underrun_frames = 0;
uint64_t nonempty_calls = 0;
oboe::DataCallbackResult onAudioReady(
oboe::AudioStream* stream,
void* audioData,
int32_t numFrames) override {
if (!g_running.load(std::memory_order_relaxed) ||
!g_rings_valid.load(std::memory_order_acquire)) {
memset(audioData, 0, numFrames * sizeof(int16_t));
return oboe::DataCallbackResult::Stop;
}
int16_t* dst = static_cast<int16_t*>(audioData);
int32_t avail = ring_available_read(g_rings.playout_write_idx,
g_rings.playout_read_idx,
g_rings.playout_capacity);
int32_t to_read = (numFrames < avail) ? numFrames : avail;
if (to_read > 0) {
ring_read(g_rings.playout_buf, g_rings.playout_capacity,
g_rings.playout_write_idx, g_rings.playout_read_idx,
dst, to_read);
nonempty_calls++;
}
// Fill remainder with silence on underrun
if (to_read < numFrames) {
memset(dst + to_read, 0, (numFrames - to_read) * sizeof(int16_t));
underrun_frames += (numFrames - to_read);
}
total_frames += numFrames;
total_played_real += to_read;
// First callback: log requested config + prove we're being called
if (calls == 0) {
LOGI("playout cb#0: numFrames=%d ring_avail_read=%d to_read=%d",
numFrames, avail, to_read);
}
// On the first callback that actually has data, log the sample range
// so we can tell if the samples coming out of the ring look like real
// audio vs constant-zeroes vs garbage.
if (to_read > 0 && nonempty_calls == 1) {
int16_t lo = dst[0], hi = dst[0];
int32_t sumsq = 0;
for (int32_t i = 0; i < to_read; i++) {
if (dst[i] < lo) lo = dst[i];
if (dst[i] > hi) hi = dst[i];
sumsq += (int32_t)dst[i] * (int32_t)dst[i];
}
int32_t rms = (to_read > 0) ? (int32_t)__builtin_sqrt((double)sumsq / (double)to_read) : 0;
LOGI("playout FIRST nonempty read: to_read=%d sample_range=[%d..%d] rms=%d",
to_read, lo, hi, rms);
}
// Heartbeat every 50 callbacks (~1s at 20ms/burst)
calls++;
if ((calls % 50) == 0) {
int state = (int)stream->getState();
auto xrunRes = stream->getXRunCount();
int xruns = xrunRes ? xrunRes.value() : -1;
LOGI("playout heartbeat: calls=%llu nonempty=%llu numFrames=%d ring_avail_read=%d to_read=%d underrun_frames=%llu total_played_real=%llu state=%d xruns=%d",
(unsigned long long)calls, (unsigned long long)nonempty_calls,
numFrames, avail, to_read,
(unsigned long long)underrun_frames, (unsigned long long)total_played_real,
state, xruns);
}
// Update latency estimate
auto result = stream->calculateLatencyMillis();
if (result) {
g_playout_latency_ms.store(static_cast<float>(result.value()),
std::memory_order_relaxed);
}
return oboe::DataCallbackResult::Continue;
}
};
static CaptureCallback g_capture_cb;
static PlayoutCallback g_playout_cb;
// ---------------------------------------------------------------------------
// Public C API
// ---------------------------------------------------------------------------
int wzp_oboe_start(const WzpOboeConfig* config, const WzpOboeRings* rings) {
if (g_running.load(std::memory_order_relaxed)) {
LOGW("wzp_oboe_start: already running");
return -1;
}
// Deep-copy the rings struct into static storage BEFORE we publish it to
// the audio callbacks — `rings` points at the caller's stack frame and
// goes away as soon as this function returns.
g_rings = *rings;
g_rings_valid.store(true, std::memory_order_release);
// Build capture stream
oboe::AudioStreamBuilder captureBuilder;
captureBuilder.setDirection(oboe::Direction::Input)
->setPerformanceMode(oboe::PerformanceMode::LowLatency)
->setSharingMode(oboe::SharingMode::Shared)
->setFormat(oboe::AudioFormat::I16)
->setChannelCount(config->channel_count)
->setSampleRateConversionQuality(oboe::SampleRateConversionQuality::Best)
->setDataCallback(&g_capture_cb);
if (config->bt_active) {
// BT SCO mode: do NOT set sample rate or input preset.
// Requesting 48kHz against a BT SCO device fails with
// "getInputProfile could not find profile". Letting the system
// choose the native rate (8/16kHz) and relying on Oboe's
// resampler (SampleRateConversionQuality::Best) to bridge
// to our 48kHz ring buffer is the only path that works.
// InputPreset::VoiceCommunication can also prevent BT SCO
// routing on some devices — skip it for BT.
LOGI("capture: BT mode — no sample rate or input preset set");
} else {
captureBuilder.setSampleRate(config->sample_rate)
->setFramesPerDataCallback(config->frames_per_burst)
->setInputPreset(oboe::InputPreset::VoiceCommunication);
}
oboe::Result result = captureBuilder.openStream(g_capture_stream);
if (result != oboe::Result::OK) {
LOGE("Failed to open capture stream: %s", oboe::convertToText(result));
return -2;
}
LOGI("capture stream opened: actualSR=%d actualCh=%d actualFormat=%d actualFramesPerBurst=%d actualFramesPerDataCallback=%d bufferCapacityInFrames=%d sharing=%d perfMode=%d",
g_capture_stream->getSampleRate(),
g_capture_stream->getChannelCount(),
(int)g_capture_stream->getFormat(),
g_capture_stream->getFramesPerBurst(),
g_capture_stream->getFramesPerDataCallback(),
g_capture_stream->getBufferCapacityInFrames(),
(int)g_capture_stream->getSharingMode(),
(int)g_capture_stream->getPerformanceMode());
// Build playout stream.
//
// Regression triangulation between builds:
// 96be740 (Usage::Media, default API): playout callback DID drain
// the ring at steady 50Hz (playout heartbeat: calls=1100,
// total_played_real=1055040). Audio not audible because OS routing
// sent it to a silent output.
//
// 8c36fb5 (Usage::VoiceCommunication + setAudioApi(AAudio) +
// ContentType::Speech): playout callback fired cb#0 once then
// stopped draining the ring entirely. written_samples stuck at
// ring capacity (7679) across all subsequent heartbeats, so Oboe
// accepted zero samples after startup. Still inaudible.
//
// Hypothesis: forcing setAudioApi(AAudio) + VoiceCommunication on
// Pixel 6 / Android 15 opens a stream that succeeds at cb#0 but
// then detaches from the real audio driver. Reverting to the
// config that at least drove callbacks correctly, plus the
// Kotlin-side MODE_IN_COMMUNICATION + setSpeakerphoneOn(true)
// handled in MainActivity.kt to route audio to the loud speaker.
// Usage::VoiceCommunication is the correct Oboe usage for a VoIP app
// — it respects Android's in-call audio routing and lets
// AudioManager.setSpeakerphoneOn/setBluetoothScoOn actually switch
// between earpiece, loudspeaker, and Bluetooth headset. Combined with
// MODE_IN_COMMUNICATION set from MainActivity.kt and
// speakerphoneOn=false by default, this produces handset/earpiece as
// the default output.
//
// IMPORTANT: do NOT add setAudioApi(AAudio) here. Build 8c36fb5 proved
// forcing AAudio with Usage::VoiceCommunication makes the playout
// callback stop draining the ring after cb#0, even though the stream
// opens successfully. Letting Oboe pick the API (which will be AAudio
// on API ≥ 27 but via a different codepath) kept callbacks firing in
// every other build.
oboe::AudioStreamBuilder playoutBuilder;
playoutBuilder.setDirection(oboe::Direction::Output)
->setPerformanceMode(oboe::PerformanceMode::LowLatency)
->setSharingMode(oboe::SharingMode::Shared)
->setFormat(oboe::AudioFormat::I16)
->setChannelCount(config->channel_count)
->setSampleRateConversionQuality(oboe::SampleRateConversionQuality::Best)
->setDataCallback(&g_playout_cb);
if (config->bt_active) {
LOGI("playout: BT mode — no sample rate set, using Usage::Media");
// Usage::Media instead of VoiceCommunication for BT output
// to avoid conflicts with the communication device routing.
playoutBuilder.setUsage(oboe::Usage::Media);
} else {
playoutBuilder.setSampleRate(config->sample_rate)
->setFramesPerDataCallback(config->frames_per_burst)
->setUsage(oboe::Usage::VoiceCommunication);
}
result = playoutBuilder.openStream(g_playout_stream);
if (result != oboe::Result::OK) {
LOGE("Failed to open playout stream: %s", oboe::convertToText(result));
g_capture_stream->close();
g_capture_stream.reset();
return -3;
}
LOGI("playout stream opened: actualSR=%d actualCh=%d actualFormat=%d actualFramesPerBurst=%d actualFramesPerDataCallback=%d bufferCapacityInFrames=%d sharing=%d perfMode=%d",
g_playout_stream->getSampleRate(),
g_playout_stream->getChannelCount(),
(int)g_playout_stream->getFormat(),
g_playout_stream->getFramesPerBurst(),
g_playout_stream->getFramesPerDataCallback(),
g_playout_stream->getBufferCapacityInFrames(),
(int)g_playout_stream->getSharingMode(),
(int)g_playout_stream->getPerformanceMode());
g_running.store(true, std::memory_order_release);
// Start both streams
result = g_capture_stream->requestStart();
if (result != oboe::Result::OK) {
LOGE("Failed to start capture: %s", oboe::convertToText(result));
g_running.store(false, std::memory_order_release);
g_capture_stream->close();
g_playout_stream->close();
g_capture_stream.reset();
g_playout_stream.reset();
return -4;
}
result = g_playout_stream->requestStart();
if (result != oboe::Result::OK) {
LOGE("Failed to start playout: %s", oboe::convertToText(result));
g_running.store(false, std::memory_order_release);
g_capture_stream->requestStop();
g_capture_stream->close();
g_playout_stream->close();
g_capture_stream.reset();
g_playout_stream.reset();
return -5;
}
// Log initial stream states right after requestStart() returns.
// On well-behaved HALs both will already be Started; on others
// (Nothing A059) they may still be in Starting state.
LOGI("requestStart returned: capture_state=%d playout_state=%d",
(int)g_capture_stream->getState(),
(int)g_playout_stream->getState());
// Poll until both streams report Started state, up to 2s timeout.
// Some Android HALs (Nothing A059) delay transitioning from Starting
// to Started; proceeding before the transition completes causes the
// first capture/playout callbacks to be dropped silently.
{
auto deadline = std::chrono::steady_clock::now() + std::chrono::milliseconds(2000);
int poll_count = 0;
while (std::chrono::steady_clock::now() < deadline) {
auto cap_state = g_capture_stream->getState();
auto play_state = g_playout_stream->getState();
if (cap_state == oboe::StreamState::Started &&
play_state == oboe::StreamState::Started) {
LOGI("both streams Started after %d polls", poll_count);
break;
}
poll_count++;
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
// Log final state even on timeout (helps diagnose HAL quirks)
LOGI("stream states after poll: capture=%d playout=%d (polls=%d)",
(int)g_capture_stream->getState(),
(int)g_playout_stream->getState(),
poll_count);
}
LOGI("Oboe started: sr=%d burst=%d ch=%d",
config->sample_rate, config->frames_per_burst, config->channel_count);
return 0;
}
void wzp_oboe_stop(void) {
g_running.store(false, std::memory_order_release);
// Tell the audio callbacks to stop touching g_rings BEFORE we tear down
// the streams, so any in-flight callback returns Stop instead of reading
// stale pointers.
g_rings_valid.store(false, std::memory_order_release);
if (g_capture_stream) {
g_capture_stream->requestStop();
g_capture_stream->close();
g_capture_stream.reset();
}
if (g_playout_stream) {
g_playout_stream->requestStop();
g_playout_stream->close();
g_playout_stream.reset();
}
LOGI("Oboe stopped");
}
float wzp_oboe_capture_latency_ms(void) {
return g_capture_latency_ms.load(std::memory_order_relaxed);
}
float wzp_oboe_playout_latency_ms(void) {
return g_playout_latency_ms.load(std::memory_order_relaxed);
}
int wzp_oboe_is_running(void) {
return g_running.load(std::memory_order_relaxed) ? 1 : 0;
}
#else
// Non-Android fallback — should not be reached; oboe_stub.cpp is used instead.
// Provide empty implementations just in case.
int wzp_oboe_start(const WzpOboeConfig* config, const WzpOboeRings* rings) {
(void)config; (void)rings;
return -99;
}
void wzp_oboe_stop(void) {}
float wzp_oboe_capture_latency_ms(void) { return 0.0f; }
float wzp_oboe_playout_latency_ms(void) { return 0.0f; }
int wzp_oboe_is_running(void) { return 0; }
#endif // __ANDROID__

View File

@@ -0,0 +1,44 @@
#ifndef WZP_OBOE_BRIDGE_H
#define WZP_OBOE_BRIDGE_H
#include <stdint.h>
#ifdef __cplusplus
#include <atomic>
typedef std::atomic<int32_t> wzp_atomic_int;
extern "C" {
#else
#include <stdatomic.h>
typedef atomic_int wzp_atomic_int;
#endif
typedef struct {
int32_t sample_rate;
int32_t frames_per_burst;
int32_t channel_count;
int32_t bt_active; /* nonzero = BT SCO mode: skip sample rate + input preset */
} WzpOboeConfig;
typedef struct {
int16_t* capture_buf;
int32_t capture_capacity;
wzp_atomic_int* capture_write_idx;
wzp_atomic_int* capture_read_idx;
int16_t* playout_buf;
int32_t playout_capacity;
wzp_atomic_int* playout_write_idx;
wzp_atomic_int* playout_read_idx;
} WzpOboeRings;
int wzp_oboe_start(const WzpOboeConfig* config, const WzpOboeRings* rings);
void wzp_oboe_stop(void);
float wzp_oboe_capture_latency_ms(void);
float wzp_oboe_playout_latency_ms(void);
int wzp_oboe_is_running(void);
#ifdef __cplusplus
}
#endif
#endif // WZP_OBOE_BRIDGE_H

View File

@@ -0,0 +1,27 @@
// Stub implementation for non-Android host builds (testing, cargo check, etc.)
#include "oboe_bridge.h"
#include <stdio.h>
int wzp_oboe_start(const WzpOboeConfig* config, const WzpOboeRings* rings) {
(void)config;
(void)rings;
fprintf(stderr, "wzp_oboe_start: stub (not on Android)\n");
return 0;
}
void wzp_oboe_stop(void) {
fprintf(stderr, "wzp_oboe_stop: stub (not on Android)\n");
}
float wzp_oboe_capture_latency_ms(void) {
return 0.0f;
}
float wzp_oboe_playout_latency_ms(void) {
return 0.0f;
}
int wzp_oboe_is_running(void) {
return 0;
}

View File

@@ -0,0 +1,427 @@
//! wzp-native — standalone Android cdylib for all the C++ audio code.
//!
//! Built with `cargo ndk`, NOT `cargo tauri android build`. Loaded at
//! runtime by the Tauri desktop cdylib (`wzp-desktop`) via libloading.
//! See `docs/incident-tauri-android-init-tcb.md` for why the split exists.
//!
//! Phase 2: real Oboe audio backend.
//!
//! Architecture: Oboe runs capture + playout streams on its own high-
//! priority AAudio callback threads inside the C++ bridge. Two SPSC ring
//! buffers (capture and playout) are shared between the C++ callbacks
//! and the Rust side via atomic indices — no locks on the hot path.
//! `wzp-desktop` drains the capture ring into its Opus encoder and fills
//! the playout ring with decoded PCM.
use std::sync::atomic::{AtomicI32, Ordering};
// ─── Phase 1 smoke-test exports (kept for sanity checks) ─────────────────
/// Returns 42. Used by wzp-desktop's setup() to verify dlopen + dlsym
/// work before any audio code runs.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_version() -> i32 {
42
}
/// Writes a NUL-terminated string into `out` (capped at `cap`) and
/// returns bytes written excluding the NUL.
#[unsafe(no_mangle)]
pub unsafe extern "C" fn wzp_native_hello(out: *mut u8, cap: usize) -> usize {
const MSG: &[u8] = b"hello from wzp-native\0";
if out.is_null() || cap == 0 {
return 0;
}
let n = MSG.len().min(cap);
unsafe {
core::ptr::copy_nonoverlapping(MSG.as_ptr(), out, n);
*out.add(n - 1) = 0;
}
n - 1
}
// ─── C++ Oboe bridge FFI ─────────────────────────────────────────────────
#[repr(C)]
struct WzpOboeConfig {
sample_rate: i32,
frames_per_burst: i32,
channel_count: i32,
/// When nonzero, capture stream skips setSampleRate and setInputPreset
/// so the system can route to BT SCO at its native rate (8/16kHz).
/// Oboe's SampleRateConversionQuality::Best resamples to 48kHz.
bt_active: i32,
}
#[repr(C)]
struct WzpOboeRings {
capture_buf: *mut i16,
capture_capacity: i32,
capture_write_idx: *mut AtomicI32,
capture_read_idx: *mut AtomicI32,
playout_buf: *mut i16,
playout_capacity: i32,
playout_write_idx: *mut AtomicI32,
playout_read_idx: *mut AtomicI32,
}
// SAFETY: atomics synchronise producer/consumer; raw pointers are owned
// by the AudioBackend singleton below whose lifetime covers all calls.
unsafe impl Send for WzpOboeRings {}
unsafe impl Sync for WzpOboeRings {}
unsafe extern "C" {
fn wzp_oboe_start(config: *const WzpOboeConfig, rings: *const WzpOboeRings) -> i32;
fn wzp_oboe_stop();
fn wzp_oboe_capture_latency_ms() -> f32;
fn wzp_oboe_playout_latency_ms() -> f32;
fn wzp_oboe_is_running() -> i32;
}
// ─── SPSC ring buffer (shared with C++ via AtomicI32) ────────────────────
/// 20 ms @ 48 kHz mono = 960 samples.
const FRAME_SAMPLES: usize = 960;
/// ~160 ms headroom at 48 kHz.
const RING_CAPACITY: usize = 7680;
struct RingBuffer {
buf: Vec<i16>,
capacity: usize,
write_idx: AtomicI32,
read_idx: AtomicI32,
}
// SAFETY: SPSC with atomic read/write cursors; producer and consumer
// are always on different threads.
unsafe impl Send for RingBuffer {}
unsafe impl Sync for RingBuffer {}
impl RingBuffer {
fn new(capacity: usize) -> Self {
Self {
buf: vec![0i16; capacity],
capacity,
write_idx: AtomicI32::new(0),
read_idx: AtomicI32::new(0),
}
}
fn available_read(&self) -> usize {
let w = self.write_idx.load(Ordering::Acquire);
let r = self.read_idx.load(Ordering::Relaxed);
let avail = w - r;
if avail < 0 { (avail + self.capacity as i32) as usize } else { avail as usize }
}
fn available_write(&self) -> usize {
self.capacity - 1 - self.available_read()
}
fn write(&self, data: &[i16]) -> usize {
let count = data.len().min(self.available_write());
if count == 0 {
return 0;
}
let mut w = self.write_idx.load(Ordering::Relaxed) as usize;
let cap = self.capacity;
let buf_ptr = self.buf.as_ptr() as *mut i16;
for sample in &data[..count] {
unsafe { *buf_ptr.add(w) = *sample; }
w += 1;
if w >= cap { w = 0; }
}
self.write_idx.store(w as i32, Ordering::Release);
count
}
fn read(&self, out: &mut [i16]) -> usize {
let count = out.len().min(self.available_read());
if count == 0 {
return 0;
}
let mut r = self.read_idx.load(Ordering::Relaxed) as usize;
let cap = self.capacity;
let buf_ptr = self.buf.as_ptr();
for slot in &mut out[..count] {
unsafe { *slot = *buf_ptr.add(r); }
r += 1;
if r >= cap { r = 0; }
}
self.read_idx.store(r as i32, Ordering::Release);
count
}
fn buf_ptr(&self) -> *mut i16 {
self.buf.as_ptr() as *mut i16
}
fn write_idx_ptr(&self) -> *mut AtomicI32 {
&self.write_idx as *const AtomicI32 as *mut AtomicI32
}
fn read_idx_ptr(&self) -> *mut AtomicI32 {
&self.read_idx as *const AtomicI32 as *mut AtomicI32
}
}
// ─── AudioBackend singleton ──────────────────────────────────────────────
//
// There is one global AudioBackend instance because Oboe's C++ side
// holds its own singleton of the streams. The `Box::leak`'d statics own
// the ring buffers for the lifetime of the process — dropping them while
// Oboe is still running would cause use-after-free in the audio callback.
use std::sync::OnceLock;
struct AudioBackend {
capture: RingBuffer,
playout: RingBuffer,
started: std::sync::Mutex<bool>,
/// Per-write logging throttle counter for wzp_native_audio_write_playout.
playout_write_log_count: std::sync::atomic::AtomicU64,
/// Fix A (task #35): the playout ring's read_idx at the last
/// check. If audio_write_playout observes read_idx hasn't
/// advanced after N writes, the Oboe playout callback has
/// stopped firing → restart the streams.
playout_last_read_idx: std::sync::atomic::AtomicI32,
/// Number of writes since the last read_idx advance.
playout_stall_writes: std::sync::atomic::AtomicU32,
}
static BACKEND: OnceLock<&'static AudioBackend> = OnceLock::new();
fn backend() -> &'static AudioBackend {
BACKEND.get_or_init(|| {
Box::leak(Box::new(AudioBackend {
capture: RingBuffer::new(RING_CAPACITY),
playout: RingBuffer::new(RING_CAPACITY),
started: std::sync::Mutex::new(false),
playout_write_log_count: std::sync::atomic::AtomicU64::new(0),
playout_last_read_idx: std::sync::atomic::AtomicI32::new(0),
playout_stall_writes: std::sync::atomic::AtomicU32::new(0),
}))
})
}
// ─── C FFI for wzp-desktop ───────────────────────────────────────────────
/// Start the Oboe audio streams. Returns 0 on success, non-zero on error.
/// Idempotent — calling while already running is a no-op that returns 0.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_audio_start() -> i32 {
audio_start_inner(false)
}
/// Start Oboe in Bluetooth SCO mode — skips sample rate and input preset
/// on capture so the system can route to the BT SCO device natively.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_audio_start_bt() -> i32 {
audio_start_inner(true)
}
fn audio_start_inner(bt: bool) -> i32 {
let b = backend();
let mut started = match b.started.lock() {
Ok(g) => g,
Err(_) => return -1,
};
if *started {
return 0;
}
let config = WzpOboeConfig {
sample_rate: 48_000,
frames_per_burst: FRAME_SAMPLES as i32,
channel_count: 1,
bt_active: if bt { 1 } else { 0 },
};
let rings = WzpOboeRings {
capture_buf: b.capture.buf_ptr(),
capture_capacity: b.capture.capacity as i32,
capture_write_idx: b.capture.write_idx_ptr(),
capture_read_idx: b.capture.read_idx_ptr(),
playout_buf: b.playout.buf_ptr(),
playout_capacity: b.playout.capacity as i32,
playout_write_idx: b.playout.write_idx_ptr(),
playout_read_idx: b.playout.read_idx_ptr(),
};
let ret = unsafe { wzp_oboe_start(&config, &rings) };
if ret != 0 {
return ret;
}
*started = true;
0
}
/// Stop Oboe. Idempotent. Safe to call from any thread.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_audio_stop() {
let b = backend();
if let Ok(mut started) = b.started.lock() {
if *started {
unsafe { wzp_oboe_stop() };
*started = false;
}
}
}
/// Read captured PCM samples from the capture ring. Returns the number
/// of `i16` samples actually copied into `out` (may be less than
/// `out_len` if the ring is empty).
#[unsafe(no_mangle)]
pub unsafe extern "C" fn wzp_native_audio_read_capture(out: *mut i16, out_len: usize) -> usize {
if out.is_null() || out_len == 0 {
return 0;
}
let slice = unsafe { std::slice::from_raw_parts_mut(out, out_len) };
backend().capture.read(slice)
}
/// Write PCM samples into the playout ring. Returns the number of
/// samples actually enqueued (may be less than `in_len` if the ring
/// is nearly full — in practice the caller should pace to 20 ms
/// frames and spin briefly if the ring is full).
#[unsafe(no_mangle)]
pub unsafe extern "C" fn wzp_native_audio_write_playout(input: *const i16, in_len: usize) -> usize {
if input.is_null() || in_len == 0 {
return 0;
}
let slice = unsafe { std::slice::from_raw_parts(input, in_len) };
let b = backend();
// Fix A (task #35): detect playout callback stall. If the
// playout ring's read_idx hasn't advanced in 50+ writes
// (~1 second at 50 writes/sec), the Oboe playout callback
// has stopped firing → restart the streams. This is the
// self-healing behavior that makes rejoin work: teardown +
// rebuild clears whatever HAL state locked up the callback.
let current_read_idx = b.playout.read_idx.load(std::sync::atomic::Ordering::Relaxed);
let last_read_idx = b.playout_last_read_idx.load(std::sync::atomic::Ordering::Relaxed);
if current_read_idx == last_read_idx {
let stall = b.playout_stall_writes.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
if stall >= 50 {
// Callback hasn't drained anything in ~1 second.
// Force a stream restart.
unsafe {
android_log("playout STALL detected (50 writes, read_idx unchanged) — restarting Oboe streams");
}
b.playout_stall_writes.store(0, std::sync::atomic::Ordering::Relaxed);
// Release the started lock, stop, re-start.
// This is the same logic as the Rust-side
// audio_stop() + audio_start() but done inline
// because we can't call the extern "C" fns
// recursively. Just call the C++ side directly.
{
if let Ok(mut started) = b.started.lock() {
if *started {
unsafe { wzp_oboe_stop() };
*started = false;
}
}
}
// Clear the rings so the restart doesn't read stale data
b.playout.write_idx.store(0, std::sync::atomic::Ordering::Relaxed);
b.playout.read_idx.store(0, std::sync::atomic::Ordering::Relaxed);
b.capture.write_idx.store(0, std::sync::atomic::Ordering::Relaxed);
b.capture.read_idx.store(0, std::sync::atomic::Ordering::Relaxed);
// Re-start (stall detector — always non-BT mode)
let config = WzpOboeConfig {
sample_rate: 48_000,
frames_per_burst: FRAME_SAMPLES as i32,
channel_count: 1,
bt_active: 0,
};
let rings = WzpOboeRings {
capture_buf: b.capture.buf_ptr(),
capture_capacity: b.capture.capacity as i32,
capture_write_idx: b.capture.write_idx_ptr(),
capture_read_idx: b.capture.read_idx_ptr(),
playout_buf: b.playout.buf_ptr(),
playout_capacity: b.playout.capacity as i32,
playout_write_idx: b.playout.write_idx_ptr(),
playout_read_idx: b.playout.read_idx_ptr(),
};
let ret = unsafe { wzp_oboe_start(&config, &rings) };
if ret == 0 {
if let Ok(mut started) = b.started.lock() {
*started = true;
}
unsafe { android_log("playout restart OK — Oboe streams rebuilt"); }
} else {
unsafe { android_log(&format!("playout restart FAILED: {ret}")); }
}
b.playout_last_read_idx.store(0, std::sync::atomic::Ordering::Relaxed);
return 0; // caller will retry on next frame
}
} else {
// read_idx advanced — callback is alive, reset counter
b.playout_stall_writes.store(0, std::sync::atomic::Ordering::Relaxed);
b.playout_last_read_idx.store(current_read_idx, std::sync::atomic::Ordering::Relaxed);
}
let before_w = b.playout.write_idx.load(std::sync::atomic::Ordering::Relaxed);
let before_r = b.playout.read_idx.load(std::sync::atomic::Ordering::Relaxed);
let written = b.playout.write(slice);
// First few writes: log ring state + sample range so we can compare what
// engine.rs hands us to what the C++ playout callback reads.
let first_writes = b.playout_write_log_count.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
if first_writes < 3 || first_writes % 50 == 0 {
let (mut lo, mut hi, mut sumsq) = (i16::MAX, i16::MIN, 0i64);
for &s in slice.iter() {
if s < lo { lo = s; }
if s > hi { hi = s; }
sumsq += (s as i64) * (s as i64);
}
let rms = (sumsq as f64 / slice.len() as f64).sqrt() as i32;
let avail_w_after = b.playout.available_write();
let avail_r_after = b.playout.available_read();
let msg = format!(
"playout WRITE #{first_writes}: in_len={} written={} range=[{lo}..{hi}] rms={rms} before_w={before_w} before_r={before_r} avail_read_after={avail_r_after} avail_write_after={avail_w_after}",
slice.len(), written
);
unsafe {
android_log(msg.as_str());
}
}
written
}
// Minimal android logcat shim so we can print from the cdylib without pulling
// in android_logger crate (which would add another dep that has to build with
// cargo-ndk). Uses libc's __android_log_print via extern linkage.
#[cfg(target_os = "android")]
unsafe extern "C" {
fn __android_log_write(prio: i32, tag: *const u8, text: *const u8) -> i32;
}
#[cfg(target_os = "android")]
unsafe fn android_log(msg: &str) {
// ANDROID_LOG_INFO = 4. Tag and text must be NUL-terminated.
let tag = b"wzp-native\0";
let mut buf = Vec::with_capacity(msg.len() + 1);
buf.extend_from_slice(msg.as_bytes());
buf.push(0);
unsafe { __android_log_write(4, tag.as_ptr(), buf.as_ptr()); }
}
#[cfg(not(target_os = "android"))]
#[allow(dead_code)]
unsafe fn android_log(_msg: &str) {}
/// Current capture latency reported by Oboe, in milliseconds. Returns
/// NaN / 0.0 if the stream isn't running.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_audio_capture_latency_ms() -> f32 {
unsafe { wzp_oboe_capture_latency_ms() }
}
/// Current playout latency reported by Oboe, in milliseconds.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_audio_playout_latency_ms() -> f32 {
unsafe { wzp_oboe_playout_latency_ms() }
}
/// Non-zero if both Oboe streams are currently running.
#[unsafe(no_mangle)]
pub extern "C" fn wzp_native_audio_is_running() -> i32 {
unsafe { wzp_oboe_is_running() }
}

View File

@@ -18,6 +18,12 @@ pub enum CodecId {
Codec2_1200 = 4,
/// Comfort noise descriptor (silence suppression)
ComfortNoise = 5,
/// Opus at 32kbps (studio low)
Opus32k = 6,
/// Opus at 48kbps (studio)
Opus48k = 7,
/// Opus at 64kbps (studio high)
Opus64k = 8,
}
impl CodecId {
@@ -27,6 +33,9 @@ impl CodecId {
Self::Opus24k => 24_000,
Self::Opus16k => 16_000,
Self::Opus6k => 6_000,
Self::Opus32k => 32_000,
Self::Opus48k => 48_000,
Self::Opus64k => 64_000,
Self::Codec2_3200 => 3_200,
Self::Codec2_1200 => 1_200,
Self::ComfortNoise => 0,
@@ -36,8 +45,7 @@ impl CodecId {
/// Preferred frame duration in milliseconds.
pub const fn frame_duration_ms(self) -> u8 {
match self {
Self::Opus24k => 20,
Self::Opus16k => 20,
Self::Opus24k | Self::Opus16k | Self::Opus32k | Self::Opus48k | Self::Opus64k => 20,
Self::Opus6k => 40,
Self::Codec2_3200 => 20,
Self::Codec2_1200 => 40,
@@ -48,7 +56,8 @@ impl CodecId {
/// Sample rate expected by this codec.
pub const fn sample_rate_hz(self) -> u32 {
match self {
Self::Opus24k | Self::Opus16k | Self::Opus6k => 48_000,
Self::Opus24k | Self::Opus16k | Self::Opus6k
| Self::Opus32k | Self::Opus48k | Self::Opus64k => 48_000,
Self::Codec2_3200 | Self::Codec2_1200 => 8_000,
Self::ComfortNoise => 48_000,
}
@@ -63,6 +72,9 @@ impl CodecId {
3 => Some(Self::Codec2_3200),
4 => Some(Self::Codec2_1200),
5 => Some(Self::ComfortNoise),
6 => Some(Self::Opus32k),
7 => Some(Self::Opus48k),
8 => Some(Self::Opus64k),
_ => None,
}
}
@@ -71,6 +83,12 @@ impl CodecId {
pub const fn to_wire(self) -> u8 {
self as u8
}
/// Returns true if this is an Opus variant.
pub const fn is_opus(self) -> bool {
matches!(self, Self::Opus6k | Self::Opus16k | Self::Opus24k
| Self::Opus32k | Self::Opus48k | Self::Opus64k)
}
}
/// Describes the complete quality configuration for a call session.
@@ -111,6 +129,30 @@ impl QualityProfile {
frames_per_block: 8,
};
/// Studio low: Opus 32kbps, minimal FEC.
pub const STUDIO_32K: Self = Self {
codec: CodecId::Opus32k,
fec_ratio: 0.1,
frame_duration_ms: 20,
frames_per_block: 5,
};
/// Studio: Opus 48kbps, minimal FEC.
pub const STUDIO_48K: Self = Self {
codec: CodecId::Opus48k,
fec_ratio: 0.1,
frame_duration_ms: 20,
frames_per_block: 5,
};
/// Studio high: Opus 64kbps, minimal FEC.
pub const STUDIO_64K: Self = Self {
codec: CodecId::Opus64k,
fec_ratio: 0.1,
frame_duration_ms: 20,
frames_per_block: 5,
};
/// Estimated total bandwidth in kbps including FEC overhead.
pub fn total_bitrate_kbps(&self) -> f32 {
let base = self.codec.bitrate_bps() as f32 / 1000.0;

View File

@@ -0,0 +1,312 @@
//! Continuous DRED tuning from real-time network metrics.
//!
//! Instead of locking DRED duration to 3 discrete quality tiers (100/200/500 ms),
//! `DredTuner` maps live path quality metrics to a continuous DRED duration and
//! expected-loss hint, updated every N packets. This makes DRED reactive within
//! ~200 ms instead of waiting for 3+ consecutive bad quality reports to trigger
//! a full tier transition.
//!
//! The tuner also implements pre-emptive jitter-spike detection ("sawtooth"
//! prediction): when jitter variance spikes >30% over a 200 ms window — typical
//! of Starlink satellite handovers — it temporarily boosts DRED to the maximum
//! allowed for the current codec before packets actually start dropping.
use crate::CodecId;
/// Output of a single tuning cycle.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct DredTuning {
/// DRED duration in 10 ms frame units (0104). Passed directly to
/// `OpusEncoder::set_dred_duration()`.
pub dred_frames: u8,
/// Expected packet loss percentage (0100). Passed to
/// `OpusEncoder::set_expected_loss()`. Floored at 15% by the encoder
/// itself, but we pass the real value so the encoder can override upward.
pub expected_loss_pct: u8,
}
/// Minimum DRED frames for any Opus codec (matches DRED_LOSS_FLOOR_PCT logic:
/// at 15% loss, libopus 1.5 emits ~95 ms of DRED, which needs at least 10
/// frames configured to be useful).
const MIN_DRED_FRAMES: u8 = 5;
/// Maximum DRED frames libopus supports (104 × 10 ms = 1040 ms).
const MAX_DRED_FRAMES: u8 = 104;
/// Jitter variance spike ratio that triggers pre-emptive DRED boost.
const JITTER_SPIKE_RATIO: f32 = 1.3;
/// How many tuning cycles a jitter-spike boost persists (at 25 packets/cycle
/// and 20 ms/packet, 10 cycles ≈ 5 seconds).
const SPIKE_BOOST_COOLDOWN_CYCLES: u32 = 10;
/// Maps codec tier to its baseline DRED frames (used when network is healthy).
fn baseline_dred_frames(codec: CodecId) -> u8 {
match codec {
CodecId::Opus32k | CodecId::Opus48k | CodecId::Opus64k => 10, // 100 ms
CodecId::Opus16k | CodecId::Opus24k => 20, // 200 ms
CodecId::Opus6k => 50, // 500 ms
_ => 0,
}
}
/// Maps codec tier to its maximum allowed DRED frames under spike/bad conditions.
fn max_dred_frames_for(codec: CodecId) -> u8 {
match codec {
// Studio: cap at 300 ms (don't waste bitrate on good links)
CodecId::Opus32k | CodecId::Opus48k | CodecId::Opus64k => 30,
// Normal: cap at 500 ms
CodecId::Opus16k | CodecId::Opus24k => 50,
// Degraded: allow full 1040 ms
CodecId::Opus6k => MAX_DRED_FRAMES,
_ => 0,
}
}
/// Continuous DRED tuner driven by network path metrics.
pub struct DredTuner {
/// Current codec (determines baseline and ceiling).
codec: CodecId,
/// Last computed tuning output.
last_tuning: DredTuning,
/// EWMA-smoothed jitter for spike detection (in ms).
jitter_ewma: f32,
/// Remaining cooldown cycles for a jitter-spike boost.
spike_cooldown: u32,
/// Whether the tuner has received at least one observation.
initialized: bool,
}
impl DredTuner {
/// Create a new tuner for the given codec.
pub fn new(codec: CodecId) -> Self {
let baseline = baseline_dred_frames(codec);
Self {
codec,
last_tuning: DredTuning {
dred_frames: baseline,
expected_loss_pct: 15, // match DRED_LOSS_FLOOR_PCT
},
jitter_ewma: 0.0,
spike_cooldown: 0,
initialized: false,
}
}
/// Update the active codec (e.g. on tier transition). Resets spike state.
pub fn set_codec(&mut self, codec: CodecId) {
self.codec = codec;
self.spike_cooldown = 0;
}
/// Feed network metrics and compute new DRED parameters.
///
/// Call this every tuning cycle (e.g. every 25 packets ≈ 500 ms at 20 ms
/// frame duration).
///
/// - `loss_pct`: observed packet loss (0.0100.0)
/// - `rtt_ms`: smoothed round-trip time
/// - `jitter_ms`: current jitter estimate (RTT variance)
///
/// Returns `Some(tuning)` if the output changed, `None` if unchanged.
pub fn update(&mut self, loss_pct: f32, rtt_ms: u32, jitter_ms: u32) -> Option<DredTuning> {
if !self.codec.is_opus() {
return None;
}
let baseline = baseline_dred_frames(self.codec);
let ceiling = max_dred_frames_for(self.codec);
// --- Jitter spike detection ---
let jitter_f = jitter_ms as f32;
if !self.initialized {
self.jitter_ewma = jitter_f;
self.initialized = true;
} else {
// Fast-up (alpha=0.3), slow-down (alpha=0.05) asymmetric EWMA
let alpha = if jitter_f > self.jitter_ewma { 0.3 } else { 0.05 };
self.jitter_ewma = alpha * jitter_f + (1.0 - alpha) * self.jitter_ewma;
}
// Detect spike: instantaneous jitter > EWMA × 1.3
if self.jitter_ewma > 1.0 && jitter_f > self.jitter_ewma * JITTER_SPIKE_RATIO {
self.spike_cooldown = SPIKE_BOOST_COOLDOWN_CYCLES;
}
// Decrement cooldown
if self.spike_cooldown > 0 {
self.spike_cooldown -= 1;
}
// --- Compute DRED frames ---
let dred_frames = if self.spike_cooldown > 0 {
// During spike boost: jump to ceiling
ceiling
} else {
// Continuous mapping: scale linearly between baseline and ceiling
// based on loss percentage.
// 0% loss → baseline
// 40% loss → ceiling
let loss_clamped = loss_pct.clamp(0.0, 40.0);
let t = loss_clamped / 40.0;
let raw = baseline as f32 + t * (ceiling - baseline) as f32;
(raw as u8).clamp(MIN_DRED_FRAMES, ceiling)
};
// --- Compute expected loss hint ---
// Pass the real loss so the encoder can clamp at its own floor (15%).
// For RTT-driven boost: high RTT suggests impending loss, so add a
// phantom loss contribution to keep DRED emitting generously.
let rtt_loss_phantom = if rtt_ms > 200 {
((rtt_ms - 200) as f32 / 40.0).min(15.0)
} else {
0.0
};
let expected_loss = (loss_pct + rtt_loss_phantom).clamp(0.0, 100.0) as u8;
let tuning = DredTuning {
dred_frames,
expected_loss_pct: expected_loss,
};
if tuning != self.last_tuning {
self.last_tuning = tuning;
Some(tuning)
} else {
None
}
}
/// Get the last computed tuning without updating.
pub fn current(&self) -> DredTuning {
self.last_tuning
}
/// Whether a jitter-spike boost is currently active.
pub fn spike_boost_active(&self) -> bool {
self.spike_cooldown > 0
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn baseline_for_opus24k() {
let tuner = DredTuner::new(CodecId::Opus24k);
assert_eq!(tuner.current().dred_frames, 20); // 200 ms
}
#[test]
fn baseline_for_opus6k() {
let tuner = DredTuner::new(CodecId::Opus6k);
assert_eq!(tuner.current().dred_frames, 50); // 500 ms
}
#[test]
fn codec2_returns_none() {
let mut tuner = DredTuner::new(CodecId::Codec2_1200);
assert!(tuner.update(10.0, 100, 20).is_none());
}
#[test]
fn scales_with_loss() {
let mut tuner = DredTuner::new(CodecId::Opus24k);
// 0% loss → baseline (20 frames)
tuner.update(0.0, 50, 5);
assert_eq!(tuner.current().dred_frames, 20);
// 20% loss → midpoint between 20 and 50 = 35
tuner.update(20.0, 50, 5);
assert_eq!(tuner.current().dred_frames, 35);
// 40%+ loss → ceiling (50 frames)
tuner.update(40.0, 50, 5);
assert_eq!(tuner.current().dred_frames, 50);
}
#[test]
fn jitter_spike_triggers_boost() {
let mut tuner = DredTuner::new(CodecId::Opus24k);
// Establish baseline jitter
for _ in 0..20 {
tuner.update(0.0, 50, 10);
}
assert!(!tuner.spike_boost_active());
// Spike: jitter jumps to 50 ms (5x the EWMA of ~10)
tuner.update(0.0, 50, 50);
assert!(tuner.spike_boost_active());
// Should be at ceiling (50 frames = 500 ms for Opus24k)
assert_eq!(tuner.current().dred_frames, 50);
}
#[test]
fn spike_cooldown_decays() {
let mut tuner = DredTuner::new(CodecId::Opus24k);
// Establish baseline then spike
for _ in 0..20 {
tuner.update(0.0, 50, 10);
}
tuner.update(0.0, 50, 50);
assert!(tuner.spike_boost_active());
// Run through cooldown
for _ in 0..SPIKE_BOOST_COOLDOWN_CYCLES {
tuner.update(0.0, 50, 10);
}
assert!(!tuner.spike_boost_active());
// Should return to baseline
assert_eq!(tuner.current().dred_frames, 20);
}
#[test]
fn rtt_phantom_loss() {
let mut tuner = DredTuner::new(CodecId::Opus24k);
// High RTT (400ms) with 0% real loss
tuner.update(0.0, 400, 10);
// Phantom loss = (400-200)/40 = 5
assert_eq!(tuner.current().expected_loss_pct, 5);
}
#[test]
fn set_codec_resets_spike() {
let mut tuner = DredTuner::new(CodecId::Opus24k);
// Trigger spike
for _ in 0..20 {
tuner.update(0.0, 50, 10);
}
tuner.update(0.0, 50, 50);
assert!(tuner.spike_boost_active());
// Switch codec — spike should reset
tuner.set_codec(CodecId::Opus6k);
assert!(!tuner.spike_boost_active());
}
#[test]
fn opus6k_reaches_max_1040ms() {
let mut tuner = DredTuner::new(CodecId::Opus6k);
// High loss → should reach 104 frames (1040 ms)
tuner.update(40.0, 50, 5);
assert_eq!(tuner.current().dred_frames, MAX_DRED_FRAMES);
}
#[test]
fn returns_none_when_unchanged() {
let mut tuner = DredTuner::new(CodecId::Opus24k);
// First update always returns Some (initial → computed)
let first = tuner.update(0.0, 50, 5);
// Same inputs → None
let second = tuner.update(0.0, 50, 5);
assert!(first.is_some() || second.is_none());
}
}

View File

@@ -53,6 +53,15 @@ pub enum TransportError {
Timeout { ms: u64 },
#[error("io error: {0}")]
Io(#[from] std::io::Error),
/// Parsed wire bytes successfully but the payload didn't
/// deserialize into a known `SignalMessage` variant. Usually
/// means the peer is running a newer build with a variant we
/// don't know yet. Callers should **log and continue** rather
/// than tearing down the connection, so that forward-compat
/// additions to `SignalMessage` don't silently kill old
/// clients/relays.
#[error("signal deserialize: {0}")]
Deserialize(String),
#[error("internal transport error: {0}")]
Internal(String),
}

View File

@@ -273,10 +273,21 @@ impl JitterBuffer {
return;
}
// Check if packet is too old (already played out)
// Check if packet is too old (already played out).
// A backward jump of >100 seq (~2s at 50fps) indicates a new sender in a
// federation room — reset instead of dropping.
if self.stats.packets_played > 0 && seq_before(seq, self.next_playout_seq) {
self.stats.packets_late += 1;
return;
let backward_distance = self.next_playout_seq.wrapping_sub(seq);
tracing::warn!(seq, next = self.next_playout_seq, backward_distance, "jitter: backward seq detected");
if backward_distance > 100 {
tracing::info!(seq, next = self.next_playout_seq, "jitter: RESET — new sender detected");
self.buffer.clear();
self.next_playout_seq = seq;
self.stats.packets_late = 0;
} else {
self.stats.packets_late += 1;
return;
}
}
// If we haven't started playout yet, adjust next_playout_seq to earliest known
@@ -412,10 +423,21 @@ impl JitterBuffer {
return;
}
// Check if packet is too old (already played out)
// Check if packet is too old (already played out).
// A backward jump of >100 seq (~2s at 50fps) indicates a new sender in a
// federation room — reset instead of dropping.
if self.stats.packets_played > 0 && seq_before(seq, self.next_playout_seq) {
self.stats.packets_late += 1;
return;
let backward_distance = self.next_playout_seq.wrapping_sub(seq);
tracing::warn!(seq, next = self.next_playout_seq, backward_distance, "jitter: backward seq detected");
if backward_distance > 100 {
tracing::info!(seq, next = self.next_playout_seq, "jitter: RESET — new sender detected");
self.buffer.clear();
self.next_playout_seq = seq;
self.stats.packets_late = 0;
} else {
self.stats.packets_late += 1;
return;
}
}
// If we haven't started playout yet, adjust next_playout_seq to earliest known

View File

@@ -14,6 +14,7 @@
pub mod bandwidth;
pub mod codec_id;
pub mod dred_tuner;
pub mod error;
pub mod jitter;
pub mod packet;
@@ -25,10 +26,12 @@ pub mod traits;
pub use codec_id::{CodecId, QualityProfile};
pub use error::*;
pub use packet::{
HangupReason, MediaHeader, MediaPacket, MiniFrameContext, MiniHeader, QualityReport,
RoomParticipant, SignalMessage, TrunkEntry, TrunkFrame, FRAME_TYPE_FULL, FRAME_TYPE_MINI,
CallAcceptMode, HangupReason, MediaHeader, MediaPacket, MiniFrameContext, MiniHeader,
QualityReport, RoomParticipant, SignalMessage, TrunkEntry, TrunkFrame, FRAME_TYPE_FULL,
FRAME_TYPE_MINI,
};
pub use bandwidth::{BandwidthEstimator, CongestionState};
pub use dred_tuner::{DredTuner, DredTuning};
pub use quality::{AdaptiveQualityController, NetworkContext, Tier};
pub use session::{Session, SessionEvent, SessionState};
pub use traits::*;

View File

@@ -584,12 +584,38 @@ pub enum SignalMessage {
recommended_profile: crate::QualityProfile,
},
/// Phase 4 telemetry: loss-recovery counts for the current session.
/// Sent periodically from receivers to the relay so Prometheus metrics
/// can distinguish DRED reconstructions from classical PLC invocations.
/// Fields default to 0 on old receivers (`#[serde(default)]`), so
/// introducing this variant is backward-compatible with pre-Phase-4
/// relays — they'll just log "unknown signal variant" on receipt.
LossRecoveryUpdate {
/// Total frames reconstructed via DRED since call start (monotonic).
#[serde(default)]
dred_reconstructions: u64,
/// Total frames filled via classical Opus/Codec2 PLC since call
/// start (monotonic).
#[serde(default)]
classical_plc_invocations: u64,
/// Total frames decoded since call start. Used by the relay to
/// compute recovery rates as a fraction of total frames.
#[serde(default)]
frames_decoded: u64,
},
/// Connection keepalive / RTT measurement.
Ping { timestamp_ms: u64 },
Pong { timestamp_ms: u64 },
/// End the call.
Hangup { reason: HangupReason },
/// End the call. `call_id` is optional for backwards compatibility
/// with older clients that send Hangup without it — the relay falls
/// back to ending ALL active calls for the sender in that case.
Hangup {
reason: HangupReason,
#[serde(default, skip_serializing_if = "Option::is_none")]
call_id: Option<String>,
},
/// featherChat bearer token for relay authentication.
/// Sent as the first signal message when --auth-url is configured.
@@ -657,10 +683,260 @@ pub enum SignalMessage {
participants: Vec<RoomParticipant>,
},
/// Set or update the client's display name.
/// Sent by client after joining; relay updates the participant entry and
/// re-broadcasts a RoomUpdate to all participants.
SetAlias { alias: String },
// ── Federation signals (relay-to-relay) ──
/// Federation: initial handshake — the connecting relay identifies itself.
FederationHello {
/// TLS certificate fingerprint of the connecting relay.
tls_fingerprint: String,
},
/// Federation: this relay now has local participants in a global room.
GlobalRoomActive {
room: String,
/// Participants on the announcing relay (for federated presence).
#[serde(default)]
participants: Vec<RoomParticipant>,
},
/// Federation: this relay's last local participant left a global room.
GlobalRoomInactive {
room: String,
},
// ── Direct calling signals (client ↔ relay signaling) ──
/// Register on relay for direct calls. Sent on `_signal` connections
/// after optional AuthToken.
RegisterPresence {
/// Client's Ed25519 identity public key.
identity_pub: [u8; 32],
/// Signature over ("register-presence" || identity_pub).
signature: Vec<u8>,
/// Optional display name.
alias: Option<String>,
},
/// Relay confirms presence registration.
RegisterPresenceAck {
success: bool,
#[serde(skip_serializing_if = "Option::is_none")]
error: Option<String>,
/// Relay's build version (git short hash).
#[serde(default, skip_serializing_if = "Option::is_none")]
relay_build: Option<String>,
},
/// Direct call offer routed through the relay to a specific peer.
DirectCallOffer {
/// Caller's fingerprint.
caller_fingerprint: String,
/// Caller's display name.
caller_alias: Option<String>,
/// Target's fingerprint.
target_fingerprint: String,
/// Unique call session ID (UUID).
call_id: String,
/// Caller's Ed25519 identity pub.
identity_pub: [u8; 32],
/// Caller's ephemeral X25519 pub (for key exchange on media connect).
ephemeral_pub: [u8; 32],
/// Signature over (ephemeral_pub || target_fingerprint || call_id).
signature: Vec<u8>,
/// Supported quality profiles.
supported_profiles: Vec<crate::QualityProfile>,
/// Phase 3 (hole-punching): caller's own server-reflexive
/// address as learned via `SignalMessage::Reflect`. The
/// relay stashes this in its call registry and later
/// injects it into the callee's `CallSetup.peer_direct_addr`
/// so the callee can try a direct QUIC handshake to the
/// caller instead of routing media through the relay.
/// `None` means "caller doesn't want P2P, use relay only".
#[serde(default, skip_serializing_if = "Option::is_none")]
caller_reflexive_addr: Option<String>,
/// Phase 5.5 (ICE host candidates): caller's LAN-local
/// interface addresses paired with its signal endpoint's
/// port. Peers on the same physical LAN can direct-dial
/// these without going through the WAN reflex addr,
/// which is important because most consumer NATs
/// (including MikroTik masquerade) don't support NAT
/// hairpinning — the reflex addr is unreachable from
/// the same LAN.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
caller_local_addrs: Vec<String>,
/// Build version (git short hash) for debugging.
#[serde(default, skip_serializing_if = "Option::is_none")]
caller_build_version: Option<String>,
},
/// Callee's response to a direct call.
DirectCallAnswer {
call_id: String,
/// How the callee accepts (or rejects).
accept_mode: CallAcceptMode,
/// Callee's identity pub (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
identity_pub: Option<[u8; 32]>,
/// Callee's ephemeral pub (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
ephemeral_pub: Option<[u8; 32]>,
/// Signature (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
signature: Option<Vec<u8>>,
/// Chosen quality profile (present when accepting).
#[serde(skip_serializing_if = "Option::is_none")]
chosen_profile: Option<crate::QualityProfile>,
/// Phase 3 (hole-punching): callee's own server-reflexive
/// address, only populated on `AcceptTrusted` — privacy-mode
/// answers leave this `None` so the callee's real IP stays
/// hidden (the whole point of `AcceptGeneric`). The relay
/// carries it opaquely into the caller's `CallSetup`.
#[serde(default, skip_serializing_if = "Option::is_none")]
callee_reflexive_addr: Option<String>,
/// Phase 5.5 (ICE host candidates): callee's LAN-local
/// interface addresses. Same purpose as
/// `caller_local_addrs` in `DirectCallOffer`. Only
/// populated on `AcceptTrusted` alongside
/// `callee_reflexive_addr`.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
callee_local_addrs: Vec<String>,
/// Build version (git short hash) for debugging.
#[serde(default, skip_serializing_if = "Option::is_none")]
callee_build_version: Option<String>,
},
/// Relay tells both parties: media room is ready.
CallSetup {
call_id: String,
/// Room name on the relay for the media session (e.g., "_call:a1b2c3d4").
room: String,
/// Relay address for the QUIC media connection.
relay_addr: String,
/// Phase 3 (hole-punching): the OTHER party's server-reflexive
/// address as the relay learned it from the offer/answer
/// exchange. When populated, clients attempt a direct QUIC
/// handshake to this address in parallel with the existing
/// relay path and use whichever connects first. `None`
/// means the relay path is the only option — either because
/// a peer didn't advertise its addr (Phase 1/2 relay or
/// privacy-mode answer) or because the relay decided P2P
/// wasn't viable.
#[serde(default, skip_serializing_if = "Option::is_none")]
peer_direct_addr: Option<String>,
/// Phase 5.5 (ICE host candidates): the OTHER party's LAN
/// host addresses (RFC1918 IPv4 + CGNAT + non-link-local
/// IPv6). On same-LAN calls these are directly dialable
/// and bypass the NAT-hairpinning problem that blocks
/// same-LAN peers from using `peer_direct_addr`.
/// Client-side race tries all of these in parallel.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
peer_local_addrs: Vec<String>,
},
/// Ringing notification (relay → caller, callee received the offer).
CallRinging {
call_id: String,
},
// ── NAT reflection ("STUN for QUIC") ──────────────────────────────
/// Client → relay: "please tell me the source IP:port you see on
/// this connection". A QUIC-native replacement for classic STUN
/// that reuses the TLS-authenticated signal channel to the relay
/// instead of running a separate UDP reflection service on port
/// 3478. The relay answers with `ReflectResponse`.
///
/// No payload — the relay already knows which connection the
/// request arrived on, and `connection.remote_address()` gives it
/// the exact source address (post-NAT) as observed from the
/// server side of the TLS session.
Reflect,
/// Relay → client: response to `Reflect`. Carries the socket
/// address the relay observes as the client's source for this
/// QUIC connection in `SocketAddr::to_string()` form — "a.b.c.d:p"
/// for IPv4, "[::1]:p" for IPv6. Clients parse it with
/// `SocketAddr::from_str`.
ReflectResponse {
observed_addr: String,
},
// ── Phase 6: ICE-style path negotiation ─────────────────────
/// Phase 6: each side reports the result of its local dual-
/// path race to the other side through the relay. Both peers
/// send this after their race completes; both wait for the
/// other's report before committing a transport to the
/// CallEngine.
///
/// The decision rule is: if BOTH sides report `direct_ok =
/// true`, use the direct P2P connection. If EITHER reports
/// `direct_ok = false`, BOTH fall back to relay. This
/// eliminates the race condition where one side picks Direct
/// and the other picks Relay — they now agree on the path
/// before any media flows.
MediaPathReport {
call_id: String,
/// Did the direct QUIC connection (P2P dial or accept)
/// complete successfully on this side?
direct_ok: bool,
/// Which future won the local tokio::select race?
/// "Direct" or "Relay" — informational for debug logs.
#[serde(default)]
race_winner: String,
},
// ── Phase 4: cross-relay direct-call signaling ────────────────────
/// Phase 4: relay-to-relay envelope for forwarding direct-call
/// signaling across a federation link. When Alice on Relay A
/// sends a `DirectCallOffer` for Bob whose fingerprint isn't
/// in A's local SignalHub, Relay A wraps the offer in this
/// envelope and broadcasts it over every active federation
/// peer link. Whichever peer has Bob registered unwraps the
/// inner message and delivers it locally.
///
/// Never originated by clients — only relays create and
/// consume this variant.
///
/// Loop prevention: the receiving relay drops any forward
/// where `origin_relay_fp` matches its own federation TLS
/// fingerprint. With broadcast-to-all-peers this prevents
/// A→B→A echo loops; proper TTL + dedup will land when
/// multi-hop federation is added (Phase 4.2).
FederatedSignalForward {
/// The signal message being forwarded
/// (`DirectCallOffer`, `DirectCallAnswer`, `CallRinging`,
/// `Hangup`, ...). Boxed because `SignalMessage` is
/// relatively large and JSON serde handles recursion
/// cleanly.
inner: Box<SignalMessage>,
/// Federation TLS fingerprint of the sending relay.
/// Used (a) for loop prevention by the receiver and (b)
/// to route the peer's reply back through the same
/// federation link via `send_signal_to_peer`.
origin_relay_fp: String,
},
/// Relay-initiated quality directive: all participants should switch
/// to the recommended profile to match the weakest link.
QualityDirective {
recommended_profile: crate::QualityProfile,
#[serde(default, skip_serializing_if = "Option::is_none")]
reason: Option<String>,
},
}
/// How the callee responds to a direct call.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub enum CallAcceptMode {
/// Reject the call.
Reject,
/// Accept with trust — in Phase 2, this enables P2P (reveals IP).
/// In Phase 1, behaves the same as AcceptGeneric.
AcceptTrusted,
/// Accept with privacy — relay always mediates media.
AcceptGeneric,
}
/// A participant entry in a RoomUpdate message.
@@ -670,6 +946,10 @@ pub struct RoomParticipant {
pub fingerprint: String,
/// Optional display name set by the client.
pub alias: Option<String>,
/// Relay label — identifies which relay this participant is connected to.
/// None for local participants, Some("Relay B") for federated.
#[serde(default)]
pub relay_label: Option<String>,
}
/// Reasons for ending a call.
@@ -783,6 +1063,272 @@ mod tests {
assert_eq!(packet.quality_report, decoded.quality_report);
}
#[test]
fn reflect_serialize_roundtrip() {
// Reflect is a unit variant — the client sends it with no
// payload and the relay answers with the observed source addr.
let req = SignalMessage::Reflect;
let json = serde_json::to_string(&req).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
assert!(matches!(decoded, SignalMessage::Reflect));
// ReflectResponse carries a string — exercise both IPv4 and
// IPv6 shapes because SocketAddr::to_string uses [::1]:port
// for v6 and the client side has to parse that back.
for addr in ["192.0.2.17:4433", "[2001:db8::1]:4433", "127.0.0.1:54321"] {
let resp = SignalMessage::ReflectResponse {
observed_addr: addr.to_string(),
};
let json = serde_json::to_string(&resp).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::ReflectResponse { observed_addr } => {
assert_eq!(observed_addr, addr);
// Must parse back to a SocketAddr cleanly.
let _parsed: std::net::SocketAddr = observed_addr.parse()
.expect("observed_addr must parse as SocketAddr");
}
_ => panic!("wrong variant after roundtrip"),
}
}
}
#[test]
fn federated_signal_forward_roundtrip() {
// Wrap a DirectCallOffer inside FederatedSignalForward and
// prove both directions of serde preserve every field.
let inner = SignalMessage::DirectCallOffer {
caller_fingerprint: "alice".into(),
caller_alias: Some("Alice".into()),
target_fingerprint: "bob".into(),
call_id: "c1".into(),
identity_pub: [1u8; 32],
ephemeral_pub: [2u8; 32],
signature: vec![3u8; 64],
supported_profiles: vec![],
caller_reflexive_addr: Some("192.0.2.1:4433".into()),
caller_local_addrs: Vec::new(),
caller_build_version: None,
};
let forward = SignalMessage::FederatedSignalForward {
inner: Box::new(inner),
origin_relay_fp: "relay-a-tls-fp".into(),
};
let json = serde_json::to_string(&forward).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::FederatedSignalForward { inner, origin_relay_fp } => {
assert_eq!(origin_relay_fp, "relay-a-tls-fp");
match *inner {
SignalMessage::DirectCallOffer {
caller_fingerprint,
target_fingerprint,
caller_reflexive_addr,
..
} => {
assert_eq!(caller_fingerprint, "alice");
assert_eq!(target_fingerprint, "bob");
assert_eq!(caller_reflexive_addr.as_deref(), Some("192.0.2.1:4433"));
}
_ => panic!("inner was not DirectCallOffer after roundtrip"),
}
}
_ => panic!("outer was not FederatedSignalForward"),
}
}
#[test]
fn federated_signal_forward_can_nest_any_inner() {
// Sanity check that every direct-call signaling variant
// we intend to forward survives being boxed + re-serialized.
let cases: Vec<SignalMessage> = vec![
SignalMessage::DirectCallAnswer {
call_id: "c1".into(),
accept_mode: CallAcceptMode::AcceptTrusted,
identity_pub: None,
ephemeral_pub: None,
signature: None,
chosen_profile: None,
callee_reflexive_addr: Some("198.51.100.9:4433".into()),
callee_local_addrs: Vec::new(),
callee_build_version: None,
},
SignalMessage::CallRinging { call_id: "c1".into() },
SignalMessage::Hangup { reason: HangupReason::Normal, call_id: None },
];
for inner in cases {
let inner_disc = std::mem::discriminant(&inner);
let forward = SignalMessage::FederatedSignalForward {
inner: Box::new(inner),
origin_relay_fp: "r".into(),
};
let json = serde_json::to_string(&forward).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::FederatedSignalForward { inner, .. } => {
assert_eq!(std::mem::discriminant(&*inner), inner_disc);
}
_ => panic!("outer variant lost"),
}
}
}
#[test]
fn hole_punching_optional_fields_roundtrip() {
// DirectCallOffer with Some(caller_reflexive_addr)
let offer = SignalMessage::DirectCallOffer {
caller_fingerprint: "alice".into(),
caller_alias: None,
target_fingerprint: "bob".into(),
call_id: "c1".into(),
identity_pub: [0; 32],
ephemeral_pub: [0; 32],
signature: vec![],
supported_profiles: vec![],
caller_reflexive_addr: Some("192.0.2.1:4433".into()),
caller_local_addrs: Vec::new(),
caller_build_version: None,
};
let json = serde_json::to_string(&offer).unwrap();
assert!(
json.contains("caller_reflexive_addr"),
"Some field must serialize: {json}"
);
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::DirectCallOffer { caller_reflexive_addr, .. } => {
assert_eq!(caller_reflexive_addr.as_deref(), Some("192.0.2.1:4433"));
}
_ => panic!("wrong variant"),
}
// DirectCallOffer with None — skip_serializing_if must
// OMIT the field from the JSON so older relays that don't
// know about caller_reflexive_addr don't see it.
let offer_none = SignalMessage::DirectCallOffer {
caller_fingerprint: "alice".into(),
caller_alias: None,
target_fingerprint: "bob".into(),
call_id: "c1".into(),
identity_pub: [0; 32],
ephemeral_pub: [0; 32],
signature: vec![],
supported_profiles: vec![],
caller_reflexive_addr: None,
caller_local_addrs: Vec::new(),
caller_build_version: None,
};
let json_none = serde_json::to_string(&offer_none).unwrap();
assert!(
!json_none.contains("caller_reflexive_addr"),
"None field must NOT serialize: {json_none}"
);
// DirectCallAnswer with callee_reflexive_addr.
let answer = SignalMessage::DirectCallAnswer {
call_id: "c1".into(),
accept_mode: CallAcceptMode::AcceptTrusted,
identity_pub: None,
ephemeral_pub: None,
signature: None,
chosen_profile: None,
callee_reflexive_addr: Some("198.51.100.9:4433".into()),
callee_local_addrs: Vec::new(),
callee_build_version: None,
};
let decoded: SignalMessage =
serde_json::from_str(&serde_json::to_string(&answer).unwrap()).unwrap();
match decoded {
SignalMessage::DirectCallAnswer { callee_reflexive_addr, .. } => {
assert_eq!(
callee_reflexive_addr.as_deref(),
Some("198.51.100.9:4433")
);
}
_ => panic!("wrong variant"),
}
// CallSetup with peer_direct_addr.
let setup = SignalMessage::CallSetup {
call_id: "c1".into(),
room: "call-c1".into(),
relay_addr: "203.0.113.5:4433".into(),
peer_direct_addr: Some("192.0.2.1:4433".into()),
peer_local_addrs: Vec::new(),
};
let decoded: SignalMessage =
serde_json::from_str(&serde_json::to_string(&setup).unwrap()).unwrap();
match decoded {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
assert_eq!(peer_direct_addr.as_deref(), Some("192.0.2.1:4433"));
}
_ => panic!("wrong variant"),
}
}
#[test]
fn hole_punching_backward_compat_old_json_parses() {
// An older client/relay wouldn't include the new fields at
// all — the new code must still accept that JSON because
// of #[serde(default)] on the Option<String>.
let old_offer_json = r#"{
"DirectCallOffer": {
"caller_fingerprint": "alice",
"caller_alias": null,
"target_fingerprint": "bob",
"call_id": "c1",
"identity_pub": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
"ephemeral_pub": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
"signature": [],
"supported_profiles": []
}
}"#;
let decoded: SignalMessage = serde_json::from_str(old_offer_json).unwrap();
match decoded {
SignalMessage::DirectCallOffer { caller_reflexive_addr, .. } => {
assert!(caller_reflexive_addr.is_none());
}
_ => panic!("wrong variant"),
}
let old_setup_json = r#"{
"CallSetup": {
"call_id": "c1",
"room": "call-c1",
"relay_addr": "203.0.113.5:4433"
}
}"#;
let decoded: SignalMessage = serde_json::from_str(old_setup_json).unwrap();
match decoded {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
assert!(peer_direct_addr.is_none());
}
_ => panic!("wrong variant"),
}
}
#[test]
fn reflect_backward_compat_with_existing_variants() {
// Adding Reflect/ReflectResponse at the end of the enum must
// not break JSON round-tripping of existing variants. Smoke-
// test a sample of the pre-existing ones.
let cases = vec![
SignalMessage::Ping { timestamp_ms: 12345 },
SignalMessage::Hold,
SignalMessage::Hangup { reason: HangupReason::Normal, call_id: None },
SignalMessage::CallRinging { call_id: "abcd".into() },
];
for m in cases {
let json = serde_json::to_string(&m).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
// Discriminant equality proves variant tag survived.
assert_eq!(
std::mem::discriminant(&m),
std::mem::discriminant(&decoded)
);
}
}
#[test]
fn hold_unhold_serialize() {
let hold = SignalMessage::Hold;
@@ -1127,6 +1673,41 @@ mod tests {
}
}
#[test]
fn quality_directive_roundtrip() {
let msg = SignalMessage::QualityDirective {
recommended_profile: crate::QualityProfile::DEGRADED,
reason: Some("weakest link degraded".into()),
};
let json = serde_json::to_string(&msg).unwrap();
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::QualityDirective { recommended_profile, reason } => {
assert_eq!(recommended_profile.codec, CodecId::Opus6k);
assert_eq!(reason.as_deref(), Some("weakest link degraded"));
}
_ => panic!("wrong variant"),
}
}
#[test]
fn quality_directive_without_reason_roundtrip() {
let msg = SignalMessage::QualityDirective {
recommended_profile: crate::QualityProfile::GOOD,
reason: None,
};
let json = serde_json::to_string(&msg).unwrap();
// None reason should be omitted from JSON
assert!(!json.contains("reason"));
let decoded: SignalMessage = serde_json::from_str(&json).unwrap();
match decoded {
SignalMessage::QualityDirective { reason, .. } => {
assert!(reason.is_none());
}
_ => panic!("wrong variant"),
}
}
#[test]
fn mini_frame_disabled() {
// Simulate disabled mini-frames by always keeping frames_since_full at 0

View File

@@ -28,6 +28,13 @@ pub trait AudioEncoder: Send + Sync {
/// Enable/disable DTX (discontinuous transmission). No-op for Codec2.
fn set_dtx(&mut self, _enabled: bool) {}
/// Hint the encoder about expected packet loss (0100). In DRED mode the
/// encoder floors this at 15% internally. No-op for Codec2.
fn set_expected_loss(&mut self, _loss_pct: u8) {}
/// Set DRED duration in 10 ms frame units (0104). No-op for Codec2.
fn set_dred_duration(&mut self, _frames: u8) {}
}
/// Decodes compressed frames back to PCM audio.
@@ -132,6 +139,14 @@ pub trait CryptoSession: Send + Sync {
fn overhead(&self) -> usize {
16 // ChaCha20-Poly1305 tag
}
/// Short Authentication String (SAS) — 4-digit code for verbal verification.
/// Both peers derive the same code from the shared secret + identity keys.
/// If a MITM relay is intercepting, the codes will differ.
/// Returns None if SAS was not computed (e.g., relay-side sessions).
fn sas_code(&self) -> Option<u32> {
None
}
}
/// Key exchange using the Warzone identity model.

View File

@@ -29,6 +29,8 @@ axum = { version = "0.7", default-features = false, features = ["tokio", "http1"
tower-http = { version = "0.6", features = ["fs"] }
futures-util = "0.3"
dirs = "6"
sha2 = { workspace = true }
chrono = "0.4"
[[bin]]
name = "wzp-relay"

18
crates/wzp-relay/build.rs Normal file
View File

@@ -0,0 +1,18 @@
use std::process::Command;
fn main() {
// Get git hash at build time
let output = Command::new("git")
.args(["rev-parse", "--short", "HEAD"])
.output();
let hash = match output {
Ok(o) if o.status.success() => {
String::from_utf8_lossy(&o.stdout).trim().to_string()
}
_ => "unknown".to_string(),
};
println!("cargo:rustc-env=WZP_BUILD_HASH={hash}");
println!("cargo:rerun-if-changed=.git/HEAD");
}

View File

@@ -0,0 +1,354 @@
//! Direct call state tracking.
//!
//! Manages the lifecycle of 1:1 direct calls placed via the `_signal` channel.
//! Each call goes through: Pending → Ringing → Active → Ended.
use std::collections::HashMap;
use std::time::{Duration, Instant};
/// State of a direct call.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum DirectCallState {
/// Offer sent to callee, waiting for response.
Pending,
/// Callee acknowledged, ringing.
Ringing,
/// Call accepted, media room active.
Active,
/// Call ended (hangup, reject, timeout, or error).
Ended,
}
/// A tracked direct call between two users.
pub struct DirectCall {
pub call_id: String,
pub caller_fingerprint: String,
pub callee_fingerprint: String,
pub state: DirectCallState,
pub accept_mode: Option<wzp_proto::CallAcceptMode>,
/// Private room name (set when accepted).
pub room_name: Option<String>,
pub created_at: Instant,
pub answered_at: Option<Instant>,
pub ended_at: Option<Instant>,
/// Phase 3 (hole-punching): caller's server-reflexive address
/// as carried in the `DirectCallOffer`. The relay stashes it
/// here when the offer arrives so it can later inject it as
/// `peer_direct_addr` into the callee's `CallSetup`.
pub caller_reflexive_addr: Option<String>,
/// Phase 3 (hole-punching): callee's server-reflexive address
/// as carried in the `DirectCallAnswer`. Only populated for
/// `AcceptTrusted` answers — privacy-mode answers leave this
/// `None`. Fed into the caller's `CallSetup.peer_direct_addr`.
pub callee_reflexive_addr: Option<String>,
/// Phase 4 (cross-relay): federation TLS fingerprint of the
/// PEER RELAY that forwarded the offer/answer for this call.
/// `None` for local calls — caller and callee both
/// registered on this relay. `Some(fp)` when one side of
/// the call is on a remote relay reached through the
/// federation link identified by `fp`. The
/// `DirectCallAnswer` handling uses this to route the reply
/// back through the SAME link instead of broadcasting again.
pub peer_relay_fp: Option<String>,
/// Phase 5.5 (ICE host candidates): caller's LAN-local
/// interface addresses from the `DirectCallOffer`. Cross-
/// wired into the callee's `CallSetup.peer_local_addrs` so
/// the callee can direct-dial the caller over the same LAN
/// without going through the WAN reflex addr (NAT
/// hairpinning often doesn't work for same-LAN peers).
pub caller_local_addrs: Vec<String>,
/// Phase 5.5 (ICE host candidates): callee's LAN-local
/// interface addresses from the `DirectCallAnswer`. Cross-
/// wired into the caller's `CallSetup.peer_local_addrs`.
pub callee_local_addrs: Vec<String>,
}
/// Registry of active direct calls.
pub struct CallRegistry {
calls: HashMap<String, DirectCall>,
}
impl CallRegistry {
pub fn new() -> Self {
Self {
calls: HashMap::new(),
}
}
/// Create a new pending call. Returns the call_id.
pub fn create_call(&mut self, call_id: String, caller_fp: String, callee_fp: String) -> &DirectCall {
let call = DirectCall {
call_id: call_id.clone(),
caller_fingerprint: caller_fp,
callee_fingerprint: callee_fp,
state: DirectCallState::Pending,
accept_mode: None,
room_name: None,
created_at: Instant::now(),
answered_at: None,
ended_at: None,
caller_reflexive_addr: None,
callee_reflexive_addr: None,
peer_relay_fp: None,
caller_local_addrs: Vec::new(),
callee_local_addrs: Vec::new(),
};
self.calls.insert(call_id.clone(), call);
self.calls.get(&call_id).unwrap()
}
/// Phase 5.5: stash the caller's LAN host candidates from
/// the `DirectCallOffer`. Empty Vec is a valid value meaning
/// "caller has no LAN candidates" (e.g. old client).
pub fn set_caller_local_addrs(&mut self, call_id: &str, addrs: Vec<String>) {
if let Some(call) = self.calls.get_mut(call_id) {
call.caller_local_addrs = addrs;
}
}
/// Phase 5.5: stash the callee's LAN host candidates from
/// the `DirectCallAnswer`.
pub fn set_callee_local_addrs(&mut self, call_id: &str, addrs: Vec<String>) {
if let Some(call) = self.calls.get_mut(call_id) {
call.callee_local_addrs = addrs;
}
}
/// Phase 4: stash the federation TLS fingerprint of the peer
/// relay that originated (or will receive) the cross-relay
/// forward for this call. Safe to call with `None` to clear
/// a previously-set value.
pub fn set_peer_relay_fp(&mut self, call_id: &str, fp: Option<String>) {
if let Some(call) = self.calls.get_mut(call_id) {
call.peer_relay_fp = fp;
}
}
/// Phase 3: stash the caller's server-reflexive address read
/// off a `DirectCallOffer`. Safe to call on any call state;
/// a no-op if the call doesn't exist.
pub fn set_caller_reflexive_addr(&mut self, call_id: &str, addr: Option<String>) {
if let Some(call) = self.calls.get_mut(call_id) {
call.caller_reflexive_addr = addr;
}
}
/// Phase 3: stash the callee's server-reflexive address read
/// off a `DirectCallAnswer`. Safe to call on any call state;
/// a no-op if the call doesn't exist.
pub fn set_callee_reflexive_addr(&mut self, call_id: &str, addr: Option<String>) {
if let Some(call) = self.calls.get_mut(call_id) {
call.callee_reflexive_addr = addr;
}
}
/// Get a call by ID.
pub fn get(&self, call_id: &str) -> Option<&DirectCall> {
self.calls.get(call_id)
}
/// Get a mutable call by ID.
pub fn get_mut(&mut self, call_id: &str) -> Option<&mut DirectCall> {
self.calls.get_mut(call_id)
}
/// Transition to Ringing state.
pub fn set_ringing(&mut self, call_id: &str) -> bool {
if let Some(call) = self.calls.get_mut(call_id) {
if call.state == DirectCallState::Pending {
call.state = DirectCallState::Ringing;
return true;
}
}
false
}
/// Transition to Active state.
pub fn set_active(&mut self, call_id: &str, mode: wzp_proto::CallAcceptMode, room: String) -> bool {
if let Some(call) = self.calls.get_mut(call_id) {
if call.state == DirectCallState::Pending || call.state == DirectCallState::Ringing {
call.state = DirectCallState::Active;
call.accept_mode = Some(mode);
call.room_name = Some(room);
call.answered_at = Some(Instant::now());
return true;
}
}
false
}
/// End a call.
pub fn end_call(&mut self, call_id: &str) -> Option<DirectCall> {
if let Some(call) = self.calls.get_mut(call_id) {
call.state = DirectCallState::Ended;
call.ended_at = Some(Instant::now());
}
self.calls.remove(call_id)
}
/// Find active/pending calls involving a fingerprint.
pub fn calls_for_fingerprint(&self, fp: &str) -> Vec<&DirectCall> {
self.calls.values()
.filter(|c| {
c.state != DirectCallState::Ended
&& (c.caller_fingerprint == fp || c.callee_fingerprint == fp)
})
.collect()
}
/// Find the peer's fingerprint in a call.
pub fn peer_fingerprint(&self, call_id: &str, my_fp: &str) -> Option<&str> {
self.calls.get(call_id).map(|c| {
if c.caller_fingerprint == my_fp {
c.callee_fingerprint.as_str()
} else {
c.caller_fingerprint.as_str()
}
})
}
/// Remove calls that have been pending longer than the timeout.
/// Returns call IDs of expired calls.
pub fn expire_stale(&mut self, timeout: Duration) -> Vec<DirectCall> {
let now = Instant::now();
let expired: Vec<String> = self.calls.iter()
.filter(|(_, c)| {
c.state == DirectCallState::Pending
&& now.duration_since(c.created_at) > timeout
})
.map(|(id, _)| id.clone())
.collect();
expired.into_iter()
.filter_map(|id| self.calls.remove(&id))
.collect()
}
/// Number of active (non-ended) calls.
pub fn active_count(&self) -> usize {
self.calls.values()
.filter(|c| c.state != DirectCallState::Ended)
.count()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn call_lifecycle() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
assert_eq!(reg.get("c1").unwrap().state, DirectCallState::Pending);
assert!(reg.set_ringing("c1"));
assert_eq!(reg.get("c1").unwrap().state, DirectCallState::Ringing);
assert!(reg.set_active("c1", wzp_proto::CallAcceptMode::AcceptGeneric, "_call:c1".into()));
assert_eq!(reg.get("c1").unwrap().state, DirectCallState::Active);
assert_eq!(reg.get("c1").unwrap().room_name.as_deref(), Some("_call:c1"));
let ended = reg.end_call("c1").unwrap();
assert_eq!(ended.state, DirectCallState::Ended);
assert_eq!(reg.active_count(), 0);
}
#[test]
fn expire_stale_calls() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
// Not expired yet
let expired = reg.expire_stale(Duration::from_secs(30));
assert!(expired.is_empty());
// Force expiry with 0 timeout
let expired = reg.expire_stale(Duration::from_secs(0));
assert_eq!(expired.len(), 1);
assert_eq!(expired[0].call_id, "c1");
}
#[test]
fn peer_lookup() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
assert_eq!(reg.peer_fingerprint("c1", "alice"), Some("bob"));
assert_eq!(reg.peer_fingerprint("c1", "bob"), Some("alice"));
}
#[test]
fn call_registry_stores_reflexive_addrs() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
// Default: both addrs are None.
let c = reg.get("c1").unwrap();
assert!(c.caller_reflexive_addr.is_none());
assert!(c.callee_reflexive_addr.is_none());
// Caller advertises its reflex addr via DirectCallOffer.
reg.set_caller_reflexive_addr("c1", Some("192.0.2.1:4433".into()));
assert_eq!(
reg.get("c1").unwrap().caller_reflexive_addr.as_deref(),
Some("192.0.2.1:4433")
);
// Callee responds with AcceptTrusted + its own reflex addr.
reg.set_callee_reflexive_addr("c1", Some("198.51.100.9:4433".into()));
assert_eq!(
reg.get("c1").unwrap().callee_reflexive_addr.as_deref(),
Some("198.51.100.9:4433")
);
// Both addrs are independently readable — the relay uses
// them to cross-wire peer_direct_addr in CallSetup.
let c = reg.get("c1").unwrap();
assert_eq!(
c.caller_reflexive_addr.as_deref(),
Some("192.0.2.1:4433")
);
assert_eq!(
c.callee_reflexive_addr.as_deref(),
Some("198.51.100.9:4433")
);
// Setter on an unknown call is a no-op, not a panic.
reg.set_caller_reflexive_addr("does-not-exist", Some("x".into()));
}
#[test]
fn call_registry_stores_peer_relay_fp() {
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
// Default: no peer relay.
assert!(reg.get("c1").unwrap().peer_relay_fp.is_none());
// Cross-relay call: origin relay's fp is stashed.
reg.set_peer_relay_fp("c1", Some("relay-a-tls-fp".into()));
assert_eq!(
reg.get("c1").unwrap().peer_relay_fp.as_deref(),
Some("relay-a-tls-fp")
);
// Clearing with None is a valid no-op and empties the field.
reg.set_peer_relay_fp("c1", None);
assert!(reg.get("c1").unwrap().peer_relay_fp.is_none());
// Unknown call is a no-op, not a panic.
reg.set_peer_relay_fp("does-not-exist", Some("x".into()));
}
#[test]
fn call_registry_clearing_reflex_addr_works() {
// Passing None to the setter must clear a previously-set value
// so callers that downgrade to privacy mode mid-flow don't
// leak a stale addr into CallSetup.
let mut reg = CallRegistry::new();
reg.create_call("c1".into(), "alice".into(), "bob".into());
reg.set_caller_reflexive_addr("c1", Some("192.0.2.1:4433".into()));
reg.set_caller_reflexive_addr("c1", None);
assert!(reg.get("c1").unwrap().caller_reflexive_addr.is_none());
}
}

View File

@@ -3,8 +3,41 @@
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;
/// Configuration for the relay daemon.
/// A federated peer relay.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct PeerConfig {
/// Address of the peer relay (e.g., "193.180.213.68:4433").
pub url: String,
/// Expected TLS certificate fingerprint (hex, with colons).
pub fingerprint: String,
/// Optional human-readable label.
#[serde(default)]
pub label: Option<String>,
}
/// A trusted relay — accepts inbound federation without needing the peer's address.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct TrustedConfig {
/// Expected TLS certificate fingerprint (hex, with colons).
pub fingerprint: String,
/// Optional human-readable label.
#[serde(default)]
pub label: Option<String>,
}
/// A room declared global — bridged across all federated peers.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct GlobalRoomConfig {
/// Room name to bridge (e.g., "android").
pub name: String,
}
/// Configuration for the relay daemon.
///
/// All fields have defaults, so a minimal TOML file only needs the
/// fields you want to override (e.g., just `[[peers]]`).
#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(default)]
pub struct RelayConfig {
/// Address to listen on for incoming connections (client-facing).
pub listen_addr: SocketAddr,
@@ -44,6 +77,22 @@ pub struct RelayConfig {
pub ws_port: Option<u16>,
/// Directory to serve static files from (HTML/JS/WASM for web clients).
pub static_dir: Option<String>,
/// Federation peer relays.
#[serde(default)]
pub peers: Vec<PeerConfig>,
/// Global rooms bridged across federation.
#[serde(default)]
pub global_rooms: Vec<GlobalRoomConfig>,
/// Trusted relay fingerprints — accept inbound federation from these relays.
/// Unlike [[peers]], no url is needed — the peer connects to us.
#[serde(default)]
pub trusted: Vec<TrustedConfig>,
/// Debug tap: log packet headers for matching rooms ("*" = all rooms).
/// Activated via --debug-tap <room> or debug_tap = "room" in TOML.
pub debug_tap: Option<String>,
/// JSONL event log path for protocol analysis (--event-log).
#[serde(skip)]
pub event_log: Option<String>,
}
impl Default for RelayConfig {
@@ -62,6 +111,100 @@ impl Default for RelayConfig {
trunking_enabled: false,
ws_port: None,
static_dir: None,
peers: Vec::new(),
global_rooms: Vec::new(),
trusted: Vec::new(),
debug_tap: None,
event_log: None,
}
}
}
/// Load relay configuration from a TOML file.
pub fn load_config(path: &str) -> Result<RelayConfig, anyhow::Error> {
let content = std::fs::read_to_string(path)?;
let config: RelayConfig = toml::from_str(&content)?;
Ok(config)
}
/// Info about this relay instance, used to generate personalized example configs.
pub struct RelayInfo {
pub listen_addr: String,
pub tls_fingerprint: String,
pub public_ip: Option<String>,
}
/// Load config from path, or create a personalized example config if it doesn't exist.
pub fn load_or_create_config(path: &str, info: Option<&RelayInfo>) -> Result<RelayConfig, anyhow::Error> {
let p = std::path::Path::new(path);
if p.exists() {
return load_config(path);
}
// Create parent directory if needed
if let Some(parent) = p.parent() {
std::fs::create_dir_all(parent)?;
}
// Generate personalized example config
let example = generate_example_config(info);
std::fs::write(p, &example)?;
eprintln!("Created example config at {path} — edit it and restart.");
let config: RelayConfig = toml::from_str(&example)?;
Ok(config)
}
/// Generate an example TOML config, personalized with this relay's info if available.
fn generate_example_config(info: Option<&RelayInfo>) -> String {
let listen = info.map(|i| i.listen_addr.as_str()).unwrap_or("0.0.0.0:4433");
let peer_example = if let Some(i) = info {
let ip = i.public_ip.as_deref().unwrap_or("this-relay-ip");
format!(
r#"# Other relays can peer with this relay using:
# [[peers]]
# url = "{ip}:{port}"
# fingerprint = "{fp}"
# label = "This Relay""#,
port = listen.rsplit(':').next().unwrap_or("4433"),
fp = i.tls_fingerprint,
)
} else {
"# To peer with another relay, add its url + fingerprint:".to_string()
};
format!(
r#"# WarzonePhone Relay Configuration
# See docs/ADMINISTRATION.md for full reference.
# Listen address for client connections
listen_addr = "{listen}"
# Maximum concurrent sessions
# max_sessions = 100
# Prometheus metrics endpoint (uncomment to enable)
# metrics_port = 9090
# featherChat auth endpoint (uncomment to enable)
# auth_url = "https://chat.example.com/v1/auth/validate"
{peer_example}
# Federation: peer relays we connect to (outbound)
# [[peers]]
# url = "other-relay.example.com:4433"
# fingerprint = "aa:bb:cc:dd:..."
# label = "Relay B"
# Federation: relays we trust inbound connections from
# [[trusted]]
# fingerprint = "ee:ff:00:11:..."
# label = "Relay X"
# Global rooms bridged across all federated peers
# [[global_rooms]]
# name = "general"
# Debug: log packet headers for a room ("*" for all)
# debug_tap = "*"
"#
)
}

View File

@@ -0,0 +1,200 @@
//! JSONL event log for protocol analysis.
//!
//! When `--event-log <path>` is set, every media packet emits a structured
//! event at each decision point (recv, forward, drop, deliver).
//! Use `wzp-analyzer` to correlate events across multiple relays.
use std::path::PathBuf;
use serde::Serialize;
use tokio::sync::mpsc;
use tracing::{error, info};
/// A single protocol event for JSONL output.
#[derive(Debug, Serialize)]
pub struct Event {
/// ISO 8601 timestamp with microseconds.
pub ts: String,
/// Event type.
pub event: &'static str,
/// Room name.
#[serde(skip_serializing_if = "Option::is_none")]
pub room: Option<String>,
/// Source address or peer label.
#[serde(skip_serializing_if = "Option::is_none")]
pub src: Option<String>,
/// Packet sequence number.
#[serde(skip_serializing_if = "Option::is_none")]
pub seq: Option<u16>,
/// Codec identifier.
#[serde(skip_serializing_if = "Option::is_none")]
pub codec: Option<String>,
/// FEC block ID.
#[serde(skip_serializing_if = "Option::is_none")]
pub fec_block: Option<u8>,
/// FEC symbol index.
#[serde(skip_serializing_if = "Option::is_none")]
pub fec_sym: Option<u8>,
/// Is FEC repair packet.
#[serde(skip_serializing_if = "Option::is_none")]
pub repair: Option<bool>,
/// Payload length in bytes.
#[serde(skip_serializing_if = "Option::is_none")]
pub len: Option<usize>,
/// Number of recipients.
#[serde(skip_serializing_if = "Option::is_none")]
pub to_count: Option<usize>,
/// Peer label (for federation events).
#[serde(skip_serializing_if = "Option::is_none")]
pub peer: Option<String>,
/// Drop/error reason.
#[serde(skip_serializing_if = "Option::is_none")]
pub reason: Option<String>,
/// Presence action (active/inactive).
#[serde(skip_serializing_if = "Option::is_none")]
pub action: Option<String>,
/// Participant count (presence events).
#[serde(skip_serializing_if = "Option::is_none")]
pub participants: Option<usize>,
}
impl Event {
fn now() -> String {
chrono::Utc::now().format("%Y-%m-%dT%H:%M:%S%.6fZ").to_string()
}
/// Create a minimal event with just type and timestamp.
pub fn new(event: &'static str) -> Self {
Self {
ts: Self::now(),
event,
room: None,
src: None,
seq: None,
codec: None,
fec_block: None,
fec_sym: None,
repair: None,
len: None,
to_count: None,
peer: None,
reason: None,
action: None,
participants: None,
}
}
/// Set room.
pub fn room(mut self, room: &str) -> Self { self.room = Some(room.to_string()); self }
/// Set source.
pub fn src(mut self, src: &str) -> Self { self.src = Some(src.to_string()); self }
/// Set packet header fields from a MediaPacket.
pub fn packet(mut self, pkt: &wzp_proto::MediaPacket) -> Self {
self.seq = Some(pkt.header.seq);
self.codec = Some(format!("{:?}", pkt.header.codec_id));
self.fec_block = Some(pkt.header.fec_block);
self.fec_sym = Some(pkt.header.fec_symbol);
self.repair = Some(pkt.header.is_repair);
self.len = Some(pkt.payload.len());
self
}
/// Set seq only (when full packet not available).
pub fn seq(mut self, seq: u16) -> Self { self.seq = Some(seq); self }
/// Set payload length.
pub fn len(mut self, len: usize) -> Self { self.len = Some(len); self }
/// Set recipient count.
pub fn to_count(mut self, n: usize) -> Self { self.to_count = Some(n); self }
/// Set peer label.
pub fn peer(mut self, peer: &str) -> Self { self.peer = Some(peer.to_string()); self }
/// Set drop reason.
pub fn reason(mut self, reason: &str) -> Self { self.reason = Some(reason.to_string()); self }
/// Set presence action.
pub fn action(mut self, action: &str) -> Self { self.action = Some(action.to_string()); self }
/// Set participant count.
pub fn participants(mut self, n: usize) -> Self { self.participants = Some(n); self }
}
/// Handle for emitting events. Cheap to clone.
#[derive(Clone)]
pub struct EventLog {
tx: mpsc::UnboundedSender<Event>,
}
impl EventLog {
/// Emit an event (non-blocking, drops if channel is full).
pub fn emit(&self, event: Event) {
let _ = self.tx.send(event);
}
}
/// No-op event log for when `--event-log` is not set.
/// All methods are no-ops that compile to nothing.
#[derive(Clone)]
pub struct NoopEventLog;
/// Unified event log handle — either real or no-op.
#[derive(Clone)]
pub enum EventLogger {
Active(EventLog),
Noop,
}
impl EventLogger {
pub fn emit(&self, event: Event) {
if let EventLogger::Active(log) = self {
log.emit(event);
}
}
pub fn is_active(&self) -> bool {
matches!(self, EventLogger::Active(_))
}
}
/// Start the event log writer. Returns an `EventLogger` handle.
pub fn start_event_log(path: Option<PathBuf>) -> EventLogger {
match path {
Some(path) => {
let (tx, rx) = mpsc::unbounded_channel();
tokio::spawn(writer_task(path, rx));
info!("event log enabled");
EventLogger::Active(EventLog { tx })
}
None => EventLogger::Noop,
}
}
/// Background task that writes events to a JSONL file.
async fn writer_task(path: PathBuf, mut rx: mpsc::UnboundedReceiver<Event>) {
use tokio::io::AsyncWriteExt;
let file = match tokio::fs::File::create(&path).await {
Ok(f) => f,
Err(e) => {
error!("failed to create event log {}: {e}", path.display());
return;
}
};
let mut writer = tokio::io::BufWriter::new(file);
let mut count: u64 = 0;
while let Some(event) = rx.recv().await {
match serde_json::to_string(&event) {
Ok(json) => {
if writer.write_all(json.as_bytes()).await.is_err() { break; }
if writer.write_all(b"\n").await.is_err() { break; }
count += 1;
// Flush every 100 events
if count % 100 == 0 {
let _ = writer.flush().await;
}
}
Err(e) => {
error!("event log serialize error: {e}");
}
}
}
let _ = writer.flush().await;
info!(events = count, "event log closed");
}

File diff suppressed because it is too large Load Diff

View File

@@ -78,31 +78,30 @@ pub async fn accept_handshake(
};
transport.send_signal(&answer).await?;
// Derive caller fingerprint from their identity public key (first 8 bytes as hex)
let caller_fp = caller_identity_pub[..8]
.iter()
.map(|b| format!("{b:02x}"))
.collect::<String>();
// Derive caller fingerprint: SHA-256(Ed25519 pub)[:16], formatted as xxxx:xxxx:...
// Must match the format used in signal registration and presence.
let caller_fp = {
use sha2::{Sha256, Digest};
let hash = Sha256::digest(&caller_identity_pub);
let fp = wzp_crypto::Fingerprint([
hash[0], hash[1], hash[2], hash[3], hash[4], hash[5], hash[6], hash[7],
hash[8], hash[9], hash[10], hash[11], hash[12], hash[13], hash[14], hash[15],
]);
fp.to_string()
};
Ok((session, chosen_profile, caller_fp, caller_alias))
}
/// Select the best quality profile from those the caller supports.
fn choose_profile(supported: &[QualityProfile]) -> QualityProfile {
// Prefer higher-quality profiles. Use GOOD as default if supported list is empty.
if supported.is_empty() {
return QualityProfile::GOOD;
}
// Pick the profile with the highest bitrate.
supported
.iter()
.max_by(|a, b| {
a.total_bitrate_kbps()
.partial_cmp(&b.total_bitrate_kbps())
.unwrap_or(std::cmp::Ordering::Equal)
})
.copied()
.unwrap_or(QualityProfile::GOOD)
///
/// The `_supported` list is currently ignored — we hardcode GOOD (24k) until
/// studio tiers (32k/48k/64k) have been validated across federation (large
/// packets may exceed path MTU and fragment in unpleasant ways). Once that's
/// tested, the body should pick the highest supported profile ≤ the relay's
/// configured ceiling.
fn choose_profile(_supported: &[QualityProfile]) -> QualityProfile {
QualityProfile::GOOD
}
#[cfg(test)]

View File

@@ -8,7 +8,11 @@
//! quality transitions.
pub mod auth;
pub mod call_registry;
pub mod config;
pub mod event_log;
pub mod federation;
pub mod signal_hub;
pub mod handshake;
pub mod metrics;
pub mod pipeline;

File diff suppressed because it is too large Load Diff

View File

@@ -16,12 +16,22 @@ pub struct RelayMetrics {
pub bytes_forwarded: IntCounter,
pub auth_attempts: IntCounterVec,
pub handshake_duration: Histogram,
// Federation metrics
pub federation_peer_status: IntGaugeVec,
pub federation_peer_rtt_ms: GaugeVec,
pub federation_packets_forwarded: IntCounterVec,
pub federation_packets_deduped: IntCounter,
pub federation_packets_rate_limited: IntCounter,
pub federation_active_rooms: IntGauge,
// Per-session metrics
pub session_buffer_depth: IntGaugeVec,
pub session_loss_pct: GaugeVec,
pub session_rtt_ms: GaugeVec,
pub session_underruns: IntCounterVec,
pub session_overruns: IntCounterVec,
// Phase 4: loss-recovery breakdown per session.
pub session_dred_reconstructions: IntCounterVec,
pub session_classical_plc: IntCounterVec,
registry: Registry,
}
@@ -60,6 +70,28 @@ impl RelayMetrics {
)
.expect("metric");
let federation_peer_status = IntGaugeVec::new(
Opts::new("wzp_federation_peer_status", "Peer connection status (0=disconnected, 1=connected)"),
&["peer"],
).expect("metric");
let federation_peer_rtt_ms = GaugeVec::new(
Opts::new("wzp_federation_peer_rtt_ms", "QUIC RTT to federated peer in milliseconds"),
&["peer"],
).expect("metric");
let federation_packets_forwarded = IntCounterVec::new(
Opts::new("wzp_federation_packets_forwarded_total", "Packets forwarded to/from federated peers"),
&["peer", "direction"],
).expect("metric");
let federation_packets_deduped = IntCounter::with_opts(
Opts::new("wzp_federation_packets_deduped_total", "Duplicate federation packets dropped"),
).expect("metric");
let federation_packets_rate_limited = IntCounter::with_opts(
Opts::new("wzp_federation_packets_rate_limited_total", "Federation packets dropped by rate limiter"),
).expect("metric");
let federation_active_rooms = IntGauge::with_opts(
Opts::new("wzp_federation_active_rooms", "Number of federated rooms currently active"),
).expect("metric");
let session_buffer_depth = IntGaugeVec::new(
Opts::new(
"wzp_relay_session_jitter_buffer_depth",
@@ -101,17 +133,42 @@ impl RelayMetrics {
)
.expect("metric");
let session_dred_reconstructions = IntCounterVec::new(
Opts::new(
"wzp_relay_session_dred_reconstructions_total",
"Frames reconstructed via DRED (Deep REDundancy) per session",
),
&["session_id"],
)
.expect("metric");
let session_classical_plc = IntCounterVec::new(
Opts::new(
"wzp_relay_session_classical_plc_total",
"Frames filled via classical Opus/Codec2 PLC per session",
),
&["session_id"],
)
.expect("metric");
registry.register(Box::new(active_sessions.clone())).expect("register");
registry.register(Box::new(active_rooms.clone())).expect("register");
registry.register(Box::new(packets_forwarded.clone())).expect("register");
registry.register(Box::new(bytes_forwarded.clone())).expect("register");
registry.register(Box::new(auth_attempts.clone())).expect("register");
registry.register(Box::new(handshake_duration.clone())).expect("register");
registry.register(Box::new(federation_peer_status.clone())).expect("register");
registry.register(Box::new(federation_peer_rtt_ms.clone())).expect("register");
registry.register(Box::new(federation_packets_forwarded.clone())).expect("register");
registry.register(Box::new(federation_packets_deduped.clone())).expect("register");
registry.register(Box::new(federation_packets_rate_limited.clone())).expect("register");
registry.register(Box::new(federation_active_rooms.clone())).expect("register");
registry.register(Box::new(session_buffer_depth.clone())).expect("register");
registry.register(Box::new(session_loss_pct.clone())).expect("register");
registry.register(Box::new(session_rtt_ms.clone())).expect("register");
registry.register(Box::new(session_underruns.clone())).expect("register");
registry.register(Box::new(session_overruns.clone())).expect("register");
registry.register(Box::new(session_dred_reconstructions.clone())).expect("register");
registry.register(Box::new(session_classical_plc.clone())).expect("register");
Self {
active_sessions,
@@ -120,11 +177,19 @@ impl RelayMetrics {
bytes_forwarded,
auth_attempts,
handshake_duration,
federation_peer_status,
federation_peer_rtt_ms,
federation_packets_forwarded,
federation_packets_deduped,
federation_packets_rate_limited,
federation_active_rooms,
session_buffer_depth,
session_loss_pct,
session_rtt_ms,
session_underruns,
session_overruns,
session_dred_reconstructions,
session_classical_plc,
registry,
}
}
@@ -176,6 +241,39 @@ impl RelayMetrics {
}
}
/// Phase 4: update per-session loss-recovery counters from a client's
/// `LossRecoveryUpdate` signal message. The client sends monotonic
/// totals (frames reconstructed since call start); we compute the
/// delta against the current Prometheus counter and increment by it.
/// IntCounterVec only increases, so a client restart that resets the
/// counter to 0 simply produces no delta until the new totals exceed
/// the Prometheus state.
pub fn update_session_loss_recovery(
&self,
session_id: &str,
dred_reconstructions: u64,
classical_plc: u64,
) {
let cur_dred = self
.session_dred_reconstructions
.with_label_values(&[session_id])
.get();
if dred_reconstructions > cur_dred {
self.session_dred_reconstructions
.with_label_values(&[session_id])
.inc_by(dred_reconstructions - cur_dred);
}
let cur_plc = self
.session_classical_plc
.with_label_values(&[session_id])
.get();
if classical_plc > cur_plc {
self.session_classical_plc
.with_label_values(&[session_id])
.inc_by(classical_plc - cur_plc);
}
}
/// Remove all per-session label values for a disconnected session.
pub fn remove_session_metrics(&self, session_id: &str) {
let _ = self.session_buffer_depth.remove_label_values(&[session_id]);
@@ -183,6 +281,10 @@ impl RelayMetrics {
let _ = self.session_rtt_ms.remove_label_values(&[session_id]);
let _ = self.session_underruns.remove_label_values(&[session_id]);
let _ = self.session_overruns.remove_label_values(&[session_id]);
let _ = self
.session_dred_reconstructions
.remove_label_values(&[session_id]);
let _ = self.session_classical_plc.remove_label_values(&[session_id]);
}
/// Get a reference to the underlying Prometheus registry.
@@ -377,10 +479,13 @@ mod tests {
};
m.update_session_quality("sess-cleanup", &report);
m.update_session_buffer("sess-cleanup", 42, 3, 1);
m.update_session_loss_recovery("sess-cleanup", 17, 4);
// Verify they appear
let output = m.metrics_handler();
assert!(output.contains("sess-cleanup"));
assert!(output.contains("wzp_relay_session_dred_reconstructions_total"));
assert!(output.contains("wzp_relay_session_classical_plc_total"));
// Remove and verify they are gone
m.remove_session_metrics("sess-cleanup");
@@ -388,6 +493,55 @@ mod tests {
assert!(!output.contains("sess-cleanup"));
}
/// Phase 4: LossRecoveryUpdate → per-session counters, monotonic delta
/// application.
#[test]
fn session_loss_recovery_monotonic_delta() {
let m = RelayMetrics::new();
let sess = "sess-dred";
// First update: 10 DRED, 2 PLC
m.update_session_loss_recovery(sess, 10, 2);
let dred1 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc1 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred1, 10);
assert_eq!(plc1, 2);
// Second update: 25 DRED, 5 PLC — counter advances by (15, 3)
m.update_session_loss_recovery(sess, 25, 5);
let dred2 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc2 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred2, 25);
assert_eq!(plc2, 5);
// Third update with LOWER values (e.g., client reset) — counters
// hold steady, no decrement.
m.update_session_loss_recovery(sess, 5, 1);
let dred3 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc3 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred3, 25, "counter must not decrease");
assert_eq!(plc3, 5, "counter must not decrease");
// Fourth update: client caught up and exceeded the old max.
m.update_session_loss_recovery(sess, 30, 8);
let dred4 = m
.session_dred_reconstructions
.with_label_values(&[sess])
.get();
let plc4 = m.session_classical_plc.with_label_values(&[sess]).get();
assert_eq!(dred4, 30);
assert_eq!(plc4, 8);
}
#[test]
fn metrics_increment() {
let m = RelayMetrics::new();

View File

@@ -10,14 +10,87 @@ use std::time::Duration;
use bytes::Bytes;
use tokio::sync::Mutex;
use tracing::{debug, error, info, trace, warn};
use tracing::{error, info, warn};
use wzp_proto::packet::TrunkFrame;
use wzp_proto::quality::{AdaptiveQualityController, Tier};
use wzp_proto::traits::QualityController;
use wzp_proto::MediaTransport;
use crate::metrics::RelayMetrics;
use crate::trunk::TrunkBatcher;
/// Debug tap: logs packet metadata for matching rooms.
#[derive(Clone)]
pub struct DebugTap {
/// Room name filter ("*" = all rooms, or specific room name/hash).
pub room_filter: String,
}
impl DebugTap {
pub fn matches(&self, room_name: &str) -> bool {
self.room_filter == "*" || self.room_filter == room_name
}
pub fn log_packet(&self, room: &str, dir: &str, addr: &std::net::SocketAddr, pkt: &wzp_proto::MediaPacket, fan_out: usize) {
let h = &pkt.header;
info!(
target: "debug_tap",
room = %room,
dir = dir,
addr = %addr,
seq = h.seq,
codec = ?h.codec_id,
ts = h.timestamp,
fec_block = h.fec_block,
fec_sym = h.fec_symbol,
repair = h.is_repair,
len = pkt.payload.len(),
fan_out,
"TAP"
);
}
}
/// Tracks network quality for a single participant in a room.
struct ParticipantQuality {
controller: AdaptiveQualityController,
current_tier: Tier,
}
impl ParticipantQuality {
fn new() -> Self {
Self {
controller: AdaptiveQualityController::new(),
current_tier: Tier::Good,
}
}
/// Feed a quality report and return the new tier if it changed.
fn observe(&mut self, report: &wzp_proto::packet::QualityReport) -> Option<Tier> {
let _ = self.controller.observe(report);
let new_tier = self.controller.tier();
if new_tier != self.current_tier {
self.current_tier = new_tier;
Some(new_tier)
} else {
None
}
}
}
/// Compute the weakest (worst) quality tier across all tracked participants.
fn weakest_tier<'a>(qualities: impl Iterator<Item = &'a ParticipantQuality>) -> Tier {
qualities
.map(|pq| pq.current_tier)
.min_by_key(|t| match t {
Tier::Good => 2,
Tier::Degraded => 1,
Tier::Catastrophic => 0,
})
.unwrap_or(Tier::Good)
}
/// Unique participant ID within a room.
pub type ParticipantId = u64;
@@ -27,6 +100,22 @@ fn next_id() -> ParticipantId {
NEXT_PARTICIPANT_ID.fetch_add(1, Ordering::Relaxed)
}
/// Events emitted by RoomManager for federation to observe.
#[derive(Clone, Debug)]
pub enum RoomEvent {
/// First local participant joined this room.
LocalJoin { room: String },
/// Last local participant left this room.
LocalLeave { room: String },
}
/// Outbound federation media from a local participant.
pub struct FederationMediaOut {
pub room_name: String,
pub room_hash: [u8; 8],
pub data: Bytes,
}
/// How to send data to a participant — either via QUIC transport or WebSocket channel.
#[derive(Clone)]
pub enum ParticipantSender {
@@ -132,6 +221,7 @@ impl Room {
.map(|p| wzp_proto::packet::RoomParticipant {
fingerprint: p.fingerprint.clone().unwrap_or_default(),
alias: p.alias.clone(),
relay_label: None, // local participant
})
.collect()
}
@@ -141,17 +231,6 @@ impl Room {
self.participants.iter().map(|p| p.sender.clone()).collect()
}
/// Update a participant's alias. Returns true if the participant was found.
fn set_alias(&mut self, id: ParticipantId, alias: String) -> bool {
if let Some(p) = self.participants.iter_mut().find(|p| p.id == id) {
info!(participant = id, %alias, "alias updated");
p.alias = Some(alias);
true
} else {
false
}
}
fn is_empty(&self) -> bool {
self.participants.is_empty()
}
@@ -168,24 +247,43 @@ pub struct RoomManager {
/// When `None`, rooms are open (no auth mode). When `Some`, only listed
/// fingerprints can join the corresponding room.
acl: Option<HashMap<String, HashSet<String>>>,
/// Channel for room lifecycle events (federation subscribes).
event_tx: tokio::sync::broadcast::Sender<RoomEvent>,
/// Per-participant quality tracking, keyed by (room_name, participant_id).
qualities: HashMap<(String, ParticipantId), ParticipantQuality>,
/// Current room-wide tier per room (to avoid repeated broadcasts).
room_tiers: HashMap<String, Tier>,
}
impl RoomManager {
pub fn new() -> Self {
let (event_tx, _) = tokio::sync::broadcast::channel(64);
Self {
rooms: HashMap::new(),
acl: None,
event_tx,
qualities: HashMap::new(),
room_tiers: HashMap::new(),
}
}
/// Create a room manager with ACL enforcement enabled.
pub fn with_acl() -> Self {
let (event_tx, _) = tokio::sync::broadcast::channel(64);
Self {
rooms: HashMap::new(),
acl: Some(HashMap::new()),
event_tx,
qualities: HashMap::new(),
room_tiers: HashMap::new(),
}
}
/// Subscribe to room lifecycle events (for federation).
pub fn subscribe_events(&self) -> tokio::sync::broadcast::Receiver<RoomEvent> {
self.event_tx.subscribe()
}
/// Grant a fingerprint access to a room.
pub fn allow(&mut self, room_name: &str, fingerprint: &str) {
if let Some(ref mut acl) = self.acl {
@@ -224,8 +322,14 @@ impl RoomManager {
warn!(room = room_name, fingerprint = ?fingerprint, "unauthorized room join attempt");
return Err("not authorized for this room".to_string());
}
let was_empty = !self.rooms.contains_key(room_name)
|| self.rooms.get(room_name).map_or(true, |r| r.is_empty());
let room = self.rooms.entry(room_name.to_string()).or_insert_with(Room::new);
let id = room.add(addr, sender, fingerprint.map(|s| s.to_string()), alias.map(|s| s.to_string()));
self.qualities.insert((room_name.to_string(), id), ParticipantQuality::new());
if was_empty {
let _ = self.event_tx.send(RoomEvent::LocalJoin { room: room_name.to_string() });
}
let update = wzp_proto::SignalMessage::RoomUpdate {
count: room.len() as u32,
participants: room.participant_list(),
@@ -246,12 +350,36 @@ impl RoomManager {
Ok(id)
}
/// Get list of active room names.
pub fn active_rooms(&self) -> Vec<String> {
self.rooms.keys().cloned().collect()
}
/// Get participant list for a room (fingerprint + alias).
pub fn local_participant_list(&self, room_name: &str) -> Vec<wzp_proto::packet::RoomParticipant> {
self.rooms.get(room_name)
.map(|room| room.participant_list())
.unwrap_or_default()
}
/// Get all senders for participants in a room (for federation inbound media delivery).
pub fn local_senders(&self, room_name: &str) -> Vec<ParticipantSender> {
self.rooms.get(room_name)
.map(|room| room.participants.iter()
.map(|p| p.sender.clone())
.collect())
.unwrap_or_default()
}
/// Leave a room. Returns (room_update_msg, remaining_senders) for broadcasting, or None if room is now empty.
pub fn leave(&mut self, room_name: &str, participant_id: ParticipantId) -> Option<(wzp_proto::SignalMessage, Vec<ParticipantSender>)> {
self.qualities.remove(&(room_name.to_string(), participant_id));
if let Some(room) = self.rooms.get_mut(room_name) {
room.remove(participant_id);
if room.is_empty() {
self.rooms.remove(room_name);
self.room_tiers.remove(room_name);
let _ = self.event_tx.send(RoomEvent::LocalLeave { room: room_name.to_string() });
info!(room = room_name, "room closed (empty)");
return None;
}
@@ -266,26 +394,6 @@ impl RoomManager {
}
}
/// Update a participant's alias and return a RoomUpdate + senders for broadcasting.
pub fn set_alias(
&mut self,
room_name: &str,
participant_id: ParticipantId,
alias: String,
) -> Option<(wzp_proto::SignalMessage, Vec<ParticipantSender>)> {
if let Some(room) = self.rooms.get_mut(room_name) {
if room.set_alias(participant_id, alias) {
let update = wzp_proto::SignalMessage::RoomUpdate {
count: room.len() as u32,
participants: room.participant_list(),
};
let senders = room.all_senders();
return Some((update, senders));
}
}
None
}
/// Get senders for all OTHER participants in a room.
pub fn others(
&self,
@@ -307,6 +415,58 @@ impl RoomManager {
pub fn list(&self) -> Vec<(String, usize)> {
self.rooms.iter().map(|(k, v)| (k.clone(), v.len())).collect()
}
/// Feed a quality report from a participant. If the room-wide weakest
/// tier changes, returns `(QualityDirective signal, all senders)` for
/// broadcasting.
pub fn observe_quality(
&mut self,
room_name: &str,
participant_id: ParticipantId,
report: &wzp_proto::packet::QualityReport,
) -> Option<(wzp_proto::SignalMessage, Vec<ParticipantSender>)> {
let key = (room_name.to_string(), participant_id);
let tier_changed = self.qualities
.get_mut(&key)
.and_then(|pq| pq.observe(report))
.is_some();
if !tier_changed {
return None;
}
// Compute the weakest tier across all participants in this room
let room_qualities = self.qualities.iter()
.filter(|((rn, _), _)| rn == room_name)
.map(|(_, pq)| pq);
let weakest = weakest_tier(room_qualities);
let current_room_tier = self.room_tiers.get(room_name).copied().unwrap_or(Tier::Good);
if weakest == current_room_tier {
return None;
}
// Room-wide tier changed — update and broadcast directive
self.room_tiers.insert(room_name.to_string(), weakest);
let profile = weakest.profile();
info!(
room = room_name,
old_tier = ?current_room_tier,
new_tier = ?weakest,
codec = ?profile.codec,
fec_ratio = profile.fec_ratio,
"room quality directive"
);
let directive = wzp_proto::SignalMessage::QualityDirective {
recommended_profile: profile,
reason: Some(format!("weakest link: {weakest:?}")),
};
let senders = self.rooms.get(room_name)
.map(|r| r.all_senders())
.unwrap_or_default();
Some((directive, senders))
}
}
// ---------------------------------------------------------------------------
@@ -326,18 +486,32 @@ impl TrunkedForwarder {
/// Create a new trunked forwarder.
///
/// `session_id` tags every entry pushed into the batcher so the receiver
/// can demultiplex packets by session.
/// can demultiplex packets by session. The batcher's `max_bytes` is
/// initialized from the transport's current PMTUD-discovered MTU so that
/// trunk frames fill the largest datagram the path supports (instead of
/// the conservative 1200-byte default).
pub fn new(transport: Arc<wzp_transport::QuinnTransport>, session_id: [u8; 2]) -> Self {
let mut batcher = TrunkBatcher::new();
if let Some(mtu) = transport.max_datagram_size() {
batcher.max_bytes = mtu;
}
Self {
transport,
batcher: TrunkBatcher::new(),
batcher,
session_id,
}
}
/// Push a media packet into the batcher. If the batcher is full it will
/// flush automatically and the resulting trunk frame is sent immediately.
///
/// Also refreshes `max_bytes` from the transport's PMTUD-discovered MTU
/// so the batcher fills larger datagrams as the path MTU grows.
pub async fn send(&mut self, pkt: &wzp_proto::MediaPacket) -> anyhow::Result<()> {
// Refresh batcher limit from PMTUD (cheap: reads an atomic in quinn).
if let Some(mtu) = self.transport.max_datagram_size() {
self.batcher.max_bytes = mtu;
}
let payload: Bytes = pkt.to_bytes();
if let Some(frame) = self.batcher.push(self.session_id, payload) {
self.send_frame(&frame)?;
@@ -381,6 +555,9 @@ pub async fn run_participant(
metrics: Arc<RelayMetrics>,
session_id: &str,
trunking_enabled: bool,
debug_tap: Option<DebugTap>,
federation_tx: Option<tokio::sync::mpsc::Sender<FederationMediaOut>>,
federation_room_hash: Option<[u8; 8]>,
) {
if trunking_enabled {
run_participant_trunked(
@@ -389,7 +566,7 @@ pub async fn run_participant(
.await;
} else {
run_participant_plain(
room_mgr, room_name, participant_id, transport, metrics, session_id,
room_mgr, room_name, participant_id, transport, metrics, session_id, debug_tap, federation_tx, federation_room_hash,
)
.await;
}
@@ -403,168 +580,174 @@ async fn run_participant_plain(
transport: Arc<wzp_transport::QuinnTransport>,
metrics: Arc<RelayMetrics>,
session_id: &str,
debug_tap: Option<DebugTap>,
federation_tx: Option<tokio::sync::mpsc::Sender<FederationMediaOut>>,
federation_room_hash: Option<[u8; 8]>,
) {
let addr = transport.connection().remote_address();
let mut packets_forwarded = 0u64;
let mut last_recv_instant = std::time::Instant::now();
let mut max_recv_gap_ms = 0u64;
let mut max_forward_ms = 0u64;
let mut send_errors = 0u64;
let mut last_log_instant = std::time::Instant::now();
// Media forwarding task (with debug logging from Android fixes)
let media_room_mgr = room_mgr.clone();
let media_room_name = room_name.clone();
let media_transport = transport.clone();
let media_metrics = metrics.clone();
let media_session_id = session_id.to_string();
let media_task = async move {
let mut packets_forwarded = 0u64;
let mut last_recv_instant = std::time::Instant::now();
let mut max_recv_gap_ms = 0u64;
let mut max_forward_ms = 0u64;
let mut send_errors = 0u64;
let mut last_log_instant = std::time::Instant::now();
info!(
room = %room_name,
participant = participant_id,
%addr,
session = session_id,
"forwarding loop started (plain)"
);
info!(
room = %media_room_name,
participant = participant_id,
%addr,
session = %media_session_id,
"forwarding loop started (plain)"
);
loop {
let pkt = match media_transport.recv_media().await {
Ok(Some(pkt)) => pkt,
Ok(None) => {
info!(%addr, participant = participant_id, forwarded = packets_forwarded, "disconnected (stream ended)");
break;
}
Err(e) => {
let msg = e.to_string();
if msg.contains("timed out") || msg.contains("reset") || msg.contains("closed") {
info!(%addr, participant = participant_id, forwarded = packets_forwarded, "connection closed: {e}");
} else {
error!(%addr, participant = participant_id, forwarded = packets_forwarded, "recv error: {e}");
}
break;
loop {
let pkt = match transport.recv_media().await {
Ok(Some(pkt)) => pkt,
Ok(None) => {
info!(%addr, participant = participant_id, forwarded = packets_forwarded, "disconnected (stream ended)");
break;
}
Err(e) => {
let msg = e.to_string();
if msg.contains("timed out") || msg.contains("reset") || msg.contains("closed") {
info!(%addr, participant = participant_id, forwarded = packets_forwarded, "connection closed: {e}");
} else {
error!(%addr, participant = participant_id, forwarded = packets_forwarded, "recv error: {e}");
}
break;
}
};
let recv_gap_ms = last_recv_instant.elapsed().as_millis() as u64;
last_recv_instant = std::time::Instant::now();
if recv_gap_ms > max_recv_gap_ms {
max_recv_gap_ms = recv_gap_ms;
}
// Log if recv gap is suspiciously large (>200ms = missed ~10 packets)
if recv_gap_ms > 200 {
warn!(
room = %room_name,
participant = participant_id,
recv_gap_ms,
seq = pkt.header.seq,
"large recv gap"
);
}
// Update per-session quality metrics if a quality report is present
if let Some(ref report) = pkt.quality_report {
metrics.update_session_quality(session_id, report);
}
// Get current list of other participants + check quality directive
let lock_start = std::time::Instant::now();
let (others, quality_directive) = {
let mut mgr = room_mgr.lock().await;
let directive = if let Some(ref report) = pkt.quality_report {
mgr.observe_quality(&room_name, participant_id, report)
} else {
None
};
let o = mgr.others(&room_name, participant_id);
(o, directive)
};
let lock_ms = lock_start.elapsed().as_millis() as u64;
if lock_ms > 10 {
warn!(
room = %room_name,
participant = participant_id,
lock_ms,
"slow room_mgr lock"
);
}
let recv_gap_ms = last_recv_instant.elapsed().as_millis() as u64;
last_recv_instant = std::time::Instant::now();
if recv_gap_ms > max_recv_gap_ms {
max_recv_gap_ms = recv_gap_ms;
}
if recv_gap_ms > 200 {
warn!(
room = %media_room_name,
participant = participant_id,
recv_gap_ms,
seq = pkt.header.seq,
"large recv gap"
);
}
// Broadcast quality directive to all participants if tier changed
if let Some((directive, all_senders)) = quality_directive {
broadcast_signal(&all_senders, &directive).await;
}
if let Some(ref report) = pkt.quality_report {
media_metrics.update_session_quality(&media_session_id, report);
// Debug tap: log packet metadata
if let Some(ref tap) = debug_tap {
if tap.matches(&room_name) {
tap.log_packet(&room_name, "in", &addr, &pkt, others.len());
}
}
let lock_start = std::time::Instant::now();
let others = {
let mgr = media_room_mgr.lock().await;
mgr.others(&media_room_name, participant_id)
};
let lock_ms = lock_start.elapsed().as_millis() as u64;
if lock_ms > 10 {
warn!(room = %media_room_name, participant = participant_id, lock_ms, "slow room_mgr lock");
}
let fwd_start = std::time::Instant::now();
let pkt_bytes = pkt.payload.len() as u64;
for other in &others {
match other {
ParticipantSender::Quic(t) => {
if let Err(e) = t.send_media(&pkt).await {
send_errors += 1;
if send_errors <= 5 || send_errors % 100 == 0 {
warn!(
room = %media_room_name,
participant = participant_id,
peer = %t.connection().remote_address(),
total_send_errors = send_errors,
"send_media error: {e}"
);
}
// Forward to all others
let fwd_start = std::time::Instant::now();
let pkt_bytes = pkt.payload.len() as u64;
for other in &others {
match other {
ParticipantSender::Quic(t) => {
if let Err(e) = t.send_media(&pkt).await {
send_errors += 1;
if send_errors <= 5 || send_errors % 100 == 0 {
warn!(
room = %room_name,
participant = participant_id,
peer = %t.connection().remote_address(),
total_send_errors = send_errors,
"send_media error: {e}"
);
}
}
ParticipantSender::WebSocket(_) => {
let _ = other.send_raw(&pkt.payload).await;
}
}
}
let fwd_ms = fwd_start.elapsed().as_millis() as u64;
if fwd_ms > max_forward_ms { max_forward_ms = fwd_ms; }
if fwd_ms > 50 {
warn!(room = %media_room_name, participant = participant_id, fwd_ms, fan_out = others.len(), "slow forward");
}
let fan_out = others.len() as u64;
media_metrics.packets_forwarded.inc_by(fan_out);
media_metrics.bytes_forwarded.inc_by(pkt_bytes * fan_out);
packets_forwarded += 1;
if last_log_instant.elapsed() >= Duration::from_secs(5) {
let room_size = {
let mgr = media_room_mgr.lock().await;
mgr.room_size(&media_room_name)
};
info!(
room = %media_room_name,
participant = participant_id,
forwarded = packets_forwarded,
room_size, fan_out, max_recv_gap_ms, max_forward_ms, send_errors,
"participant stats"
);
max_recv_gap_ms = 0;
max_forward_ms = 0;
last_log_instant = std::time::Instant::now();
}
}
};
// Signal handling task — processes SetAlias and other in-call signals
let signal_room_mgr = room_mgr.clone();
let signal_room_name = room_name.clone();
let signal_transport = transport.clone();
let signal_task = async move {
loop {
match signal_transport.recv_signal().await {
Ok(Some(wzp_proto::SignalMessage::SetAlias { alias })) => {
info!(%addr, participant = participant_id, %alias, "SetAlias received");
let mut mgr = signal_room_mgr.lock().await;
if let Some((update, senders)) =
mgr.set_alias(&signal_room_name, participant_id, alias)
{
drop(mgr);
broadcast_signal(&senders, &update).await;
}
}
Ok(Some(wzp_proto::SignalMessage::Hangup { .. })) => {
info!(%addr, participant = participant_id, "hangup received");
break;
}
Ok(Some(msg)) => {
info!(%addr, participant = participant_id, "signal: {:?}", std::mem::discriminant(&msg));
}
Ok(None) => break,
Err(e) => {
warn!(%addr, participant = participant_id, "signal recv error: {e}");
break;
ParticipantSender::WebSocket(_) => {
let _ = other.send_raw(&pkt.payload).await;
}
}
}
};
// Run both in parallel — exit when either finishes (disconnection)
tokio::select! {
_ = media_task => {}
_ = signal_task => {}
// Federation: forward to active peer relays via channel
if let Some(ref fed_tx) = federation_tx {
let data = pkt.to_bytes();
let _ = fed_tx.try_send(FederationMediaOut {
room_name: room_name.clone(),
room_hash: federation_room_hash.unwrap_or_else(|| crate::federation::room_hash(&room_name)),
data,
});
}
let fwd_ms = fwd_start.elapsed().as_millis() as u64;
if fwd_ms > max_forward_ms {
max_forward_ms = fwd_ms;
}
if fwd_ms > 50 {
warn!(
room = %room_name,
participant = participant_id,
fwd_ms,
fan_out = others.len(),
"slow forward"
);
}
let fan_out = others.len() as u64;
metrics.packets_forwarded.inc_by(fan_out);
metrics.bytes_forwarded.inc_by(pkt_bytes * fan_out);
packets_forwarded += 1;
// Periodic stats log every 5 seconds
if last_log_instant.elapsed() >= Duration::from_secs(5) {
let room_size = {
let mgr = room_mgr.lock().await;
mgr.room_size(&room_name)
};
info!(
room = %room_name,
participant = participant_id,
forwarded = packets_forwarded,
room_size,
fan_out,
max_recv_gap_ms,
max_forward_ms,
send_errors,
"participant stats"
);
max_recv_gap_ms = 0;
max_forward_ms = 0;
last_log_instant = std::time::Instant::now();
}
}
// Clean up — leave room and broadcast update to remaining participants
@@ -651,9 +834,15 @@ async fn run_participant_trunked(
}
let lock_start = std::time::Instant::now();
let others = {
let mgr = room_mgr.lock().await;
mgr.others(&room_name, participant_id)
let (others, quality_directive) = {
let mut mgr = room_mgr.lock().await;
let directive = if let Some(ref report) = pkt.quality_report {
mgr.observe_quality(&room_name, participant_id, report)
} else {
None
};
let o = mgr.others(&room_name, participant_id);
(o, directive)
};
let lock_ms = lock_start.elapsed().as_millis() as u64;
if lock_ms > 10 {
@@ -665,6 +854,11 @@ async fn run_participant_trunked(
);
}
// Broadcast quality directive to all participants if tier changed
if let Some((directive, all_senders)) = quality_directive {
broadcast_signal(&all_senders, &directive).await;
}
let fwd_start = std::time::Instant::now();
let pkt_bytes = pkt.payload.len() as u64;
for other in &others {
@@ -783,7 +977,7 @@ mod tests {
#[test]
fn room_join_leave() {
let mut mgr = RoomManager::new();
let mgr = RoomManager::new();
assert_eq!(mgr.room_size("test"), 0);
assert!(mgr.list().is_empty());
}
@@ -905,4 +1099,47 @@ mod tests {
// Batcher should now be empty — nothing to flush.
assert!(batcher.flush().is_none());
}
fn make_report(loss_pct_f: f32, rtt_ms: u16) -> wzp_proto::packet::QualityReport {
wzp_proto::packet::QualityReport {
loss_pct: (loss_pct_f / 100.0 * 255.0) as u8,
rtt_4ms: (rtt_ms / 4) as u8,
jitter_ms: 10,
bitrate_cap_kbps: 200,
}
}
#[test]
fn participant_quality_starts_good() {
let pq = ParticipantQuality::new();
assert_eq!(pq.current_tier, Tier::Good);
}
#[test]
fn participant_quality_degrades_on_bad_reports() {
let mut pq = ParticipantQuality::new();
let bad = make_report(50.0, 300);
// Feed enough bad reports to trigger downgrade (3 consecutive)
for _ in 0..5 {
pq.observe(&bad);
}
assert_ne!(pq.current_tier, Tier::Good, "should degrade from Good");
}
#[test]
fn weakest_tier_picks_worst() {
let good = ParticipantQuality::new();
// good stays at Good tier
let mut bad = ParticipantQuality::new();
let bad_report = make_report(50.0, 300);
for _ in 0..5 {
bad.observe(&bad_report);
}
// bad should be degraded or catastrophic
let participants = vec![good, bad];
let weakest = weakest_tier(participants.iter());
assert_ne!(weakest, Tier::Good, "weakest should not be Good when one participant is bad");
}
}

View File

@@ -0,0 +1,105 @@
//! Persistent signaling connection manager.
//!
//! Tracks clients connected via `_signal` SNI. Routes call signals
//! (DirectCallOffer, DirectCallAnswer, Hangup) between registered users.
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Instant;
use tracing::info;
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_transport::QuinnTransport;
/// A client connected via `_signal` for direct calling.
pub struct SignalClient {
pub fingerprint: String,
pub alias: Option<String>,
pub transport: Arc<QuinnTransport>,
pub connected_at: Instant,
}
/// Manages persistent signaling connections.
pub struct SignalHub {
clients: HashMap<String, SignalClient>,
}
impl SignalHub {
pub fn new() -> Self {
Self {
clients: HashMap::new(),
}
}
/// Register a new signaling client.
pub fn register(&mut self, fp: String, transport: Arc<QuinnTransport>, alias: Option<String>) {
info!(fingerprint = %fp, alias = ?alias, "signal client registered");
self.clients.insert(fp.clone(), SignalClient {
fingerprint: fp,
alias,
transport,
connected_at: Instant::now(),
});
}
/// Unregister a signaling client. Returns the client if found.
pub fn unregister(&mut self, fp: &str) -> Option<SignalClient> {
let client = self.clients.remove(fp);
if client.is_some() {
info!(fingerprint = %fp, "signal client unregistered");
}
client
}
/// Look up a client by fingerprint.
pub fn get(&self, fp: &str) -> Option<&SignalClient> {
self.clients.get(fp)
}
/// Check if a fingerprint is online.
pub fn is_online(&self, fp: &str) -> bool {
self.clients.contains_key(fp)
}
/// Send a signal message to a client by fingerprint.
pub async fn send_to(&self, fp: &str, msg: &SignalMessage) -> Result<(), String> {
match self.clients.get(fp) {
Some(client) => {
client.transport.send_signal(msg).await
.map_err(|e| format!("send to {fp}: {e}"))
}
None => Err(format!("{fp} not online")),
}
}
/// Number of connected signaling clients.
pub fn online_count(&self) -> usize {
self.clients.len()
}
/// List all online fingerprints.
pub fn online_fingerprints(&self) -> Vec<&str> {
self.clients.keys().map(|s| s.as_str()).collect()
}
/// Get alias for a fingerprint.
pub fn alias(&self, fp: &str) -> Option<&str> {
self.clients.get(fp).and_then(|c| c.alias.as_deref())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn register_unregister() {
let hub = SignalHub::new();
assert_eq!(hub.online_count(), 0);
assert!(!hub.is_online("alice"));
// Can't easily construct QuinnTransport in a unit test,
// so we just test the HashMap logic conceptually.
// Integration tests cover the full flow.
}
}

View File

@@ -0,0 +1,317 @@
//! Phase 4 integration test for cross-relay direct calling
//! (PRD: .taskmaster/docs/prd_phase4_cross_relay_p2p.txt).
//!
//! Drives the call-registry cross-wiring + a simulated federation
//! forward without spinning up actual relay binaries. The real
//! main-loop and dispatcher code are exercised end-to-end in
//! `reflect.rs` / `hole_punching.rs` already; this file focuses on
//! the *new* invariants Phase 4 adds:
//!
//! 1. When Relay A forwards a DirectCallOffer, its local registry
//! stashes caller_reflexive_addr and leaves peer_relay_fp
//! unset (broadcast, answer-side will identify itself).
//! 2. When Relay B's cross-relay dispatcher receives the forward,
//! its local registry stores the call with
//! peer_relay_fp = Some(relay_a_tls_fp).
//! 3. When Relay B processes the local callee's answer, it sees
//! peer_relay_fp.is_some() and MUST NOT deliver the answer via
//! local signal_hub — instead it routes through federation.
//! 4. When Relay A receives the forwarded answer via its
//! cross-relay dispatcher, it stashes callee_reflexive_addr
//! and emits a CallSetup to its local caller with
//! peer_direct_addr = callee_addr.
//! 5. Final state: Alice's CallSetup carries Bob's reflex addr,
//! Bob's CallSetup carries Alice's reflex addr — cross-wired
//! through two relays + a federation link.
use wzp_proto::{CallAcceptMode, SignalMessage};
use wzp_relay::call_registry::CallRegistry;
// ────────────────────────────────────────────────────────────────
// Simulated dispatch helpers — these reproduce the exact logic
// in main.rs without the tokio + federation boilerplate.
// ────────────────────────────────────────────────────────────────
const RELAY_A_TLS_FP: &str = "relay-A-tls-fingerprint";
const RELAY_B_TLS_FP: &str = "relay-B-tls-fingerprint";
const ALICE_ADDR: &str = "192.0.2.1:4433";
const BOB_ADDR: &str = "198.51.100.9:4433";
const RELAY_A_ADDR: &str = "203.0.113.5:4433";
const RELAY_B_ADDR: &str = "203.0.113.10:4433";
/// Helper that Alice's place_call sends.
fn alice_offer(call_id: &str) -> SignalMessage {
SignalMessage::DirectCallOffer {
caller_fingerprint: "alice".into(),
caller_alias: None,
target_fingerprint: "bob".into(),
call_id: call_id.into(),
identity_pub: [0; 32],
ephemeral_pub: [0; 32],
signature: vec![],
supported_profiles: vec![],
caller_reflexive_addr: Some(ALICE_ADDR.into()),
caller_local_addrs: Vec::new(),
caller_build_version: None,
}
}
/// Relay A receives Alice's offer. Target Bob is not local.
/// Relay A wraps + broadcasts over federation, stashes the call
/// locally with peer_relay_fp = None (broadcast — answer-side
/// identifies itself).
fn relay_a_handle_offer(reg_a: &mut CallRegistry, offer: &SignalMessage) -> SignalMessage {
match offer {
SignalMessage::DirectCallOffer {
caller_fingerprint,
target_fingerprint,
call_id,
caller_reflexive_addr,
..
} => {
reg_a.create_call(
call_id.clone(),
caller_fingerprint.clone(),
target_fingerprint.clone(),
);
reg_a.set_caller_reflexive_addr(call_id, caller_reflexive_addr.clone());
// peer_relay_fp stays None — we don't know which peer
// will respond yet.
}
_ => panic!("not an offer"),
}
// Build the federation envelope the main loop would
// broadcast.
SignalMessage::FederatedSignalForward {
inner: Box::new(offer.clone()),
origin_relay_fp: RELAY_A_TLS_FP.into(),
}
}
/// Relay B receives a FederatedSignalForward(DirectCallOffer).
/// This is the cross-relay dispatcher task code in main.rs —
/// reproduced here for the test.
fn relay_b_handle_forwarded_offer(reg_b: &mut CallRegistry, forward: &SignalMessage) {
let (inner, origin_relay_fp) = match forward {
SignalMessage::FederatedSignalForward { inner, origin_relay_fp } => {
(inner.as_ref().clone(), origin_relay_fp.clone())
}
_ => panic!("not a forward"),
};
// Loop-prevention: drop self-sourced.
assert_ne!(origin_relay_fp, RELAY_B_TLS_FP);
let SignalMessage::DirectCallOffer {
caller_fingerprint,
target_fingerprint,
call_id,
caller_reflexive_addr,
..
} = inner
else {
panic!("inner was not DirectCallOffer");
};
// Simulated: target is local to B (Bob is registered here).
reg_b.create_call(
call_id.clone(),
caller_fingerprint,
target_fingerprint,
);
reg_b.set_caller_reflexive_addr(&call_id, caller_reflexive_addr);
reg_b.set_peer_relay_fp(&call_id, Some(origin_relay_fp));
}
/// Bob's answer — AcceptTrusted with his reflex addr.
fn bob_answer(call_id: &str) -> SignalMessage {
SignalMessage::DirectCallAnswer {
call_id: call_id.into(),
accept_mode: CallAcceptMode::AcceptTrusted,
identity_pub: None,
ephemeral_pub: None,
signature: None,
chosen_profile: None,
callee_reflexive_addr: Some(BOB_ADDR.into()),
callee_local_addrs: Vec::new(),
callee_build_version: None,
}
}
/// Relay B handles the LOCAL callee's answer. If peer_relay_fp
/// is Some, wrap the answer in a FederatedSignalForward + emit the
/// local CallSetup to Bob. Returns the (forward_envelope,
/// bob_call_setup) pair.
fn relay_b_handle_local_answer(
reg_b: &mut CallRegistry,
answer: &SignalMessage,
) -> (SignalMessage, SignalMessage) {
let (call_id, mode, callee_addr) = match answer {
SignalMessage::DirectCallAnswer {
call_id,
accept_mode,
callee_reflexive_addr,
..
} => (call_id.clone(), *accept_mode, callee_reflexive_addr.clone()),
_ => panic!(),
};
// Stash callee addr + activate.
reg_b.set_active(&call_id, mode, format!("call-{call_id}"));
reg_b.set_callee_reflexive_addr(&call_id, callee_addr);
let call = reg_b.get(&call_id).unwrap();
let caller_addr = call.caller_reflexive_addr.clone();
let callee_addr = call.callee_reflexive_addr.clone();
assert!(
call.peer_relay_fp.is_some(),
"Relay B must know this call is cross-relay"
);
// Forward the answer back over federation.
let forward = SignalMessage::FederatedSignalForward {
inner: Box::new(answer.clone()),
origin_relay_fp: RELAY_B_TLS_FP.into(),
};
// Local CallSetup for Bob — peer_direct_addr = Alice's addr.
let setup_for_bob = SignalMessage::CallSetup {
call_id: call_id.clone(),
room: format!("call-{call_id}"),
relay_addr: RELAY_B_ADDR.into(),
peer_direct_addr: caller_addr,
peer_local_addrs: Vec::new(),
};
let _ = callee_addr;
(forward, setup_for_bob)
}
/// Relay A's cross-relay dispatcher receives the forwarded answer.
/// It stashes the callee addr, forwards the raw answer to local
/// Alice, and emits a CallSetup with peer_direct_addr = Bob's addr.
fn relay_a_handle_forwarded_answer(
reg_a: &mut CallRegistry,
forward: &SignalMessage,
) -> SignalMessage {
let (inner, origin_relay_fp) = match forward {
SignalMessage::FederatedSignalForward { inner, origin_relay_fp } => {
(inner.as_ref().clone(), origin_relay_fp.clone())
}
_ => panic!("not a forward"),
};
assert_ne!(origin_relay_fp, RELAY_A_TLS_FP);
let SignalMessage::DirectCallAnswer {
call_id,
accept_mode,
callee_reflexive_addr,
..
} = inner
else {
panic!("inner was not DirectCallAnswer");
};
assert_eq!(accept_mode, CallAcceptMode::AcceptTrusted);
reg_a.set_active(&call_id, accept_mode, format!("call-{call_id}"));
reg_a.set_callee_reflexive_addr(&call_id, callee_reflexive_addr.clone());
// Alice's CallSetup — peer_direct_addr = Bob's addr.
SignalMessage::CallSetup {
call_id: call_id.clone(),
room: format!("call-{call_id}"),
relay_addr: RELAY_A_ADDR.into(),
peer_direct_addr: callee_reflexive_addr,
peer_local_addrs: Vec::new(),
}
}
// ────────────────────────────────────────────────────────────────
// Tests
// ────────────────────────────────────────────────────────────────
#[test]
fn cross_relay_offer_forwards_and_stashes_peer_relay_fp() {
let mut reg_a = CallRegistry::new();
let mut reg_b = CallRegistry::new();
let offer = alice_offer("c-xrelay-1");
let forward = relay_a_handle_offer(&mut reg_a, &offer);
// Relay A's local view: call exists, caller addr stashed,
// peer_relay_fp still None (broadcast — answer identifies the
// peer).
let call_a = reg_a.get("c-xrelay-1").unwrap();
assert_eq!(call_a.caller_fingerprint, "alice");
assert_eq!(call_a.callee_fingerprint, "bob");
assert_eq!(call_a.caller_reflexive_addr.as_deref(), Some(ALICE_ADDR));
assert!(call_a.peer_relay_fp.is_none());
// Relay B dispatches the forward: creates the call locally
// and stashes peer_relay_fp = Relay A.
relay_b_handle_forwarded_offer(&mut reg_b, &forward);
let call_b = reg_b.get("c-xrelay-1").unwrap();
assert_eq!(call_b.caller_fingerprint, "alice");
assert_eq!(call_b.callee_fingerprint, "bob");
assert_eq!(call_b.caller_reflexive_addr.as_deref(), Some(ALICE_ADDR));
assert_eq!(call_b.peer_relay_fp.as_deref(), Some(RELAY_A_TLS_FP));
}
#[test]
fn cross_relay_answer_crosswires_peer_direct_addrs() {
let mut reg_a = CallRegistry::new();
let mut reg_b = CallRegistry::new();
// Full round trip: offer → forward → dispatch → answer →
// forward back → dispatch → both CallSetups.
let offer = alice_offer("c-xrelay-2");
let offer_forward = relay_a_handle_offer(&mut reg_a, &offer);
relay_b_handle_forwarded_offer(&mut reg_b, &offer_forward);
// Bob answers on Relay B.
let answer = bob_answer("c-xrelay-2");
let (answer_forward, setup_for_bob) =
relay_b_handle_local_answer(&mut reg_b, &answer);
// Bob's CallSetup carries Alice's addr.
match setup_for_bob {
SignalMessage::CallSetup { peer_direct_addr, relay_addr, .. } => {
assert_eq!(peer_direct_addr.as_deref(), Some(ALICE_ADDR));
assert_eq!(relay_addr, RELAY_B_ADDR);
}
_ => panic!("wrong variant"),
}
// Alice's dispatcher receives the forwarded answer and builds
// her CallSetup.
let setup_for_alice = relay_a_handle_forwarded_answer(&mut reg_a, &answer_forward);
match setup_for_alice {
SignalMessage::CallSetup { peer_direct_addr, relay_addr, .. } => {
assert_eq!(peer_direct_addr.as_deref(), Some(BOB_ADDR));
assert_eq!(relay_addr, RELAY_A_ADDR);
}
_ => panic!("wrong variant"),
}
// Both registries agree on caller + callee reflex addrs after
// the full round-trip.
for reg in [&reg_a, &reg_b] {
let c = reg.get("c-xrelay-2").unwrap();
assert_eq!(c.caller_reflexive_addr.as_deref(), Some(ALICE_ADDR));
assert_eq!(c.callee_reflexive_addr.as_deref(), Some(BOB_ADDR));
}
}
#[test]
fn cross_relay_loop_prevention_drops_self_sourced_forward() {
// A FederatedSignalForward that circles back to the origin
// relay should be dropped before it hits the call registry.
let forward = SignalMessage::FederatedSignalForward {
inner: Box::new(alice_offer("c-loop")),
origin_relay_fp: RELAY_B_TLS_FP.into(),
};
// The dispatcher in main.rs calls this explicit check before
// doing any work. Reproduce it inline.
let origin = match &forward {
SignalMessage::FederatedSignalForward { origin_relay_fp, .. } => origin_relay_fp.clone(),
_ => unreachable!(),
};
// Relay B sees origin == its own fp → drop.
assert_eq!(origin, RELAY_B_TLS_FP, "loop-prevention triggers on self-fp");
}

View File

@@ -63,11 +63,11 @@ async fn handshake_succeeds() {
accept_handshake(server_t.as_ref(), &callee_seed).await
});
let caller_session = perform_handshake(client_transport.as_ref(), &caller_seed)
let caller_session = perform_handshake(client_transport.as_ref(), &caller_seed, None)
.await
.expect("perform_handshake should succeed");
let (callee_session, chosen_profile) = callee_handle
let (callee_session, chosen_profile, _caller_fp, _caller_alias) = callee_handle
.await
.expect("join callee task")
.expect("accept_handshake should succeed");
@@ -124,11 +124,11 @@ async fn handshake_verifies_identity() {
accept_handshake(server_t.as_ref(), &callee_seed).await
});
let caller_session = perform_handshake(client_transport.as_ref(), &caller_seed)
let caller_session = perform_handshake(client_transport.as_ref(), &caller_seed, None)
.await
.expect("handshake must succeed even with different identities");
let (callee_session, _profile) = callee_handle
let (callee_session, _profile, _caller_fp, _caller_alias) = callee_handle
.await
.expect("join")
.expect("accept_handshake must succeed");
@@ -183,7 +183,7 @@ async fn auth_then_handshake() {
};
// 2. Run the cryptographic handshake
let (session, profile) = accept_handshake(server_t.as_ref(), &callee_seed)
let (session, profile, _caller_fp, _caller_alias) = accept_handshake(server_t.as_ref(), &callee_seed)
.await
.expect("accept_handshake after auth");
@@ -199,7 +199,7 @@ async fn auth_then_handshake() {
.await
.expect("send AuthToken");
let caller_session = perform_handshake(client_transport.as_ref(), &caller_seed)
let caller_session = perform_handshake(client_transport.as_ref(), &caller_seed, None)
.await
.expect("perform_handshake after auth");
@@ -270,6 +270,7 @@ async fn handshake_rejects_bad_signature() {
ephemeral_pub,
signature,
supported_profiles: vec![wzp_proto::QualityProfile::GOOD],
alias: None,
};
client_transport

View File

@@ -0,0 +1,294 @@
//! Phase 3 integration tests for hole-punching advertising
//! (PRD: .taskmaster/docs/prd_hole_punching.txt).
//!
//! These verify the end-to-end protocol cross-wiring:
//! caller (places offer with caller_reflexive_addr=A)
//! → relay (stashes A in registry)
//! → callee (reads A off the forwarded offer)
//! callee (sends AcceptTrusted answer with callee_reflexive_addr=B)
//! → relay (stashes B, emits CallSetup to both parties)
//! → caller receives CallSetup.peer_direct_addr = B
//! → callee receives CallSetup.peer_direct_addr = A
//!
//! The actual QUIC hole-punch race is a Phase 3.5 follow-up.
//! These tests only cover the signal-plane plumbing — that the
//! addrs make it from each peer's offer/answer through the relay
//! cross-wiring back out in CallSetup with the peer's addr.
//!
//! We drive the call registry + a minimal routing function
//! directly instead of spinning up a full relay process — easier
//! to reason about, no real network, and what we actually want to
//! test is the cross-wiring logic, not the whole signal stack.
use wzp_proto::{CallAcceptMode, SignalMessage};
use wzp_relay::call_registry::CallRegistry;
/// Helper: simulate the relay's handling of a DirectCallOffer. In
/// `wzp-relay/src/main.rs` this is the match arm that creates the
/// call in the registry and stashes the caller's reflex addr.
fn handle_offer(reg: &mut CallRegistry, offer: &SignalMessage) -> String {
match offer {
SignalMessage::DirectCallOffer {
caller_fingerprint,
target_fingerprint,
call_id,
caller_reflexive_addr,
..
} => {
reg.create_call(
call_id.clone(),
caller_fingerprint.clone(),
target_fingerprint.clone(),
);
reg.set_caller_reflexive_addr(call_id, caller_reflexive_addr.clone());
call_id.clone()
}
_ => panic!("not an offer"),
}
}
/// Helper: simulate the relay's handling of a DirectCallAnswer +
/// the subsequent CallSetup emission. Returns the two CallSetup
/// messages the relay would push: (for_caller, for_callee).
fn handle_answer_and_build_setups(
reg: &mut CallRegistry,
answer: &SignalMessage,
) -> (SignalMessage, SignalMessage) {
let (call_id, mode, callee_addr) = match answer {
SignalMessage::DirectCallAnswer {
call_id,
accept_mode,
callee_reflexive_addr,
..
} => (call_id.clone(), *accept_mode, callee_reflexive_addr.clone()),
_ => panic!("not an answer"),
};
reg.set_callee_reflexive_addr(&call_id, callee_addr);
let room = format!("call-{call_id}");
reg.set_active(&call_id, mode, room.clone());
let (caller_addr, callee_addr) = {
let c = reg.get(&call_id).unwrap();
(
c.caller_reflexive_addr.clone(),
c.callee_reflexive_addr.clone(),
)
};
let setup_for_caller = SignalMessage::CallSetup {
call_id: call_id.clone(),
room: room.clone(),
relay_addr: "203.0.113.5:4433".into(),
peer_direct_addr: callee_addr,
peer_local_addrs: Vec::new(),
};
let setup_for_callee = SignalMessage::CallSetup {
call_id,
room,
relay_addr: "203.0.113.5:4433".into(),
peer_direct_addr: caller_addr,
peer_local_addrs: Vec::new(),
};
(setup_for_caller, setup_for_callee)
}
fn mk_offer(call_id: &str, caller_reflexive_addr: Option<&str>) -> SignalMessage {
SignalMessage::DirectCallOffer {
caller_fingerprint: "alice".into(),
caller_alias: None,
target_fingerprint: "bob".into(),
call_id: call_id.into(),
identity_pub: [0; 32],
ephemeral_pub: [0; 32],
signature: vec![],
supported_profiles: vec![],
caller_reflexive_addr: caller_reflexive_addr.map(String::from),
caller_local_addrs: Vec::new(),
caller_build_version: None,
}
}
fn mk_answer(
call_id: &str,
mode: CallAcceptMode,
callee_reflexive_addr: Option<&str>,
) -> SignalMessage {
SignalMessage::DirectCallAnswer {
call_id: call_id.into(),
accept_mode: mode,
identity_pub: None,
ephemeral_pub: None,
signature: None,
chosen_profile: None,
callee_reflexive_addr: callee_reflexive_addr.map(String::from),
callee_local_addrs: Vec::new(),
callee_build_version: None,
}
}
// -----------------------------------------------------------------------
// Test 1: both peers advertise — CallSetup cross-wires correctly
// -----------------------------------------------------------------------
#[test]
fn both_peers_advertise_reflex_addrs_cross_wire_in_setup() {
let mut reg = CallRegistry::new();
let caller_addr = "192.0.2.1:4433";
let callee_addr = "198.51.100.9:4433";
let offer = mk_offer("c1", Some(caller_addr));
let call_id = handle_offer(&mut reg, &offer);
assert_eq!(call_id, "c1");
assert_eq!(
reg.get("c1").unwrap().caller_reflexive_addr.as_deref(),
Some(caller_addr)
);
let answer = mk_answer("c1", CallAcceptMode::AcceptTrusted, Some(callee_addr));
let (setup_caller, setup_callee) =
handle_answer_and_build_setups(&mut reg, &answer);
// The CALLER's setup should carry the CALLEE's addr as peer_direct_addr.
match setup_caller {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
assert_eq!(
peer_direct_addr.as_deref(),
Some(callee_addr),
"caller's CallSetup must contain callee's addr"
);
}
_ => panic!("wrong variant"),
}
// The CALLEE's setup should carry the CALLER's addr.
match setup_callee {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
assert_eq!(
peer_direct_addr.as_deref(),
Some(caller_addr),
"callee's CallSetup must contain caller's addr"
);
}
_ => panic!("wrong variant"),
}
}
// -----------------------------------------------------------------------
// Test 2: callee uses AcceptGeneric (privacy) — no addr leaks
// -----------------------------------------------------------------------
#[test]
fn privacy_mode_answer_omits_callee_addr_from_setup() {
let mut reg = CallRegistry::new();
let caller_addr = "192.0.2.1:4433";
handle_offer(&mut reg, &mk_offer("c2", Some(caller_addr)));
// AcceptGeneric explicitly passes None for callee_reflexive_addr —
// the whole point is to hide the callee's IP from the caller.
let answer = mk_answer("c2", CallAcceptMode::AcceptGeneric, None);
let (setup_caller, setup_callee) =
handle_answer_and_build_setups(&mut reg, &answer);
// CALLER should see peer_direct_addr = None (privacy preserved).
match setup_caller {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
assert!(
peer_direct_addr.is_none(),
"privacy mode must not leak callee addr to caller"
);
}
_ => panic!("wrong variant"),
}
// CALLEE still gets the caller's addr — only the callee opted for
// privacy, the caller already volunteered its addr in the offer.
match setup_callee {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
assert_eq!(
peer_direct_addr.as_deref(),
Some(caller_addr),
"callee's CallSetup should still carry caller's volunteered addr"
);
}
_ => panic!("wrong variant"),
}
}
// -----------------------------------------------------------------------
// Test 3: old caller (no addr) + new callee — relay path only
// -----------------------------------------------------------------------
#[test]
fn pre_phase3_caller_leaves_both_setups_relay_only() {
let mut reg = CallRegistry::new();
// Pre-Phase-3 client doesn't know about caller_reflexive_addr
// so the field is None.
handle_offer(&mut reg, &mk_offer("c3", None));
// New callee advertises its addr — doesn't matter because
// without caller_reflexive_addr the caller has nothing to
// attempt a direct handshake to, so the cross-wiring should
// still leave the caller's CallSetup without peer_direct_addr.
let answer = mk_answer(
"c3",
CallAcceptMode::AcceptTrusted,
Some("198.51.100.9:4433"),
);
let (setup_caller, setup_callee) =
handle_answer_and_build_setups(&mut reg, &answer);
match setup_caller {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
// Phase 3 relay behavior: we always inject whatever
// addrs are in the registry, regardless of who
// advertised. The caller here gets the callee's addr
// because the callee did advertise.
assert_eq!(peer_direct_addr.as_deref(), Some("198.51.100.9:4433"));
}
_ => panic!("wrong variant"),
}
// The callee's setup has no caller addr (pre-Phase-3 offer).
match setup_callee {
SignalMessage::CallSetup { peer_direct_addr, .. } => {
assert!(
peer_direct_addr.is_none(),
"callee should see no caller addr when offer was pre-Phase-3"
);
}
_ => panic!("wrong variant"),
}
}
// -----------------------------------------------------------------------
// Test 4: neither side advertises — both CallSetups fall back cleanly
// -----------------------------------------------------------------------
#[test]
fn neither_peer_advertises_both_setups_are_relay_only() {
let mut reg = CallRegistry::new();
handle_offer(&mut reg, &mk_offer("c4", None));
let answer = mk_answer("c4", CallAcceptMode::AcceptTrusted, None);
let (setup_caller, setup_callee) =
handle_answer_and_build_setups(&mut reg, &answer);
for (label, setup) in [("caller", setup_caller), ("callee", setup_callee)] {
match setup {
SignalMessage::CallSetup { peer_direct_addr, relay_addr, .. } => {
assert!(
peer_direct_addr.is_none(),
"{label}'s CallSetup must have no peer_direct_addr"
);
// Relay addr is always filled — that's the fallback
// path and the existing behavior.
assert!(!relay_addr.is_empty(), "{label} relay_addr must be set");
}
_ => panic!("wrong variant"),
}
}
}

View File

@@ -0,0 +1,229 @@
//! Phase 2 integration tests for multi-relay NAT reflection
//! (PRD: .taskmaster/docs/prd_multi_relay_reflect.txt).
//!
//! These spin up one or two mock relays that implement the full
//! pre-reflect dance — RegisterPresence → RegisterPresenceAck →
//! Reflect → ReflectResponse — which is what the transient
//! probe helper in `wzp_client::reflect::probe_reflect_addr` does
//! against a real relay.
//!
//! Test matrix:
//! 1. `probe_reflect_addr_happy_path`
//! — single mock relay, assert the probe helper returns the
//! observed addr as 127.0.0.1:<client ephemeral port>
//! 2. `detect_nat_type_two_loopback_relays_is_cone`
//! — two mock relays, one client; loopback single-host means
//! every probe sees the same (127.0.0.1, same_port) so the
//! classifier returns `Cone` + a consensus addr
//! 3. `detect_nat_type_dead_relay_is_unknown`
//! — one alive relay + one dead address; aggregator returns
//! `Unknown` with a non-empty `error` field on the failed
//! probe
use std::net::{Ipv4Addr, SocketAddr};
use std::sync::Arc;
use std::time::Duration;
use wzp_client::reflect::{detect_nat_type, probe_reflect_addr, NatType};
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_transport::{create_endpoint, server_config, QuinnTransport};
/// Minimal mock relay that loops accepting connections, handles
/// RegisterPresence + Reflect, and responds correctly. Mirrors the
/// two match arms from `wzp-relay/src/main.rs` that matter here.
///
/// Each accepted connection gets its own inner task so multiple
/// simultaneous probes work.
async fn spawn_mock_relay() -> (SocketAddr, tokio::task::JoinHandle<()>) {
let _ = rustls::crypto::ring::default_provider().install_default();
let (sc, _cert_der) = server_config();
let bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let endpoint = create_endpoint(bind, Some(sc)).expect("server endpoint");
let listen_addr = endpoint.local_addr().expect("local_addr");
let handle = tokio::spawn(async move {
loop {
// Accept the next incoming connection. `wzp_transport::accept`
// returns the established `quinn::Connection`.
let conn = match wzp_transport::accept(&endpoint).await {
Ok(c) => c,
Err(_) => break, // endpoint closed
};
let observed_addr = conn.remote_address();
let transport = Arc::new(QuinnTransport::new(conn));
// Per-connection handler. Keep servicing messages until
// the peer closes so one probe connection can do
// RegisterPresence → Ack → Reflect → Response without
// racing other incoming connections.
let t = transport;
tokio::spawn(async move {
loop {
match t.recv_signal().await {
Ok(Some(SignalMessage::RegisterPresence { .. })) => {
let _ = t
.send_signal(&SignalMessage::RegisterPresenceAck {
success: true,
error: None,
relay_build: None,
})
.await;
}
Ok(Some(SignalMessage::Reflect)) => {
let _ = t
.send_signal(&SignalMessage::ReflectResponse {
observed_addr: observed_addr.to_string(),
})
.await;
}
Ok(Some(_other)) => { /* ignore */ }
Ok(None) => break,
Err(_) => break,
}
}
});
}
});
(listen_addr, handle)
}
// -----------------------------------------------------------------------
// Test 1: probe_reflect_addr against a single mock relay
// -----------------------------------------------------------------------
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn probe_reflect_addr_happy_path() {
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
let (observed, latency_ms) = tokio::time::timeout(
Duration::from_secs(3),
probe_reflect_addr(relay_addr, 2000, None),
)
.await
.expect("probe must complete within 3s")
.expect("probe must succeed");
assert_eq!(
observed.ip().to_string(),
"127.0.0.1",
"loopback test should see 127.0.0.1"
);
assert_ne!(observed.port(), 0, "observed port must be non-zero");
// Latency on same host is dominated by the handshake — generously
// allow up to 2s (the timeout) rather than picking a tight number
// that would be flaky on busy CI runners.
assert!(latency_ms < 2000, "latency {latency_ms}ms too high");
}
// -----------------------------------------------------------------------
// Test 2: two loopback relays → probes succeed, classification is Unknown
// -----------------------------------------------------------------------
//
// With the private-IP filter added in the NAT classifier, loopback
// reflex addrs (127.0.0.1) are dropped before classification —
// they can't possibly indicate public-internet NAT state. So the
// test now asserts:
// - both probes succeed end-to-end (wire plumbing works)
// - both return 127.0.0.1 (same-host is visible)
// - the aggregated verdict is Unknown (no public probes)
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn detect_nat_type_two_loopback_relays_probes_work_but_classify_unknown() {
let (addr_a, _h_a) = spawn_mock_relay().await;
let (addr_b, _h_b) = spawn_mock_relay().await;
let detection = detect_nat_type(
vec![
("RelayA".into(), addr_a),
("RelayB".into(), addr_b),
],
2000,
None,
)
.await;
assert_eq!(detection.probes.len(), 2);
for p in &detection.probes {
assert!(
p.observed_addr.is_some(),
"probe {:?} failed: {:?}",
p.relay_name,
p.error
);
}
let observed_ips: Vec<String> = detection
.probes
.iter()
.map(|p| {
p.observed_addr
.as_ref()
.and_then(|s| s.parse::<SocketAddr>().ok())
.map(|a| a.ip().to_string())
.unwrap_or_default()
})
.collect();
assert_eq!(observed_ips[0], "127.0.0.1");
assert_eq!(observed_ips[1], "127.0.0.1");
// Classification: loopback probes are filtered out of the
// public-NAT classifier, so with 0 public probes the result
// is Unknown.
assert_eq!(
detection.nat_type,
NatType::Unknown,
"loopback-only probes must not contribute to public NAT classification"
);
assert!(detection.consensus_addr.is_none());
}
// -----------------------------------------------------------------------
// Test 3: one alive relay + one dead address → Unknown
// -----------------------------------------------------------------------
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn detect_nat_type_dead_relay_is_unknown() {
let (alive_addr, _alive_handle) = spawn_mock_relay().await;
// Dead relay: a port that nothing is listening on. OS will drop
// the packets, the probe should time out within the 600ms budget
// we give it. Pick a port unlikely to be in use — port 1 on
// loopback works on every OS I care about and fails fast.
let dead_addr: SocketAddr = "127.0.0.1:1".parse().unwrap();
let detection = detect_nat_type(
vec![
("Alive".into(), alive_addr),
("Dead".into(), dead_addr),
],
600, // tight timeout so the dead probe fails fast
None,
)
.await;
assert_eq!(detection.probes.len(), 2);
// Find the alive and dead probes by name (order of JoinSet
// completions is not guaranteed).
let alive = detection.probes.iter().find(|p| p.relay_name == "Alive").unwrap();
let dead = detection.probes.iter().find(|p| p.relay_name == "Dead").unwrap();
assert!(
alive.observed_addr.is_some(),
"alive probe must succeed: {:?}",
alive.error
);
assert!(
dead.observed_addr.is_none(),
"dead probe must fail, got addr {:?}",
dead.observed_addr
);
assert!(
dead.error.is_some(),
"dead probe must surface an error string"
);
// With only 1 successful probe, the classifier returns Unknown.
assert_eq!(detection.nat_type, NatType::Unknown);
assert!(detection.consensus_addr.is_none());
}

View File

@@ -0,0 +1,318 @@
//! Integration tests for the "STUN for QUIC" reflect protocol
//! (PRD: .taskmaster/docs/prd_reflect_over_quic.txt, Phase 1).
//!
//! We don't spin up the full relay binary — instead we exercise the
//! same wire-level request/response dance with a mock relay loop
//! that implements exactly the match arm added to
//! `wzp-relay/src/main.rs`. This isolates the protocol test from the
//! rest of the relay state (rooms, federation, call registry, ...).
//!
//! Three test cases:
//! 1. `reflect_happy_path` — client sends `Reflect`, mock relay
//! replies with `ReflectResponse { observed_addr }`, client
//! parses it back to a `SocketAddr` and confirms the IP is
//! `127.0.0.1` and the port matches its own bound port.
//! 2. `reflect_two_clients_distinct_ports` — two simultaneous
//! client connections on different ephemeral ports get back
//! different reflected ports, proving the relay uses
//! per-connection `remote_address` rather than a global.
//! 3. `reflect_old_relay_times_out` — mock relay that *doesn't*
//! handle `Reflect`; client side times out in the expected
//! window and does not hang.
//!
//! The third test uses a `tokio::time::timeout` wrapper directly
//! (the client-side `request_reflect` helper lives in
//! `desktop/src-tauri/src/lib.rs` which isn't a library we can
//! depend on from here, so we reproduce the timeout semantics
//! inline).
use std::net::{Ipv4Addr, SocketAddr};
use std::sync::Arc;
use std::time::Duration;
use wzp_proto::{MediaTransport, SignalMessage};
use wzp_transport::{client_config, create_endpoint, server_config, QuinnTransport};
/// Spawn a minimal mock relay that loops over `recv_signal`,
/// matches on `Reflect`, and responds with `ReflectResponse` using
/// the remote_address observed for this connection. Mirrors the
/// match arm in `crates/wzp-relay/src/main.rs`.
async fn spawn_mock_relay_with_reflect(
server_transport: Arc<QuinnTransport>,
) -> tokio::task::JoinHandle<()> {
tokio::spawn(async move {
// Observed remote address at the time the connection was
// accepted. Stable for the life of the connection under quinn's
// normal operation. This is exactly what the real relay does.
let observed = server_transport.connection().remote_address();
loop {
match server_transport.recv_signal().await {
Ok(Some(SignalMessage::Reflect)) => {
let resp = SignalMessage::ReflectResponse {
observed_addr: observed.to_string(),
};
// If the send fails the client has gone; just exit.
if server_transport.send_signal(&resp).await.is_err() {
break;
}
}
Ok(Some(_other)) => {
// Ignore anything else — not relevant to this test.
}
Ok(None) => break,
Err(_e) => break,
}
}
})
}
/// Spawn a mock relay that intentionally DOES NOT handle Reflect.
/// Models a pre-Phase-1 relay — it keeps reading signal messages and
/// logs them to stderr, but never produces a `ReflectResponse`.
async fn spawn_mock_relay_without_reflect(
server_transport: Arc<QuinnTransport>,
) -> tokio::task::JoinHandle<()> {
tokio::spawn(async move {
loop {
match server_transport.recv_signal().await {
Ok(Some(_msg)) => {
// Deliberately do nothing. Old relay.
}
Ok(None) => break,
Err(_) => break,
}
}
})
}
/// Build an in-process QUIC client/server pair on loopback and
/// return (client_transport, server_transport, endpoints). The
/// endpoints tuple must be kept alive for the test duration.
///
/// `client_port_hint` of 0 means "let OS pick". Pass an explicit
/// port to pin the client's source port (useful for the
/// distinct-ports test).
async fn connected_pair_with_port(
_client_port_hint: u16,
) -> (Arc<QuinnTransport>, Arc<QuinnTransport>, (quinn::Endpoint, quinn::Endpoint)) {
let _ = rustls::crypto::ring::default_provider().install_default();
let (sc, _cert_der) = server_config();
let server_addr: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let server_ep = create_endpoint(server_addr, Some(sc)).expect("server endpoint");
let server_listen = server_ep.local_addr().expect("server local addr");
// Always bind the client to an ephemeral port — we'll read back
// the actual assigned port via `local_addr()` in the assertions.
let client_bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let client_ep = create_endpoint(client_bind, None).expect("client endpoint");
let server_ep_clone = server_ep.clone();
let accept_fut = tokio::spawn(async move {
let conn = wzp_transport::accept(&server_ep_clone).await.expect("accept");
Arc::new(QuinnTransport::new(conn))
});
let client_conn =
wzp_transport::connect(&client_ep, server_listen, "localhost", client_config())
.await
.expect("connect");
let client_transport = Arc::new(QuinnTransport::new(client_conn));
let server_transport = accept_fut.await.expect("join accept task");
(client_transport, server_transport, (server_ep, client_ep))
}
// -----------------------------------------------------------------------
// Test 1: happy path — client learns its own port via Reflect
// -----------------------------------------------------------------------
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn reflect_happy_path() {
let (client_transport, server_transport, (_server_ep, client_ep)) =
connected_pair_with_port(0).await;
// Grab the client's actual bound port so we can cross-check
// against the reflected response.
let client_port = client_ep
.local_addr()
.expect("client local addr")
.port();
assert_ne!(client_port, 0, "client must have a real bound port");
// Start the mock relay's reflect handler.
let _relay_handle = spawn_mock_relay_with_reflect(Arc::clone(&server_transport)).await;
// Client sends Reflect and awaits the response. The real
// request_reflect helper in desktop/src-tauri/src/lib.rs uses a
// oneshot channel driven off the spawned recv loop; here we just
// do it inline because there's no spawned loop yet in this test
// — this isolates the wire protocol from the client-side state
// machine.
client_transport
.send_signal(&SignalMessage::Reflect)
.await
.expect("send Reflect");
let resp = tokio::time::timeout(Duration::from_secs(2), client_transport.recv_signal())
.await
.expect("reflect response should arrive within 2s")
.expect("recv_signal ok")
.expect("some message");
let observed_addr = match resp {
SignalMessage::ReflectResponse { observed_addr } => observed_addr,
other => panic!("expected ReflectResponse, got {:?}", std::mem::discriminant(&other)),
};
let parsed: SocketAddr = observed_addr
.parse()
.expect("ReflectResponse.observed_addr must parse as SocketAddr");
// The relay should see the client on 127.0.0.1 (loopback in the
// test harness) and on the client's bound ephemeral port.
assert_eq!(parsed.ip().to_string(), "127.0.0.1");
assert_eq!(
parsed.port(),
client_port,
"reflected port must match the client's local_addr port"
);
drop(client_transport);
drop(server_transport);
}
// -----------------------------------------------------------------------
// Test 2: two clients get DIFFERENT reflected ports
// -----------------------------------------------------------------------
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn reflect_two_clients_distinct_ports() {
let _ = rustls::crypto::ring::default_provider().install_default();
// Shared server: one endpoint, two incoming accepts.
let (sc, _cert_der) = server_config();
let server_addr: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let server_ep = create_endpoint(server_addr, Some(sc)).expect("server endpoint");
let server_listen = server_ep.local_addr().expect("server local addr");
// Accept two clients in parallel.
let server_ep_a = server_ep.clone();
let accept_a = tokio::spawn(async move {
let conn = wzp_transport::accept(&server_ep_a).await.expect("accept A");
Arc::new(QuinnTransport::new(conn))
});
let server_ep_b = server_ep.clone();
let accept_b = tokio::spawn(async move {
let conn = wzp_transport::accept(&server_ep_b).await.expect("accept B");
Arc::new(QuinnTransport::new(conn))
});
// Client A
let client_ep_a = create_endpoint((Ipv4Addr::LOCALHOST, 0).into(), None).expect("ep A");
let conn_a =
wzp_transport::connect(&client_ep_a, server_listen, "localhost", client_config())
.await
.expect("connect A");
let client_a = Arc::new(QuinnTransport::new(conn_a));
let port_a = client_ep_a.local_addr().unwrap().port();
// Client B
let client_ep_b = create_endpoint((Ipv4Addr::LOCALHOST, 0).into(), None).expect("ep B");
let conn_b =
wzp_transport::connect(&client_ep_b, server_listen, "localhost", client_config())
.await
.expect("connect B");
let client_b = Arc::new(QuinnTransport::new(conn_b));
let port_b = client_ep_b.local_addr().unwrap().port();
assert_ne!(
port_a, port_b,
"preconditions: OS must assign two clients different ephemeral ports"
);
let server_a = accept_a.await.expect("join A");
let server_b = accept_b.await.expect("join B");
// Spawn a reflect handler for each server-side transport.
let _relay_a = spawn_mock_relay_with_reflect(Arc::clone(&server_a)).await;
let _relay_b = spawn_mock_relay_with_reflect(Arc::clone(&server_b)).await;
// Each client requests reflect concurrently.
let reflect_for = |t: Arc<QuinnTransport>| async move {
t.send_signal(&SignalMessage::Reflect).await.expect("send");
let resp = tokio::time::timeout(Duration::from_secs(2), t.recv_signal())
.await
.expect("timeout")
.expect("ok")
.expect("some");
match resp {
SignalMessage::ReflectResponse { observed_addr } => observed_addr,
_ => panic!("wrong variant"),
}
};
let (addr_a, addr_b) = tokio::join!(reflect_for(client_a.clone()), reflect_for(client_b.clone()));
let parsed_a: SocketAddr = addr_a.parse().unwrap();
let parsed_b: SocketAddr = addr_b.parse().unwrap();
assert_eq!(parsed_a.port(), port_a, "client A's reflected port");
assert_eq!(parsed_b.port(), port_b, "client B's reflected port");
assert_ne!(
parsed_a.port(),
parsed_b.port(),
"each client must see its own port, not a shared one"
);
drop(client_a);
drop(client_b);
drop(server_a);
drop(server_b);
}
// -----------------------------------------------------------------------
// Test 3: old relay never answers — client times out cleanly
// -----------------------------------------------------------------------
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn reflect_old_relay_times_out() {
let (client_transport, server_transport, _endpoints) =
connected_pair_with_port(0).await;
// Mock relay that ignores Reflect — simulates a pre-Phase-1 build.
let _relay_handle =
spawn_mock_relay_without_reflect(Arc::clone(&server_transport)).await;
client_transport
.send_signal(&SignalMessage::Reflect)
.await
.expect("send Reflect");
// 1100ms ceiling matches the 1s timeout baked into
// get_reflected_address plus a tiny bit of slack. If this
// regression ever fires it probably means recv_signal blocked
// longer than expected and the Tauri command would hang the UI.
let start = std::time::Instant::now();
let result =
tokio::time::timeout(Duration::from_millis(1100), client_transport.recv_signal()).await;
let elapsed = start.elapsed();
assert!(
result.is_err(),
"recv_signal must time out when the relay ignores Reflect"
);
assert!(
elapsed >= Duration::from_millis(1000),
"timeout fired too early ({:?})",
elapsed
);
assert!(
elapsed < Duration::from_millis(1200),
"timeout fired too late ({:?}), client would feel unresponsive",
elapsed
);
drop(client_transport);
drop(server_transport);
}

View File

@@ -15,7 +15,11 @@ tracing = { workspace = true }
async-trait = { workspace = true }
serde_json = "1"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
socket2 = { workspace = true }
rcgen = "0.13"
ed25519-dalek = { workspace = true }
hkdf = { workspace = true }
sha2 = { workspace = true }
[dev-dependencies]
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }

View File

@@ -6,20 +6,74 @@ use std::time::Duration;
use quinn::crypto::rustls::QuicClientConfig;
use quinn::crypto::rustls::QuicServerConfig;
/// Create a server configuration with a self-signed certificate (for testing).
/// Create a server configuration with a self-signed certificate (random keypair).
///
/// Tunes QUIC transport parameters for lossy VoIP:
/// - 30s idle timeout
/// - 5s keep-alive interval
/// - DATAGRAM extension enabled
/// - Conservative flow control for bandwidth-constrained links
/// The certificate changes on every call. Use `server_config_from_seed` for
/// a deterministic certificate that survives relay restarts.
pub fn server_config() -> (quinn::ServerConfig, Vec<u8>) {
let cert_key = rcgen::generate_simple_self_signed(vec!["localhost".to_string()])
.expect("failed to generate self-signed cert");
let cert_der = rustls::pki_types::CertificateDer::from(cert_key.cert);
let key_der =
rustls::pki_types::PrivateKeyDer::try_from(cert_key.key_pair.serialize_der()).unwrap();
build_server_config(cert_der, key_der)
}
/// Create a server configuration with a deterministic self-signed certificate
/// derived from a 32-byte seed. Same seed = same cert = same TLS fingerprint.
pub fn server_config_from_seed(seed: &[u8; 32]) -> (quinn::ServerConfig, Vec<u8>) {
use ed25519_dalek::pkcs8::EncodePrivateKey;
use ed25519_dalek::SigningKey;
use hkdf::Hkdf;
use sha2::Sha256;
// Derive Ed25519 key bytes from seed via HKDF
let hk = Hkdf::<Sha256>::new(None, seed);
let mut ed_bytes = [0u8; 32];
hk.expand(b"wzp-tls-ed25519", &mut ed_bytes)
.expect("HKDF expand failed");
// Create Ed25519 signing key and export as PKCS8 DER
let signing_key = SigningKey::from_bytes(&ed_bytes);
let pkcs8_doc = signing_key.to_pkcs8_der()
.expect("failed to encode Ed25519 key as PKCS8");
let key_der_for_rcgen = rustls::pki_types::PrivateKeyDer::try_from(pkcs8_doc.as_bytes().to_vec())
.expect("failed to wrap PKCS8 DER");
// Create rcgen KeyPair from DER
let key_pair = rcgen::KeyPair::from_der_and_sign_algo(
&key_der_for_rcgen,
&rcgen::PKCS_ED25519,
)
.expect("failed to create KeyPair from seed-derived Ed25519 key");
// Build self-signed cert with this deterministic keypair
let params = rcgen::CertificateParams::new(vec!["localhost".to_string()])
.expect("failed to create CertificateParams");
let cert = params.self_signed(&key_pair).expect("failed to self-sign cert");
let cert_der = rustls::pki_types::CertificateDer::from(cert.der().to_vec());
let key_der = rustls::pki_types::PrivateKeyDer::try_from(key_pair.serialize_der())
.expect("failed to serialize key DER");
build_server_config(cert_der, key_der)
}
/// Compute a hex-formatted SHA-256 fingerprint of a DER-encoded certificate.
///
/// Format: `xx:xx:xx:xx:...` (32 bytes = 64 hex chars with colons).
pub fn tls_fingerprint(cert_der: &[u8]) -> String {
use sha2::{Sha256, Digest};
let hash = Sha256::digest(cert_der);
hash.iter()
.map(|b| format!("{b:02x}"))
.collect::<Vec<_>>()
.join(":")
}
fn build_server_config(
cert_der: rustls::pki_types::CertificateDer<'static>,
key_der: rustls::pki_types::PrivateKeyDer<'static>,
) -> (quinn::ServerConfig, Vec<u8>) {
let mut server_crypto = rustls::ServerConfig::builder()
.with_no_client_auth()
.with_single_cert(vec![cert_der.clone()], key_der)
@@ -69,7 +123,6 @@ fn transport_config() -> quinn::TransportConfig {
config.keep_alive_interval(Some(Duration::from_secs(5)));
// Enable DATAGRAM extension for unreliable media packets.
// Allow datagrams up to 1200 bytes (conservative for lossy links).
config.datagram_receive_buffer_size(Some(65536));
// Conservative flow control for bandwidth-constrained links
@@ -80,6 +133,26 @@ fn transport_config() -> quinn::TransportConfig {
// Aggressive initial RTT estimate for high-latency links
config.initial_rtt(Duration::from_millis(300));
// PMTUD (Path MTU Discovery) — quinn 0.11 enables this by default but
// with conservative bounds (initial 1200, upper 1452). We keep the safe
// initial_mtu of 1200 so the first packets always get through, but raise
// upper_bound so the binary search can discover larger MTUs on paths that
// support them. Typical results:
// - Ethernet/fiber: discovers ~1452 (Ethernet MTU minus IP/UDP/QUIC)
// - WireGuard/VPN: discovers ~1380-1420
// - Starlink: discovers ~1400-1452
// - Cellular: stays at 1200-1300
// Black hole detection automatically falls back to 1200 if probes fail.
// This matters for future video frames which can be 1-50 KB and benefit
// from fewer application-layer fragments per frame.
let mut mtu_config = quinn::MtuDiscoveryConfig::default();
mtu_config
.upper_bound(1452)
.interval(Duration::from_secs(300)) // re-probe every 5 min
.black_hole_cooldown(Duration::from_secs(30)); // retry faster on lossy links
config.mtu_discovery_config(Some(mtu_config));
config.initial_mtu(1200); // safe starting point
config
}

View File

@@ -39,6 +39,71 @@ pub async fn connect(
Ok(connection)
}
/// Create an IPv6-only QUIC endpoint with `IPV6_V6ONLY=1`.
///
/// Tries `[::]:preferred_port` first (same port as the IPv4 signal
/// endpoint — allowed on Linux/Android when the AFs differ and
/// V6ONLY is set). Falls back to `[::]:0` (OS-assigned) if the
/// preferred port is already taken.
///
/// Must be called from within a tokio runtime (quinn needs the
/// async runtime handle for its I/O driver).
pub fn create_ipv6_endpoint(
preferred_port: u16,
server_config: Option<quinn::ServerConfig>,
) -> Result<quinn::Endpoint, TransportError> {
use socket2::{Domain, Protocol, Socket, Type};
use std::net::{Ipv6Addr, SocketAddrV6};
let sock = Socket::new(Domain::IPV6, Type::DGRAM, Some(Protocol::UDP))
.map_err(|e| TransportError::Internal(format!("ipv6 socket: {e}")))?;
// Critical: IPv6-only so this socket never intercepts IPv4.
// On Android some kernels default to V6ONLY=1 anyway, but we
// set it explicitly for cross-platform consistency.
sock.set_only_v6(true)
.map_err(|e| TransportError::Internal(format!("set_only_v6: {e}")))?;
sock.set_reuse_address(true)
.map_err(|e| TransportError::Internal(format!("set_reuse_address: {e}")))?;
// Try the preferred port (same as IPv4 signal endpoint), fall
// back to ephemeral if the OS rejects it.
let bind_addr = SocketAddrV6::new(Ipv6Addr::UNSPECIFIED, preferred_port, 0, 0);
if let Err(e) = sock.bind(&bind_addr.into()) {
if preferred_port != 0 {
tracing::debug!(
preferred_port,
error = %e,
"ipv6 bind to preferred port failed, falling back to ephemeral"
);
let fallback = SocketAddrV6::new(Ipv6Addr::UNSPECIFIED, 0, 0, 0);
sock.bind(&fallback.into())
.map_err(|e| TransportError::Internal(format!("ipv6 bind fallback: {e}")))?;
} else {
return Err(TransportError::Internal(format!("ipv6 bind: {e}")));
}
}
sock.set_nonblocking(true)
.map_err(|e| TransportError::Internal(format!("set_nonblocking: {e}")))?;
let udp_socket: std::net::UdpSocket = sock.into();
let runtime = quinn::default_runtime()
.ok_or_else(|| TransportError::Internal("no async runtime for ipv6 endpoint".into()))?;
let endpoint = quinn::Endpoint::new(
quinn::EndpointConfig::default(),
server_config,
udp_socket,
runtime,
)
.map_err(|e| TransportError::Internal(format!("ipv6 endpoint: {e}")))?;
Ok(endpoint)
}
/// Accept the next incoming connection on an endpoint.
pub async fn accept(endpoint: &quinn::Endpoint) -> Result<quinn::Connection, TransportError> {
let incoming = endpoint

View File

@@ -22,8 +22,13 @@ pub mod path_monitor;
pub mod quic;
pub mod reliable;
pub use config::{client_config, server_config};
pub use connection::{accept, connect, create_endpoint};
pub use config::{client_config, server_config, server_config_from_seed, tls_fingerprint};
pub use connection::{accept, connect, create_endpoint, create_ipv6_endpoint};
pub use path_monitor::PathMonitor;
pub use quic::QuinnTransport;
pub use quic::{QuinnPathSnapshot, QuinnTransport};
pub use wzp_proto::{MediaTransport, PathQuality, TransportError};
// Re-export the quinn Endpoint type so downstream crates (wzp-desktop) can
// thread a shared endpoint between signaling and media connections without
// needing to depend on quinn directly.
pub use quinn::Endpoint;

View File

@@ -2,11 +2,17 @@
//!
//! Tracks packet loss (via sequence number gaps), RTT, jitter, and bandwidth.
use std::collections::VecDeque;
use wzp_proto::PathQuality;
/// EWMA smoothing factor.
const ALPHA: f64 = 0.1;
/// Maximum number of RTT samples in the jitter variance sliding window.
/// At ~50 packets/sec (20 ms frame), 10 samples ≈ 200 ms.
const JITTER_VARIANCE_WINDOW_SIZE: usize = 10;
/// Monitors network path quality metrics.
pub struct PathMonitor {
/// EWMA-smoothed loss percentage (0.0 - 100.0).
@@ -31,6 +37,8 @@ pub struct PathMonitor {
last_rtt_ms: Option<f64>,
/// Whether we have any observations yet.
initialized: bool,
/// Sliding window of recent RTT samples for variance calculation.
rtt_window: VecDeque<f64>,
}
impl PathMonitor {
@@ -51,6 +59,7 @@ impl PathMonitor {
total_received: 0,
last_rtt_ms: None,
initialized: false,
rtt_window: VecDeque::with_capacity(JITTER_VARIANCE_WINDOW_SIZE),
}
}
@@ -122,6 +131,12 @@ impl PathMonitor {
} else {
self.rtt_ewma = ALPHA * rtt + (1.0 - ALPHA) * self.rtt_ewma;
}
// Maintain sliding window for variance calculation
if self.rtt_window.len() >= JITTER_VARIANCE_WINDOW_SIZE {
self.rtt_window.pop_front();
}
self.rtt_window.push_back(rtt);
}
/// Get the current estimated path quality.
@@ -155,6 +170,20 @@ impl PathMonitor {
0
}
/// Compute the jitter (RTT standard deviation) over the sliding window.
///
/// Returns the standard deviation in milliseconds, or 0.0 if insufficient
/// samples. Used by `DredTuner` for spike detection.
pub fn jitter_variance_ms(&self) -> f64 {
let n = self.rtt_window.len();
if n < 2 {
return 0.0;
}
let mean = self.rtt_window.iter().sum::<f64>() / n as f64;
let var = self.rtt_window.iter().map(|r| (r - mean).powi(2)).sum::<f64>() / n as f64;
var.sqrt()
}
/// Detect whether a network handoff likely occurred.
///
/// Returns `true` if the most recent RTT jitter measurement exceeds 3x

View File

@@ -13,6 +13,29 @@ use crate::datagram;
use crate::path_monitor::PathMonitor;
use crate::reliable;
/// Snapshot of quinn's QUIC-level path statistics.
///
/// Provides more accurate loss/RTT data than `PathMonitor`'s sequence-gap
/// heuristic because quinn sees ACK frames and congestion signals directly.
#[derive(Clone, Copy, Debug)]
pub struct QuinnPathSnapshot {
/// Smoothed RTT in milliseconds (from quinn's congestion controller).
pub rtt_ms: u32,
/// Cumulative loss percentage (lost_packets / sent_packets × 100).
pub loss_pct: f32,
/// Total congestion events observed by the QUIC stack.
pub congestion_events: u64,
/// Current congestion window in bytes.
pub cwnd: u64,
/// Total packets sent on this path.
pub sent_packets: u64,
/// Total packets lost on this path.
pub lost_packets: u64,
/// Current PMTUD-discovered maximum datagram payload size (bytes).
/// Starts at `initial_mtu` (1200) and grows as PMTUD probes succeed.
pub current_mtu: usize,
}
/// QUIC-based transport implementing the `MediaTransport` trait.
pub struct QuinnTransport {
connection: quinn::Connection,
@@ -33,6 +56,18 @@ impl QuinnTransport {
&self.connection
}
/// Remote address of the peer on this connection.
pub fn remote_address(&self) -> std::net::SocketAddr {
self.connection.remote_address()
}
/// Send raw bytes as a QUIC datagram (no MediaPacket framing).
pub fn send_raw_datagram(&self, data: &[u8]) -> Result<(), TransportError> {
self.connection
.send_datagram(bytes::Bytes::copy_from_slice(data))
.map_err(|e| TransportError::Internal(format!("datagram: {e}")))
}
/// Close the QUIC connection immediately (synchronous, no async needed).
/// The relay will detect the close and remove this participant from the room.
pub fn close_now(&self) {
@@ -54,6 +89,31 @@ impl QuinnTransport {
datagram::max_datagram_payload(&self.connection)
}
/// Snapshot of QUIC-level path stats from quinn, useful for DRED tuning.
///
/// Returns `(rtt_ms, loss_pct, congestion_events)` derived from quinn's
/// internal congestion controller — more accurate than our own sequence-gap
/// heuristic in `PathMonitor` because quinn sees ACK frames directly.
pub fn quinn_path_stats(&self) -> QuinnPathSnapshot {
let stats = self.connection.stats();
let rtt_ms = stats.path.rtt.as_millis() as u32;
let loss_pct = if stats.path.sent_packets > 0 {
(stats.path.lost_packets as f32 / stats.path.sent_packets as f32) * 100.0
} else {
0.0
};
let current_mtu = self.connection.max_datagram_size().unwrap_or(1200);
QuinnPathSnapshot {
rtt_ms,
loss_pct,
congestion_events: stats.path.congestion_events,
cwnd: stats.path.cwnd,
sent_packets: stats.path.sent_packets,
lost_packets: stats.path.lost_packets,
current_mtu,
}
}
/// Send an encoded [`TrunkFrame`] as a single QUIC datagram.
pub fn send_trunk(&self, frame: &TrunkFrame) -> Result<(), TransportError> {
let data = frame.encode();
@@ -136,7 +196,7 @@ impl MediaTransport for QuinnTransport {
}
};
match datagram::deserialize_media(data) {
match datagram::deserialize_media(data.clone()) {
Some(packet) => {
// Record receive observation
{
@@ -149,8 +209,10 @@ impl MediaTransport for QuinnTransport {
Ok(Some(packet))
}
None => {
tracing::warn!("received malformed media datagram");
Ok(None)
tracing::warn!(len = data.len(), "skipping malformed media datagram, continuing");
// Don't return Ok(None) — that signals connection closed.
// Recurse to read the next datagram instead.
Box::pin(self.recv_media()).await
}
}
}

View File

@@ -53,6 +53,13 @@ pub async fn recv_signal(recv: &mut quinn::RecvStream) -> Result<SignalMessage,
.await
.map_err(|e| TransportError::Internal(format!("stream read payload error: {e}")))?;
serde_json::from_slice(&payload)
.map_err(|e| TransportError::Internal(format!("signal deserialize error: {e}")))
serde_json::from_slice(&payload).map_err(|e| {
// Distinguish serde failures from transport failures so the
// caller (relay main loop, client recv loop) can continue on
// unknown-variant / parse errors instead of tearing down the
// whole signal connection. Forward-compat: adding a new
// `SignalMessage` variant in one side must not break the
// other side's signal connection.
TransportError::Deserialize(format!("{e}"))
})
}

View File

@@ -0,0 +1,115 @@
# Incident Report: SIGBUS in ART GC During Audio Thread JNI Calls
**Date:** 2026-04-06
**Severity:** High — app crash (SIGBUS) mid-call
**Status:** Root-caused, fix proposed
**Affects:** Android 16 (API 36) devices with concurrent mark-compact GC
## Summary
The app crashes with SIGBUS (signal 7, BUS_ADRERR) during an active call. The crash occurs in ART's garbage collector or JIT compiler, NOT in our Rust native code or AudioRing buffer. Both `wzp-capture` and `wzp-playout` Kotlin threads are affected.
## Crash Details
### Crash 1: wzp-capture (18:42, after 476s of call)
```
Fatal signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0x720009be38
tid 19697 (wzp-capture), pid 17885 (com.wzp.phone)
```
**Backtrace:**
```
#00 art::StackVisitor::WalkStack
#01 art::Thread::VisitRoots
#02 art::gc::collector::MarkCompact::ThreadFlipVisitor::Run
#03 art::Thread::EnsureFlipFunctionStarted
#04 CheckJNI::ReleasePrimitiveArrayElements ← JNI boundary
#05 android_media_AudioRecord_readInArray ← AudioRecord.read()
#09 com.wzp.audio.AudioPipeline.runCapture
```
**Root cause:** ART's concurrent mark-compact GC (`MarkCompact::ThreadFlipVisitor`) is flipping thread roots while the capture thread is in the middle of a JNI call (`AudioRecord.read()`). The GC's `EnsureFlipFunctionStarted` triggers a stack walk that hits an invalid address.
### Crash 2: wzp-playout (19:17, mid-call)
```
Fatal signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0x225eb98
tid 32574 (wzp-playout), pid 32479 (com.wzp.phone)
```
**Backtrace:**
```
#00 com.wzp.audio.AudioPipeline.runPlayout ← JIT-compiled code
#01 art_quick_osr_stub ← On-Stack Replacement
#02 art::jit::Jit::MaybeDoOnStackReplacement
#03-#04 art::interpreter::ExecuteSwitchImplCpp
```
**Root cause:** ART's JIT compiler performed On-Stack Replacement (OSR) on the hot playout loop. The OSR stub references a code address (`0x225eb98`) that is no longer valid — likely because the GC moved the compiled code in memory during concurrent compaction.
## Why This Happens
Android 16 introduced a new **concurrent mark-compact GC** (CMC) that moves objects in memory while other threads are running. This is safe for normal Java code because ART uses read barriers. But our audio threads have specific properties that stress this:
1. **`Thread.MAX_PRIORITY`** — audio threads run at the highest priority, starving the GC thread of CPU time. The GC may not complete its thread-flip before the audio thread resumes.
2. **Tight JNI loops**`runCapture()` and `runPlayout()` loop every 20ms calling `AudioRecord.read()` / `AudioTrack.write()` via JNI. Each JNI transition is a GC safepoint, but the thread spends most of its time in native code where the GC can't flip it.
3. **Long-running JIT-compiled code** — the hot loop gets JIT-compiled and may undergo OSR. If the GC compacts memory while OSR is in progress, the stub can reference stale addresses.
4. **Daemon threads that never exit** — our threads are parked with `Thread.sleep(Long.MAX_VALUE)` after the call ends (to avoid the libcrypto TLS destructor crash). These zombie threads accumulate GC root scan work.
## Evidence This Is Not Our Bug
| Component | Evidence |
|-----------|---------|
| **AudioRing** | Not in any backtrace. All crash frames are in `libart.so` (ART runtime) |
| **Rust native code** | `libwzp_android.so` not in any crash frame |
| **JNI bridge** | Crash happens during `ReleasePrimitiveArrayElements` (ART internal), not during our JNI calls |
| **Timing** | Crashes after 476s and mid-call — not during init or teardown |
## Proposed Fix
### Option A: Disable concurrent GC compaction for audio threads (recommended)
Use `dalvik.vm.gctype` or per-thread GC pinning to prevent the mark-compact collector from moving objects referenced by audio threads.
**Not directly controllable from app code.** But we can reduce GC pressure:
### Option B: Reduce JNI transitions in audio threads
Instead of calling `engine.writeAudio(pcm)` / `engine.readAudio(pcm)` via JNI on every 20ms frame, batch multiple frames or use `DirectByteBuffer` to share memory without JNI array copies.
**Implementation:**
- Allocate a `DirectByteBuffer` in Kotlin, share the pointer with Rust via JNI
- Audio threads write/read directly to the buffer (no JNI call per frame)
- Rust reads/writes from the same memory region
- Reduces JNI transitions from 100/sec to 0/sec per audio direction
### Option C: Use Android's Oboe (AAudio) natively from Rust
Skip the Kotlin AudioRecord/AudioTrack entirely. Use Oboe (which we already have as a dependency in `wzp-android/Cargo.toml`) to create native audio streams directly from Rust. The audio callbacks run in native code with no JNI, no GC interaction, no ART.
This is how the project was originally designed (see `audio_android.rs` with Oboe references) before switching to Kotlin AudioRecord for simplicity.
**Pros:** Eliminates the entire JNI audio path. No GC interaction. Lower latency.
**Cons:** Requires rewriting `AudioPipeline.kt` into Rust. Oboe setup is more complex.
### Option D: Pin audio thread objects to prevent GC movement
Use JNI `GetPrimitiveArrayCritical` instead of `GetShortArrayRegion` to pin the array in memory during the operation. This prevents the GC from moving the array while we're using it.
**Implementation:** Change `nativeWriteAudio` / `nativeReadAudio` JNI functions to use critical sections.
### Recommendation
**Short term: Option B** (DirectByteBuffer) — reduces JNI transitions without major refactoring.
**Long term: Option C** (Oboe from Rust) — eliminates the problem entirely. This is the architecturally correct solution and matches the original design intent.
## Data Files
- Logcat from Nothing A059 (Android 16, API 36)
- Two crashes in the same session: 18:42 (capture, after 476s) and 19:17 (playout)
- Both SIGBUS/BUS_ADRERR, both in ART internal frames

View File

@@ -0,0 +1,175 @@
# Incident Report: Native Crash in Capture Thread — Use-After-Free on Engine Handle
**Date:** 2026-04-06
**Severity:** Critical — app crash (SIGSEGV) on call hangup
**Status:** Root-caused, fix pending
**Affects:** Android client only
## Summary
The app crashes with a native SIGSEGV during or shortly after call hangup. The crash occurs in JIT-compiled code inside `AudioPipeline.runCapture()`. The root cause is a use-after-free: the capture thread calls `engine.writeAudio()` via JNI after the engine's native handle has been freed by `teardown()` on the ViewModel thread.
## Crash Stacktrace
```
04-06 13:05:42.707 F DEBUG: #09 pc 000000000250696c /memfd:jit-cache (deleted) (com.wzp.audio.AudioPipeline.runCapture+3228)
04-06 13:05:42.707 F DEBUG: #14 pc 0000000000005270 <anonymous:730900d000> (com.wzp.audio.AudioPipeline.start$lambda$0+0)
04-06 13:05:42.708 F DEBUG: #19 pc 00000000000044cc <anonymous:730900d000> (com.wzp.audio.AudioPipeline.$r8$lambda$0rYcivupwvyN4SgBXhsroKmTlo8+0)
04-06 13:05:42.708 F DEBUG: #24 pc 00000000000042e4 <anonymous:730900d000> (com.wzp.audio.AudioPipeline$$ExternalSyntheticLambda0.run+0)
```
This is a tombstone (signal crash), not a Java exception. The `F DEBUG` tag indicates a native crash handler (debuggerd) captured the signal.
## Root Cause
### The Race Condition
Two threads operate on the engine concurrently without synchronization:
**Thread 1: `wzp-capture` (AudioRecord thread, MAX_PRIORITY)**
```kotlin
// AudioPipeline.runCapture() — runs in a tight loop
while (running) {
val read = recorder.read(pcm, 0, FRAME_SAMPLES)
if (read > 0) {
engine.writeAudio(pcm) // <-- JNI call to native engine
}
}
```
**Thread 2: ViewModel/UI thread (normal priority)**
```kotlin
// CallViewModel.teardown()
stopAudio() // sets AudioPipeline.running = false
engine?.stopCall() // tells Rust to stop
engine?.destroy() // frees native memory, sets nativeHandle = 0L
engine = null
```
### The Kotlin Guard is Insufficient
`WzpEngine.writeAudio()` has a guard:
```kotlin
fun writeAudio(pcm: ShortArray): Int {
if (nativeHandle == 0L) return 0 // check
return nativeWriteAudio(nativeHandle, pcm) // use
}
```
This is a **TOCTOU (time-of-check/time-of-use) race**:
1. Capture thread checks `nativeHandle != 0L` → true
2. ViewModel thread calls `destroy()`, which calls `nativeDestroy(handle)` then sets `nativeHandle = 0L`
3. Capture thread calls `nativeWriteAudio(handle, pcm)` with the now-freed handle
4. The JNI function dereferences `handle` as a pointer → **SIGSEGV**
The same race exists for `readAudio()` on the `wzp-playout` thread.
### Why `stopAudio()` Doesn't Prevent This
`AudioPipeline.stop()` sets `running = false` but does **NOT join or wait** for the threads:
```kotlin
fun stop() {
running = false
// Don't join — threads are parked as daemons to avoid native TLS crash
captureThread = null
playoutThread = null
}
```
The threads are intentionally not joined because of a separate bug: exiting a JNI-calling thread triggers a `SIGSEGV in OPENSSL_free` due to libcrypto TLS destructors on Android. The threads instead "park" with `Thread.sleep(Long.MAX_VALUE)` after the loop exits.
But the problem is the **window between `running = false` and the thread actually checking it**. The capture thread may be blocked in `recorder.read()` (which blocks for 20ms per frame) or in the middle of `engine.writeAudio()` when `destroy()` is called.
### Timeline of the Crash
```
T=0ms ViewModel: stopAudio() → sets running=false
T=0ms ViewModel: stopStatsPolling()
T=0ms ViewModel: engine.stopCall() — Rust stops internal tasks
T=1ms ViewModel: engine.destroy() — frees native memory
↑ nativeHandle = 0L
T=0-20ms Capture thread: still in recorder.read() or writeAudio()
→ if in writeAudio(), the nativeHandle check passed BEFORE destroy()
→ JNI dereferences freed pointer → SIGSEGV
```
## Affected Code
### Files with the race
| File | Line(s) | Issue |
|------|---------|-------|
| `android/.../WzpEngine.kt` | 107-108, 116-117 | TOCTOU on `nativeHandle` in `writeAudio()` / `readAudio()` |
| `android/.../CallViewModel.kt` | 257-262 | `stopAudio()` + `destroy()` without waiting for audio threads to quiesce |
| `android/.../AudioPipeline.kt` | 80-82 | `stop()` doesn't synchronize with running threads |
### Files with the thread parking workaround
| File | Line(s) | Context |
|------|---------|---------|
| `android/.../AudioPipeline.kt` | 57-58, 69-70 | Threads parked after loop exit to avoid libcrypto TLS crash |
| `android/.../AudioPipeline.kt` | 96-101 | `parkThread()``Thread.sleep(Long.MAX_VALUE)` |
## Constraints for the Fix
1. **Cannot join audio threads** — joining triggers a separate SIGSEGV in `OPENSSL_free` when the thread's TLS destructors fire (documented in `AudioPipeline.kt` comments). The parking workaround must be preserved.
2. **Must guarantee no JNI calls after `destroy()`** — the native handle is a raw pointer; any dereference after free is undefined behavior.
3. **Must not add blocking waits on the UI thread**`teardown()` runs on the ViewModel thread which must remain responsive.
4. **The `@Volatile running` flag is necessary but not sufficient** — it prevents new loop iterations but doesn't help with in-flight JNI calls.
5. **Both `writeAudio` and `readAudio` have the same race** — the fix must cover both the capture and playout paths.
## Reproduction
The crash is timing-dependent. It's most likely to occur when:
- The capture thread is in the middle of a `writeAudio()` JNI call when `destroy()` is called
- More likely on slower devices or under CPU pressure (GC, thermal throttling)
- Can happen on every hangup, but only crashes ~10-30% of the time due to the timing window
## Analysis of Possible Fix Approaches
### Approach A: Add a synchronization gate in the JNI bridge
Use a `ReentrantReadWriteLock` or `AtomicBoolean` in `WzpEngine.kt`:
- Audio threads acquire a read lock / check the flag before JNI calls
- `destroy()` acquires a write lock / sets the flag and waits for in-flight calls to drain
**Pro:** Clean, solves the race directly.
**Con:** Adding a lock to the audio hot path (every 20ms). `ReentrantReadWriteLock` is not lock-free. However, the read-lock path is uncontended 99.99% of the time (write-lock only during destroy), so contention is negligible.
### Approach B: Defer `destroy()` until audio threads have stopped
Instead of calling `destroy()` in `teardown()`, set a flag and have the audio threads call `destroy()` after they exit the loop (before parking).
**Pro:** No locks on hot path.
**Con:** Complex lifecycle — which thread calls destroy? What if both threads race to destroy? Need a `CountDownLatch` or similar.
### Approach C: Make the JNI handle atomically invalidated
Use `AtomicLong` for `nativeHandle` and use `compareAndExchange` in `destroy()` + `getAndCheck` pattern in audio calls.
**Pro:** Lock-free.
**Con:** Still has a TOCTOU window — the thread can load the handle, then it gets CAS'd to 0, then the thread uses the stale handle. Doesn't fully solve the race without combining with a reference count or epoch.
### Approach D: Introduce a destroy latch
Add a `CountDownLatch(1)` that audio threads wait on before parking. `teardown()` sets `running=false`, then `await`s the latch (with timeout), then calls `destroy()`. Each audio thread counts down the latch after exiting the loop.
Actually this needs a `CountDownLatch(2)` — one for each thread (capture + playout).
**Pro:** Guarantees no in-flight JNI calls at destroy time. No locks on hot path.
**Con:** `teardown()` blocks for up to one frame duration (~20ms) waiting for threads to exit their loops. Acceptable for a hangup path.
### Recommendation
**Approach D (destroy latch)** is the cleanest. The 20ms worst-case wait is imperceptible on the hangup path, and it provides a hard guarantee that no JNI calls are in flight when `destroy()` runs. Combined with the existing `running` volatile flag, the audio threads exit their loops within one frame and count down the latch.
If the latch times out (e.g., AudioRecord.read() is stuck), `destroy()` proceeds anyway — the `panic::catch_unwind` in the JNI bridge will catch the invalid access as a panic rather than a SIGSEGV (though this is best-effort; a true SIGSEGV from freed memory is not catchable).
## Data Files
The crash was captured from the Nothing A059 device at 13:05:42 on 2026-04-06. The tombstone is in the device's `/data/tombstones/` directory. The logcat output shows the crash frames.

View File

@@ -2,7 +2,10 @@
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no, viewport-fit=cover"
/>
<title>WarzonePhone</title>
<link rel="stylesheet" href="/src/style.css" />
</head>
@@ -21,7 +24,7 @@
</button>
</label>
<label>Room
<input id="room" type="text" value="android" />
<input id="room" type="text" value="general" />
</label>
<label>Alias
<input id="alias" type="text" placeholder="your name" />
@@ -33,7 +36,57 @@
</label>
<button id="settings-btn-home" class="icon-btn" title="Settings (Cmd+,)">&#9881;</button>
</div>
<button id="connect-btn" class="primary">Connect</button>
<!-- Mode toggle -->
<div class="mode-toggle" style="display:flex;gap:8px;margin-bottom:8px;">
<button id="mode-room" class="mode-btn active" style="flex:1">Room</button>
<button id="mode-direct" class="mode-btn" style="flex:1">Direct Call</button>
</div>
<!-- Room mode (default) -->
<div id="room-mode">
<button id="connect-btn" class="primary">Connect</button>
</div>
<!-- Direct call mode -->
<div id="direct-mode" class="hidden">
<button id="register-btn" class="primary" style="background:#2196F3">Register on Relay</button>
<div id="direct-registered" class="hidden" style="margin-top:12px">
<div class="direct-registered-header">
<p id="registered-status" style="color:var(--green);font-size:13px;margin:0">&#x2705; Registered — waiting for calls</p>
<button id="deregister-btn" class="secondary-btn small">Deregister</button>
</div>
<div id="incoming-call-panel" class="hidden" style="background:#1B5E20;padding:12px;border-radius:8px;margin:8px 0">
<p style="font-weight:bold;margin:0 0 4px 0">Incoming Call</p>
<p id="incoming-caller" style="font-size:12px;opacity:0.8;margin:0 0 8px 0">From: unknown</p>
<div style="display:flex;gap:8px">
<button id="accept-call-btn" style="flex:1;background:var(--green);color:white;border:none;padding:8px;border-radius:6px;cursor:pointer">Accept</button>
<button id="reject-call-btn" style="flex:1;background:var(--red);color:white;border:none;padding:8px;border-radius:6px;cursor:pointer">Reject</button>
</div>
</div>
<!-- Recent contacts -->
<div id="recent-contacts-section" class="hidden">
<div class="history-header">Recent contacts</div>
<div id="recent-contacts-list" class="history-list"></div>
</div>
<!-- Call history -->
<div id="call-history-section" class="hidden">
<div class="history-header">
History
<button id="clear-history-btn" class="link-btn">clear</button>
</div>
<div id="call-history-list" class="history-list"></div>
</div>
<label style="margin-top:8px">Call by fingerprint
<input id="target-fp" type="text" placeholder="xxxx:xxxx:xxxx:..." />
</label>
<button id="call-btn" class="primary" style="margin-top:8px">Call</button>
<p id="call-status-text" style="color:var(--yellow);font-size:13px;margin-top:4px"></p>
</div>
</div>
<p id="connect-error" class="error"></p>
</div>
<div class="identity-info">
@@ -58,6 +111,16 @@
<div class="level-meter">
<div id="level-bar" class="level-bar-fill"></div>
</div>
<!-- Direct-call phone layout — shown instead of the group
participant list when directCallPeer is set. Centered
identicon, name, fp, connection badge. Hidden for
room calls (directCallPeer == null). -->
<div id="direct-call-view" class="direct-call-view hidden">
<div id="dc-identicon" class="dc-identicon"></div>
<div id="dc-name" class="dc-name">Unknown</div>
<div id="dc-fp" class="dc-fp"></div>
<div id="dc-badge" class="dc-badge">Connecting...</div>
</div>
<div id="participants" class="participants"></div>
<div class="controls">
<button id="mic-btn" class="control-btn" title="Toggle Mic (m)">
@@ -91,6 +154,23 @@
</div>
<div class="settings-section">
<h3>Audio</h3>
<div class="quality-control">
<div class="quality-header">
<span class="setting-label">QUALITY</span>
<span id="s-quality-label" class="quality-label">Auto</span>
</div>
<input id="s-quality" type="range" min="0" max="7" step="1" value="3" class="quality-slider" />
<div class="quality-ticks">
<span>64k</span>
<span>48k</span>
<span>32k</span>
<span>Auto</span>
<span>24k</span>
<span>6k</span>
<span>C2</span>
<span>1.2k</span>
</div>
</div>
<label class="checkbox">
<input id="s-os-aec" type="checkbox" />
OS Echo Cancellation (macOS VoiceProcessingIO)
@@ -99,6 +179,29 @@
<input id="s-agc" type="checkbox" checked />
Automatic Gain Control
</label>
<label class="checkbox">
<input id="s-dred-debug" type="checkbox" />
DRED debug logs (verbose, dev only)
</label>
<label class="checkbox">
<input id="s-call-debug" type="checkbox" />
Call flow debug logs (trace every step of a call)
</label>
</div>
<div class="settings-section" id="s-call-debug-section" style="display:none">
<h3>Call Debug Log</h3>
<div id="s-call-debug-log" style="max-height:220px;overflow-y:auto;background:#0a0a0a;color:#e0e0e0;font-family:ui-monospace,Menlo,Monaco,'Courier New',monospace;font-size:10px;padding:6px;border-radius:4px;line-height:1.4;white-space:pre-wrap"></div>
<div style="display:flex;gap:6px;margin-top:6px">
<button id="s-call-debug-copy" class="secondary-btn" style="flex:1">Copy log</button>
<button id="s-call-debug-share" class="secondary-btn" style="flex:1">Share</button>
<button id="s-call-debug-clear" class="secondary-btn" style="flex:1">Clear log</button>
</div>
<small id="s-call-debug-copy-status" style="display:block;margin-top:4px;color:var(--text-dim);font-size:10px"></small>
<small style="color:var(--text-dim);display:block;margin-top:4px">
Rolling buffer of the last 200 call-flow events. Turned off by
default — the GUI overlay only populates when the checkbox above
is on, but logcat (adb) always keeps a copy regardless.
</small>
</div>
<div class="settings-section">
<h3>Identity</h3>
@@ -111,6 +214,29 @@
<span class="fp-display">~/.wzp/identity</span>
</div>
</div>
<div class="settings-section">
<h3>Network</h3>
<div class="setting-row">
<span class="setting-label">Public address</span>
<span id="s-reflected-addr" class="fp-display">(not queried)</span>
<button id="s-reflect-btn" class="secondary-btn">Detect</button>
</div>
<small style="color:var(--text-dim);display:block;margin-top:4px">
Asks the registered relay to echo back the IP:port it sees for this
connection (QUIC-native NAT reflection, replaces STUN).
</small>
<div class="setting-row" style="margin-top:10px">
<span class="setting-label">NAT type</span>
<span id="s-nat-type" class="fp-display">(not detected)</span>
<button id="s-nat-detect-btn" class="secondary-btn">Detect NAT</button>
</div>
<div id="s-nat-probes" style="margin-top:6px;font-size:11px;color:var(--text-dim)"></div>
<small style="color:var(--text-dim);display:block;margin-top:4px">
Probes every configured relay in parallel and compares the results
to classify the NAT: cone (P2P viable), symmetric (must relay),
multiple, or unknown.
</small>
</div>
<div class="settings-section">
<h3>Recent Rooms</h3>
<div id="s-recent-rooms" class="recent-rooms-list"></div>
@@ -137,6 +263,28 @@
</div>
</div>
</div>
<!-- Key changed warning dialog -->
<div id="key-warning" class="hidden">
<div class="settings-card key-warning-card">
<div class="key-warning-icon">&#9888;</div>
<h2>Server Key Changed</h2>
<p class="key-warning-text">The relay's identity has changed since you last connected. This usually happens when the server was restarted, but could also indicate a security issue.</p>
<div class="key-warning-fps">
<div class="key-fp-row">
<span class="key-fp-label">Previously known</span>
<code id="kw-old-fp" class="key-fp"></code>
</div>
<div class="key-fp-row">
<span class="key-fp-label">New key</span>
<code id="kw-new-fp" class="key-fp"></code>
</div>
</div>
<div class="key-warning-actions">
<button id="kw-accept" class="primary">Accept New Key</button>
<button id="kw-cancel" class="secondary-btn">Cancel</button>
</div>
</div>
</div>
</div>
<script type="module" src="/src/main.ts"></script>
</body>

View File

@@ -5,12 +5,38 @@ edition = "2024"
description = "WarzonePhone Desktop — encrypted VoIP client"
default-run = "wzp-desktop"
# Library target — required for Tauri mobile (Android/iOS link the app as a cdylib)
# and also used by the desktop binary below.
#
# `staticlib` was DROPPED from crate-type because rust-lang/rust#104707
# documents that having staticlib alongside cdylib leaks non-exported
# symbols from staticlibs into the cdylib. Bionic's private `__init_tcb`
# / `pthread_create` symbols end up bound LOCALLY inside our .so instead
# of resolved dynamically against libc.so at dlopen time — which crashes
# at launch as soon as tao tries to std::thread::spawn() from the JNI
# onCreate callback. The legacy wzp-android crate uses ["cdylib", "rlib"]
# and runs fine on the same phone with the same NDK + Rust toolchain.
#
# iOS Tauri builds that actually need staticlib can re-add it behind a
# target cfg if we ever ship on iOS.
[lib]
name = "wzp_desktop_lib"
crate-type = ["cdylib", "rlib"]
[[bin]]
name = "wzp-desktop"
path = "src/main.rs"
[build-dependencies]
tauri-build = { version = "2", features = [] }
# cc is no longer needed — all C++ moved to crates/wzp-native (built with
# cargo-ndk and loaded via libloading at runtime). wzp-desktop's .so on
# Android is now pure Rust.
[dependencies]
tauri = { version = "2", features = [] }
tauri-plugin-shell = "2"
tauri-plugin-notification = "2"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
@@ -19,18 +45,64 @@ tracing-subscriber = "0.3"
anyhow = "1"
rustls = { version = "0.23", default-features = false, features = ["ring", "std"] }
# WarzonePhone crates
# WarzonePhone crates — protocol layer is platform-independent
wzp-proto = { path = "../../crates/wzp-proto" }
wzp-codec = { path = "../../crates/wzp-codec" }
wzp-fec = { path = "../../crates/wzp-fec" }
wzp-crypto = { path = "../../crates/wzp-crypto" }
wzp-transport = { path = "../../crates/wzp-transport" }
# wzp-client pulls in CPAL on every desktop target and, additionally on
# macOS, VoiceProcessingIO (coreaudio-rs behind the "vpio" feature). The
# vpio feature MUST NOT be enabled on Windows / Linux because coreaudio-rs
# is Apple-framework-only and will fail to build. Task #24 will add a
# matching Windows Voice Capture DSP path behind its own feature; until
# then, Windows desktops use plain CPAL with AEC disabled.
# macOS: CPAL + VoiceProcessingIO (hardware AEC via Core Audio).
[target.'cfg(target_os = "macos")'.dependencies]
wzp-client = { path = "../../crates/wzp-client", features = ["audio", "vpio"] }
# Platform-specific
[target.'cfg(target_os = "macos")'.dependencies]
coreaudio-rs = "0.11"
# Windows: CPAL for playback + direct WASAPI for capture with OS-level
# AEC (AudioCategory_Communications). The wzp-client `windows-aec`
# feature swaps the default CPAL AudioCapture for a WASAPI one that
# opens the mic under AudioCategory_Communications, turning on Windows's
# communications audio processing chain (AEC, NS, AGC). The reference
# signal for AEC is the system render mix, so echo from our CPAL
# playback is cancelled automatically without extra plumbing.
[target.'cfg(target_os = "windows")'.dependencies]
wzp-client = { path = "../../crates/wzp-client", features = ["audio", "windows-aec"] }
# Linux: CPAL playback+capture baseline. AEC is enabled via the top-level
# `linux-aec` feature in wzp-desktop, which forwards to wzp-client/linux-aec.
# Keeping it opt-in at the wzp-desktop level (rather than forcing it always
# on here) lets `cargo tauri build` produce two variants from the same
# source tree — a noAEC baseline and an AEC build — by toggling the feature
# at build time: `cargo tauri build -- --features wzp-desktop/linux-aec`.
[target.'cfg(target_os = "linux")'.dependencies]
wzp-client = { path = "../../crates/wzp-client", features = ["audio"] }
# Android: no CPAL, no vpio — audio goes through the standalone wzp-native
# cdylib that we dlopen via libloading at runtime. See the wzp_native
# module in src/.
[target.'cfg(target_os = "android")'.dependencies]
wzp-client = { path = "../../crates/wzp-client", default-features = false }
# libloading: runtime dlopen of libwzp_native.so — the standalone cdylib
# crate that owns all C++ (Oboe bridge). Keeps wzp-desktop's .so free of
# any C/C++ static archives that would otherwise leak bionic's internal
# pthread_create into our cdylib and trigger the __init_tcb crash.
libloading = "0.8"
# jni + ndk-context: called from android_audio.rs to invoke
# AudioManager.setSpeakerphoneOn on the JVM side at runtime, so the
# Oboe playout stream (opened with Usage::VoiceCommunication) can route
# between earpiece and loud speaker without restarting.
jni = "0.21"
ndk-context = "0.1"
[features]
default = ["custom-protocol"]
custom-protocol = ["tauri/custom-protocol"]
# linux-aec: forwards to wzp-client/linux-aec so `cargo tauri build -- --features
# wzp-desktop/linux-aec` enables the WebRTC AEC3 backend on Linux. No-op on
# other targets because wzp-client/linux-aec is itself cfg(target_os = "linux").
linux-aec = ["wzp-client/linux-aec"]

View File

@@ -0,0 +1,21 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<!--
Custom Info.plist keys merged into the bundled WarzonePhone.app by
tauri-bundler. The base Info.plist (CFBundleIdentifier, version,
etc.) is generated from tauri.conf.json — only put *additional*
keys here.
NSMicrophoneUsageDescription is required by macOS TCC for any
app that opens an audio input unit. Without this string the OS
silently denies CoreAudio capture (input callbacks return zeros)
and the app never appears in System Settings → Privacy &
Security → Microphone. This was the root cause of the desktop
mic regression where phones could not hear the desktop client.
-->
<key>NSMicrophoneUsageDescription</key>
<string>WarzonePhone needs microphone access to transmit your voice during calls.</string>
</dict>
</plist>

View File

@@ -1,3 +1,26 @@
use std::process::Command;
fn main() {
// Capture short git hash so the running app can prove which build it is.
// Falls back to "unknown" if git isn't available (e.g. when building from
// a tarball without a .git dir).
let git_hash = Command::new("git")
.args(["rev-parse", "--short", "HEAD"])
.output()
.ok()
.filter(|o| o.status.success())
.and_then(|o| String::from_utf8(o.stdout).ok())
.map(|s| s.trim().to_string())
.unwrap_or_else(|| "unknown".into());
println!("cargo:rustc-env=WZP_GIT_HASH={git_hash}");
println!("cargo:rerun-if-changed=../../.git/HEAD");
println!("cargo:rerun-if-changed=../../.git/refs/heads");
// No cc::Build of ANY kind on Android — all C++ lives in the standalone
// `wzp-native` crate which is built separately with cargo-ndk and loaded
// via libloading at runtime. See docs/incident-tauri-android-init-tcb.md
// for why this split exists.
tauri_build::build()
}

View File

@@ -0,0 +1,30 @@
{
"$schema": "../gen/schemas/desktop-schema.json",
"identifier": "default",
"description": "Default capability — grants core APIs (events, path, window, app, clipboard) to the main window on every platform we ship to.",
"windows": ["main"],
"platforms": [
"linux",
"macOS",
"windows",
"android",
"iOS"
],
"permissions": [
"core:default",
"core:event:default",
"core:event:allow-listen",
"core:event:allow-unlisten",
"core:event:allow-emit",
"core:event:allow-emit-to",
"core:path:default",
"core:window:default",
"core:app:default",
"core:webview:default",
"shell:default",
"notification:default",
"notification:allow-notify",
"notification:allow-request-permission",
"notification:allow-is-permission-granted"
]
}

View File

@@ -0,0 +1,40 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android">
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-feature android:name="android.hardware.microphone" android:required="true" />
<!-- AndroidTV support -->
<uses-feature android:name="android.software.leanback" android:required="false" />
<application
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:theme="@style/Theme.wzp_desktop"
android:usesCleartextTraffic="${usesCleartextTraffic}">
<activity
android:configChanges="orientation|keyboardHidden|keyboard|screenSize|locale|smallestScreenSize|screenLayout|uiMode"
android:launchMode="singleTask"
android:label="@string/main_activity_title"
android:name=".MainActivity"
android:exported="true">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
<!-- AndroidTV support -->
<category android:name="android.intent.category.LEANBACK_LAUNCHER" />
</intent-filter>
</activity>
<provider
android:name="androidx.core.content.FileProvider"
android:authorities="${applicationId}.fileprovider"
android:exported="false"
android:grantUriPermissions="true">
<meta-data
android:name="android.support.FILE_PROVIDER_PATHS"
android:resource="@xml/file_paths" />
</provider>
</application>
</manifest>

View File

@@ -0,0 +1,103 @@
package com.wzp.desktop
import android.Manifest
import android.content.Context
import android.content.pm.PackageManager
import android.media.AudioManager
import android.os.Bundle
import android.util.Log
import androidx.activity.enableEdgeToEdge
import androidx.core.app.ActivityCompat
import androidx.core.content.ContextCompat
class MainActivity : TauriActivity() {
companion object {
private const val TAG = "WzpMainActivity"
private const val AUDIO_PERMISSIONS_REQUEST = 4242
private val REQUIRED_AUDIO_PERMISSIONS = arrayOf(
Manifest.permission.RECORD_AUDIO,
Manifest.permission.MODIFY_AUDIO_SETTINGS
)
}
override fun onCreate(savedInstanceState: Bundle?) {
enableEdgeToEdge()
super.onCreate(savedInstanceState)
// Request RECORD_AUDIO early so Oboe (inside libwzp_native.so) can open
// the AAudio input stream without silently failing. The grant is
// persisted, so after the first launch the dialog no longer appears.
// MODIFY_AUDIO_SETTINGS is needed to switch AudioManager mode + speaker.
val needsRequest = REQUIRED_AUDIO_PERMISSIONS.any {
ContextCompat.checkSelfPermission(this, it) != PackageManager.PERMISSION_GRANTED
}
if (needsRequest) {
Log.i(TAG, "requesting audio permissions")
ActivityCompat.requestPermissions(this, REQUIRED_AUDIO_PERMISSIONS, AUDIO_PERMISSIONS_REQUEST)
} else {
Log.i(TAG, "audio permissions already granted")
configureAudioForCall()
}
}
override fun onRequestPermissionsResult(
requestCode: Int,
permissions: Array<String>,
grantResults: IntArray
) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults)
if (requestCode == AUDIO_PERMISSIONS_REQUEST) {
val allGranted = grantResults.isNotEmpty() &&
grantResults.all { it == PackageManager.PERMISSION_GRANTED }
Log.i(TAG, "audio permissions result: allGranted=$allGranted grants=${grantResults.toList()}")
if (allGranted) {
configureAudioForCall()
}
}
}
/**
* Put the phone into VoIP call mode with handset (earpiece) as the
* default output. The Oboe playout stream is opened with
* Usage::VoiceCommunication which honours this routing, so:
*
* MODE_IN_COMMUNICATION + speakerphoneOn=false → earpiece (handset)
* MODE_IN_COMMUNICATION + speakerphoneOn=true → loudspeaker
* MODE_IN_COMMUNICATION + bluetoothScoOn=true → bluetooth headset
*
* The speaker/handset/BT toggle itself is wired up via the Tauri
* command `set_speakerphone(on)` in a follow-up build. For now the
* default is handset, matching the user's stated preference.
*
* STREAM_VOICE_CALL volume is cranked to max since the in-call volume
* slider is separate from media volume on most devices.
*/
/**
* Pre-flight: only set volumes. Do NOT set MODE_IN_COMMUNICATION here —
* that hijacks the entire audio routing (music stops, BT A2DP drops to
* earpiece) even before a call starts. The Rust side sets the mode via
* JNI when the call engine actually starts, and restores MODE_NORMAL
* when the call ends.
*/
private fun configureAudioForCall() {
try {
val am = getSystemService(Context.AUDIO_SERVICE) as AudioManager
Log.i(TAG, "audio state: mode=${am.mode} speaker=${am.isSpeakerphoneOn} " +
"voiceVol=${am.getStreamVolume(AudioManager.STREAM_VOICE_CALL)}/" +
"${am.getStreamMaxVolume(AudioManager.STREAM_VOICE_CALL)} " +
"musicVol=${am.getStreamVolume(AudioManager.STREAM_MUSIC)}/" +
"${am.getStreamMaxVolume(AudioManager.STREAM_MUSIC)}")
// Crank both voice-call and music volumes so nothing silent slips
// through regardless of which stream actually ends up driving.
val maxVoice = am.getStreamMaxVolume(AudioManager.STREAM_VOICE_CALL)
am.setStreamVolume(AudioManager.STREAM_VOICE_CALL, maxVoice, 0)
val maxMusic = am.getStreamMaxVolume(AudioManager.STREAM_MUSIC)
am.setStreamVolume(AudioManager.STREAM_MUSIC, maxMusic, 0)
Log.i(TAG, "volumes set: voiceVol=$maxVoice musicVol=$maxMusic (mode left at ${am.mode})")
} catch (e: Throwable) {
Log.e(TAG, "configureAudioForCall failed: ${e.message}", e)
}
}
}

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
{}
{"default":{"identifier":"default","description":"Default capability — grants core APIs (events, path, window, app, clipboard) to the main window on every platform we ship to.","local":true,"windows":["main"],"permissions":["core:default","core:event:default","core:event:allow-listen","core:event:allow-unlisten","core:event:allow-emit","core:event:allow-emit-to","core:path:default","core:window:default","core:app:default","core:webview:default","shell:default","notification:default","notification:allow-notify","notification:allow-request-permission","notification:allow-is-permission-granted"],"platforms":["linux","macOS","windows","android","iOS"]}}

View File

@@ -2354,6 +2354,204 @@
"const": "core:window:deny-unminimize",
"markdownDescription": "Denies the unminimize command without any pre-configured scope."
},
{
"description": "This permission set configures which\nnotification features are by default exposed.\n\n#### Granted Permissions\n\nIt allows all notification related features.\n\n\n#### This default permission set includes:\n\n- `allow-is-permission-granted`\n- `allow-request-permission`\n- `allow-notify`\n- `allow-register-action-types`\n- `allow-register-listener`\n- `allow-cancel`\n- `allow-get-pending`\n- `allow-remove-active`\n- `allow-get-active`\n- `allow-check-permissions`\n- `allow-show`\n- `allow-batch`\n- `allow-list-channels`\n- `allow-delete-channel`\n- `allow-create-channel`\n- `allow-permission-state`",
"type": "string",
"const": "notification:default",
"markdownDescription": "This permission set configures which\nnotification features are by default exposed.\n\n#### Granted Permissions\n\nIt allows all notification related features.\n\n\n#### This default permission set includes:\n\n- `allow-is-permission-granted`\n- `allow-request-permission`\n- `allow-notify`\n- `allow-register-action-types`\n- `allow-register-listener`\n- `allow-cancel`\n- `allow-get-pending`\n- `allow-remove-active`\n- `allow-get-active`\n- `allow-check-permissions`\n- `allow-show`\n- `allow-batch`\n- `allow-list-channels`\n- `allow-delete-channel`\n- `allow-create-channel`\n- `allow-permission-state`"
},
{
"description": "Enables the batch command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-batch",
"markdownDescription": "Enables the batch command without any pre-configured scope."
},
{
"description": "Enables the cancel command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-cancel",
"markdownDescription": "Enables the cancel command without any pre-configured scope."
},
{
"description": "Enables the check_permissions command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-check-permissions",
"markdownDescription": "Enables the check_permissions command without any pre-configured scope."
},
{
"description": "Enables the create_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-create-channel",
"markdownDescription": "Enables the create_channel command without any pre-configured scope."
},
{
"description": "Enables the delete_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-delete-channel",
"markdownDescription": "Enables the delete_channel command without any pre-configured scope."
},
{
"description": "Enables the get_active command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-get-active",
"markdownDescription": "Enables the get_active command without any pre-configured scope."
},
{
"description": "Enables the get_pending command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-get-pending",
"markdownDescription": "Enables the get_pending command without any pre-configured scope."
},
{
"description": "Enables the is_permission_granted command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-is-permission-granted",
"markdownDescription": "Enables the is_permission_granted command without any pre-configured scope."
},
{
"description": "Enables the list_channels command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-list-channels",
"markdownDescription": "Enables the list_channels command without any pre-configured scope."
},
{
"description": "Enables the notify command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-notify",
"markdownDescription": "Enables the notify command without any pre-configured scope."
},
{
"description": "Enables the permission_state command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-permission-state",
"markdownDescription": "Enables the permission_state command without any pre-configured scope."
},
{
"description": "Enables the register_action_types command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-register-action-types",
"markdownDescription": "Enables the register_action_types command without any pre-configured scope."
},
{
"description": "Enables the register_listener command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-register-listener",
"markdownDescription": "Enables the register_listener command without any pre-configured scope."
},
{
"description": "Enables the remove_active command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-remove-active",
"markdownDescription": "Enables the remove_active command without any pre-configured scope."
},
{
"description": "Enables the request_permission command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-request-permission",
"markdownDescription": "Enables the request_permission command without any pre-configured scope."
},
{
"description": "Enables the show command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-show",
"markdownDescription": "Enables the show command without any pre-configured scope."
},
{
"description": "Denies the batch command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-batch",
"markdownDescription": "Denies the batch command without any pre-configured scope."
},
{
"description": "Denies the cancel command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-cancel",
"markdownDescription": "Denies the cancel command without any pre-configured scope."
},
{
"description": "Denies the check_permissions command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-check-permissions",
"markdownDescription": "Denies the check_permissions command without any pre-configured scope."
},
{
"description": "Denies the create_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-create-channel",
"markdownDescription": "Denies the create_channel command without any pre-configured scope."
},
{
"description": "Denies the delete_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-delete-channel",
"markdownDescription": "Denies the delete_channel command without any pre-configured scope."
},
{
"description": "Denies the get_active command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-get-active",
"markdownDescription": "Denies the get_active command without any pre-configured scope."
},
{
"description": "Denies the get_pending command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-get-pending",
"markdownDescription": "Denies the get_pending command without any pre-configured scope."
},
{
"description": "Denies the is_permission_granted command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-is-permission-granted",
"markdownDescription": "Denies the is_permission_granted command without any pre-configured scope."
},
{
"description": "Denies the list_channels command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-list-channels",
"markdownDescription": "Denies the list_channels command without any pre-configured scope."
},
{
"description": "Denies the notify command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-notify",
"markdownDescription": "Denies the notify command without any pre-configured scope."
},
{
"description": "Denies the permission_state command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-permission-state",
"markdownDescription": "Denies the permission_state command without any pre-configured scope."
},
{
"description": "Denies the register_action_types command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-register-action-types",
"markdownDescription": "Denies the register_action_types command without any pre-configured scope."
},
{
"description": "Denies the register_listener command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-register-listener",
"markdownDescription": "Denies the register_listener command without any pre-configured scope."
},
{
"description": "Denies the remove_active command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-remove-active",
"markdownDescription": "Denies the remove_active command without any pre-configured scope."
},
{
"description": "Denies the request_permission command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-request-permission",
"markdownDescription": "Denies the request_permission command without any pre-configured scope."
},
{
"description": "Denies the show command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-show",
"markdownDescription": "Denies the show command without any pre-configured scope."
},
{
"description": "This permission set configures which\nshell functionality is exposed by default.\n\n#### Granted Permissions\n\nIt allows to use the `open` functionality with a reasonable\nscope pre-configured. It will allow opening `http(s)://`,\n`tel:` and `mailto:` links.\n\n#### This default permission set includes:\n\n- `allow-open`",
"type": "string",

View File

@@ -2354,6 +2354,204 @@
"const": "core:window:deny-unminimize",
"markdownDescription": "Denies the unminimize command without any pre-configured scope."
},
{
"description": "This permission set configures which\nnotification features are by default exposed.\n\n#### Granted Permissions\n\nIt allows all notification related features.\n\n\n#### This default permission set includes:\n\n- `allow-is-permission-granted`\n- `allow-request-permission`\n- `allow-notify`\n- `allow-register-action-types`\n- `allow-register-listener`\n- `allow-cancel`\n- `allow-get-pending`\n- `allow-remove-active`\n- `allow-get-active`\n- `allow-check-permissions`\n- `allow-show`\n- `allow-batch`\n- `allow-list-channels`\n- `allow-delete-channel`\n- `allow-create-channel`\n- `allow-permission-state`",
"type": "string",
"const": "notification:default",
"markdownDescription": "This permission set configures which\nnotification features are by default exposed.\n\n#### Granted Permissions\n\nIt allows all notification related features.\n\n\n#### This default permission set includes:\n\n- `allow-is-permission-granted`\n- `allow-request-permission`\n- `allow-notify`\n- `allow-register-action-types`\n- `allow-register-listener`\n- `allow-cancel`\n- `allow-get-pending`\n- `allow-remove-active`\n- `allow-get-active`\n- `allow-check-permissions`\n- `allow-show`\n- `allow-batch`\n- `allow-list-channels`\n- `allow-delete-channel`\n- `allow-create-channel`\n- `allow-permission-state`"
},
{
"description": "Enables the batch command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-batch",
"markdownDescription": "Enables the batch command without any pre-configured scope."
},
{
"description": "Enables the cancel command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-cancel",
"markdownDescription": "Enables the cancel command without any pre-configured scope."
},
{
"description": "Enables the check_permissions command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-check-permissions",
"markdownDescription": "Enables the check_permissions command without any pre-configured scope."
},
{
"description": "Enables the create_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-create-channel",
"markdownDescription": "Enables the create_channel command without any pre-configured scope."
},
{
"description": "Enables the delete_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-delete-channel",
"markdownDescription": "Enables the delete_channel command without any pre-configured scope."
},
{
"description": "Enables the get_active command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-get-active",
"markdownDescription": "Enables the get_active command without any pre-configured scope."
},
{
"description": "Enables the get_pending command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-get-pending",
"markdownDescription": "Enables the get_pending command without any pre-configured scope."
},
{
"description": "Enables the is_permission_granted command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-is-permission-granted",
"markdownDescription": "Enables the is_permission_granted command without any pre-configured scope."
},
{
"description": "Enables the list_channels command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-list-channels",
"markdownDescription": "Enables the list_channels command without any pre-configured scope."
},
{
"description": "Enables the notify command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-notify",
"markdownDescription": "Enables the notify command without any pre-configured scope."
},
{
"description": "Enables the permission_state command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-permission-state",
"markdownDescription": "Enables the permission_state command without any pre-configured scope."
},
{
"description": "Enables the register_action_types command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-register-action-types",
"markdownDescription": "Enables the register_action_types command without any pre-configured scope."
},
{
"description": "Enables the register_listener command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-register-listener",
"markdownDescription": "Enables the register_listener command without any pre-configured scope."
},
{
"description": "Enables the remove_active command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-remove-active",
"markdownDescription": "Enables the remove_active command without any pre-configured scope."
},
{
"description": "Enables the request_permission command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-request-permission",
"markdownDescription": "Enables the request_permission command without any pre-configured scope."
},
{
"description": "Enables the show command without any pre-configured scope.",
"type": "string",
"const": "notification:allow-show",
"markdownDescription": "Enables the show command without any pre-configured scope."
},
{
"description": "Denies the batch command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-batch",
"markdownDescription": "Denies the batch command without any pre-configured scope."
},
{
"description": "Denies the cancel command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-cancel",
"markdownDescription": "Denies the cancel command without any pre-configured scope."
},
{
"description": "Denies the check_permissions command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-check-permissions",
"markdownDescription": "Denies the check_permissions command without any pre-configured scope."
},
{
"description": "Denies the create_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-create-channel",
"markdownDescription": "Denies the create_channel command without any pre-configured scope."
},
{
"description": "Denies the delete_channel command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-delete-channel",
"markdownDescription": "Denies the delete_channel command without any pre-configured scope."
},
{
"description": "Denies the get_active command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-get-active",
"markdownDescription": "Denies the get_active command without any pre-configured scope."
},
{
"description": "Denies the get_pending command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-get-pending",
"markdownDescription": "Denies the get_pending command without any pre-configured scope."
},
{
"description": "Denies the is_permission_granted command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-is-permission-granted",
"markdownDescription": "Denies the is_permission_granted command without any pre-configured scope."
},
{
"description": "Denies the list_channels command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-list-channels",
"markdownDescription": "Denies the list_channels command without any pre-configured scope."
},
{
"description": "Denies the notify command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-notify",
"markdownDescription": "Denies the notify command without any pre-configured scope."
},
{
"description": "Denies the permission_state command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-permission-state",
"markdownDescription": "Denies the permission_state command without any pre-configured scope."
},
{
"description": "Denies the register_action_types command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-register-action-types",
"markdownDescription": "Denies the register_action_types command without any pre-configured scope."
},
{
"description": "Denies the register_listener command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-register-listener",
"markdownDescription": "Denies the register_listener command without any pre-configured scope."
},
{
"description": "Denies the remove_active command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-remove-active",
"markdownDescription": "Denies the remove_active command without any pre-configured scope."
},
{
"description": "Denies the request_permission command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-request-permission",
"markdownDescription": "Denies the request_permission command without any pre-configured scope."
},
{
"description": "Denies the show command without any pre-configured scope.",
"type": "string",
"const": "notification:deny-show",
"markdownDescription": "Denies the show command without any pre-configured scope."
},
{
"description": "This permission set configures which\nshell functionality is exposed by default.\n\n#### Granted Permissions\n\nIt allows to use the `open` functionality with a reasonable\nscope pre-configured. It will allow opening `http(s)://`,\n`tel:` and `mailto:` links.\n\n#### This default permission set includes:\n\n- `allow-open`",
"type": "string",

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

View File

@@ -0,0 +1,359 @@
//! Runtime bridge to Android's `AudioManager` for in-call audio routing.
//!
//! We own a quinn+Oboe VoIP pipeline entirely from Rust, but routing the
//! playout stream between earpiece / loudspeaker / Bluetooth headset has to
//! happen at the JVM level because those toggles are AudioManager-only.
//! This module uses the global JavaVM handle that `ndk_context` exposes
//! (populated by Tauri's mobile runtime) + the `jni` crate to reach into
//! the Android framework without needing a Tauri plugin.
//!
//! All callers must be inside an Android target (`#[cfg(target_os = "android")]`).
#![cfg(target_os = "android")]
use jni::objects::{JObject, JString, JValue};
use jni::JavaVM;
/// Grab the JavaVM + current Activity from the ndk_context that Tauri's
/// mobile runtime sets up at process startup.
fn jvm_and_activity() -> Result<(JavaVM, JObject<'static>), String> {
let ctx = ndk_context::android_context();
let vm_ptr = ctx.vm() as *mut jni::sys::JavaVM;
if vm_ptr.is_null() {
return Err("ndk_context: JavaVM pointer is null".into());
}
let vm = unsafe { JavaVM::from_raw(vm_ptr) }
.map_err(|e| format!("JavaVM::from_raw: {e}"))?;
let activity_ptr = ctx.context() as jni::sys::jobject;
if activity_ptr.is_null() {
return Err("ndk_context: activity pointer is null".into());
}
// SAFETY: ndk_context guarantees the pointer lives for the process
// lifetime; we wrap it as a JObject<'static> for convenience.
let activity: JObject<'static> = unsafe { JObject::from_raw(activity_ptr) };
Ok((vm, activity))
}
/// Get Android's `AudioManager` via `activity.getSystemService("audio")`.
fn audio_manager<'local>(
env: &mut jni::AttachGuard<'local>,
activity: &JObject<'local>,
) -> Result<JObject<'local>, String> {
let svc_name: JString<'local> = env
.new_string("audio")
.map_err(|e| format!("new_string(audio): {e}"))?;
let am = env
.call_method(
activity,
"getSystemService",
"(Ljava/lang/String;)Ljava/lang/Object;",
&[JValue::Object(&svc_name)],
)
.and_then(|v| v.l())
.map_err(|e| format!("getSystemService(audio): {e}"))?;
if am.is_null() {
return Err("getSystemService returned null".into());
}
Ok(am)
}
/// Set `AudioManager.MODE_IN_COMMUNICATION`. Call when a VoIP call starts.
/// This tells the audio policy to route through the communication device
/// path (earpiece/BT SCO) instead of the media path (speaker/BT A2DP).
pub fn set_audio_mode_communication() -> Result<(), String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
// MODE_IN_COMMUNICATION = 3
env.call_method(&am, "setMode", "(I)V", &[JValue::Int(3)])
.map_err(|e| format!("setMode(MODE_IN_COMMUNICATION): {e}"))?;
tracing::info!("AudioManager: mode set to MODE_IN_COMMUNICATION");
Ok(())
}
/// Restore `AudioManager.MODE_NORMAL`. Call when a VoIP call ends.
pub fn set_audio_mode_normal() -> Result<(), String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
// MODE_NORMAL = 0
env.call_method(&am, "setMode", "(I)V", &[JValue::Int(0)])
.map_err(|e| format!("setMode(MODE_NORMAL): {e}"))?;
tracing::info!("AudioManager: mode set to MODE_NORMAL");
Ok(())
}
/// Switch between loud speaker (`true`) and earpiece/handset (`false`).
pub fn set_speakerphone(on: bool) -> Result<(), String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
env.call_method(
&am,
"setSpeakerphoneOn",
"(Z)V",
&[JValue::Bool(if on { 1 } else { 0 })],
)
.map_err(|e| format!("setSpeakerphoneOn({on}): {e}"))?;
tracing::info!(on, "AudioManager.setSpeakerphoneOn");
Ok(())
}
/// Query the current speakerphone state. Returns true if routing is on the
/// loud speaker, false if on earpiece / BT headset / wired headset.
pub fn is_speakerphone_on() -> Result<bool, String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
let on = env
.call_method(&am, "isSpeakerphoneOn", "()Z", &[])
.and_then(|v| v.z())
.map_err(|e| format!("isSpeakerphoneOn: {e}"))?;
Ok(on)
}
// ─── Bluetooth SCO routing ──────────────────────────────────────────────────
/// Start Bluetooth SCO audio routing.
///
/// On API 31+ uses `setCommunicationDevice()` which is the modern way to
/// route voice audio to a specific device. Falls back to the deprecated
/// `startBluetoothSco()` path on older APIs.
///
/// The caller must restart Oboe streams after this call.
pub fn start_bluetooth_sco() -> Result<(), String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
// Ensure speaker is off — mutually exclusive with BT.
env.call_method(
&am,
"setSpeakerphoneOn",
"(Z)V",
&[JValue::Bool(0)],
)
.map_err(|e| format!("setSpeakerphoneOn(false): {e}"))?;
// Try modern API first (API 31+): setCommunicationDevice(AudioDeviceInfo)
// Find a BT SCO or BLE device from getAvailableCommunicationDevices()
let used_modern = try_set_communication_device(&mut env, &am, true)?;
if !used_modern {
// Fallback: deprecated startBluetoothSco (API < 31)
tracing::info!("start_bluetooth_sco: falling back to deprecated startBluetoothSco");
env.call_method(&am, "startBluetoothSco", "()V", &[])
.map_err(|e| format!("startBluetoothSco: {e}"))?;
}
tracing::info!(used_modern, "AudioManager: Bluetooth SCO started");
Ok(())
}
/// Stop Bluetooth SCO audio routing, returning audio to the earpiece.
///
/// The caller must restart Oboe streams after this call.
pub fn stop_bluetooth_sco() -> Result<(), String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
// Modern API: clearCommunicationDevice() (API 31+)
let cleared = try_set_communication_device(&mut env, &am, false)?;
if !cleared {
// Fallback: deprecated stopBluetoothSco
env.call_method(&am, "stopBluetoothSco", "()V", &[])
.map_err(|e| format!("stopBluetoothSco: {e}"))?;
}
tracing::info!(cleared, "AudioManager: Bluetooth SCO stopped");
Ok(())
}
/// Try to use the modern `setCommunicationDevice` / `clearCommunicationDevice`
/// API (Android 12 / API 31+). Returns `true` if the modern API was used.
fn try_set_communication_device(
env: &mut jni::AttachGuard<'_>,
am: &JObject<'_>,
enable: bool,
) -> Result<bool, String> {
// Check SDK_INT >= 31 (Android 12)
let sdk_int = env
.get_static_field(
"android/os/Build$VERSION",
"SDK_INT",
"I",
)
.and_then(|v| v.i())
.unwrap_or(0);
if sdk_int < 31 {
return Ok(false);
}
if !enable {
// clearCommunicationDevice()
env.call_method(am, "clearCommunicationDevice", "()V", &[])
.map_err(|e| format!("clearCommunicationDevice: {e}"))?;
tracing::info!("clearCommunicationDevice: done");
return Ok(true);
}
// getAvailableCommunicationDevices() → List<AudioDeviceInfo>
let device_list = env
.call_method(
am,
"getAvailableCommunicationDevices",
"()Ljava/util/List;",
&[],
)
.and_then(|v| v.l())
.map_err(|e| format!("getAvailableCommunicationDevices: {e}"))?;
let size = env
.call_method(&device_list, "size", "()I", &[])
.and_then(|v| v.i())
.unwrap_or(0);
// Find first BT device: TYPE_BLUETOOTH_SCO (7), TYPE_BLUETOOTH_A2DP (8),
// TYPE_BLE_HEADSET (26), TYPE_BLE_SPEAKER (27)
for i in 0..size {
let device = env
.call_method(
&device_list,
"get",
"(I)Ljava/lang/Object;",
&[JValue::Int(i)],
)
.and_then(|v| v.l())
.map_err(|e| format!("list.get({i}): {e}"))?;
let device_type = env
.call_method(&device, "getType", "()I", &[])
.and_then(|v| v.i())
.unwrap_or(0);
// BT SCO = 7, A2DP = 8, BLE headset = 26, BLE speaker = 27
if matches!(device_type, 7 | 8 | 26 | 27) {
let ok = env
.call_method(
am,
"setCommunicationDevice",
"(Landroid/media/AudioDeviceInfo;)Z",
&[JValue::Object(&device)],
)
.and_then(|v| v.z())
.unwrap_or(false);
tracing::info!(
device_type,
ok,
"setCommunicationDevice: set BT device"
);
return Ok(ok);
}
}
tracing::warn!("setCommunicationDevice: no BT device in available list");
Ok(false)
}
/// Query whether Bluetooth audio is currently the active communication device.
///
/// On API 31+ checks `getCommunicationDevice()` type. Falls back to the
/// deprecated `isBluetoothScoOn()` on older APIs.
pub fn is_bluetooth_sco_on() -> Result<bool, String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
let sdk_int = env
.get_static_field("android/os/Build$VERSION", "SDK_INT", "I")
.and_then(|v| v.i())
.unwrap_or(0);
if sdk_int >= 31 {
// getCommunicationDevice() → AudioDeviceInfo (nullable)
let device = env
.call_method(am, "getCommunicationDevice", "()Landroid/media/AudioDeviceInfo;", &[])
.and_then(|v| v.l())
.unwrap_or(JObject::null());
if device.is_null() {
return Ok(false);
}
let device_type = env
.call_method(&device, "getType", "()I", &[])
.and_then(|v| v.i())
.unwrap_or(0);
// BT SCO = 7, A2DP = 8, BLE headset = 26, BLE speaker = 27
return Ok(matches!(device_type, 7 | 8 | 26 | 27));
}
// Fallback: deprecated API
env.call_method(&am, "isBluetoothScoOn", "()Z", &[])
.and_then(|v| v.z())
.map_err(|e| format!("isBluetoothScoOn: {e}"))
}
/// Check whether a Bluetooth audio device is currently connected.
///
/// Iterates `AudioManager.getDevices(GET_DEVICES_OUTPUTS)` and looks for
/// any Bluetooth device type. Many headsets only register as A2DP until
/// SCO is explicitly started, so we check for both SCO and A2DP types.
pub fn is_bluetooth_available() -> Result<bool, String> {
let (vm, activity) = jvm_and_activity()?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {e}"))?;
let am = audio_manager(&mut env, &activity)?;
// AudioManager.GET_DEVICES_OUTPUTS = 2
let devices = env
.call_method(
&am,
"getDevices",
"(I)[Landroid/media/AudioDeviceInfo;",
&[JValue::Int(2)],
)
.and_then(|v| v.l())
.map_err(|e| format!("getDevices(OUTPUTS): {e}"))?;
let arr = jni::objects::JObjectArray::from(devices);
let len = env
.get_array_length(&arr)
.map_err(|e| format!("get_array_length: {e}"))?;
for i in 0..len {
let device = env
.get_object_array_element(&arr, i)
.map_err(|e| format!("get_object_array_element({i}): {e}"))?;
let device_type = env
.call_method(&device, "getType", "()I", &[])
.and_then(|v| v.i())
.unwrap_or(0);
// TYPE_BLUETOOTH_SCO = 7, TYPE_BLUETOOTH_A2DP = 8
if device_type == 7 || device_type == 8 {
tracing::info!(device_type, idx = i, "is_bluetooth_available: found BT device");
return Ok(true);
}
}
Ok(false)
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,180 @@
//! Call history store.
//!
//! Keeps a rolling JSON file of the last N direct-call events so the UI can
//! show "recent contacts" + "call history with callback buttons" on the
//! direct-call screen. Storage lives in `<APP_DATA_DIR>/call_history.json`
//! alongside the identity file. The file is read lazily on first access and
//! cached in an RwLock behind a OnceLock.
//!
//! This is a v1 — no duration tracking yet, entries are logged at the
//! moment the direction is decided (placed / received / missed).
use std::path::PathBuf;
use std::sync::{OnceLock, RwLock};
use std::time::{SystemTime, UNIX_EPOCH};
use serde::{Deserialize, Serialize};
/// Maximum number of history entries we keep. Older ones are pruned FIFO.
const MAX_ENTRIES: usize = 200;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum CallDirection {
/// Local user placed the call.
Placed,
/// Remote user called and local user answered.
Received,
/// Remote user called but local user did not answer (rejected or
/// missed entirely — the UI treats these identically).
Missed,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CallHistoryEntry {
pub call_id: String,
pub peer_fp: String,
pub peer_alias: Option<String>,
pub direction: CallDirection,
/// Seconds since UNIX epoch, UTC.
pub timestamp_unix: u64,
}
// ─── In-process store (loaded from disk once) ─────────────────────────────
static STORE: OnceLock<RwLock<Vec<CallHistoryEntry>>> = OnceLock::new();
fn store() -> &'static RwLock<Vec<CallHistoryEntry>> {
STORE.get_or_init(|| RwLock::new(load_from_disk()))
}
fn history_path() -> PathBuf {
crate::APP_DATA_DIR
.get()
.cloned()
.unwrap_or_else(|| {
let home = std::env::var("HOME").unwrap_or_else(|_| ".".into());
PathBuf::from(home).join(".wzp")
})
.join("call_history.json")
}
fn load_from_disk() -> Vec<CallHistoryEntry> {
let path = history_path();
let Ok(bytes) = std::fs::read(&path) else {
return Vec::new();
};
serde_json::from_slice::<Vec<CallHistoryEntry>>(&bytes)
.inspect_err(|e| tracing::warn!(path = %path.display(), error = %e, "call_history.json parse failed"))
.unwrap_or_default()
}
fn save_to_disk(entries: &[CallHistoryEntry]) {
let path = history_path();
if let Some(parent) = path.parent() {
let _ = std::fs::create_dir_all(parent);
}
let Ok(json) = serde_json::to_vec_pretty(entries) else { return };
// Atomic write via temp file + rename so a crash mid-write doesn't
// leave us with a half-file on disk.
let tmp = path.with_extension("json.tmp");
if std::fs::write(&tmp, &json).is_ok() {
let _ = std::fs::rename(&tmp, &path);
}
}
fn now_unix() -> u64 {
SystemTime::now()
.duration_since(UNIX_EPOCH)
.map(|d| d.as_secs())
.unwrap_or(0)
}
// ─── Public API ───────────────────────────────────────────────────────────
/// Append a new entry to the store and persist to disk. Trims the store to
/// `MAX_ENTRIES` after insertion.
pub fn log(
call_id: String,
peer_fp: String,
peer_alias: Option<String>,
direction: CallDirection,
) {
tracing::info!(
%call_id, %peer_fp, ?direction,
alias = ?peer_alias,
"history::log"
);
let entry = CallHistoryEntry {
call_id: call_id.clone(),
peer_fp,
peer_alias,
direction,
timestamp_unix: now_unix(),
};
let mut guard = store().write().unwrap();
// If an entry for this call_id already exists, update it in-place
// rather than appending a duplicate. Protects against the caller
// side adding a second Missed row when the callee's DirectCallOffer
// bounces back through federation / loopback, or when some future
// relay routing edge case double-emits a signal. The dedup keeps
// history tidy and matches what the user intuitively expects (one
// history row per call, not one per signal event).
if let Some(existing) = guard.iter_mut().rev().find(|e| e.call_id == call_id) {
tracing::info!(%call_id, from = ?existing.direction, to = ?direction, "history::log replacing existing entry");
existing.direction = direction;
existing.timestamp_unix = entry.timestamp_unix;
save_to_disk(&guard);
return;
}
guard.push(entry);
if guard.len() > MAX_ENTRIES {
let drop_n = guard.len() - MAX_ENTRIES;
guard.drain(0..drop_n);
}
save_to_disk(&guard);
}
/// Return a copy of all entries in reverse-chronological order
/// (most recent first).
pub fn all() -> Vec<CallHistoryEntry> {
let guard = store().read().unwrap();
guard.iter().rev().cloned().collect()
}
/// Unique peer contacts sorted by most recent interaction. Each contact
/// is represented by the newest history entry for that fingerprint.
pub fn contacts() -> Vec<CallHistoryEntry> {
let guard = store().read().unwrap();
let mut seen: std::collections::HashSet<String> = std::collections::HashSet::new();
let mut out = Vec::new();
// iterate newest → oldest
for entry in guard.iter().rev() {
if seen.insert(entry.peer_fp.clone()) {
out.push(entry.clone());
}
}
out
}
/// Clear the entire history and persist the empty file.
pub fn clear() {
let mut guard = store().write().unwrap();
guard.clear();
save_to_disk(&guard);
}
/// Find a Missed-candidate entry that matches `call_id` and hasn't been
/// answered yet. Used by the signal loop to turn "pending incoming" into
/// "Received" when the user accepts.
pub fn mark_received_if_pending(call_id: &str) -> bool {
let mut guard = store().write().unwrap();
for entry in guard.iter_mut().rev() {
if entry.call_id == call_id && entry.direction == CallDirection::Missed {
entry.direction = CallDirection::Received;
save_to_disk(&guard);
return true;
}
}
false
}

2011
desktop/src-tauri/src/lib.rs Normal file

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More