- Wire AdaptiveQualityController into desktop engine send/recv tasks
(mirrors Android pattern: AtomicU8 pending_profile, auto-mode check)
- Wire same into Android engine send task (was only in recv before)
- QualityDirective SignalMessage variant for relay-initiated codec switch
- ParticipantQuality tracking in relay RoomManager (per-participant
AdaptiveQualityController, weakest-link tier computation)
- Relay broadcasts QualityDirective to all participants when room-wide
tier degrades (coordinated codec switching)
- Oboe stream state polling: poll getState() for up to 2s after
requestStart() to ensure both streams reach Started before proceeding
(fixes intermittent silent calls on cold start, Nothing Phone A059)
Tasks: #7, #25, #26, #31, #35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without clearCommunicationDevice(), the BT headset stays locked in SCO
mode after the call. Media playback (video, music) can't route to BT
A2DP, requiring a device reboot to restore normal audio.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects the current reality: setCommunicationDevice API 31+, deferred
MODE_IN_COMMUNICATION, BT-mode Oboe (bt_active flag), per-arch builds,
Hangup call_id fix, and network monitoring integration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: MainActivity set MODE_IN_COMMUNICATION at app launch,
hijacking system audio routing immediately — BT A2DP music dropped to
earpiece, and the pre-existing communication mode confused subsequent
setCommunicationDevice calls for BT SCO.
Fix: MainActivity now only sets volumes. MODE_IN_COMMUNICATION is set
via JNI right before Oboe audio_start() in CallEngine, and MODE_NORMAL
is restored after audio_stop() when the call ends.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: Oboe capture at 48kHz with InputPreset::VoiceCommunication
cannot open against a BT SCO device (only supports 8/16kHz). The stream
silently falls back to builtin mic, delivering zeros.
Fix: add bt_active flag to WzpOboeConfig. When set, capture skips
setSampleRate and setInputPreset, letting the system route to BT SCO
at its native rate. Oboe's SampleRateConversionQuality::Best resamples
to 48kHz for our ring buffers. Playout uses Usage::Media in BT mode.
New API: wzp_native_audio_start_bt() for BT mode, called from
set_bluetooth_sco(on=true). Normal audio_start() restores the
standard config when switching back to earpiece/speaker.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for BT audio silence:
1. Switch Oboe streams from Exclusive to Shared sharing mode. Exclusive
mode bypasses Oboe's internal resampler, so opening a 48kHz stream
against a BT SCO device (8/16kHz only) fails at the AudioPolicy
level. Shared mode lets Oboe's resampler bridge the gap.
2. Add 500ms post-SCO delay before Oboe restart. The audio policy needs
time to apply the bt-sco route after setCommunicationDevice returns.
Without the delay, Oboe opens against the old device (handset).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BT SCO devices only support 8kHz or 16kHz but our Oboe streams request
48kHz. Without resampling, AudioPolicyManager rejects the input stream
("getInputProfile could not find profile for... sampling rate 48000").
Fix: add setSampleRateConversionQuality(Best) to both capture and
playout stream builders. Oboe resamples internally so our ring buffers
stay at 48kHz regardless of the hardware sample rate.
Also removes the broken setBluetoothScoOn/isBluetoothScoOn calls from
stop_bluetooth_sco — just call stopBluetoothSco() unconditionally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: setBluetoothScoOn(true) is silently rejected on Android 12+
for non-system apps ("is greater than FIRST_APPLICATION_UID exiting").
Audio policy routed to handset instead of BT despite SCO link being up.
Fix: use the modern setCommunicationDevice(AudioDeviceInfo) API on
API 31+ which properly routes voice audio to the BT device. Falls back
to deprecated startBluetoothSco() on older APIs.
Also uses getCommunicationDevice() for is_bluetooth_sco_on() and
clearCommunicationDevice() for stop, matching the modern API surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for Bluetooth audio not working:
1. is_bluetooth_available() now checks for TYPE_BLUETOOTH_A2DP (8) in
addition to TYPE_BLUETOOTH_SCO (7) — many headsets only register as
A2DP until SCO is explicitly started.
2. set_bluetooth_sco(on=true) polls isBluetoothScoOn() for up to 3s
before restarting Oboe. startBluetoothSco() is async — the SCO link
takes 500ms-2s to establish. Without waiting, Oboe opens against
earpiece and audio goes nowhere.
3. Frontend skips redundant set_speakerphone(false) when transitioning
to BT — start_bluetooth_sco() handles speaker-off internally,
avoiding a double Oboe restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: Hangup had no call_id field. The relay forwarded hangups to
ALL active calls for a user. When user A hung up call 1 and user B
immediately placed call 2, the relay's processing of A's hangup would
also kill call 2 (race window ~1-2s).
Fix: add optional call_id to Hangup (backwards-compatible via serde
skip_serializing_if). When present, the relay only ends the named call.
Old clients send call_id=None and get the legacy broadcast behavior.
Also: clear pending_path_report in Hangup recv handler and
internal_deregister to prevent stale oneshot channels from blocking
subsequent call setups.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Release builds from cargo-tauri are unsigned. After Gradle produces the
APK, zipalign + apksigner now sign it with the release keystore
(android/keystore/wzp-release.jks). Falls back to debug keystore if
release is missing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bluetooth: wire existing AudioRouteManager SCO support through both app
variants. Replace binary speaker toggle with 3-way route cycling
(Earpiece → Speaker → Bluetooth). Tauri side adds JNI bridge functions
(start/stop/query SCO, device availability) and Oboe stream restart.
Network awareness: integrate Android ConnectivityManager to detect
WiFi/cellular transitions and feed them to AdaptiveQualityController
via lock-free AtomicU8 signaling. Enables proactive quality downgrade
and FEC boost on network handoffs.
Build: add --arch flag to build-tauri-android.sh supporting arm64,
armv7, or all (separate per-arch APKs for smaller tester binaries).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PRD 4: Disable IPv6 direct dial/accept temporarily. IPv6 QUIC
handshakes succeed but connections die immediately on datagram
send ("connection lost"). IPv4 candidates work reliably. IPv6
candidates still gathered but filtered at dial time.
PRD 1: Close losing transport after Phase 6 negotiation. The
non-selected transport now gets an explicit QUIC close frame
instead of silently dropping after 30s idle timeout. Prevents
phantom connections from polluting future accept() calls.
PRD 2: Harden accept loop with max 3 stale retries. Stale
connections are explicitly closed (conn.close) and counted.
After 3 stale connections, the accept loop aborts instead of
spinning until the race timeout.
PRD 3: Resource cleanup — close old IPv6 endpoint before
creating a new one in place_call/answer_call. Add Drop impl
to CallEngine so tasks are signalled to stop on ungraceful
shutdown.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The originating relay (where the caller is) never set peer_relay_fp
because the call was created locally. When the callee's answer
arrived via federation, the cross-relay dispatcher handled it but
didn't mark the call as cross-relay. This meant the caller's
MediaPathReport was delivered via local hub.send_to() to a peer
fingerprint that isn't connected locally — silently dropped.
Fix: in the cross-relay answer dispatcher, call
reg.set_peer_relay_fp(call_id, Some(origin_relay_fp)) so the
originating relay knows to forward MediaPathReport via federation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add relay_build field to RegisterPresenceAck so the client logs
which relay version it connected to. Shows in the debug log as
register_signal:ack_received {"relay_build":"f843a93"}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MediaPathReport was only delivered via local signal_hub, so calls
between peers on different relays always hit peer_report_timeout
and fell back to relay — even when direct P2P worked perfectly.
Fix: check peer_relay_fp in call_registry (same pattern as
DirectCallAnswer). If the peer is on a remote relay, wrap in
FederatedSignalForward and send via federation link. Also fix
the cross-relay dispatcher to deliver to BOTH caller and callee
(not just caller), since the report can come from either side.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When peers are on different relays, MediaPathReport can't be
forwarded — causing a 3s timeout and false relay fallback even
though direct P2P works perfectly.
Fix: on timeout, if local_direct_ok is true AND the direct
transport's connection is still alive (no close_reason), trust
the direct path instead of falling back to relay. The timeout
indicates a relay forwarding issue, not a direct path failure.
Also fix ALT build paste URL (paste.tbs.manko.yoga not amn.gg).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Acceptor's accept() on the shared signal endpoint can dequeue
a stale QUIC connection from a previous call that the Dialer has
already dropped. This results in "connection lost" errors when
media datagrams are sent — 100% drops on both sides.
Fix: after accepting a connection, check close_reason(). If the
connection is already closed, log a warning and re-accept. Also
verify max_datagram_size() is available before returning.
Additionally: emit transport details (remote addr, max_datagram,
close_reason) in the call_engine_starting debug event so stale
connection issues are visible in the user-facing debug log.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When direct P2P calls show 100% datagram drops, we need to know
WHY send_media() fails. This commit adds:
- Remote address + stable_id logging on A-role accept and D-role
dial success (dual_path.rs) — tells us which candidate won
- Remote address + max_datagram_size on engine transport init —
verifies datagrams are negotiated
- last_send_err in send heartbeat — captures the actual error
from send_datagram() failures
- QuinnTransport::remote_address() helper
Also fixes UI badge: was looking for wrong event name
("dual_path_race_won" → "path_negotiated").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The UI looked for event "connect:dual_path_race_won" which doesn't
exist — the actual event is "connect:path_negotiated" with a
use_direct boolean. Badge always showed "Via Relay" even when the
call was direct P2P.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CLI binary was missing the new caller_build_version and
callee_build_version fields, causing E0063 compile errors on
Linux relay/client builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The peer's MediaPathReport can arrive while our dual_path::race is
still running. Previously, the oneshot was created AFTER the race
completed, so the recv loop had nowhere to deliver the report —
it was silently dropped, causing a 3s timeout and false relay
fallback on ~50% of calls.
Fix: create the oneshot and install it in SignalState BEFORE
starting the race. The oneshot::Receiver buffers the value so the
connect command can read it immediately after the race finishes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add caller_build_version / callee_build_version (git short hash)
to DirectCallOffer and DirectCallAnswer so peers can identify each
other's build in debug logs. Also log own build at register time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CallSetup enum gained peer_direct_addr and peer_local_addrs
in Phase 5.5 but the wzp-android signal recv match arm was never
updated, breaking cargo ndk builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a dedicated IPv6 QUIC endpoint (IPV6_V6ONLY=1 via socket2)
alongside the existing IPv4 signal endpoint for proper dual-stack
P2P connectivity. Previous [::]:0 dual-stack attempt broke IPv4
on Android; this uses separate sockets per address family like
WebRTC/libwebrtc.
- create_ipv6_endpoint(): socket2-based IPv6-only UDP socket,
tries same port as IPv4 signal EP, falls back to ephemeral
- local_host_candidates(v4_port, v6_port): now gathers IPv6
global-unicast (2000::/3) and unique-local (fc00::/7) addrs
- dual_path::race(): A-role accepts on both v4+v6 via select!,
D-role routes each candidate to matching-AF endpoint
- Graceful fallback: if IPv6 unavailable, .ok() → None → pure
IPv4 behavior identical to pre-Phase-7
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:
## Revert [::]:0 dual-stack sockets → back to 0.0.0.0:0
Android's IPV6_V6ONLY=1 default on some kernels (confirmed on
Nothing Phone) makes [::]:0 IPv6-only, silently killing ALL
IPv4 traffic. This broke P2P direct calls: IPv4 LAN candidates
(172.16.81.x) couldn't complete QUIC handshakes through the
IPv6-only socket, causing local_direct_ok=false and relay
fallback on every call after the first.
Reverted all bind sites to 0.0.0.0:0 (reliable IPv4). IPv6 host
candidates are disabled in local_host_candidates() until a
proper dual-socket approach (one IPv4 + one IPv6 endpoint,
Phase 7) is implemented.
## Fix A (task #35): Oboe playout callback stall auto-restart
The Nothing Phone's Oboe playout callback fires once (cb#0) and
then stops draining the ring on ~50% of cold-launch calls. Fix
D+C (stop+prime from previous commit) didn't help because
audio_stop is a no-op on cold launch.
New approach: self-healing watchdog in audio_write_playout.
Tracks the playout ring's read_idx across writes. If read_idx
hasn't advanced in 50 consecutive writes (~1 second), the Oboe
playout callback has stopped:
1. Log "playout STALL detected"
2. Call wzp_oboe_stop() to tear down the stuck streams
3. Clear both ring buffers (prevent stale data reads)
4. Call wzp_oboe_start() to rebuild fresh streams
5. Log success/failure
6. Return 0 (caller retries on next frame)
This is the same teardown+rebuild that "rejoin" does — but
triggered automatically from the first stalled call instead of
requiring the user to hang up and redial. The watchdog runs
on every write so it fires within 1s of the stall starting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every QUIC endpoint was bound to 0.0.0.0:0 (IPv4-only). This
silently killed ALL IPv6 host candidates: the Dialer couldn't
send packets to [2a0d:...] addresses (wrong address family on
the socket), and the Acceptor couldn't receive incoming IPv6
QUIC handshakes. The IPv6 candidates were gathered and advertised
in DirectCallOffer/Answer but were completely non-functional.
On same-LAN with dual-stack (which both test phones have), this
meant:
- JoinSet fanned out 3+ candidates (2× IPv6 + 1× IPv4)
- IPv6 dials failed silently or timed out
- IPv4 dial worked but competed with failed IPv6 for JoinSet
attention
- Sometimes the JoinSet returned an IPv6 failure before the
IPv4 success, causing unnecessary fallback to relay
Fix: bind to [::]:0 (IPv6 any) instead of 0.0.0.0:0. On
dual-stack systems (Linux/Android default), [::]:0 creates a
socket that handles BOTH:
- IPv6 natively (global unicast, ULA)
- IPv4 via v4-mapped addresses (::ffff:172.16.81.x)
One socket, both protocols. All 7 bind sites updated:
- register_signal (signal endpoint)
- do_register_signal
- ping_relay
- probe_reflect_addr (fresh endpoint fallback)
- dual_path::race (A-role fresh, D-role fresh, relay fresh)
With this fix, same-LAN P2P should prefer the IPv6 path (no
NAT, direct routing, lower latency) and fall through to IPv4
if IPv6 fails — relay is the last resort after ALL candidates
are exhausted.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses the first-join no-audio regression (tasks #35-37) where
the Oboe playout callback fires once (cb#0) and then stops
draining the ring on the Nothing Phone, causing written_samples
to freeze at 7679 (ring capacity minus one burst). Second call
(rejoin) always works because audio_stop tears down the streams
and audio_start rebuilds them fresh.
Two combined fixes:
**Fix D (task #37)**: always call audio_stop() before audio_start()
at the top of CallEngine::start. On a cold launch this is a no-op
(streams not yet started). On subsequent calls it guarantees a
clean teardown before rebuild — the same thing rejoin does. Added
a 50ms pause between stop and start to let the Android HAL release
the audio session.
**Fix C (task #36)**: after audio_start(), immediately write 960
samples (20ms) of silence into the playout ring. This ensures the
Oboe playout callback has data to drain on its first invocation.
On devices where an empty-ring first callback causes the stream
to self-pause (Nothing Phone's Qualcomm HAL), the priming data
keeps the callback loop alive until real decoded audio arrives
from the recv task.
Together these cover the two most likely root causes:
1. Stale Oboe state from a previous audio_start that didn't
clean up properly → Fix D forces a clean rebuild
2. Playout callback self-pausing on an empty ring → Fix C
ensures the ring is non-empty at callback time
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical Phase 6 bug: when the negotiation agreed on relay path
but delivered the relay transport via pre_connected_transport,
CallEngine saw is_some() = true → is_direct_p2p = true → skipped
perform_handshake. The relay couldn't authenticate the participant
→ room join silently failed → recv_fr: 0, both sides sending
into the void.
Fix: add explicit is_direct_p2p: bool parameter to CallEngine::
start (both android and desktop branches). The connect command
sets it from the Phase 6 negotiation result (use_direct), not
from whether pre_connected_transport is Some.
Now relay-negotiated calls correctly run perform_handshake,
and direct P2P calls correctly skip it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The commit de007ec added a heuristic that forced relay-only when
peers had different public IPs. That was a stopgap for the race
condition where one side picked Direct and the other picked Relay.
Phase 6 (f5542ef) solved this properly via MediaPathReport
negotiation, but the heuristic wasn't cleaned up and was still
running BEFORE the Phase 6 code — suppressing the race entirely
for cross-network calls.
Removed. Phase 6 negotiation now handles ALL cases: both sides
race, exchange reports, and agree on the same path before
committing media. Cross-network calls that can't go P2P will
have both sides report direct_ok=false and agree on relay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Before Phase 6, each side's dual-path race ran independently and
committed to whichever transport completed first. When one side
picked Direct and the other picked Relay, they sent media to
different places — TX > 0 RX: 0 on both, completely silent call.
Phase 6 adds a negotiation step: after the local race completes,
each side sends a MediaPathReport { call_id, direct_ok, winner }
to the peer through the relay. Both wait for the other's report
before committing a transport to the CallEngine. The decision
rule is simple: if BOTH report direct_ok = true, use direct; if
EITHER reports false, BOTH use relay.
## Wire protocol
New `SignalMessage::MediaPathReport { call_id, direct_ok,
race_winner }`. The relay forwards it to the call peer via the
same signal_hub routing used for DirectCallOffer/Answer. The
cross-relay dispatcher also forwards it.
## dual_path::race restructured
Returns `RaceResult` instead of `(Arc<QuinnTransport>, WinningPath)`:
- `direct_transport: Option<Arc<QuinnTransport>>`
- `relay_transport: Option<Arc<QuinnTransport>>`
- `local_winner: WinningPath`
Both paths are run as spawned tasks. After the first completes,
a 1s grace period lets the loser also finish. The connect
command gets BOTH transports (when available) and picks the
right one based on the negotiation outcome. The unused transport
is dropped.
## connect command flow (revised)
1. Run race() → RaceResult with both transports
2. Send MediaPathReport to relay with our direct_ok
3. Install oneshot; wait for peer's report (3s timeout)
4. Decision: both direct_ok → use direct; else → use relay
5. Start CallEngine with the agreed transport
If the peer never responds (old build, timeout), falls back to
relay — backward compatible.
## Relay forwarding
MediaPathReport is forwarded like DirectCallOffer/Answer: via
signal_hub.send_to(peer_fp) for same-relay calls, and via
cross-relay dispatcher for federated calls.
## Debug log events
- `connect:dual_path_race_done` — local race result
- `connect:path_report_sent` — our report to the peer
- `connect:peer_report_received` — peer's report
- `connect:peer_report_timeout` — peer didn't respond (3s)
- `connect:path_negotiated` — final agreed path with reasons
Full workspace test: 423 passing (no regressions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Race condition: when two phones are on different networks (WiFi
vs LTE, home vs office, etc.), each side's dual-path race runs
independently. One side may pick Direct while the other picks
Relay, causing both to send media to different places — TX > 0,
RX: 0 on both sides, completely silent call.
Root cause: the dual-path race doesn't have a negotiation step.
Each side picks the first transport that completes a QUIC
handshake, which may be a different path than the other side
picked. On same-LAN this doesn't matter because direct always
wins on both (the 500ms relay delay guarantees it). On cross-
network, the asymmetry bites.
Heuristic fix: compare own_reflex_addr IP to peer_reflex_addr
IP. If they're different → different networks → force relay-only
(set role = None, which skips the dual-path race entirely).
Same public IP means same LAN / same NAT:
→ LAN host candidates work, direct always wins on both sides
→ Safe for P2P
Different public IPs means cross-network:
→ Direct may work on one side but not the other
→ Relay is the safe choice for both
This preserves the proven same-LAN P2P and eliminates the broken
cross-network case. The full fix is ICE-style path negotiation
(Phase 6) where both sides exchange connectivity check results
through the signal plane and agree on a winner before committing
media — but that's a 500+ line protocol change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added warn-level log in handle_datagram when a federation
datagram arrives but no matching local room is found. Prints:
- room_hash (8-byte tag from the datagram)
- active_rooms (all rooms the relay currently has)
- seq + peer label
This diagnoses the cross-relay recv_fr=0 issue: if media IS
arriving from the peer relay but the room hash doesn't match any
active room, the log tells us exactly what hash is expected vs
what rooms exist locally. If no datagram log fires at all, the
issue is upstream (peer relay not forwarding, federation link
down, etc.).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a P2P direct call establishes successfully but the underlying
network path dies (phone switched from WiFi to LTE mid-call, or
cross-relay media forwarding isn't working), the call stays up
silently with recv_fr frozen at 0. No feedback to the user.
New watchdog in the Android recv task: tracks consecutive
heartbeat ticks (2s each) where recv_fr hasn't advanced. After 3
ticks (6s) with no new packets, emits:
- call-event { kind: "media-degraded" } — user-facing warning
banner: "No audio — connection may be lost. Try hanging up and
reconnecting, or switch to a different relay."
- call-debug media:no_recv_timeout for the debug log
If packets resume (recv_fr advances), clears the banner via:
- call-event { kind: "media-recovered" }
JS listener creates/removes a red-tinted banner dynamically at
the top of the call screen. Banner is also cleaned up on
showConnectScreen (call end).
This covers:
- Direct P2P that established on WiFi but died when the phone
switched to LTE (stale NAT mapping, unreachable peer)
- Cross-relay calls where federation media isn't forwarding
(relay not upgraded, not federated, etc.)
- Any other "connected but silent" scenario
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two UX issues when the selected relay is unreachable (e.g. user
switched from WiFi to LTE and the LAN relay is gone):
1. Pressing Register blocked the UI for ~30s while the QUIC
connect timed out against a dead host. No way to abort.
2. No feedback that the relay was unreachable — just a long
wait followed by a cryptic error.
Fix:
**Pre-flight ping**: before attempting the full register flow,
run `ping_relay` (existing Tauri command, 3s QUIC handshake
timeout). If it fails, immediately show "Server unavailable:
<error>" and re-enable the Register button. No blocking, no
wasted time. If it succeeds, proceed to register_signal.
**Cancel button**: during the register_signal await, the
Register button becomes "Cancel". Tapping it calls `deregister`
which closes the in-flight transport and makes the connect
fail immediately, breaking the await. The button goes back to
"Register on Relay" with a "Registration cancelled" message.
Flow:
[Register] → "Checking..." (disabled, 3s ping) →
ping fails → "Server unavailable" (re-enabled)
ping ok → "Cancel" (enabled, register in flight) →
user taps Cancel → "Registration cancelled" (re-enabled)
register succeeds → registered panel shown
register fails → error shown (re-enabled)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All rooms with names starting with 'call-' are now treated as
global rooms by the federation pipeline. This enables relay-
mediated media fallback for cross-relay direct calls: when Alice
on Relay A and Bob on Relay B both join the same call-<id> room,
the federation media forwarding pipeline (GlobalRoomActive
announcements + datagram forwarding + presence replication)
kicks in automatically without any runtime registration step.
Previously, cross-relay direct calls that couldn't go P2P
(symmetric NAT on either side) failed with "no media path"
because the call-<id> room wasn't in the configured global_rooms
set and media datagrams weren't forwarded across the federation
link.
The relay's existing ACL for call-* rooms (only the two
authorized fingerprints from the call registry can join)
prevents random clients from creating or eavesdropping on
call rooms.
## Changes
### `is_global_room` (federation.rs)
Added `room.starts_with("call-")` check before the static
global_rooms set lookup. Returns true immediately for any
call-prefixed room.
### `resolve_global_room` (federation.rs)
Return type changed from `Option<&str>` to `Option<String>`
(owned) because call-* room names aren't stored on `self` —
they come from the caller and resolve to themselves as the
canonical name. The 13 callers continue to work via String/&str
auto-deref; 4 HashMap lookups needed explicit `.as_str()` or
`&` borrows.
Full workspace test: 423 passing (no regressions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The call screen now shows two different layouts depending on
whether the call is a 1:1 direct call or a room/group call:
**Direct call (directCallPeer set):**
- Large centered identicon (96px circular with glow)
- Peer name (22px bold) + fingerprint (11px mono)
- Connection badge: "P2P Direct" (green), "Via Relay" (blue),
or "Connecting..." (yellow) — auto-detected from the
call-debug buffer's dual_path_race_won event
- Room name header shows the peer's alias/fp instead of "general"
- Group participant list is hidden
**Room/group call (directCallPeer null):**
- Existing group participant list layout — unchanged
The badge updates live from pollStatus by scanning the debug
buffer for the connect:dual_path_race_won event. If the path
was "Direct" → green P2P badge; if "Relay" → blue relay badge.
Before the race resolves, shows yellow "Connecting...".
directCallView is cleared on showConnectScreen (call end).
CSS in style.css: .direct-call-view, .dc-identicon, .dc-name,
.dc-fp, .dc-badge with .relay and .connecting modifiers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On relay-mediated calls, the relay broadcasts RoomUpdate with the
participant list and pollStatus renders it. On direct P2P calls
neither peer joins the relay's media room, so RoomUpdate never
fires and the UI showed "Waiting for participants..." even though
audio was flowing bidirectionally.
Fix: track the peer's identity (fingerprint + alias) from the
signal plane in a `directCallPeer` variable:
- Set on incoming call from the DirectCallOffer (caller_fp +
caller_alias)
- Set on outgoing call from the Call button click (target_fp)
- Cleared on showConnectScreen (call ended)
pollStatus now checks: if the engine's participant list is empty
AND directCallPeer is set, inject a synthetic participant entry
with relay_label = "P2P Direct". The participant row renders with
identicon + fingerprint + alias as normal, but grouped under a
"P2P Direct" header instead of "This Relay".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two regressions from Phase 5.5/5.6:
1. Room connect broken: the connect Tauri command required
peerLocalAddrs as a Vec<String>, but the room-join JS path
doesn't pass it (only the direct-call setup handler does).
Error: "invalid args 'peerLocalAddrs' for command 'connect':
command connect missing required key peerLocalAddrs".
Fix: change to Option<Vec<String>>, unwrap_or_default() at
usage sites. Room connect works again with zero peer addrs.
2. Direct P2P call connects but then CallEngine fails with
"expected CallAnswer, got Discriminant(0)". Root cause: after
the dual-path race picked a direct P2P transport, CallEngine
still ran perform_handshake() on it. That handshake is a
relay-specific protocol — sends a CallOffer signal and waits
for CallAnswer back. On a direct QUIC connection to a phone,
there's nobody running accept_handshake, so the handshake
reads garbage from the peer's first media packet and errors.
Fix: track is_direct_p2p = pre_connected_transport.is_some()
and skip perform_handshake when true. The direct connection
is already TLS-encrypted by QUIC, and both peers' identities
were verified through the signal channel (DirectCallOffer/
Answer carry identity_pub + ephemeral_pub + signature). Both
android and desktop branches updated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes from a field-test log where same-LAN calls were
still losing the dual-path race to the relay path, peers were
getting stuck on an empty call screen when the other side
hung up, and 1-way audio was hard to diagnose because the
GUI debug log had no media-level events.
## 1. Direct-path 500ms head start (dual_path.rs)
The race was resolving in ~105ms with Relay winning even when
both phones were on the same MikroTik LAN with valid IPv6 host
candidates. Root cause: the relay dial is a plain outbound QUIC
connect that completes in whatever the client→relay RTT is
(~100ms), while the direct path needs the PEER to also process
its CallSetup, spin up its own race, and complete at least one
LAN dial back to us. That cross-client sequence reliably takes
longer than 100ms, so relay always won.
Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before
starting its connect. Same-LAN direct dials complete in 30-50ms
typically, so the head start gives direct plenty of time to win
cleanly. Users on setups where direct genuinely can't work
(LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback,
which is invisible for a call setup.
## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts)
The hangup button was calling `disconnect` which stopped the
local media engine but never sent a SignalMessage::Hangup to
the relay. The peer never got notified and was stuck on the
call screen with silent audio. My earlier fix (commit e75b045)
only handled the RECEIVE side — auto-dismiss call screen on
recv:Hangup — but the SEND side was still missing.
New Tauri command `hangup_call`:
1. Acquire state.signal.lock(), send SignalMessage::Hangup
over the signal transport (best-effort; log + continue if
signal is down)
2. Acquire state.engine.lock(), stop the CallEngine
JS hangupBtn click handler now calls hangup_call with a fallback
to raw disconnect if the command is missing (older builds).
## 3. Media debug events (engine.rs + lib.rs)
Threaded tauri::AppHandle into CallEngine::start so the send/
recv tasks can emit call-debug events when the user has debug
logs enabled. Added on the Android branch (desktop branch
accepts the arg for API symmetry but doesn't emit yet):
- media:first_send — emitted when the first encoded frame is
handed to the transport. Useful for 1-way audio diagnosis:
if this fires on side A but side B never sees media:first_recv,
A's outbound is broken.
- media:first_recv — emitted when the first packet from the
peer arrives. Mirror of first_send.
- media:send_heartbeat — every 2s with frames_sent, last_rms,
last_pkt_bytes, short_reads, drops. A stalled last_rms
(== 0) tells you the mic isn't producing samples; a frozen
frames_sent tells you the encode pipeline hung.
- media:recv_heartbeat — every 2s with recv_fr, decoded_frames,
last_written, written_samples, decode_errs, codec. Mirror
invariants for the inbound direction.
All four are gated by `call_debug_logs_enabled()` via
`emit_call_debug`, so they only show up in the GUI log when the
user has the Call Flow Debug Logs checkbox on. Tracing::info!
still runs unconditionally so logcat (adb) keeps its copy
regardless.
The `emit_call_debug` fn in lib.rs is now `pub(crate)` so
engine.rs can call it via `crate::emit_call_debug`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.
Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.
Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.
## Changes
### `wzp_client::reflect::local_host_candidates(port)` (new)
Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:
- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
(already covered by reflex-addr), unspecified
Safe from any thread, one `getifaddrs(3)` syscall.
### Wire protocol (wzp-proto/packet.rs)
Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:
- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`
### Call registry (wzp-relay/call_registry.rs)
`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.
### Relay cross-wiring (wzp-relay/main.rs)
Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.
### Client side (desktop/src-tauri/lib.rs)
- `place_call`: gathers local host candidates via
`local_host_candidates(signal_endpoint.local_addr().port())`
and includes them in `DirectCallOffer.caller_local_addrs`.
The port match is critical — it's the Phase 5 shared signal
socket, so incoming dials to these addrs land on the same
endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
arg. Builds a `PeerCandidates` bundle and passes it to the
dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
new field to JS via the signal-event payload.
### `dual_path::race` (wzp-client/dual_path.rs)
Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.
LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.
### JS side (desktop/main.ts)
`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.
### Tests
All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.
Full workspace test: 423 passing (same as before 5.5).
## Expected behavior on the reporter's setup
Two phones behind MikroTik, both on the same LAN:
place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
"peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
dual_path: direct dial succeeded on candidate 0 ← LAN v4 wins
connect:dual_path_race_won {"path":"Direct"}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>