fix(p2p): Phase 5.6 — direct-path head start + hangup propagation + media debug events
Three fixes from a field-test log where same-LAN calls were
still losing the dual-path race to the relay path, peers were
getting stuck on an empty call screen when the other side
hung up, and 1-way audio was hard to diagnose because the
GUI debug log had no media-level events.
## 1. Direct-path 500ms head start (dual_path.rs)
The race was resolving in ~105ms with Relay winning even when
both phones were on the same MikroTik LAN with valid IPv6 host
candidates. Root cause: the relay dial is a plain outbound QUIC
connect that completes in whatever the client→relay RTT is
(~100ms), while the direct path needs the PEER to also process
its CallSetup, spin up its own race, and complete at least one
LAN dial back to us. That cross-client sequence reliably takes
longer than 100ms, so relay always won.
Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before
starting its connect. Same-LAN direct dials complete in 30-50ms
typically, so the head start gives direct plenty of time to win
cleanly. Users on setups where direct genuinely can't work
(LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback,
which is invisible for a call setup.
## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts)
The hangup button was calling `disconnect` which stopped the
local media engine but never sent a SignalMessage::Hangup to
the relay. The peer never got notified and was stuck on the
call screen with silent audio. My earlier fix (commit e75b045)
only handled the RECEIVE side — auto-dismiss call screen on
recv:Hangup — but the SEND side was still missing.
New Tauri command `hangup_call`:
1. Acquire state.signal.lock(), send SignalMessage::Hangup
over the signal transport (best-effort; log + continue if
signal is down)
2. Acquire state.engine.lock(), stop the CallEngine
JS hangupBtn click handler now calls hangup_call with a fallback
to raw disconnect if the command is missing (older builds).
## 3. Media debug events (engine.rs + lib.rs)
Threaded tauri::AppHandle into CallEngine::start so the send/
recv tasks can emit call-debug events when the user has debug
logs enabled. Added on the Android branch (desktop branch
accepts the arg for API symmetry but doesn't emit yet):
- media:first_send — emitted when the first encoded frame is
handed to the transport. Useful for 1-way audio diagnosis:
if this fires on side A but side B never sees media:first_recv,
A's outbound is broken.
- media:first_recv — emitted when the first packet from the
peer arrives. Mirror of first_send.
- media:send_heartbeat — every 2s with frames_sent, last_rms,
last_pkt_bytes, short_reads, drops. A stalled last_rms
(== 0) tells you the mic isn't producing samples; a frozen
frames_sent tells you the encode pipeline hung.
- media:recv_heartbeat — every 2s with recv_fr, decoded_frames,
last_written, written_samples, decode_errs, codec. Mirror
invariants for the inbound direction.
All four are gated by `call_debug_logs_enabled()` via
`emit_call_debug`, so they only show up in the GUI log when the
user has the Call Flow Debug Logs checkbox on. Tracing::info!
still runs unconditionally so logcat (adb) keeps its copy
regardless.
The `emit_call_debug` fn in lib.rs is now `pub(crate)` so
engine.rs can call it via `crate::emit_call_debug`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -291,7 +291,30 @@ pub async fn race(
|
||||
let relay_ep_for_fut = relay_ep.clone();
|
||||
let relay_client_cfg = wzp_transport::client_config();
|
||||
let relay_sni = room_sni.clone();
|
||||
// Phase 5.5 direct-path head-start: hold the relay dial for
|
||||
// 500ms before attempting it. On same-LAN cone-NAT pairs the
|
||||
// direct dial finishes in ~30-100ms, so giving direct a 500ms
|
||||
// head start means direct reliably wins when it's going to
|
||||
// work at all. The worst case adds 500ms to the fall-back-
|
||||
// to-relay scenario, which is imperceptible for users on
|
||||
// setups where direct isn't available anyway.
|
||||
//
|
||||
// Prior behavior (immediate race) caused the relay to win
|
||||
// ~105ms races on a MikroTik LAN because:
|
||||
// - Acceptor role's direct_fut = accept() can only fire
|
||||
// when the peer has completed its outbound LAN dial
|
||||
// - Dialer role's parallel LAN dials need the peer's
|
||||
// CallSetup processed + the race started on the other
|
||||
// side before they can reach us
|
||||
// - Meanwhile relay_fut is a plain dial that completes in
|
||||
// whatever the client→relay RTT is (often <100ms)
|
||||
//
|
||||
// The 500ms head start is the minimum that empirically makes
|
||||
// same-LAN direct reliably beat relay, without penalizing
|
||||
// users who genuinely need the relay path.
|
||||
const DIRECT_HEAD_START: Duration = Duration::from_millis(500);
|
||||
let relay_fut = async move {
|
||||
tokio::time::sleep(DIRECT_HEAD_START).await;
|
||||
let conn =
|
||||
wzp_transport::connect(&relay_ep_for_fut, relay_addr, &relay_sni, relay_client_cfg)
|
||||
.await
|
||||
|
||||
Reference in New Issue
Block a user