fix(p2p): Phase 5.6 — direct-path head start + hangup propagation + media debug events
Three fixes from a field-test log where same-LAN calls were
still losing the dual-path race to the relay path, peers were
getting stuck on an empty call screen when the other side
hung up, and 1-way audio was hard to diagnose because the
GUI debug log had no media-level events.
## 1. Direct-path 500ms head start (dual_path.rs)
The race was resolving in ~105ms with Relay winning even when
both phones were on the same MikroTik LAN with valid IPv6 host
candidates. Root cause: the relay dial is a plain outbound QUIC
connect that completes in whatever the client→relay RTT is
(~100ms), while the direct path needs the PEER to also process
its CallSetup, spin up its own race, and complete at least one
LAN dial back to us. That cross-client sequence reliably takes
longer than 100ms, so relay always won.
Fix: delay the relay_fut with `tokio::time::sleep(500ms)` before
starting its connect. Same-LAN direct dials complete in 30-50ms
typically, so the head start gives direct plenty of time to win
cleanly. Users on setups where direct genuinely can't work
(LTE-to-LTE cross-carrier) pay 500ms extra on the relay fallback,
which is invisible for a call setup.
## 2. Hangup propagation via a new hangup_call command (lib.rs + main.ts)
The hangup button was calling `disconnect` which stopped the
local media engine but never sent a SignalMessage::Hangup to
the relay. The peer never got notified and was stuck on the
call screen with silent audio. My earlier fix (commit e75b045)
only handled the RECEIVE side — auto-dismiss call screen on
recv:Hangup — but the SEND side was still missing.
New Tauri command `hangup_call`:
1. Acquire state.signal.lock(), send SignalMessage::Hangup
over the signal transport (best-effort; log + continue if
signal is down)
2. Acquire state.engine.lock(), stop the CallEngine
JS hangupBtn click handler now calls hangup_call with a fallback
to raw disconnect if the command is missing (older builds).
## 3. Media debug events (engine.rs + lib.rs)
Threaded tauri::AppHandle into CallEngine::start so the send/
recv tasks can emit call-debug events when the user has debug
logs enabled. Added on the Android branch (desktop branch
accepts the arg for API symmetry but doesn't emit yet):
- media:first_send — emitted when the first encoded frame is
handed to the transport. Useful for 1-way audio diagnosis:
if this fires on side A but side B never sees media:first_recv,
A's outbound is broken.
- media:first_recv — emitted when the first packet from the
peer arrives. Mirror of first_send.
- media:send_heartbeat — every 2s with frames_sent, last_rms,
last_pkt_bytes, short_reads, drops. A stalled last_rms
(== 0) tells you the mic isn't producing samples; a frozen
frames_sent tells you the encode pipeline hung.
- media:recv_heartbeat — every 2s with recv_fr, decoded_frames,
last_written, written_samples, decode_errs, codec. Mirror
invariants for the inbound direction.
All four are gated by `call_debug_logs_enabled()` via
`emit_call_debug`, so they only show up in the GUI log when the
user has the Call Flow Debug Logs checkbox on. Tracing::info!
still runs unconditionally so logcat (adb) keeps its copy
regardless.
The `emit_call_debug` fn in lib.rs is now `pub(crate)` so
engine.rs can call it via `crate::emit_call_debug`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -302,6 +302,11 @@ impl CallEngine {
|
||||
// our own wzp_transport::connect step and use this
|
||||
// directly. If None, existing Phase 0 behavior.
|
||||
pre_connected_transport: Option<Arc<wzp_transport::QuinnTransport>>,
|
||||
// Phase 5.6: Tauri AppHandle for emitting call-debug
|
||||
// events from inside the send/recv tasks. Lets the
|
||||
// debug log pane show first-send/first-recv/heartbeat
|
||||
// events when the user has call debug logs enabled.
|
||||
app: tauri::AppHandle,
|
||||
event_cb: F,
|
||||
) -> Result<Self, anyhow::Error>
|
||||
where
|
||||
@@ -431,6 +436,7 @@ impl CallEngine {
|
||||
let send_quality = quality.clone();
|
||||
let send_tx_codec = tx_codec.clone();
|
||||
let send_t0 = call_t0;
|
||||
let send_app = app.clone();
|
||||
tokio::spawn(async move {
|
||||
let profile = resolve_quality(&send_quality);
|
||||
let config = match profile {
|
||||
@@ -516,7 +522,22 @@ impl CallEngine {
|
||||
}
|
||||
}
|
||||
}
|
||||
send_fs.fetch_add(1, Ordering::Relaxed);
|
||||
let before = send_fs.fetch_add(1, Ordering::Relaxed);
|
||||
if before == 0 {
|
||||
// First encoded frame successfully handed
|
||||
// to the transport. Useful for diagnosing
|
||||
// 1-way audio: if this fires but the
|
||||
// peer's media:first_recv never does,
|
||||
// outbound is broken on our side.
|
||||
crate::emit_call_debug(
|
||||
&send_app,
|
||||
"media:first_send",
|
||||
serde_json::json!({
|
||||
"t_ms": send_t0.elapsed().as_millis() as u64,
|
||||
"pkt_bytes": last_pkt_bytes,
|
||||
}),
|
||||
);
|
||||
}
|
||||
}
|
||||
Err(e) => error!("encode: {e}"),
|
||||
}
|
||||
@@ -533,6 +554,24 @@ impl CallEngine {
|
||||
send_drops = drops,
|
||||
"send heartbeat (android)"
|
||||
);
|
||||
// Phase 5.6: also emit to the GUI debug log
|
||||
// when call debug is enabled. Helps diagnose
|
||||
// 1-way audio — a stalled send heartbeat
|
||||
// (frames_sent == 0 or last_rms == 0) tells
|
||||
// you capture/mic is broken; a live one with
|
||||
// no peer recv tells you outbound is being
|
||||
// dropped somewhere in the media path.
|
||||
crate::emit_call_debug(
|
||||
&send_app,
|
||||
"media:send_heartbeat",
|
||||
serde_json::json!({
|
||||
"frames_sent": fs,
|
||||
"last_rms": last_rms,
|
||||
"last_pkt_bytes": last_pkt_bytes,
|
||||
"short_reads": short_reads,
|
||||
"drops": drops,
|
||||
}),
|
||||
);
|
||||
heartbeat = std::time::Instant::now();
|
||||
}
|
||||
}
|
||||
@@ -545,6 +584,7 @@ impl CallEngine {
|
||||
let recv_fr = frames_received.clone();
|
||||
let recv_rx_codec = rx_codec.clone();
|
||||
let recv_t0 = call_t0;
|
||||
let recv_app = app.clone();
|
||||
tokio::spawn(async move {
|
||||
let initial_profile = resolve_quality(&quality).unwrap_or(QualityProfile::GOOD);
|
||||
// Phase 3b/3c: use concrete AdaptiveDecoder (not Box<dyn
|
||||
@@ -620,6 +660,22 @@ impl CallEngine {
|
||||
"first-join diag: recv first media packet"
|
||||
);
|
||||
first_packet_logged = true;
|
||||
// Phase 5.6 GUI debug: first packet from
|
||||
// the peer. Useful for diagnosing 1-way
|
||||
// audio — if this fires and the peer
|
||||
// never sees media:first_recv, our
|
||||
// inbound path is fine and theirs is
|
||||
// broken, and vice versa.
|
||||
crate::emit_call_debug(
|
||||
&recv_app,
|
||||
"media:first_recv",
|
||||
serde_json::json!({
|
||||
"t_ms": recv_t0.elapsed().as_millis() as u64,
|
||||
"codec": format!("{:?}", pkt.header.codec_id),
|
||||
"payload_bytes": pkt.payload.len(),
|
||||
"is_repair": pkt.header.is_repair,
|
||||
}),
|
||||
);
|
||||
}
|
||||
if !pkt.header.is_repair && pkt.header.codec_id != CodecId::ComfortNoise {
|
||||
{
|
||||
@@ -814,6 +870,27 @@ impl CallEngine {
|
||||
"recv heartbeat (android)"
|
||||
);
|
||||
}
|
||||
// Phase 5.6: compact GUI debug emit.
|
||||
// recv_fr == 0 over time indicates inbound
|
||||
// media is not reaching the client — either
|
||||
// nothing is being sent by the peer, or the
|
||||
// transport is dropping packets, or we're
|
||||
// connected to the wrong side of the media
|
||||
// path. Combined with the peer's send_heartbeat
|
||||
// from the other log, this tells us exactly
|
||||
// where 1-way audio breaks.
|
||||
crate::emit_call_debug(
|
||||
&recv_app,
|
||||
"media:recv_heartbeat",
|
||||
serde_json::json!({
|
||||
"recv_fr": fr,
|
||||
"decoded_frames": decoded_frames,
|
||||
"last_written": last_written,
|
||||
"written_samples": written_samples,
|
||||
"decode_errs": decode_errs,
|
||||
"codec": format!("{:?}", current_codec),
|
||||
}),
|
||||
);
|
||||
heartbeat = std::time::Instant::now();
|
||||
}
|
||||
}
|
||||
@@ -893,6 +970,13 @@ impl CallEngine {
|
||||
// Phase 3.5: caller did the dual-path race and picked a
|
||||
// winning transport. If Some, skip our own connect step.
|
||||
pre_connected_transport: Option<Arc<wzp_transport::QuinnTransport>>,
|
||||
// Phase 5.6: Tauri AppHandle for call-debug event emits
|
||||
// from inside the send/recv tasks. See android branch for
|
||||
// the full rationale. Desktop branch accepts it for API
|
||||
// symmetry but doesn't yet thread it into the send/recv
|
||||
// tasks — android is where the reporter actually sees the
|
||||
// 1-way audio regression.
|
||||
_app: tauri::AppHandle,
|
||||
event_cb: F,
|
||||
) -> Result<Self, anyhow::Error>
|
||||
where
|
||||
|
||||
@@ -63,7 +63,7 @@ fn set_call_debug_logs_internal(on: bool) {
|
||||
/// Also mirrors to `tracing::info!` so logcat keeps its copy
|
||||
/// regardless of the flag — the toggle only controls the GUI
|
||||
/// overlay, not the underlying Android log stream.
|
||||
fn emit_call_debug(
|
||||
pub(crate) fn emit_call_debug(
|
||||
app: &tauri::AppHandle,
|
||||
step: &str,
|
||||
details: serde_json::Value,
|
||||
@@ -469,7 +469,8 @@ async fn connect(
|
||||
|
||||
let app_clone = app.clone();
|
||||
emit_call_debug(&app, "connect:call_engine_starting", serde_json::json!({}));
|
||||
match CallEngine::start(relay, room, alias, os_aec, quality, reuse_endpoint, pre_connected_transport, move |event_kind, message| {
|
||||
let app_for_engine = app.clone();
|
||||
match CallEngine::start(relay, room, alias, os_aec, quality, reuse_endpoint, pre_connected_transport, app_for_engine, move |event_kind, message| {
|
||||
let _ = app_clone.emit(
|
||||
"call-event",
|
||||
CallEvent {
|
||||
@@ -1531,6 +1532,73 @@ async fn deregister(state: tauri::State<'_, Arc<AppState>>) -> Result<(), String
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// End the current call, telling the peer via a signal-plane
|
||||
/// `Hangup` message before tearing down the local media engine.
|
||||
///
|
||||
/// Prior to this command existing, the hangup button just called
|
||||
/// `disconnect` which stopped the local engine but didn't notify
|
||||
/// the peer — so the OTHER party stayed on the call screen with
|
||||
/// nothing to hear. The relay DOES notice the media connection
|
||||
/// closing but doesn't forward anything to the peer on its own,
|
||||
/// so a real `SignalMessage::Hangup` is the only reliable signal.
|
||||
///
|
||||
/// Best-effort: if the signal transport is down (e.g. the relay
|
||||
/// dropped us mid-call), we still tear down the engine locally
|
||||
/// and return success. The peer's CallEngine will eventually
|
||||
/// notice the media side dying and the signal-event hangup
|
||||
/// handler will fire on receiving it from their signal loop if
|
||||
/// the relay is still up on their side.
|
||||
#[tauri::command]
|
||||
async fn hangup_call(
|
||||
state: tauri::State<'_, Arc<AppState>>,
|
||||
app: tauri::AppHandle,
|
||||
) -> Result<(), String> {
|
||||
use wzp_proto::SignalMessage;
|
||||
|
||||
emit_call_debug(&app, "hangup_call:start", serde_json::json!({}));
|
||||
|
||||
// Step 1: send Hangup over the signal channel so the relay
|
||||
// forwards it to the peer. Do this FIRST so the peer gets
|
||||
// the notification even if the engine shutdown takes a beat.
|
||||
{
|
||||
let sig = state.signal.lock().await;
|
||||
if let Some(ref transport) = sig.transport {
|
||||
match transport
|
||||
.send_signal(&SignalMessage::Hangup {
|
||||
reason: wzp_proto::HangupReason::Normal,
|
||||
})
|
||||
.await
|
||||
{
|
||||
Ok(()) => {
|
||||
tracing::info!("hangup_call: Hangup signal sent to relay");
|
||||
emit_call_debug(&app, "hangup_call:signal_sent", serde_json::json!({}));
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!(error = %e, "hangup_call: failed to send Hangup signal");
|
||||
emit_call_debug(
|
||||
&app,
|
||||
"hangup_call:signal_send_failed",
|
||||
serde_json::json!({ "error": e.to_string() }),
|
||||
);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
tracing::debug!("hangup_call: no signal transport, skipping Hangup send");
|
||||
emit_call_debug(&app, "hangup_call:no_signal_transport", serde_json::json!({}));
|
||||
}
|
||||
}
|
||||
|
||||
// Step 2: tear down the local media engine.
|
||||
let mut engine_lock = state.engine.lock().await;
|
||||
if let Some(engine) = engine_lock.take() {
|
||||
engine.stop().await;
|
||||
emit_call_debug(&app, "hangup_call:engine_stopped", serde_json::json!({}));
|
||||
} else {
|
||||
emit_call_debug(&app, "hangup_call:no_engine", serde_json::json!({}));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ─── App entry point ─────────────────────────────────────────────────────────
|
||||
|
||||
/// Shared Tauri app builder. Used by the desktop `main.rs` and the mobile
|
||||
@@ -1598,6 +1666,7 @@ pub fn run() {
|
||||
connect, disconnect, toggle_mic, toggle_speaker, get_status,
|
||||
register_signal, place_call, answer_call, get_signal_status,
|
||||
get_reflected_address, detect_nat_type,
|
||||
hangup_call,
|
||||
deregister,
|
||||
set_speakerphone, is_speakerphone_on,
|
||||
get_call_history, get_recent_contacts, clear_call_history,
|
||||
|
||||
@@ -897,7 +897,20 @@ spkBtn.addEventListener("click", async () => {
|
||||
});
|
||||
hangupBtn.addEventListener("click", async () => {
|
||||
userDisconnected = true;
|
||||
try { await invoke("disconnect"); } catch {}
|
||||
// Use the new hangup_call command instead of raw disconnect —
|
||||
// it sends a Hangup signal to the relay FIRST so the peer
|
||||
// gets auto-dismissed from the call screen, then tears down
|
||||
// our local engine. Plain `disconnect` would leave the peer
|
||||
// stuck on the call screen with silent audio.
|
||||
try {
|
||||
await invoke("hangup_call");
|
||||
} catch {
|
||||
// Fall back to plain disconnect if hangup_call errors
|
||||
// (older Rust build without the new command).
|
||||
try {
|
||||
await invoke("disconnect");
|
||||
} catch {}
|
||||
}
|
||||
showConnectScreen();
|
||||
});
|
||||
|
||||
|
||||
Reference in New Issue
Block a user