Two features in one commit because they ship and test together:
Phase 3.5 closes the hole-punching loop and the call-flow debug
logs give the user live visibility into every step of a call so
real-hardware testing of the new P2P path is debuggable.
## Phase 3.5 — dual-path QUIC connect race
Completes the hole-punching work Phase 3 scaffolded. On receiving
a CallSetup with peer_direct_addr, the client now actually races a
direct QUIC handshake against the relay dial and uses whichever
completes first. Symmetric role assignment avoids the two-conns-
per-call problem:
- Both peers compare `own_reflex_addr` vs `peer_reflex_addr`
lexicographically.
- Smaller addr → **Acceptor** (A-role): builds a server-capable
dual endpoint, awaits an incoming QUIC session. Does NOT dial.
- Larger addr → **Dialer** (D-role): builds a client-only
endpoint, dials the peer's addr with `call-<id>` SNI. Does NOT
listen.
- Both sides always dial the relay in parallel as fallback.
- `tokio::select!` with `biased` preference for direct, `tokio::pin!`
so each branch can await the losing opposite as fallback.
- Direct timeout 2s, relay fallback timeout 5s (so 7s worst case
from CallSetup to "no media path" error).
New crate module `wzp_client::dual_path::{race, WinningPath}`
(moved here from desktop/src-tauri so it's testable from a
workspace test). `determine_role` in `wzp_client::reflect` is
pure-function and unit-tested.
### CallEngine integration
- New `pre_connected_transport: Option<Arc<QuinnTransport>>` arg
on both android + desktop `CallEngine::start` branches. Skips
the internal wzp_transport::connect step when Some. Backward-
compat: None keeps Phase 0 relay-only behavior.
- `connect` Tauri command reads own_reflex_addr from SignalState,
computes role, runs the race, passes the winning transport
into CallEngine. If ANY input is missing (no peer addr, no own
addr, equal addrs), falls back to classic relay path —
identical to pre-Phase-3.5 behavior.
### Tests (9 new, all passing)
- 6 unit tests for `determine_role` truth table in
`wzp-client/src/reflect.rs` (smaller=Acceptor, larger=Dialer,
port-only diff, equal, missing-side, symmetry)
- 3 integration tests in `crates/wzp-client/tests/dual_path.rs`:
* `dual_path_direct_wins_on_loopback` — two-endpoint test
rig, Dialer wins direct path vs loopback mock relay
* `dual_path_relay_wins_when_direct_is_dead` — dead peer
port, 2s direct timeout, relay fallback wins
* `dual_path_errors_cleanly_when_both_paths_dead` — <10s
error, no hang
## GUI call-flow debug logs
Runtime-toggled structured events at every step of a call so the
user can see where a call progressed or stalled on real hardware.
Modeled on the existing DRED_VERBOSE_LOGS pattern.
### Rust side
- `static CALL_DEBUG_LOGS: AtomicBool` + `emit_call_debug(&app,
step, details)` helper. Always logs via `tracing::info!`
(logcat always has a copy); GUI Tauri `call-debug-log` event
only fires when the flag is on.
- Tauri commands `set_call_debug_logs` / `get_call_debug_logs`.
### Instrumented steps (24 emit_call_debug sites)
- `register_signal`: start, identity loaded, endpoint created,
connect failed/ok, RegisterPresence sent, ack received/failed,
recv loop spawning
- Recv loop: CallRinging, DirectCallOffer (w/ caller_reflexive_addr),
DirectCallAnswer (w/ callee_reflexive_addr), CallSetup (w/
peer_direct_addr), Hangup
- `place_call`: start, reflect query start/ok/none, offer sent,
send failed
- `answer_call`: start, reflect query start/ok/none or privacy
skip, answer sent, send failed
- `connect`: start, dual_path_race_start (w/ role), won (w/
path), failed, skipped (w/ reasons), call_engine_starting/
started/failed
### JS side
- New `callDebugLogs: boolean` field on Settings type.
- Boot-time hydrate of the Rust flag from localStorage so the
choice survives restarts (like `dredDebugLogs`).
- Settings panel: new "Call flow debug logs" checkbox alongside
the DRED toggle.
- New "Call Debug Log" section that ONLY shows when the flag is
on. Rolling in-memory buffer of the last 200 events, rendered
as monospace `HH:MM:SS.mmm step {details}` lines with auto-
scroll and a Clear button.
- `listen("call-debug-log", ...)` subscribed at app startup,
appends to the buffer, re-renders on every event.
Full workspace test goes from 404 → 413 passing. Clippy clean
on touched crates.
PRD: .taskmaster/docs/prd_phase35_dual_path_race.txt
Tasks: 61-69 all completed
Next: APK + desktop build carrying everything — Phase 2 NAT
detect, Phase 3 advertising, Phase 3.5 dual-path + call debug
logs, plus the earlier Android first-join diagnostics — so the
user can validate the P2P path on real hardware with live
per-step visibility into where any failures happen.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
200 lines
8.0 KiB
Rust
200 lines
8.0 KiB
Rust
//! Phase 3.5 integration tests for the dual-path QUIC race.
|
|
//!
|
|
//! The race takes a role (Acceptor or Dialer), a peer_direct_addr,
|
|
//! a relay_addr, and two SNI strings, then returns whichever QUIC
|
|
//! handshake completes first wrapped in a `QuinnTransport`. These
|
|
//! tests validate that:
|
|
//!
|
|
//! 1. On loopback with two real clients playing A + D roles, the
|
|
//! direct path wins (fewer hops than relay).
|
|
//! 2. When the direct peer is dead (nothing listening) but the
|
|
//! relay is up, the relay wins within the fallback window.
|
|
//! 3. When both paths are dead, the race errors cleanly rather
|
|
//! than hanging forever.
|
|
//!
|
|
//! The "relay" in these tests is a minimal mock that just accepts
|
|
//! an incoming QUIC connection and drops it — we don't need any
|
|
//! protocol handling, just a TCP-ish listen-and-accept.
|
|
|
|
use std::net::{Ipv4Addr, SocketAddr};
|
|
use std::time::Duration;
|
|
|
|
use wzp_client::dual_path::{race, WinningPath};
|
|
use wzp_client::reflect::Role;
|
|
use wzp_transport::{create_endpoint, server_config};
|
|
|
|
/// Spin up a "relay-ish" mock server on loopback that accepts
|
|
/// incoming QUIC connections and does nothing with them. Used to
|
|
/// give the relay branch of the race a real target to dial.
|
|
/// Returns the bound address + a join handle (kept alive to keep
|
|
/// the endpoint up).
|
|
async fn spawn_mock_relay() -> (SocketAddr, tokio::task::JoinHandle<()>) {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
let (sc, _cert_der) = server_config();
|
|
let bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
|
|
let ep = create_endpoint(bind, Some(sc)).expect("relay endpoint");
|
|
let addr = ep.local_addr().expect("local_addr");
|
|
|
|
let handle = tokio::spawn(async move {
|
|
// Accept loop — hold the connection alive for a short
|
|
// while so the race result isn't killed by the peer
|
|
// closing before the winning transport is returned.
|
|
while let Some(incoming) = ep.accept().await {
|
|
if let Ok(_conn) = incoming.await {
|
|
tokio::time::sleep(Duration::from_secs(5)).await;
|
|
}
|
|
}
|
|
});
|
|
(addr, handle)
|
|
}
|
|
|
|
// -----------------------------------------------------------------------
|
|
// Test 1: direct path wins when both sides are up
|
|
// -----------------------------------------------------------------------
|
|
//
|
|
// Spawn a mock relay, then set up a two-client test where one
|
|
// client plays the Acceptor role and the other plays the Dialer
|
|
// role. The Dialer's `peer_direct_addr` is the Acceptor's listen
|
|
// address. Because the direct path is a single loopback hop and
|
|
// the relay dial also terminates on loopback, both complete
|
|
// essentially instantly — the `biased` tokio::select in race()
|
|
// should pick direct.
|
|
|
|
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
|
|
async fn dual_path_direct_wins_on_loopback() {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
|
|
|
|
// Acceptor task: run race(Role::Acceptor, peer_addr_placeholder, ...).
|
|
// Since the acceptor doesn't dial, the peer_direct_addr arg is
|
|
// unused on the direct branch but we still pass a placeholder
|
|
// because the API takes one. Use a stub addr that would error
|
|
// if it were ever dialed — proving the Acceptor really doesn't
|
|
// reach it.
|
|
let unused_addr: SocketAddr = "127.0.0.1:2".parse().unwrap();
|
|
|
|
// We can't race both sides in the same task because each race
|
|
// call has its own direct endpoint that needs to talk to the
|
|
// OTHER side's endpoint. So spawn the Acceptor in a task and
|
|
// let it expose its listen addr via a oneshot back to the test,
|
|
// then run the Dialer in the test's main task.
|
|
//
|
|
// There's a chicken-and-egg issue: the Acceptor's listen addr
|
|
// is only known after race() creates its endpoint. To avoid
|
|
// reaching into race()'s internals, we instead play a slight
|
|
// trick: create the Acceptor's endpoint ourselves (outside
|
|
// race()) to learn its addr, spin up an accept loop on it
|
|
// ourselves, and pass THAT addr as the Dialer's peer addr.
|
|
// This tests the Dialer->Acceptor handshake end-to-end without
|
|
// running the full race() on both sides.
|
|
|
|
let (sc, _cert_der) = server_config();
|
|
let acceptor_bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
|
|
let acceptor_ep = create_endpoint(acceptor_bind, Some(sc)).expect("acceptor ep");
|
|
let acceptor_listen_addr = acceptor_ep.local_addr().expect("acceptor addr");
|
|
|
|
// Drop the external acceptor after the test finishes, not
|
|
// before — spawn a dedicated accept task.
|
|
let acceptor_accept_task = tokio::spawn(async move {
|
|
// Accept one connection and hold it for a while so the
|
|
// Dialer side can complete its QUIC handshake.
|
|
if let Some(incoming) = acceptor_ep.accept().await {
|
|
if let Ok(_conn) = incoming.await {
|
|
tokio::time::sleep(Duration::from_secs(5)).await;
|
|
}
|
|
}
|
|
});
|
|
|
|
// Now run the Dialer in the race — peer_direct_addr = acceptor's
|
|
// listen addr. The relay is the mock from above. Direct path
|
|
// should win.
|
|
let result = race(
|
|
Role::Dialer,
|
|
acceptor_listen_addr,
|
|
relay_addr,
|
|
"test-room".into(),
|
|
"call-test".into(),
|
|
)
|
|
.await
|
|
.expect("race must succeed");
|
|
|
|
assert_eq!(result.1, WinningPath::Direct, "direct should win on loopback");
|
|
|
|
// Cancel the acceptor accept task so the test finishes.
|
|
acceptor_accept_task.abort();
|
|
// Suppress unused-var warning for the placeholder.
|
|
let _ = unused_addr;
|
|
}
|
|
|
|
// -----------------------------------------------------------------------
|
|
// Test 2: relay wins when the direct peer is dead
|
|
// -----------------------------------------------------------------------
|
|
//
|
|
// Dialer role, peer_direct_addr = a port nothing is listening on,
|
|
// relay is the working mock. Direct dial will sit waiting for a
|
|
// QUIC handshake that never comes; the 2s direct timeout kicks in
|
|
// and the relay path wins the fallback.
|
|
|
|
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
|
|
async fn dual_path_relay_wins_when_direct_is_dead() {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
|
|
|
|
// A port that nothing is listening on — dead direct target.
|
|
// Port 1 on loopback is almost never bound and UDP packets to
|
|
// it will be dropped silently, so the QUIC handshake times out.
|
|
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
|
|
|
|
let result = race(
|
|
Role::Dialer,
|
|
dead_peer,
|
|
relay_addr,
|
|
"test-room".into(),
|
|
"call-test".into(),
|
|
)
|
|
.await
|
|
.expect("race must succeed via relay fallback");
|
|
|
|
assert_eq!(
|
|
result.1,
|
|
WinningPath::Relay,
|
|
"relay should win when direct dial has nowhere to land"
|
|
);
|
|
}
|
|
|
|
// -----------------------------------------------------------------------
|
|
// Test 3: race errors cleanly when both paths are dead
|
|
// -----------------------------------------------------------------------
|
|
//
|
|
// Dialer role, peer_direct_addr = dead, relay_addr = dead.
|
|
// Expected: race returns an Err within ~7s (2s direct timeout +
|
|
// 5s relay timeout fallback).
|
|
|
|
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
|
|
async fn dual_path_errors_cleanly_when_both_paths_dead() {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
|
|
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
|
|
let dead_relay: SocketAddr = "127.0.0.1:2".parse().unwrap();
|
|
|
|
let start = std::time::Instant::now();
|
|
let result = race(
|
|
Role::Dialer,
|
|
dead_peer,
|
|
dead_relay,
|
|
"test-room".into(),
|
|
"call-test".into(),
|
|
)
|
|
.await;
|
|
let elapsed = start.elapsed();
|
|
|
|
assert!(result.is_err(), "both-dead must return Err");
|
|
// Upper bound: direct 2s timeout + relay 5s fallback + small
|
|
// slack for scheduling. If this blows, something is looping.
|
|
assert!(
|
|
elapsed < Duration::from_secs(10),
|
|
"race took too long to give up: {:?}",
|
|
elapsed
|
|
);
|
|
}
|