Files
wz-phone/crates/wzp-client/tests/dual_path.rs
Siavash Sameni fa038df057 feat(p2p): Phase 5.5 — ICE LAN host candidates (IPv4 + IPv6)
Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.

Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.

Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.

## Changes

### `wzp_client::reflect::local_host_candidates(port)` (new)

Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:

- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
  (already covered by reflex-addr), unspecified

Safe from any thread, one `getifaddrs(3)` syscall.

### Wire protocol (wzp-proto/packet.rs)

Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:

- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`

### Call registry (wzp-relay/call_registry.rs)

`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.

### Relay cross-wiring (wzp-relay/main.rs)

Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.

### Client side (desktop/src-tauri/lib.rs)

- `place_call`: gathers local host candidates via
  `local_host_candidates(signal_endpoint.local_addr().port())`
  and includes them in `DirectCallOffer.caller_local_addrs`.
  The port match is critical — it's the Phase 5 shared signal
  socket, so incoming dials to these addrs land on the same
  endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
  LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
  arg. Builds a `PeerCandidates` bundle and passes it to the
  dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
  new field to JS via the signal-event payload.

### `dual_path::race` (wzp-client/dual_path.rs)

Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.

LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.

### JS side (desktop/main.ts)

`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.

### Tests

All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.

Full workspace test: 423 passing (same as before 5.5).

## Expected behavior on the reporter's setup

Two phones behind MikroTik, both on the same LAN:

  place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
  recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
  recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
                  "peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
  connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
  dual_path: direct dial succeeded on candidate 0   ← LAN v4 wins
  connect:dual_path_race_won {"path":"Direct"}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 07:34:49 +04:00

212 lines
8.5 KiB
Rust

//! Phase 3.5 integration tests for the dual-path QUIC race.
//!
//! The race takes a role (Acceptor or Dialer), a peer_direct_addr,
//! a relay_addr, and two SNI strings, then returns whichever QUIC
//! handshake completes first wrapped in a `QuinnTransport`. These
//! tests validate that:
//!
//! 1. On loopback with two real clients playing A + D roles, the
//! direct path wins (fewer hops than relay).
//! 2. When the direct peer is dead (nothing listening) but the
//! relay is up, the relay wins within the fallback window.
//! 3. When both paths are dead, the race errors cleanly rather
//! than hanging forever.
//!
//! The "relay" in these tests is a minimal mock that just accepts
//! an incoming QUIC connection and drops it — we don't need any
//! protocol handling, just a TCP-ish listen-and-accept.
use std::net::{Ipv4Addr, SocketAddr};
use std::time::Duration;
use wzp_client::dual_path::{race, PeerCandidates, WinningPath};
use wzp_client::reflect::Role;
use wzp_transport::{create_endpoint, server_config};
/// Spin up a "relay-ish" mock server on loopback that accepts
/// incoming QUIC connections and does nothing with them. Used to
/// give the relay branch of the race a real target to dial.
/// Returns the bound address + a join handle (kept alive to keep
/// the endpoint up).
async fn spawn_mock_relay() -> (SocketAddr, tokio::task::JoinHandle<()>) {
let _ = rustls::crypto::ring::default_provider().install_default();
let (sc, _cert_der) = server_config();
let bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let ep = create_endpoint(bind, Some(sc)).expect("relay endpoint");
let addr = ep.local_addr().expect("local_addr");
let handle = tokio::spawn(async move {
// Accept loop — hold the connection alive for a short
// while so the race result isn't killed by the peer
// closing before the winning transport is returned.
while let Some(incoming) = ep.accept().await {
if let Ok(_conn) = incoming.await {
tokio::time::sleep(Duration::from_secs(5)).await;
}
}
});
(addr, handle)
}
// -----------------------------------------------------------------------
// Test 1: direct path wins when both sides are up
// -----------------------------------------------------------------------
//
// Spawn a mock relay, then set up a two-client test where one
// client plays the Acceptor role and the other plays the Dialer
// role. The Dialer's `peer_direct_addr` is the Acceptor's listen
// address. Because the direct path is a single loopback hop and
// the relay dial also terminates on loopback, both complete
// essentially instantly — the `biased` tokio::select in race()
// should pick direct.
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn dual_path_direct_wins_on_loopback() {
let _ = rustls::crypto::ring::default_provider().install_default();
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
// Acceptor task: run race(Role::Acceptor, peer_addr_placeholder, ...).
// Since the acceptor doesn't dial, the peer_direct_addr arg is
// unused on the direct branch but we still pass a placeholder
// because the API takes one. Use a stub addr that would error
// if it were ever dialed — proving the Acceptor really doesn't
// reach it.
let unused_addr: SocketAddr = "127.0.0.1:2".parse().unwrap();
// We can't race both sides in the same task because each race
// call has its own direct endpoint that needs to talk to the
// OTHER side's endpoint. So spawn the Acceptor in a task and
// let it expose its listen addr via a oneshot back to the test,
// then run the Dialer in the test's main task.
//
// There's a chicken-and-egg issue: the Acceptor's listen addr
// is only known after race() creates its endpoint. To avoid
// reaching into race()'s internals, we instead play a slight
// trick: create the Acceptor's endpoint ourselves (outside
// race()) to learn its addr, spin up an accept loop on it
// ourselves, and pass THAT addr as the Dialer's peer addr.
// This tests the Dialer->Acceptor handshake end-to-end without
// running the full race() on both sides.
let (sc, _cert_der) = server_config();
let acceptor_bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
let acceptor_ep = create_endpoint(acceptor_bind, Some(sc)).expect("acceptor ep");
let acceptor_listen_addr = acceptor_ep.local_addr().expect("acceptor addr");
// Drop the external acceptor after the test finishes, not
// before — spawn a dedicated accept task.
let acceptor_accept_task = tokio::spawn(async move {
// Accept one connection and hold it for a while so the
// Dialer side can complete its QUIC handshake.
if let Some(incoming) = acceptor_ep.accept().await {
if let Ok(_conn) = incoming.await {
tokio::time::sleep(Duration::from_secs(5)).await;
}
}
});
// Now run the Dialer in the race — peer_direct_addr = acceptor's
// listen addr. The relay is the mock from above. Direct path
// should win.
let result = race(
Role::Dialer,
PeerCandidates {
reflexive: Some(acceptor_listen_addr),
local: Vec::new(),
},
relay_addr,
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
)
.await
.expect("race must succeed");
assert_eq!(result.1, WinningPath::Direct, "direct should win on loopback");
// Cancel the acceptor accept task so the test finishes.
acceptor_accept_task.abort();
// Suppress unused-var warning for the placeholder.
let _ = unused_addr;
}
// -----------------------------------------------------------------------
// Test 2: relay wins when the direct peer is dead
// -----------------------------------------------------------------------
//
// Dialer role, peer_direct_addr = a port nothing is listening on,
// relay is the working mock. Direct dial will sit waiting for a
// QUIC handshake that never comes; the 2s direct timeout kicks in
// and the relay path wins the fallback.
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn dual_path_relay_wins_when_direct_is_dead() {
let _ = rustls::crypto::ring::default_provider().install_default();
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
// A port that nothing is listening on — dead direct target.
// Port 1 on loopback is almost never bound and UDP packets to
// it will be dropped silently, so the QUIC handshake times out.
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
let result = race(
Role::Dialer,
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
},
relay_addr,
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
)
.await
.expect("race must succeed via relay fallback");
assert_eq!(
result.1,
WinningPath::Relay,
"relay should win when direct dial has nowhere to land"
);
}
// -----------------------------------------------------------------------
// Test 3: race errors cleanly when both paths are dead
// -----------------------------------------------------------------------
//
// Dialer role, peer_direct_addr = dead, relay_addr = dead.
// Expected: race returns an Err within ~7s (2s direct timeout +
// 5s relay timeout fallback).
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn dual_path_errors_cleanly_when_both_paths_dead() {
let _ = rustls::crypto::ring::default_provider().install_default();
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
let dead_relay: SocketAddr = "127.0.0.1:2".parse().unwrap();
let start = std::time::Instant::now();
let result = race(
Role::Dialer,
PeerCandidates {
reflexive: Some(dead_peer),
local: Vec::new(),
},
dead_relay,
"test-room".into(),
"call-test".into(),
None, // Phase 5: tests use fresh endpoints (no shared signal)
)
.await;
let elapsed = start.elapsed();
assert!(result.is_err(), "both-dead must return Err");
// Upper bound: direct 2s timeout + relay 5s fallback + small
// slack for scheduling. If this blows, something is looping.
assert!(
elapsed < Duration::from_secs(10),
"race took too long to give up: {:?}",
elapsed
);
}