Same-LAN P2P was failing because MikroTik masquerade (like most
consumer NATs) doesn't support NAT hairpinning — the advertised
WAN reflex addr is unreachable from a peer on the same LAN as
the advertiser. Phase 5 got us Cone NAT classification and fixed
the measurement artifact, but same-LAN direct dials still had
nowhere to land.
Phase 5.5 adds ICE-style host candidates: each client enumerates
its LAN-local network interface addresses, includes them in the
DirectCallOffer/Answer alongside the reflex addr, and the
dual-path race fans out to ALL peer candidates in parallel.
Same-LAN peers find each other via their RFC1918 IPv4 + ULA /
global-unicast IPv6 addresses without touching the NAT at all.
Dual-stack IPv6 is in scope from the start — on modern ISPs
(including Starlink) the v6 path often works even when v4
hairpinning doesn't, because there's no NAT on the v6 side.
## Changes
### `wzp_client::reflect::local_host_candidates(port)` (new)
Enumerates network interfaces via `if-addrs` and returns
SocketAddrs paired with the caller's port. Filters:
- IPv4: RFC1918 (10/8, 172.16/12, 192.168/16) + CGNAT (100.64/10)
- IPv6: global unicast (2000::/3) + ULA (fc00::/7)
- Skipped: loopback, link-local (169.254, fe80::), public v4
(already covered by reflex-addr), unspecified
Safe from any thread, one `getifaddrs(3)` syscall.
### Wire protocol (wzp-proto/packet.rs)
Three new `#[serde(default, skip_serializing_if = "Vec::is_empty")]`
fields, backward-compat with pre-5.5 clients/relays by
construction:
- `DirectCallOffer.caller_local_addrs: Vec<String>`
- `DirectCallAnswer.callee_local_addrs: Vec<String>`
- `CallSetup.peer_local_addrs: Vec<String>`
### Call registry (wzp-relay/call_registry.rs)
`DirectCall` gains `caller_local_addrs` + `callee_local_addrs`
Vec<String> fields. New `set_caller_local_addrs` /
`set_callee_local_addrs` setters. Follow the same pattern as
the reflex addr fields.
### Relay cross-wiring (wzp-relay/main.rs)
Both the local-call and cross-relay-federation paths now track
the local_addrs through the registry and inject them into the
CallSetup's peer_local_addrs. Cross-wiring is identical to the
existing peer_direct_addr logic — each party's CallSetup
carries the OTHER party's LAN candidates.
### Client side (desktop/src-tauri/lib.rs)
- `place_call`: gathers local host candidates via
`local_host_candidates(signal_endpoint.local_addr().port())`
and includes them in `DirectCallOffer.caller_local_addrs`.
The port match is critical — it's the Phase 5 shared signal
socket, so incoming dials to these addrs land on the same
endpoint that's already listening.
- `answer_call`: same, AcceptTrusted only (privacy mode keeps
LAN addrs hidden too, for consistency with the reflex addr).
- `connect` Tauri command: new `peer_local_addrs: Vec<String>`
arg. Builds a `PeerCandidates` bundle and passes it to the
dual-path race.
- Recv loop's CallSetup handler: destructures + forwards the
new field to JS via the signal-event payload.
### `dual_path::race` (wzp-client/dual_path.rs)
Signature change: takes `PeerCandidates` (reflex + local Vec)
instead of a single SocketAddr. The D-role branch now fans out
N parallel dials via `tokio::task::JoinSet` — one per candidate
— and the first successful dial wins (losers are aborted
immediately via `set.abort_all()`). Only when ALL candidates
have failed do we return Err; individual candidate failures are
just traced at debug level and the race waits for the others.
LAN host candidates are tried BEFORE the reflex addr in
`PeerCandidates::dial_order()` — they're faster when they work,
and the reflex addr is the fallback for the not-on-same-LAN
case.
### JS side (desktop/main.ts)
`connect` invoke now passes `peerLocalAddrs: data.peer_local_addrs ?? []`
alongside the existing `peerDirectAddr`.
### Tests
All existing test callsites updated for the new Vec<String>
fields (defaults to Vec::new() in tests — they don't exercise
the multi-candidate path). `dual_path.rs` integration tests
wrap the single `dead_peer` / `acceptor_listen_addr` in a
`PeerCandidates { reflexive: Some(_), local: Vec::new() }`.
Full workspace test: 423 passing (same as before 5.5).
## Expected behavior on the reporter's setup
Two phones behind MikroTik, both on the same LAN:
place_call:host_candidates {"local_addrs": ["192.168.88.21:XXX", "2001:...:YY:XXX"]}
recv:DirectCallAnswer {"callee_local_addrs": ["192.168.88.22:ZZZ", "2001:...:WW:ZZZ"]}
recv:CallSetup {"peer_direct_addr":"150.228.49.65:NN",
"peer_local_addrs":["192.168.88.22:ZZZ","2001:...:WW:ZZZ"]}
connect:dual_path_race_start {"peer_reflex":"...","peer_local":[...]}
dual_path: direct dial succeeded on candidate 0 ← LAN v4 wins
connect:dual_path_race_won {"path":"Direct"}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
212 lines
8.5 KiB
Rust
212 lines
8.5 KiB
Rust
//! Phase 3.5 integration tests for the dual-path QUIC race.
|
|
//!
|
|
//! The race takes a role (Acceptor or Dialer), a peer_direct_addr,
|
|
//! a relay_addr, and two SNI strings, then returns whichever QUIC
|
|
//! handshake completes first wrapped in a `QuinnTransport`. These
|
|
//! tests validate that:
|
|
//!
|
|
//! 1. On loopback with two real clients playing A + D roles, the
|
|
//! direct path wins (fewer hops than relay).
|
|
//! 2. When the direct peer is dead (nothing listening) but the
|
|
//! relay is up, the relay wins within the fallback window.
|
|
//! 3. When both paths are dead, the race errors cleanly rather
|
|
//! than hanging forever.
|
|
//!
|
|
//! The "relay" in these tests is a minimal mock that just accepts
|
|
//! an incoming QUIC connection and drops it — we don't need any
|
|
//! protocol handling, just a TCP-ish listen-and-accept.
|
|
|
|
use std::net::{Ipv4Addr, SocketAddr};
|
|
use std::time::Duration;
|
|
|
|
use wzp_client::dual_path::{race, PeerCandidates, WinningPath};
|
|
use wzp_client::reflect::Role;
|
|
use wzp_transport::{create_endpoint, server_config};
|
|
|
|
/// Spin up a "relay-ish" mock server on loopback that accepts
|
|
/// incoming QUIC connections and does nothing with them. Used to
|
|
/// give the relay branch of the race a real target to dial.
|
|
/// Returns the bound address + a join handle (kept alive to keep
|
|
/// the endpoint up).
|
|
async fn spawn_mock_relay() -> (SocketAddr, tokio::task::JoinHandle<()>) {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
let (sc, _cert_der) = server_config();
|
|
let bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
|
|
let ep = create_endpoint(bind, Some(sc)).expect("relay endpoint");
|
|
let addr = ep.local_addr().expect("local_addr");
|
|
|
|
let handle = tokio::spawn(async move {
|
|
// Accept loop — hold the connection alive for a short
|
|
// while so the race result isn't killed by the peer
|
|
// closing before the winning transport is returned.
|
|
while let Some(incoming) = ep.accept().await {
|
|
if let Ok(_conn) = incoming.await {
|
|
tokio::time::sleep(Duration::from_secs(5)).await;
|
|
}
|
|
}
|
|
});
|
|
(addr, handle)
|
|
}
|
|
|
|
// -----------------------------------------------------------------------
|
|
// Test 1: direct path wins when both sides are up
|
|
// -----------------------------------------------------------------------
|
|
//
|
|
// Spawn a mock relay, then set up a two-client test where one
|
|
// client plays the Acceptor role and the other plays the Dialer
|
|
// role. The Dialer's `peer_direct_addr` is the Acceptor's listen
|
|
// address. Because the direct path is a single loopback hop and
|
|
// the relay dial also terminates on loopback, both complete
|
|
// essentially instantly — the `biased` tokio::select in race()
|
|
// should pick direct.
|
|
|
|
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
|
|
async fn dual_path_direct_wins_on_loopback() {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
|
|
|
|
// Acceptor task: run race(Role::Acceptor, peer_addr_placeholder, ...).
|
|
// Since the acceptor doesn't dial, the peer_direct_addr arg is
|
|
// unused on the direct branch but we still pass a placeholder
|
|
// because the API takes one. Use a stub addr that would error
|
|
// if it were ever dialed — proving the Acceptor really doesn't
|
|
// reach it.
|
|
let unused_addr: SocketAddr = "127.0.0.1:2".parse().unwrap();
|
|
|
|
// We can't race both sides in the same task because each race
|
|
// call has its own direct endpoint that needs to talk to the
|
|
// OTHER side's endpoint. So spawn the Acceptor in a task and
|
|
// let it expose its listen addr via a oneshot back to the test,
|
|
// then run the Dialer in the test's main task.
|
|
//
|
|
// There's a chicken-and-egg issue: the Acceptor's listen addr
|
|
// is only known after race() creates its endpoint. To avoid
|
|
// reaching into race()'s internals, we instead play a slight
|
|
// trick: create the Acceptor's endpoint ourselves (outside
|
|
// race()) to learn its addr, spin up an accept loop on it
|
|
// ourselves, and pass THAT addr as the Dialer's peer addr.
|
|
// This tests the Dialer->Acceptor handshake end-to-end without
|
|
// running the full race() on both sides.
|
|
|
|
let (sc, _cert_der) = server_config();
|
|
let acceptor_bind: SocketAddr = (Ipv4Addr::LOCALHOST, 0).into();
|
|
let acceptor_ep = create_endpoint(acceptor_bind, Some(sc)).expect("acceptor ep");
|
|
let acceptor_listen_addr = acceptor_ep.local_addr().expect("acceptor addr");
|
|
|
|
// Drop the external acceptor after the test finishes, not
|
|
// before — spawn a dedicated accept task.
|
|
let acceptor_accept_task = tokio::spawn(async move {
|
|
// Accept one connection and hold it for a while so the
|
|
// Dialer side can complete its QUIC handshake.
|
|
if let Some(incoming) = acceptor_ep.accept().await {
|
|
if let Ok(_conn) = incoming.await {
|
|
tokio::time::sleep(Duration::from_secs(5)).await;
|
|
}
|
|
}
|
|
});
|
|
|
|
// Now run the Dialer in the race — peer_direct_addr = acceptor's
|
|
// listen addr. The relay is the mock from above. Direct path
|
|
// should win.
|
|
let result = race(
|
|
Role::Dialer,
|
|
PeerCandidates {
|
|
reflexive: Some(acceptor_listen_addr),
|
|
local: Vec::new(),
|
|
},
|
|
relay_addr,
|
|
"test-room".into(),
|
|
"call-test".into(),
|
|
None, // Phase 5: tests use fresh endpoints (no shared signal)
|
|
)
|
|
.await
|
|
.expect("race must succeed");
|
|
|
|
assert_eq!(result.1, WinningPath::Direct, "direct should win on loopback");
|
|
|
|
// Cancel the acceptor accept task so the test finishes.
|
|
acceptor_accept_task.abort();
|
|
// Suppress unused-var warning for the placeholder.
|
|
let _ = unused_addr;
|
|
}
|
|
|
|
// -----------------------------------------------------------------------
|
|
// Test 2: relay wins when the direct peer is dead
|
|
// -----------------------------------------------------------------------
|
|
//
|
|
// Dialer role, peer_direct_addr = a port nothing is listening on,
|
|
// relay is the working mock. Direct dial will sit waiting for a
|
|
// QUIC handshake that never comes; the 2s direct timeout kicks in
|
|
// and the relay path wins the fallback.
|
|
|
|
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
|
|
async fn dual_path_relay_wins_when_direct_is_dead() {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
let (relay_addr, _relay_handle) = spawn_mock_relay().await;
|
|
|
|
// A port that nothing is listening on — dead direct target.
|
|
// Port 1 on loopback is almost never bound and UDP packets to
|
|
// it will be dropped silently, so the QUIC handshake times out.
|
|
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
|
|
|
|
let result = race(
|
|
Role::Dialer,
|
|
PeerCandidates {
|
|
reflexive: Some(dead_peer),
|
|
local: Vec::new(),
|
|
},
|
|
relay_addr,
|
|
"test-room".into(),
|
|
"call-test".into(),
|
|
None, // Phase 5: tests use fresh endpoints (no shared signal)
|
|
)
|
|
.await
|
|
.expect("race must succeed via relay fallback");
|
|
|
|
assert_eq!(
|
|
result.1,
|
|
WinningPath::Relay,
|
|
"relay should win when direct dial has nowhere to land"
|
|
);
|
|
}
|
|
|
|
// -----------------------------------------------------------------------
|
|
// Test 3: race errors cleanly when both paths are dead
|
|
// -----------------------------------------------------------------------
|
|
//
|
|
// Dialer role, peer_direct_addr = dead, relay_addr = dead.
|
|
// Expected: race returns an Err within ~7s (2s direct timeout +
|
|
// 5s relay timeout fallback).
|
|
|
|
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
|
|
async fn dual_path_errors_cleanly_when_both_paths_dead() {
|
|
let _ = rustls::crypto::ring::default_provider().install_default();
|
|
|
|
let dead_peer: SocketAddr = "127.0.0.1:1".parse().unwrap();
|
|
let dead_relay: SocketAddr = "127.0.0.1:2".parse().unwrap();
|
|
|
|
let start = std::time::Instant::now();
|
|
let result = race(
|
|
Role::Dialer,
|
|
PeerCandidates {
|
|
reflexive: Some(dead_peer),
|
|
local: Vec::new(),
|
|
},
|
|
dead_relay,
|
|
"test-room".into(),
|
|
"call-test".into(),
|
|
None, // Phase 5: tests use fresh endpoints (no shared signal)
|
|
)
|
|
.await;
|
|
let elapsed = start.elapsed();
|
|
|
|
assert!(result.is_err(), "both-dead must return Err");
|
|
// Upper bound: direct 2s timeout + relay 5s fallback + small
|
|
// slack for scheduling. If this blows, something is looping.
|
|
assert!(
|
|
elapsed < Duration::from_secs(10),
|
|
"race took too long to give up: {:?}",
|
|
elapsed
|
|
);
|
|
}
|